Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock at process shutdown #54918

Open
lpinca opened this issue Sep 13, 2024 · 14 comments
Open

Deadlock at process shutdown #54918

lpinca opened this issue Sep 13, 2024 · 14 comments
Labels
confirmed-bug Issues with confirmed bugs. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. v8 engine Issues and PRs related to the V8 dependency.

Comments

@lpinca
Copy link
Member

lpinca commented Sep 13, 2024

Version

v23.0.0-pre

Platform

Linux USR-PC2HD4AD 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

There is a deadlock that prevents the Node.js process from exiting. The issue is causing a lot (all?) of timeout failures in our CI. It can be reproduced by running a test in parallel with our test.py tool, for example

$ ./tools/test.py --repeat=10000 test/parallel/test-http2-large-write-multiple-requests.js

See also

How often does it reproduce? Is there a required condition?

Rarely, but often enough to be a serious issue for CI.

What is the expected behavior? Why is that the expected behavior?

The process exits.

What do you see instead?

The process does not exit.

Additional information

Attaching gdb to two of the hanging processes obtained from the command above, produces the following outputs:

$ gdb -p 15884
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 15884
[New LWP 15953]
[New LWP 15952]
[New LWP 15951]
[New LWP 15950]
[New LWP 15900]
[New LWP 15889]
[New LWP 15888]
[New LWP 15887]
[New LWP 15886]
[New LWP 15885]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/futex-internal.c.
0x00007f432025fd61 in __futex_abstimed_wait_common64 (private=21855, cancel=true, abstime=0x0, op=393, expected=0,
    futex_word=0x555fa9fce8f0) at ./nptl/futex-internal.c:57

warning: 57     ./nptl/futex-internal.c: No such file or directory
(gdb) bt
#0  0x00007f432025fd61 in __futex_abstimed_wait_common64 (private=21855, cancel=true, abstime=0x0, op=393, expected=0,
    futex_word=0x555fa9fce8f0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=21855, abstime=0x0, clockid=0, expected=0,
    futex_word=0x555fa9fce8f0) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x555fa9fce8f0, expected=expected@entry=0,
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007f43202627dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555fa9fce870,
    cond=0x555fa9fce8c8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x555fa9fce8c8, mutex=0x555fa9fce870) at ./nptl/pthread_cond_wait.c:627
#5  0x0000555fa3d30d7d in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>)
    at ../deps/uv/src/unix/thread.c:814
#6  0x0000555fa2c8c063 in node::NodePlatform::DrainTasks(v8::Isolate*) ()
#7  0x0000555fa2ac528d in node::SpinEventLoopInternal(node::Environment*) ()
#8  0x0000555fa2c4abc4 in node::NodeMainInstance::Run() ()
#9  0x0000555fa2b9233a in node::Start(int, char**) ()
#10 0x00007f43201f11ca in __libc_start_call_main (main=main@entry=0x555fa2ababa0 <main>, argc=argc@entry=2,
    argv=argv@entry=0x7ffe74216608) at ../sysdeps/nptl/libc_start_call_main.h:58
#11 0x00007f43201f128b in __libc_start_main_impl (main=0x555fa2ababa0 <main>, argc=2, argv=0x7ffe74216608,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe742165f8)
    at ../csu/libc-start.c:360
#12 0x0000555fa2ac06a5 in _start ()
(gdb)
$ gdb -p 15885
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 15885
Reading symbols from /home/luigi/node/out/Release/node...
Reading symbols from /lib/x86_64-linux-gnu/libstdc++.so.6...
Reading symbols from /home/luigi/.cache/debuginfod_client/40b9b0d17fdeebfb57331304da2b7f85e1396ef2/debuginfo...
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
Reading symbols from /usr/lib/debug/.build-id/90/32976b3ecc78d1362fedfcd88528562bbfb7e4.debug...
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
Reading symbols from /home/luigi/.cache/debuginfod_client/92123f0e6223c77754bac47062c0b9713ed363df/debuginfo...
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/6d/64b17fbac799e68da7ebd9985ddf9b5cb375e6.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/35/3e1b6cb0eebc08cf3ff812eae8a51b4efd684e.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Download failed: Invalid argument.  Continuing without source file ./misc/../sysdeps/unix/sysv/linux/epoll_pwait.c.
0x00007f43202f0ee0 in __GI_epoll_pwait (epfd=epfd@entry=9, events=events@entry=0x7f43201bdcb0,
    maxevents=maxevents@entry=1024, timeout=timeout@entry=-1, set=set@entry=0x0)
    at ../sysdeps/unix/sysv/linux/epoll_pwait.c:40

warning: 40     ../sysdeps/unix/sysv/linux/epoll_pwait.c: No such file or directory
(gdb) bt
#0  0x00007f43202f0ee0 in __GI_epoll_pwait (epfd=epfd@entry=9, events=events@entry=0x7f43201bdcb0,
    maxevents=maxevents@entry=1024, timeout=timeout@entry=-1, set=set@entry=0x0)
    at ../sysdeps/unix/sysv/linux/epoll_pwait.c:40
#1  0x0000555fa3d364e7 in uv__io_poll (loop=loop@entry=0x555fa9fcea88, timeout=<optimized out>)
    at ../deps/uv/src/unix/linux.c:1432
#2  0x0000555fa3d20c17 in uv_run (loop=0x555fa9fcea88, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:448
#3  0x0000555fa2c8d4b1 in node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*)
    ()
#4  0x00007f4320263a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#5  0x00007f43202f0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb)
@lpinca
Copy link
Member Author

lpinca commented Sep 13, 2024

As written in #52550 (comment), applying the patch from #47452 seems to fix the issue, but it is not a solution since that patch brings other issues.

@avivkeller
Copy link
Member

This can been seen in various CI runs, assuming confirmed-bug

@lpinca
Copy link
Member Author

lpinca commented Sep 13, 2024

The output of lldb on macOS:

$ lldb -p 5033
(lldb) process attach --pid 5033
Process 5033 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007ff802363c3a libsystem_kernel.dylib`__psynch_cvwait + 10
libsystem_kernel.dylib`:
->  0x7ff802363c3a <+10>: jae    0x7ff802363c44            ; <+20>
    0x7ff802363c3c <+12>: movq   %rax, %rdi
    0x7ff802363c3f <+15>: jmp    0x7ff8023617e0            ; cerror_nocancel
    0x7ff802363c44 <+20>: retq   
Target 0: (node) stopped.
Executable module set to "/Users/luigi/code/node/out/Release/node".
Architecture set to: x86_64h-apple-macosx-.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007ff802363c3a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff8023a16f3 libsystem_pthread.dylib`_pthread_cond_wait + 1211
    frame #2: 0x000000010df93b23 node`uv_cond_wait(cond=0x00007f9a4d404108, mutex=0x00007f9a4d404098) at thread.c:798:7 [opt]
    frame #3: 0x000000010d0be52b node`node::NodePlatform::DrainTasks(v8::Isolate*) [inlined] node::LibuvMutexTraits::cond_wait(cond=0x00007f9a4d404108, mutex=0x00007f9a4d404098) at node_mutex.h:175:5 [opt]
    frame #4: 0x000000010d0be511 node`node::NodePlatform::DrainTasks(v8::Isolate*) [inlined] node::ConditionVariableBase<node::LibuvMutexTraits>::Wait(this=0x00007f9a4d404108, scoped_lock=<unavailable>) at node_mutex.h:249:3 [opt]
    frame #5: 0x000000010d0be511 node`node::NodePlatform::DrainTasks(v8::Isolate*) [inlined] node::TaskQueue<v8::Task>::BlockingDrain(this=0x00007f9a4d404098) at node_platform.cc:640:20 [opt]
    frame #6: 0x000000010d0be4fb node`node::NodePlatform::DrainTasks(v8::Isolate*) [inlined] node::WorkerThreadsTaskRunner::BlockingDrain(this=0x00007f9a4d404098) at node_platform.cc:214:25 [opt]
    frame #7: 0x000000010d0be4fb node`node::NodePlatform::DrainTasks(this=0x0000600002718000, isolate=<unavailable>) at node_platform.cc:467:33 [opt]
    frame #8: 0x000000010cf18a50 node`node::SpinEventLoopInternal(env=0x00007f9a2c814e00) at embed_helpers.cc:44:17 [opt]
    frame #9: 0x000000010d084083 node`node::NodeMainInstance::Run() [inlined] node::NodeMainInstance::Run(this=<unavailable>, exit_code=<unavailable>, env=0x00007f9a2c814e00) at node_main_instance.cc:111:9 [opt]
    frame #10: 0x000000010d083fe9 node`node::NodeMainInstance::Run(this=<unavailable>) at node_main_instance.cc:100:3 [opt]
    frame #11: 0x000000010cfe4159 node`node::Start(int, char**) [inlined] node::StartInternal(argc=<unavailable>, argv=<unavailable>) at node.cc:1489:24 [opt]
    frame #12: 0x000000010cfe3f54 node`node::Start(argc=<unavailable>, argv=<unavailable>) at node.cc:1496:27 [opt]
    frame #13: 0x00007ff802015345 dyld`start + 1909
(lldb)

@aduh95
Copy link
Contributor

aduh95 commented Sep 17, 2024

@lpinca could you try ab23463? I'm not able to reproduce the deadlock locally, so I can't confirm if it is effective.

@lpinca
Copy link
Member Author

lpinca commented Sep 17, 2024

@aduh95 see #54918 (comment). It seems the same patch.

@aduh95
Copy link
Contributor

aduh95 commented Sep 17, 2024

No it's not, it's based on it but it's meant to fix the test failures.

@targos
Copy link
Member

targos commented Sep 17, 2024

@lpinca
Copy link
Member Author

lpinca commented Sep 17, 2024

The issue persists 😞

gdb -p 18560
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 18560
[New LWP 18621]
[New LWP 18620]
[New LWP 18619]
[New LWP 18618]
[New LWP 18574]
[New LWP 18565]
[New LWP 18564]
[New LWP 18563]
[New LWP 18562]
[New LWP 18561]
Downloading separate debug info for /lib/x86_64-linux-gnu/libstdc++.so.6
Downloading separate debug info for system-supplied DSO at 0x7ffdf50f4000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/futex-internal.c.
0x00007fca5f88ad61 in __futex_abstimed_wait_common64 (private=21976, cancel=true, abstime=0x0, op=393, expected=0,
    futex_word=0x55d834ea10e0) at ./nptl/futex-internal.c:57

warning: 57     ./nptl/futex-internal.c: No such file or directory
(gdb) bt
#0  0x00007fca5f88ad61 in __futex_abstimed_wait_common64 (private=21976, cancel=true, abstime=0x0, op=393, expected=0,
    futex_word=0x55d834ea10e0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=21976, abstime=0x0, clockid=0, expected=0,
    futex_word=0x55d834ea10e0) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55d834ea10e0, expected=expected@entry=0,
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007fca5f88d7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55d834ea1060,
    cond=0x55d834ea10b8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x55d834ea10b8, mutex=0x55d834ea1060) at ./nptl/pthread_cond_wait.c:627
#5  0x000055d82f131fed in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>)
    at ../deps/uv/src/unix/thread.c:814
#6  0x000055d82e08c853 in node::NodePlatform::DrainTasks(v8::Isolate*) ()
#7  0x000055d82dec528d in node::SpinEventLoopInternal(node::Environment*) ()
#8  0x000055d82e04b364 in node::NodeMainInstance::Run() ()
#9  0x000055d82df9276a in node::Start(int, char**) ()
#10 0x00007fca5f81c1ca in __libc_start_call_main (main=main@entry=0x55d82debac50 <main>, argc=argc@entry=2,
    argv=argv@entry=0x7ffdf50a9a48) at ../sysdeps/nptl/libc_start_call_main.h:58
#11 0x00007fca5f81c28b in __libc_start_main_impl (main=0x55d82debac50 <main>, argc=2, argv=0x7ffdf50a9a48,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdf50a9a38)
    at ../csu/libc-start.c:360
#12 0x000055d82dec0785 in _start ()
(gdb)

Also, test/wasi/test-wasi-initialize-validation.js and test/wasi/test-wasi-start-validation.js fail.

$ ./node test/wasi/test-wasi-initialize-validation.js
(node:13188) ExperimentalWarning: WASI is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
Mismatched noop function calls. Expected exactly 1, actual 0.
    at Proxy.mustCall (/home/luigi/node/test/common/index.js:451:10)
    at Object.<anonymous> (/home/luigi/node/test/wasi/test-wasi-initialize-validation.js:154:18)
    at Module._compile (node:internal/modules/cjs/loader:1557:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1702:10)
    at Module.load (node:internal/modules/cjs/loader:1328:32)
    at Module._load (node:internal/modules/cjs/loader:1138:12)
    at TracingChannel.traceSync (node:diagnostics_channel:315:14)
    at wrapModuleLoad (node:internal/modules/cjs/loader:217:24)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:166:5)
$ ./node test/wasi/test-wasi-start-validation.js
(node:13199) ExperimentalWarning: WASI is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
Mismatched noop function calls. Expected exactly 1, actual 0.
    at Proxy.mustCall (/home/luigi/node/test/common/index.js:451:10)
    at Object.<anonymous> (/home/luigi/node/test/wasi/test-wasi-start-validation.js:151:18)
    at Module._compile (node:internal/modules/cjs/loader:1557:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1702:10)
    at Module.load (node:internal/modules/cjs/loader:1328:32)
    at Module._load (node:internal/modules/cjs/loader:1138:12)
    at TracingChannel.traceSync (node:diagnostics_channel:315:14)
    at wrapModuleLoad (node:internal/modules/cjs/loader:217:24)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:166:5)

@avivkeller avivkeller added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label Oct 15, 2024
@avivkeller
Copy link
Member

Hey, I've added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label to this issue. While all issues deserve TLC from volunteers, this one is causing a serious CI issue, and a little extra attention wouldn't hurt.

@avivkeller
Copy link
Member

avivkeller commented Oct 15, 2024

From my testing, the following snippet hangs occassionally, which might be related to this?:

tasks_drained_.Wait(scoped_lock);

outstanding_tasks_ is 2 when it hangs, 0 otherwise. It also looks like one of those tasks is completed, but the other one is not?

Some task isn't completing (but that was probably already known), do we know what task, or how to find out?


Update 1

It seems a task is getting stuck during BlockingPop, which might be connected to the issue.

Update 2

The NodeMainInstance is never properly destructed, which prevents the NodePlatform from shutting down. However, this situation appears to be a catch-22. The NodeMainInstance seems to finish only after DrainTasks is completed, but DrainTasks can’t finish until NodeMainInstance is shut down.

Update 3

Removing platform->DrainTasks(isolate); from SpinEventLoopInternal prevents the error from occurring (the draining still happens elsewhere without interruption, although other errors persist).

Update 4

It's not a catch-22, because the hanging process appears to be hanging before the process destruction would even occur. I suspect that if you bt all the threads, you'll see that DrainTasks isn't the only thing hanging,

Indeed, our hanging worker thread appears to be:

Thread 8 (Thread 0x7fb7c74006c0 (LWP 1322244) "node"):
#0  0x00007fb7cd49e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55a4a5eefdc8) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55a4a5eefdc8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2  0x00007fb7cd49e2ab in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55a4a5eefdc8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007fb7cd4a0990 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55a4a5eefd78, cond=0x55a4a5eefda0) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x55a4a5eefda0, mutex=0x55a4a5eefd78) at ./nptl/pthread_cond_wait.c:618
#5  0x000055a46a13ad8c in void heap::base::Stack::SetMarkerForBackgroundThreadAndCallbackImpl<v8::internal::LocalHeap::ExecuteWhileParked<v8::internal::CollectionBarrier::AwaitCollectionBackground(v8::internal::LocalHeap*)::{lambda()#1}>(v8::internal::CollectionBarrier::AwaitCollectionBackground(v8::internal::LocalHeap*)::{lambda()#1})::{lambda()#1}>(heap::base::Stack*, void*, void const*) ()
#6  0x000055a46acb27e3 in PushAllRegistersAndIterateStack ()
#7  0x000055a46a13b2df in v8::internal::CollectionBarrier::AwaitCollectionBackground(v8::internal::LocalHeap*) ()
#8  0x000055a46a1ad972 in v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) ()
#9  0x000055a46a1ae0a8 in v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) ()
#10 0x000055a46a1e4891 in v8::internal::LocalFactory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) ()
#11 0x000055a46a17a4fe in v8::internal::FactoryBase<v8::internal::LocalFactory>::NewProtectedFixedArray(int) ()
#12 0x000055a46a364a6d in v8::internal::DeoptimizationData::New(v8::internal::LocalIsolate*, int) ()
#13 0x000055a46a7b8d3f in v8::internal::maglev::MaglevCodeGenerator::GenerateDeoptimizationData(v8::internal::LocalIsolate*) ()
#14 0x000055a46a7b991b in v8::internal::maglev::MaglevCodeGenerator::BuildCodeObject(v8::internal::LocalIsolate*) [clone .part.0] ()
#15 0x000055a46a7d9948 in v8::internal::maglev::MaglevCodeGenerator::Assemble() ()
#16 0x000055a46a82f269 in v8::internal::maglev::MaglevCompiler::Compile(v8::internal::LocalIsolate*, v8::internal::maglev::MaglevCompilationInfo*) ()
#17 0x000055a46a830b49 in v8::internal::maglev::MaglevCompilationJob::ExecuteJobImpl(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) ()
#18 0x000055a469ffb6bb in v8::internal::OptimizedCompilationJob::ExecuteJob(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) ()
#19 0x000055a46a831047 in v8::internal::maglev::MaglevConcurrentDispatcher::JobTask::Run(v8::JobDelegate*) ()
#20 0x000055a46b21b45f in v8::platform::DefaultJobWorker::Run() ()
#21 0x000055a469cb125f in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
#22 0x00007fb7cd4a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#23 0x00007fb7cd51c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Update 5

The hanging tasks is pushed via the following bt:

 1: 0x5582238b587b node::TaskQueue<v8::Task>::Push(std::unique_ptr<v8::Task, std::default_delete<v8::Task> >) [out/Release/node]
 2: 0x5582238b3d47 node::NodePlatform::PostTaskOnWorkerThreadImpl(v8::TaskPriority, std::unique_ptr<v8::Task, std::default_delete<v8::Task> >, v8::SourceLocation const&) [out/Release/node]
 3: 0x558224e25304 v8::platform::DefaultJobState::CallOnWorkerThread(v8::TaskPriority, std::unique_ptr<v8::Task, std::default_delete<v8::Task> >) [out/Release/node]
 4: 0x558224e2548f  [out/Release/node]
 5: 0x558223c0ca77  [out/Release/node]
 6: 0x558223c0e337 v8::internal::Compiler::CompileOptimized(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::ConcurrencyMode, v8::internal::CodeKind) [out/Release/node]
 7: 0x55822423e64a v8::internal::Runtime_CompileOptimized(int, unsigned long*, v8::internal::Isolate*) [out/Release/node]
 8: 0x5581c496f276 

So it looks like this is a V8 hanging-task issue? I'll reach out to them and see if they know anything. (https://issues.chromium.org/issues/374285493)

@avivkeller avivkeller added the v8 engine Issues and PRs related to the V8 dependency. label Oct 18, 2024
dubzzz added a commit to dubzzz/fast-check that referenced this issue Oct 29, 2024
Retries are needed for now against Node 23. There is an open ticket on
V8 side (see https://issues.chromium.org/issues/374285493) and one on
Node side (see nodejs/node#54918) regarding
hanging issues of Node 23.
@lpinca
Copy link
Member Author

lpinca commented Nov 19, 2024

From the discussion in https://issues.chromium.org/issues/374285493 (closed as Won't Fix - Intended Behavior) the issue is not in V8 but in Node.js.

lpinca added a commit to lpinca/node that referenced this issue Nov 28, 2024
The issue is the same as other tests that time out.

Refs: nodejs#54918
lpinca added a commit to lpinca/node that referenced this issue Nov 28, 2024
The issue is likely the same as other tests that time out.

Refs: nodejs#54918
lpinca added a commit to lpinca/node that referenced this issue Nov 28, 2024
The issue is likely the same as other tests that time out.

Refs: nodejs#54918
Refs: nodejs#54844
lpinca added a commit to lpinca/node that referenced this issue Nov 28, 2024
The issue is the same as other tests that time out.

Refs: nodejs#54918
Refs: nodejs#54802
lpinca added a commit to lpinca/node that referenced this issue Nov 28, 2024
The issue is likely the same as other tests that time out.

Refs: nodejs#54918
Refs: nodejs#54802
lpinca added a commit to lpinca/node that referenced this issue Nov 28, 2024
The issue is likely the same as other tests that time out.

Refs: nodejs#54918
Refs: nodejs#54844
Refs: nodejs#54802
lpinca added a commit to lpinca/node that referenced this issue Nov 28, 2024
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to
`test/parallel/test-worker-arraybuffer-zerofill.js` and remove the
flaky designation.

The original issue is likely the same as other tests that time out.

Refs: nodejs#54918
Refs: nodejs#54839
Refs: nodejs#54802
aduh95 pushed a commit that referenced this issue Dec 13, 2024
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to
`test/parallel/test-worker-arraybuffer-zerofill.js` and remove the
flaky designation.

The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54839
Refs: #54802
PR-URL: #56053
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
aduh95 pushed a commit that referenced this issue Dec 13, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54844
Refs: #54802
PR-URL: #56052
Reviewed-By: Michaël Zasso <targos@protonmail.com>
aduh95 pushed a commit that referenced this issue Dec 13, 2024
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to
`test/parallel/test-worker-arraybuffer-zerofill.js` and remove the
flaky designation.

The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54839
Refs: #54802
PR-URL: #56053
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
aduh95 pushed a commit that referenced this issue Dec 13, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54844
Refs: #54802
PR-URL: #56052
Reviewed-By: Michaël Zasso <targos@protonmail.com>
aduh95 pushed a commit that referenced this issue Dec 13, 2024
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to
`test/parallel/test-worker-arraybuffer-zerofill.js` and remove the
flaky designation.

The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54839
Refs: #54802
PR-URL: #56053
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
aduh95 pushed a commit that referenced this issue Dec 13, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54844
Refs: #54802
PR-URL: #56052
Reviewed-By: Michaël Zasso <targos@protonmail.com>
nodejs-github-bot pushed a commit that referenced this issue Dec 17, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56051
Reviewed-By: Michaël Zasso <targos@protonmail.com>
aduh95 pushed a commit that referenced this issue Dec 18, 2024
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to
`test/parallel/test-worker-arraybuffer-zerofill.js` and remove the
flaky designation.

The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54839
Refs: #54802
PR-URL: #56053
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
aduh95 pushed a commit that referenced this issue Dec 18, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54844
Refs: #54802
PR-URL: #56052
Reviewed-By: Michaël Zasso <targos@protonmail.com>
aduh95 pushed a commit that referenced this issue Dec 18, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56051
Reviewed-By: Michaël Zasso <targos@protonmail.com>
aduh95 pushed a commit that referenced this issue Dec 18, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56051
Reviewed-By: Michaël Zasso <targos@protonmail.com>
nodejs-github-bot pushed a commit that referenced this issue Dec 20, 2024
The issue is the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56048
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
ruyadorno pushed a commit that referenced this issue Dec 20, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56051
Reviewed-By: Michaël Zasso <targos@protonmail.com>
ruyadorno pushed a commit that referenced this issue Dec 21, 2024
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56051
Reviewed-By: Michaël Zasso <targos@protonmail.com>
lpinca added a commit to lpinca/node that referenced this issue Dec 25, 2024
The original issue is likely the same as other tests that time out.

Refs: nodejs#54918
Refs: nodejs@84c2e712ebcd0f32dc0e
RefS: nodejs#52959
@mcollina
Copy link
Member

@Qard did you work on a similar issue?

@Qard
Copy link
Member

Qard commented Dec 26, 2024

You're thinking of #56191? I don't think it's related, but maybe something that was expected to happen is not now with the bailout when it can't call into JS? Can't see how that'd be possible though. 🤔

@mcollina
Copy link
Member

I mean that it might be a similar bug, more or less in the same spot.

nodejs-github-bot pushed a commit that referenced this issue Dec 28, 2024
The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: 84c2e712ebcd0f32dc0e
RefS: #52959
PR-URL: #56365
Refs: #52959
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Matthew Aitken <maitken033380023@gmail.com>
aduh95 pushed a commit that referenced this issue Jan 2, 2025
The issue is the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56048
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
aduh95 pushed a commit that referenced this issue Jan 2, 2025
The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: 84c2e712ebcd0f32dc0e
RefS: #52959
PR-URL: #56365
Refs: #52959
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Matthew Aitken <maitken033380023@gmail.com>
lpinca added a commit to lpinca/node that referenced this issue Jan 4, 2025
The original issue is likely the same as other tests that time out.

Refs: nodejs#54918
Refs: nodejs#53595
Refs: nodejs#53751
ruyadorno pushed a commit that referenced this issue Jan 5, 2025
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to
`test/parallel/test-worker-arraybuffer-zerofill.js` and remove the
flaky designation.

The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54839
Refs: #54802
PR-URL: #56053
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: LiviaMedeiros <livia@cirno.name>
Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
ruyadorno pushed a commit that referenced this issue Jan 5, 2025
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54844
Refs: #54802
PR-URL: #56052
Reviewed-By: Michaël Zasso <targos@protonmail.com>
ruyadorno pushed a commit that referenced this issue Jan 5, 2025
The issue is likely the same as other tests that time out.

Refs: #54918
Refs: #54802
PR-URL: #56051
Reviewed-By: Michaël Zasso <targos@protonmail.com>
jasnell pushed a commit that referenced this issue Jan 6, 2025
The original issue is likely the same as other tests that time out.

Refs: #54918
Refs: #53595
Refs: #53751
PR-URL: #56472
Reviewed-By: Michael Dawson <midawson@redhat.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

6 participants