-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock at process shutdown #54918
Comments
As written in #52550 (comment), applying the patch from #47452 seems to fix the issue, but it is not a solution since that patch brings other issues. |
This can been seen in various CI runs, assuming confirmed-bug |
The output of
|
@aduh95 see #54918 (comment). It seems the same patch. |
No it's not, it's based on it but it's meant to fix the test failures. |
The issue persists 😞
Also,
|
Hey, I've added the
help wanted
|
From my testing, the following snippet hangs occassionally, which might be related to this?: Line 640 in 2545b9e
Some task isn't completing (but that was probably already known), do we know what task, or how to find out? Update 1 It seems a task is getting stuck during
Update 4 It's not a catch-22, because the hanging process appears to be hanging before the process destruction would even occur. I suspect that if you Indeed, our hanging worker thread appears to be: Thread 8 (Thread 0x7fb7c74006c0 (LWP 1322244) "node"):
#0 0x00007fb7cd49e22e in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55a4a5eefdc8) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55a4a5eefdc8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2 0x00007fb7cd49e2ab in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55a4a5eefdc8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007fb7cd4a0990 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55a4a5eefd78, cond=0x55a4a5eefda0) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x55a4a5eefda0, mutex=0x55a4a5eefd78) at ./nptl/pthread_cond_wait.c:618
#5 0x000055a46a13ad8c in void heap::base::Stack::SetMarkerForBackgroundThreadAndCallbackImpl<v8::internal::LocalHeap::ExecuteWhileParked<v8::internal::CollectionBarrier::AwaitCollectionBackground(v8::internal::LocalHeap*)::{lambda()#1}>(v8::internal::CollectionBarrier::AwaitCollectionBackground(v8::internal::LocalHeap*)::{lambda()#1})::{lambda()#1}>(heap::base::Stack*, void*, void const*) ()
#6 0x000055a46acb27e3 in PushAllRegistersAndIterateStack ()
#7 0x000055a46a13b2df in v8::internal::CollectionBarrier::AwaitCollectionBackground(v8::internal::LocalHeap*) ()
#8 0x000055a46a1ad972 in v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) ()
#9 0x000055a46a1ae0a8 in v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) ()
#10 0x000055a46a1e4891 in v8::internal::LocalFactory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) ()
#11 0x000055a46a17a4fe in v8::internal::FactoryBase<v8::internal::LocalFactory>::NewProtectedFixedArray(int) ()
#12 0x000055a46a364a6d in v8::internal::DeoptimizationData::New(v8::internal::LocalIsolate*, int) ()
#13 0x000055a46a7b8d3f in v8::internal::maglev::MaglevCodeGenerator::GenerateDeoptimizationData(v8::internal::LocalIsolate*) ()
#14 0x000055a46a7b991b in v8::internal::maglev::MaglevCodeGenerator::BuildCodeObject(v8::internal::LocalIsolate*) [clone .part.0] ()
#15 0x000055a46a7d9948 in v8::internal::maglev::MaglevCodeGenerator::Assemble() ()
#16 0x000055a46a82f269 in v8::internal::maglev::MaglevCompiler::Compile(v8::internal::LocalIsolate*, v8::internal::maglev::MaglevCompilationInfo*) ()
#17 0x000055a46a830b49 in v8::internal::maglev::MaglevCompilationJob::ExecuteJobImpl(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) ()
#18 0x000055a469ffb6bb in v8::internal::OptimizedCompilationJob::ExecuteJob(v8::internal::RuntimeCallStats*, v8::internal::LocalIsolate*) ()
#19 0x000055a46a831047 in v8::internal::maglev::MaglevConcurrentDispatcher::JobTask::Run(v8::JobDelegate*) ()
#20 0x000055a46b21b45f in v8::platform::DefaultJobWorker::Run() ()
#21 0x000055a469cb125f in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
#22 0x00007fb7cd4a1732 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#23 0x00007fb7cd51c2b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 Update 5 The hanging tasks is pushed via the following bt:
So it looks like this is a V8 hanging-task issue? I'll reach out to them and see if they know anything. (https://issues.chromium.org/issues/374285493) |
Retries are needed for now against Node 23. There is an open ticket on V8 side (see https://issues.chromium.org/issues/374285493) and one on Node side (see nodejs/node#54918) regarding hanging issues of Node 23.
From the discussion in https://issues.chromium.org/issues/374285493 (closed as Won't Fix - Intended Behavior) the issue is not in V8 but in Node.js. |
The issue is the same as other tests that time out. Refs: nodejs#54918
The issue is likely the same as other tests that time out. Refs: nodejs#54918
The issue is likely the same as other tests that time out. Refs: nodejs#54918 Refs: nodejs#54844
The issue is the same as other tests that time out. Refs: nodejs#54918 Refs: nodejs#54802
The issue is likely the same as other tests that time out. Refs: nodejs#54918 Refs: nodejs#54802
The issue is likely the same as other tests that time out. Refs: nodejs#54918 Refs: nodejs#54844 Refs: nodejs#54802
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to `test/parallel/test-worker-arraybuffer-zerofill.js` and remove the flaky designation. The original issue is likely the same as other tests that time out. Refs: nodejs#54918 Refs: nodejs#54839 Refs: nodejs#54802
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to `test/parallel/test-worker-arraybuffer-zerofill.js` and remove the flaky designation. The original issue is likely the same as other tests that time out. Refs: #54918 Refs: #54839 Refs: #54802 PR-URL: #56053 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to `test/parallel/test-worker-arraybuffer-zerofill.js` and remove the flaky designation. The original issue is likely the same as other tests that time out. Refs: #54918 Refs: #54839 Refs: #54802 PR-URL: #56053 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to `test/parallel/test-worker-arraybuffer-zerofill.js` and remove the flaky designation. The original issue is likely the same as other tests that time out. Refs: #54918 Refs: #54839 Refs: #54802 PR-URL: #56053 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to `test/parallel/test-worker-arraybuffer-zerofill.js` and remove the flaky designation. The original issue is likely the same as other tests that time out. Refs: #54918 Refs: #54839 Refs: #54802 PR-URL: #56053 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
The original issue is likely the same as other tests that time out. Refs: nodejs#54918 Refs: nodejs@84c2e712ebcd0f32dc0e RefS: nodejs#52959
@Qard did you work on a similar issue? |
You're thinking of #56191? I don't think it's related, but maybe something that was expected to happen is not now with the bailout when it can't call into JS? Can't see how that'd be possible though. 🤔 |
I mean that it might be a similar bug, more or less in the same spot. |
The original issue is likely the same as other tests that time out. Refs: #54918 Refs: 84c2e712ebcd0f32dc0e RefS: #52959 PR-URL: #56365 Refs: #52959 Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Matthew Aitken <maitken033380023@gmail.com>
The original issue is likely the same as other tests that time out. Refs: #54918 Refs: 84c2e712ebcd0f32dc0e RefS: #52959 PR-URL: #56365 Refs: #52959 Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Matthew Aitken <maitken033380023@gmail.com>
The original issue is likely the same as other tests that time out. Refs: nodejs#54918 Refs: nodejs#53595 Refs: nodejs#53751
Move `test/sequential/test-worker-arraybuffer-zerofill.js` back to `test/parallel/test-worker-arraybuffer-zerofill.js` and remove the flaky designation. The original issue is likely the same as other tests that time out. Refs: #54918 Refs: #54839 Refs: #54802 PR-URL: #56053 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Jake Yuesong Li <jake.yuesong@gmail.com>
Version
v23.0.0-pre
Platform
Subsystem
No response
What steps will reproduce the bug?
There is a deadlock that prevents the Node.js process from exiting. The issue is causing a lot (all?) of timeout failures in our CI. It can be reproduced by running a test in parallel with our
test.py
tool, for exampleSee also
test-stream-readable-unpipe-resume
#54133 (comment)test-net-write-fully-async-buffer
as flaky #52959 (comment)How often does it reproduce? Is there a required condition?
Rarely, but often enough to be a serious issue for CI.
What is the expected behavior? Why is that the expected behavior?
The process exits.
What do you see instead?
The process does not exit.
Additional information
Attaching
gdb
to two of the hanging processes obtained from the command above, produces the following outputs:The text was updated successfully, but these errors were encountered: