Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-125451: Fix deadlock in ProcessPoolExecutor shutdown #125492

Merged
merged 6 commits into from
Oct 16, 2024

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Oct 15, 2024

There was a deadlock when ProcessPoolExecutor shuts down at the same time that a queueing thread handles an error when processing a task. Don't use _shutdown_lock to protect the _ThreadWakeup pipes -- use an internal lock instead. This fixes the ordering deadlock where the ExecutorManagerThread holds the _shutdown_lock and joins the queueing thread, while the queueing thread is attempting to acquire the _shutdown_lock while closing the _ThreadWakeup.

There was a deadlock when `ProcessPoolExecutor` shuts down at the same
time that a queueing thread handles an error when processing a task.
Don't use `_shutdown_lock` to protect the `_ThreadWakeup` pipes -- use
an internal lock instead. This fixes the ordering deadlock where the
`ExecutorManagerThread` holds the `_shutdown_lock` and joins the
queueing thread, while the queueing thread is attempting to acquire the
`_shutdown_lock` while closing the `_ThreadWakeup`.
@colesbury colesbury added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 15, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit 2efa056 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@colesbury colesbury added needs backport to 3.13 bugs and security fixes needs backport to 3.12 bug and security fixes 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Oct 15, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit 2a30e4f 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 15, 2024
@colesbury colesbury marked this pull request as ready for review October 15, 2024 15:46
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Near line 724, please remove the comment:

        # _shutdown_lock must be locked to access _ThreadWakeup (...)

Oh, I made commit a4dfe8e in 2020 to fix another race condition:

commit a4dfe8ede5a37576e17035dccfe109ba7752237e
Author: Victor Stinner <vstinner@python.org>
Date:   Wed Apr 29 03:32:06 2020 +0200

    bpo-39995: Fix concurrent.futures _ThreadWakeup (GH-19760)
    
    Fix a race condition in concurrent.futures._ThreadWakeup: access to
    _ThreadWakeup is now protected with the shutdown lock.

Lib/concurrent/futures/process.py Outdated Show resolved Hide resolved
@vstinner
Copy link
Member

buildbot/x86 Debian Installed with X PR — Build done.
buildbot/x86 Debian Non-Debug with X PR — Build done.

The failure of these 2 workers is known: #125535

@colesbury
Copy link
Contributor Author

!buildbot AMD64 Fedora Stable Refleaks

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit 84db201 🤖

The command will test the builders whose names match following regular expression: AMD64 Fedora Stable Refleaks

The builders matched are:

  • AMD64 Fedora Stable Refleaks PR

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I confirm that this change fix #125451 issue.

Using a lock per _ThreadWakeup, instead of reusing the shutdown lock, sounds like a good idea.

@colesbury colesbury merged commit 760872e into python:main Oct 16, 2024
37 checks passed
@miss-islington-app
Copy link

Thanks @colesbury for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@colesbury colesbury deleted the gh-125451-test_processes_terminate branch October 16, 2024 15:39
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 16, 2024
…GH-125492)

There was a deadlock when `ProcessPoolExecutor` shuts down at the same
time that a queueing thread handles an error processing a task.

Don't use `_shutdown_lock` to protect the `_ThreadWakeup` pipes -- use
an internal lock instead. This fixes the ordering deadlock where the
`ExecutorManagerThread` holds the `_shutdown_lock` and joins the
queueing thread, while the queueing thread is attempting to acquire the
`_shutdown_lock` while closing the `_ThreadWakeup`.
(cherry picked from commit 760872e)

Co-authored-by: Sam Gross <colesbury@gmail.com>
@miss-islington-app
Copy link

Sorry, @colesbury, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 760872efecb95017db8e38a8eda614bf23d2a22c 3.12

@bedevere-app
Copy link

bedevere-app bot commented Oct 16, 2024

GH-125598 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 16, 2024
colesbury added a commit to colesbury/cpython that referenced this pull request Oct 16, 2024
…ythonGH-125492)

There was a deadlock when `ProcessPoolExecutor` shuts down at the same
time that a queueing thread handles an error processing a task.

Don't use `_shutdown_lock` to protect the `_ThreadWakeup` pipes -- use
an internal lock instead. This fixes the ordering deadlock where the
`ExecutorManagerThread` holds the `_shutdown_lock` and joins the
queueing thread, while the queueing thread is attempting to acquire the
`_shutdown_lock` while closing the `_ThreadWakeup`.
(cherry picked from commit 760872e)

Co-authored-by: Sam Gross <colesbury@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Oct 16, 2024

GH-125599 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Oct 16, 2024
colesbury added a commit that referenced this pull request Oct 16, 2024
…5492) (GH-125598)

There was a deadlock when `ProcessPoolExecutor` shuts down at the same
time that a queueing thread handles an error processing a task.

Don't use `_shutdown_lock` to protect the `_ThreadWakeup` pipes -- use
an internal lock instead. This fixes the ordering deadlock where the
`ExecutorManagerThread` holds the `_shutdown_lock` and joins the
queueing thread, while the queueing thread is attempting to acquire the
`_shutdown_lock` while closing the `_ThreadWakeup`.
(cherry picked from commit 760872e)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit that referenced this pull request Oct 16, 2024
…5492) (#125599)

There was a deadlock when `ProcessPoolExecutor` shuts down at the same
time that a queueing thread handles an error processing a task.

Don't use `_shutdown_lock` to protect the `_ThreadWakeup` pipes -- use
an internal lock instead. This fixes the ordering deadlock where the
`ExecutorManagerThread` holds the `_shutdown_lock` and joins the
queueing thread, while the queueing thread is attempting to acquire the
`_shutdown_lock` while closing the `_ThreadWakeup`.
(cherry picked from commit 760872e)
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
…#125492)

There was a deadlock when `ProcessPoolExecutor` shuts down at the same
time that a queueing thread handles an error processing a task.

Don't use `_shutdown_lock` to protect the `_ThreadWakeup` pipes -- use
an internal lock instead. This fixes the ordering deadlock where the
`ExecutorManagerThread` holds the `_shutdown_lock` and joins the
queueing thread, while the queueing thread is attempting to acquire the
`_shutdown_lock` while closing the `_ThreadWakeup`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants