Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-104341: Fix threading Module Shutdown #104560

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented May 16, 2023

This should fix the frequent crashes in test_threading and test__xxsubinterpreters.

There is some duplication with the threading module which could be removed after this lands. We can also eliminate PyThreadState.on_delete and PyInterpreterState.threads.count, both of which are cases where the state of the threading module leaked out into the runtime.

(Also see gh-63008 and gh-18296.)

Comment on lines 86 to 92
PyThread_acquire_lock(threads->mutex, WAIT_LOCK);
while (threads->head != NULL) {
PyThread_release_lock(threads->mutex);
// XXX Sleep?
PyThread_acquire_lock(threads->mutex, WAIT_LOCK);
}
PyThread_release_lock(threads->mutex);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this doesn't feel right. I'm going to look at what my options are. A condvar would work, but that doesn't seem to be something we do in CPython outside the GIL.

(I'm also open to ideas.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've come up with a slightly better approach.

@ericsnowcurrently ericsnowcurrently added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 16, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ericsnowcurrently for commit 51d960e 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 16, 2023
@@ -22,7 +22,243 @@
static struct PyModuleDef thread_module;


/* threads owned by the modulo */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ericsnowcurrently
Copy link
Member Author

FTR, I ran the buildbots on commit 51d960e: https://buildbot.python.org/all/#/changes/22637.

@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented May 17, 2023

Based on CI and the buildbots, I'm thinking there may be something wrong with this PR on macOS. I'll take a look.


Run failed when multiple fork-related tests timed out:

On the "x86-64 macOS PR" builder, the run was cancelled 3 times:

@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented May 17, 2023

CI run 1

(https://github.com/python/cpython/actions/runs/4996253064/jobs/8949214929)

12 test modules failed before the CI job was canceled.

(details)

test module timed out after 20 minutes (1200s):

  • test_multiprocessing_main_handling (test_directory_compiled)
  • test_httpservers (test_accept)
  • test_fork1 (test_wait in fork_wait.py)
  • test_thread (test_forkinthread)
  • test_socketserver (test_forking_handled)
  • test_asyncio (test_fork_asyncio_run)
  • test_os (test_fork_warns_when_non_python_thread_exists)

individual tests timed out:

  • test.fork_wait.ForkWait.test_wait, via test_wait4 (300s)
  • test.test_wait4.Wait4Test.test_wait (30s)
  • test.test_tracemalloc.TestTracemallocEnabled.test_fork (300s)
  • test.test_pty.PtyTest.test_fork
  • test.test_pty.PtyTest.test_spawn_doesnt_hang
  • test.test_uuid.TestUUIDWithExtModule.testIssue8621 (300s)
  • test.test_uuid.TestUUIDWithoutExtModule.testIssue8621 (300s)
  • test.test_support.TestSupport.test_reap_children (30s)
  • test.test_support.TestSupport.test_temp_dir__forked_child (300s)

still running:

  • test_mailbox
  • test_wait3
  • test_tempfile
  • test_random

CI run 2

(https://github.com/python/cpython/actions/runs/4997616864/jobs/8952165246)

12 test modules failed before the CI job was canceled.

(details)

test module timed out after 20 minutes (1200s):

  • test_thread (test_forkinthread)
  • test_subprocess (test_close_fds_after_preexec)
  • test_concurrent_futures (tearDown)
  • test_builtin (test_input_no_stdout_fileno)
  • test_socketserver (test_forking_handled)
  • test_httpservers (test_accept)
  • test.test_asyncio.test_unix_events (test_fork_asyncio_run)

individual tests timed out:

  • test.fork_wait.ForkWait.test_wait, via test_wait3 (300s)
  • test.test_wait3.Wait3Test.test_wait (30s)
  • test.test_uuid.TestUUIDWithExtModule.testIssue8621 (300s)
  • test.test_uuid.TestUUIDWithoutExtModule.testIssue8621 (300s)
  • test.fork_wait.ForkWait.test_wait, via test_wait4 (300s)
  • test.test_wait4.Wait4Test.test_wait (30s)
  • test.test_pty.PtyTest.test_fork
  • test.test_logging.HandlerTest.test_post_fork_child_no_deadlock (300s)

still running:

  • test_multiprocessing_spawn
  • test_multiprocessing_main_handling
  • test_os
  • test_tempfile

Buildbot (ARM64 macOS PR)

(https://buildbot.python.org/all/#/builders/721/builds/1163/steps/5/logs/stdio)

25 test modules failed

(details)

test module timed out after 15 minutes (900s):

  • test_httpservers (test_accept)
  • test_threading (test_clear_threads_states_after_fork)
  • test_fork1 (test_wait in fork_wait.py)
  • test_concurrent_futures (test_duplicate_futures)
  • test_multiprocessing_spawn (test_context)
  • test_tempfile (test_noinherit)
  • test_socketserver (test_forking_handled)
  • test_os (test_fork_warns_when_non_python_thread_exists)
  • test_mailbox (test_lock_conflict)
  • test_subprocess (test_close_fds_after_preexec)
  • test_thread (test_forkinthread)
  • test.test_asyncio.test_unix_events (test_fork_asyncio_run)
  • test_multiprocessing_main_handling (test_directory)
  • test_builtin (test_input_no_stdout_fileno)
  • test_cmd_line (test_no_std_streams)
  • test_random (test_after_fork)
  • test_multiprocessing_forkserver (test_closefd)

individual tests timed out:

  • test.test_tracemalloc.TestTracemallocEnabled.test_fork (300s)
  • test.fork_wait.ForkWait.test_wait, via test_wait4 (300s)
  • test.test_wait4.Wait4Test.test_wait (30s)
  • test.test_support.TestSupport.test_reap_children (30s)
  • test.test_support.TestSupport.test_temp_dir__forked_child (300s)
  • test.fork_wait.ForkWait.test_wait, via test_wait3 (300s)
  • test.test_wait3.Wait3Test.test_wait (30s)
  • test.test_platform.PlatformTest.test_mac_ver_with_fork (300s)
  • test.test_pty.PtyTest.test_fork
  • test.test_pty.PtyTest.test_spawn_doesnt_hang
  • test.test_logging.HandlerTest.test_post_fork_child_no_deadlock (300s)
  • test.test_uuid.TestUUIDWithExtModule.testIssue8621 (300s)
  • test.test_uuid.TestUUIDWithoutExtModule.testIssue8621 (300s)

@arhadthedev
Copy link
Member

arhadthedev commented May 17, 2023

I reran the job and got the same result (the tests are not done yet but started to time out already).

@ericsnowcurrently ericsnowcurrently marked this pull request as draft May 18, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants