Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-112069: Make sets thread-safe with the GIL disabled #113800

Merged
merged 28 commits into from
Mar 8, 2024

Conversation

tomasr8
Copy link
Member

@tomasr8 tomasr8 commented Jan 7, 2024

Still have some failing tests in the GIL version so marking as a draft for now until I manage to fix them.
This PR is mostly based on Sam's version: colesbury/nogil-3.12@4ca2924f0d

Since sets have a lot of methods that are exposed, I made this list to keep track and help with the review:

set methods

Method Thread-safe (comments)
set_add
set_clear
set.__contains__
set_copy
set_discard
set_difference_multi
set_difference_update
set_intersection_multi
set_intersection_update_multi
set_isdisjoint
set_issubset
set_issuperset
set_pop
set.__reduce__
set_remove
set.__sizeof__
set_symmetric_difference
set_symmetric_difference_update
set_union
set_update

set_as_number

Method Thread-safe (comments)
set_sub
set_and
set_xor ✅ (via set_symmetric_difference)
set_or
set_isub
set_iand
set_ixor
set_ior

set tp_..

Method Thread-safe (comments)
set_dealloc Not needed
set_repr
set_as_sequence(set_len, set_contains)
set_traverse Not needed
set_clear_internal Not needed
set_richcompare ❌ (will be done separately)
set_iter
set_init
set_new
set_vectorcall

There are also some extra frozenset methods which I don't think need locking: frozenset_copy, frozenset_hash, frozenset_new, frozenset_vectorcall. The rest of frozenset methods are shared with sets.

C API methods:

Method Thread-safe (comments)
PySet_New
PyFrozenSet_New
PySet_Size ✅ (via set_len)
PySet_Clear ✅ (via set_clear)
PySet_Contains
PySet_Discard
PySet_Add
_PySet_NextEntry
PySet_Pop ✅ (via set_pop)
_PySet_Update

@colesbury
Copy link
Contributor

Thanks @tomasr8! I will look at this today or tomorrow.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tomasr8, thanks so much for working on this!

First, please let me know if you would like help debugging the test failures.

One of the challenges with this is that it's hard to know at a glance if a function does locking internally or assumes that the caller is supposed to do the locking. I think this will be easier to maintain if we generally push the locking to the outermost functions. Argument clinic makes this a bit easier by generating some of the critical section calls. I think we should:

  1. Convert the functions to use argument clinic with the @critical_section directive.
  2. For C APIs (like PySet_Add()) put the critical section code in the outermost function.

Sorry, this suggestion wasn't mentioned in the issue -- support for @critical_section in Argument Clinic was added a few days after I wrote up the set issue.

At the top of the file:

/*[clinic input]
class set "PySetObject *" "&PySet_Type"
[clinic start generated code]*/
/*[clinic end generated code: output=da39a3ee5e6b4b0d input=abe13a1b24961902]*/

#include "clinic/setobject.c.h"

And then, set_add would look like:

/*[clinic input]
@critical_section
set.add

    key: object
    /

Add an element to a set.

This has no effect if the element is already present.
[clinic start generated code]*/

static PyObject *
set_add_impl(PySetObject *self, PyObject *key)
/*[clinic end generated code: output=8d849b1bd2bd8b3a input=33fb8030b1ad21f5]*/
{
    if (set_add_key(self, key))
        return NULL;
    Py_RETURN_NONE;
}

Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
@colesbury
Copy link
Contributor

On the thread-safety table:

Do not need to be thread-safe: set_dealloc , set_traverse, set_clear_internal (i.e., tp_clear hook).

set_richcompare should be thread-safe, but we may want to defer and just make a note of it
_PySet_NextEntry needs to be addressed separately due to borrowed reference counts
set_iter should be thread-safe. I don't think it needs locking, just load the size once via an atomic op.

@tomasr8
Copy link
Member Author

tomasr8 commented Jan 9, 2024

Thanks a lot for the review! I wasn't aware of the new AC directive - I'll update the PR. As for the test failures, I haven't had the time yet to really dig into it, hopefully I'll have the time tomorrow. If I can't figure it out myself, I'll let you know :)

@erlend-aasland
Copy link
Contributor

This is growing to be a large diff. I think we should consider splitting this up in two PRs:

  1. adapt the needed methods to Argument Clinic
  2. make it thread safe

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 6, 2024

This is growing to be a large diff. I think we should consider splitting this up in two PRs:

1. adapt the needed methods to Argument Clinic

2. make it thread safe

Makes sense. I'll open a separate PR for the AC changes once I figure out how to convert all the methods :)

@erlend-aasland
Copy link
Contributor

I'll open a separate PR for the AC changes once I figure out how to convert all the methods :)

Great :) You can open a draft PR and ping me if you need help!

@tomasr8
Copy link
Member Author

tomasr8 commented Feb 11, 2024

I've adapted the PR to use @critical_section wherever I could. Everywhere else I tried to put the critical sections in the outermost functions.

The only problem I still have is that the nogil build fails when I use the critical section macros in the C API functions:

./configure --with-pydebug --disable-gil --config-cache
make -s -j2
Segmentation fault (core dumped)
make: *** [Makefile:1607: Python/frozen_modules/getpath.h] Error 139
make: *** Waiting for unfinished jobs....
Segmentation fault (core dumped)
make: *** [Makefile:1612: Python/frozen_modules/importlib._bootstrap.h] Error 139

I'm guessing I might've forgotten some import or to regenerate some file? getpath.h is rather cryptic as well. What's weird is that I can build it just fine with the GIL enabled..

@tomasr8 tomasr8 marked this pull request as ready for review February 11, 2024 21:49
@tomasr8 tomasr8 requested a review from rhettinger as a code owner February 11, 2024 21:49
@colesbury
Copy link
Contributor

Hi @tomasr8, the crash is because PySet_New(iterable) accepts a NULL iterable, but Py_BEGIN_CRITICAL_SECTION(iterable) does not work with a NULL argument and will crash. Here's how I debugged it:

  1. The make output indicated that the ./Programs/_freeze_module invocations were crashing. _freeze_module is a C program that embeds Python.
  2. I ran gdb --args ./Programs/_freeze_module getpath ./Modules/getpath.py Python/frozen_modules/getpath.h to re-run the command under GDB
  3. I typed run to run the program and then, after the Segmentation Fault, I ran bt (for backtrace)
  4. The first three lines of the backtrace look like:
#1  PyMutex_LockFast (lock_bits=0xa <error: Cannot access memory at address 0xa>) at ./Include/internal/pycore_lock.h:61
#2  _PyCriticalSection_Begin (m=0xa, c=0x7fffffffd980) at ./Include/internal/pycore_critical_section.h:179
#3  PySet_New (iterable=iterable@entry=0x0) at Objects/setobject.c:2543
  1. The first thing that stood out to me was m=0xa, which doesn't look like a valid pointer. These small but not quite zero pointers are common when you dereference a field from a NULL pointer. The second thing I noticed was iterable=iterable@entry=0x0, which is the same as NULL.

I'll add some suggestions on the PR.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good overall.

  • Consider adding _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED(obj); to functions where you assume that the caller locks the object. For example, set_update_internal would be a good candidate
  • Dino added a few critical sections here that should probably be removed now that you are doing locking in the callers to those functions: in set_update_internal (for the dict) and set_symmetric_difference_update.
  • Since iterable may be NULL, do the critical section inside make_new_set instead of the callers to make_new_set

Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
@tomasr8
Copy link
Member Author

tomasr8 commented Feb 19, 2024

To give an update, now that Py_XBEGIN_CRITICAL_SECTION can be used, I just need to move the locking from make_new_set to PySet_New and PyFrozenSet_New.

However there is a small issue which is that set_update_internal (called by make_new_set) requires both its arguments to be locked. I could simply keep the locking in make_new_set but as @colesbury mentioned that would be quite inefficient. I think the assertions in set_update_internal are useful so I'm trying to come up with a solution where we can keep them but also avoid locking in make_new_set.

@colesbury colesbury self-requested a review March 6, 2024 22:03
@colesbury
Copy link
Contributor

Hi @tomasr8, thanks so much for the work you've done in making set thread-safe. The remaining pieces are a bit tricky, and I don't think I can give helpful advice without actually making the edits, so I'm going to make some changes directly to the PR.

@colesbury colesbury requested a review from DinoV March 7, 2024 19:17
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style nitpicks.

Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
Objects/setobject.c Outdated Show resolved Hide resolved
colesbury and others added 2 commits March 7, 2024 17:34
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
{
return ((PySetObject *)so)->used;
return _Py_atomic_load_ssize_relaxed(&so->used);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be FT_ATOMIC_LOAD_SSIZE_RELAXED(so->used) now

return -1;
if (!PyArg_UnpackTuple(args, Py_TYPE(self)->tp_name, 0, 1, &iterable))
return -1;

Py_BEGIN_CRITICAL_SECTION(self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth inlining set_update_internal into here? It'd mean we'd only take the critical section once, and also so that the clear and update are done atomically. As of now someone could call __init__ on multiple threads and end up with elements from both callers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more complicate than just inlining set_update_internal because sometimes we want to lock both self and iterable, sometimes iterable does not need to be locked, and sometimes iterable is NULL. Additionally, we would like to avoid locking self if it's a newly created and not yet visible, which is the most common case.

I'd like to address this, but in a subsequent PR.

@tomasr8
Copy link
Member Author

tomasr8 commented Mar 8, 2024

Hi @tomasr8, thanks so much for the work you've done in making set thread-safe. The remaining pieces are a bit tricky, and I don't think I can give helpful advice without actually making the edits, so I'm going to make some changes directly to the PR.

Go for it :) Sorry for the lack of activity, I've been sick for the last couple of weeks :/

@colesbury
Copy link
Contributor

@tomasr8, I hope you feel better soon and no need to apologize. I really appreciate the work you've done on set and previously with making hashlib thread-safe.

@colesbury colesbury merged commit c951e25 into python:main Mar 8, 2024
32 checks passed
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 Debian root 3.x has failed when building commit c951e25.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/345/builds/7375) and take a look at the build logs.
  4. Check if the failure is related to this commit (c951e25) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/345/builds/7375

Summary of the results of the build (if available):

Click to see traceback logs
remote: Enumerating objects: 6, done.        
remote: Counting objects:  16% (1/6)        
remote: Counting objects:  33% (2/6)        
remote: Counting objects:  50% (3/6)        
remote: Counting objects:  66% (4/6)        
remote: Counting objects:  83% (5/6)        
remote: Counting objects: 100% (6/6)        
remote: Counting objects: 100% (6/6), done.        
remote: Compressing objects:  25% (1/4)        
remote: Compressing objects:  50% (2/4)        
remote: Compressing objects:  75% (3/4)        
remote: Compressing objects: 100% (4/4)        
remote: Compressing objects: 100% (4/4), done.        
remote: Total 6 (delta 2), reused 2 (delta 2), pack-reused 0        
From https://github.com/python/cpython
 * branch                  main       -> FETCH_HEAD
Note: switching to 'c951e25c24910064a4c8b7959e2f0f7c0d4d0a63'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at c951e25c24 gh-112069: Make sets thread-safe with the GIL disabled (#113800)
Switched to and reset branch 'main'

configure: WARNING: pkg-config is missing. Some dependencies may not be detected correctly.

Fatal Python error: init_import_site: Failed to import the site module
Python runtime state: initialized
Illegal instruction
make: *** [Makefile:1682: Python/frozen_modules/ntpath.h] Error 132

find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
find: ‘build’: No such file or directory
make: [Makefile:3085: clean-retain-profile] Error 1 (ignored)

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 Debian root 3.x has failed when building commit c951e25.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/345/builds/7376) and take a look at the build logs.
  4. Check if the failure is related to this commit (c951e25) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/345/builds/7376

Failed tests:

  • test_long

Failed subtests:

  • test_bitop_identities - test.test_long.LongTest.test_bitop_identities
  • test_ordinal_conversions - test.datetimetester.TestDate_Pure.test_ordinal_conversions

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/datetimetester.py", line 1178, in test_ordinal_conversions
    self.assertEqual(d.toordinal(), n)
                     ~~~~~~~~~~~^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/_pydatetime.py", line 1095, in toordinal
    return _ymd2ord(self._year, self._month, self._day)
           ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/_pydatetime.py", line 74, in _ymd2ord
    assert 1 <= day <= dim, ('day must be in 1..%d' % dim)
           ^^^^^^^^^^^^^^^
AssertionError: day must be in 1..30


Traceback (most recent call last):
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_long.py", line 253, in check_bitop_identities_1
    eq(x << n >> n, x)
    ~~^^^^^^^^^^^^^^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/unittest/case.py", line 885, in assertEqual
    assertion_func(first, second, msg=msg)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable


Traceback (most recent call last):
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_long.py", line 254, in check_bitop_identities_1
    eq(x // p2, x >> n)
    ~~^^^^^^^^^^^^^^^^^
AssertionError: 262143 != 17179869199


Traceback (most recent call last):
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_long.py", line 283, in test_bitop_identities
    self.check_bitop_identities_1(x)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_long.py", line 234, in check_bitop_identities_1
    with self.subTest(x=x):
    ...<14 lines>...
        eq(-x, ~(x-1))
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/contextlib.py", line 141, in __enter__
    return next(self.gen)
           ~~~~^^^^^^^^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/unittest/case.py", line 538, in subTest
    with self._outcome.testPartExecutor(self._subtest, subTest=True):
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/contextlib.py", line 305, in helper
    return _GeneratorContextManager(func, args, kwds)
           ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/contextlib.py", line 112, in __init__
    doc = getattr(func, "__doc__", None)
          ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'str' object is not callable


Traceback (most recent call last):
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_long.py", line 257, in check_bitop_identities_1
    eq(x & -p2, x & ~(p2 - 1))
    ~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/unittest/case.py", line 885, in assertEqual
    assertion_func(first, second, msg=msg)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/unittest/case.py", line 690, in __call__
    return self.run(*args, **kwds)
           ~~~~~~~~^^^^^^^^^^^^^^^
TypeError: TestCase.run() got an unexpected keyword argument 'msg'

@colesbury
Copy link
Contributor

Hmmm... the failures are really confusing. It's unclear to me if they are related to this PR. So far, I don't think I've seen failures on other buildbots.

colesbury added a commit to colesbury/cpython that referenced this pull request Mar 8, 2024
…ython#113800)"

The "AMD64 Debian root 3.x" is failing with strange errors.

This reverts commit c951e25.
@tomasr8 tomasr8 deleted the gil-set branch March 9, 2024 21:01
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024
…113800)

This makes nearly all the operations on set thread-safe in the free-threaded build, with the exception of `_PySet_NextEntry` and `setiter_iternext`.

Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…113800)

This makes nearly all the operations on set thread-safe in the free-threaded build, with the exception of `_PySet_NextEntry` and `setiter_iternext`.

Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants