-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-112069: Make sets thread-safe with the GIL disabled #113800
Conversation
Thanks @tomasr8! I will look at this today or tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tomasr8, thanks so much for working on this!
First, please let me know if you would like help debugging the test failures.
One of the challenges with this is that it's hard to know at a glance if a function does locking internally or assumes that the caller is supposed to do the locking. I think this will be easier to maintain if we generally push the locking to the outermost functions. Argument clinic makes this a bit easier by generating some of the critical section calls. I think we should:
- Convert the functions to use argument clinic with the
@critical_section
directive. - For C APIs (like
PySet_Add()
) put the critical section code in the outermost function.
Sorry, this suggestion wasn't mentioned in the issue -- support for @critical_section
in Argument Clinic was added a few days after I wrote up the set issue.
At the top of the file:
/*[clinic input]
class set "PySetObject *" "&PySet_Type"
[clinic start generated code]*/
/*[clinic end generated code: output=da39a3ee5e6b4b0d input=abe13a1b24961902]*/
#include "clinic/setobject.c.h"
And then, set_add
would look like:
/*[clinic input]
@critical_section
set.add
key: object
/
Add an element to a set.
This has no effect if the element is already present.
[clinic start generated code]*/
static PyObject *
set_add_impl(PySetObject *self, PyObject *key)
/*[clinic end generated code: output=8d849b1bd2bd8b3a input=33fb8030b1ad21f5]*/
{
if (set_add_key(self, key))
return NULL;
Py_RETURN_NONE;
}
On the thread-safety table: Do not need to be thread-safe:
|
Thanks a lot for the review! I wasn't aware of the new AC directive - I'll update the PR. As for the test failures, I haven't had the time yet to really dig into it, hopefully I'll have the time tomorrow. If I can't figure it out myself, I'll let you know :) |
This is growing to be a large diff. I think we should consider splitting this up in two PRs:
|
Makes sense. I'll open a separate PR for the AC changes once I figure out how to convert all the methods :) |
Great :) You can open a draft PR and ping me if you need help! |
I've adapted the PR to use The only problem I still have is that the nogil build fails when I use the critical section macros in the C API functions: ./configure --with-pydebug --disable-gil --config-cache
make -s -j2
Segmentation fault (core dumped)
make: *** [Makefile:1607: Python/frozen_modules/getpath.h] Error 139
make: *** Waiting for unfinished jobs....
Segmentation fault (core dumped)
make: *** [Makefile:1612: Python/frozen_modules/importlib._bootstrap.h] Error 139 I'm guessing I might've forgotten some import or to regenerate some file? |
Hi @tomasr8, the crash is because
I'll add some suggestions on the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good overall.
- Consider adding
_Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED(obj);
to functions where you assume that the caller locks the object. For example,set_update_internal
would be a good candidate - Dino added a few critical sections here that should probably be removed now that you are doing locking in the callers to those functions: in
set_update_internal
(for the dict) andset_symmetric_difference_update
. - Since
iterable
may be NULL, do the critical section insidemake_new_set
instead of the callers tomake_new_set
To give an update, now that However there is a small issue which is that |
Hi @tomasr8, thanks so much for the work you've done in making set thread-safe. The remaining pieces are a bit tricky, and I don't think I can give helpful advice without actually making the edits, so I'm going to make some changes directly to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style nitpicks.
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Objects/setobject.c
Outdated
{ | ||
return ((PySetObject *)so)->used; | ||
return _Py_atomic_load_ssize_relaxed(&so->used); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be FT_ATOMIC_LOAD_SSIZE_RELAXED(so->used)
now
return -1; | ||
if (!PyArg_UnpackTuple(args, Py_TYPE(self)->tp_name, 0, 1, &iterable)) | ||
return -1; | ||
|
||
Py_BEGIN_CRITICAL_SECTION(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth inlining set_update_internal
into here? It'd mean we'd only take the critical section once, and also so that the clear and update are done atomically. As of now someone could call __init__
on multiple threads and end up with elements from both callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more complicate than just inlining set_update_internal
because sometimes we want to lock both self
and iterable
, sometimes iterable
does not need to be locked, and sometimes iterable
is NULL. Additionally, we would like to avoid locking self
if it's a newly created and not yet visible, which is the most common case.
I'd like to address this, but in a subsequent PR.
Go for it :) Sorry for the lack of activity, I've been sick for the last couple of weeks :/ |
@tomasr8, I hope you feel better soon and no need to apologize. I really appreciate the work you've done on set and previously with making hashlib thread-safe. |
|
|
Hmmm... the failures are really confusing. It's unclear to me if they are related to this PR. So far, I don't think I've seen failures on other buildbots. |
…ython#113800)" The "AMD64 Debian root 3.x" is failing with strange errors. This reverts commit c951e25.
…113800) This makes nearly all the operations on set thread-safe in the free-threaded build, with the exception of `_PySet_NextEntry` and `setiter_iternext`. Co-authored-by: Sam Gross <colesbury@gmail.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
…113800) This makes nearly all the operations on set thread-safe in the free-threaded build, with the exception of `_PySet_NextEntry` and `setiter_iternext`. Co-authored-by: Sam Gross <colesbury@gmail.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Still have some failing tests in the GIL version so marking as a draft for now until I manage to fix them.
This PR is mostly based on Sam's version: colesbury/nogil-3.12@4ca2924f0d
Since sets have a lot of methods that are exposed, I made this list to keep track and help with the review:
set methods
set_as_number
set_symmetric_difference
)set
tp_
..There are also some extra frozenset methods which I don't think need locking:
frozenset_copy
,frozenset_hash
,frozenset_new
,frozenset_vectorcall
. The rest of frozenset methods are shared with sets.C API methods:
set_len
)set_clear
)set_pop
)set
thread-safe in--disable-gil
builds #112069