-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for free-threading builds of CPython #243
base: main
Are you sure you want to change the base?
Conversation
dpdani
commented
Nov 17, 2024
- Compile fails on 3.14 no-GIL #231
# Conflicts: # .github/workflows/test.yml # pyproject.toml
In addition to the failure you left a comment for, I also see a different failure which is more obviously a thread safety issue in hypothesis itself:
Hypothesis even warns about this:
So I suspect the other failure is caused by a similar problem happening. And indeed looking at the hypothesis docs, they do not support running hypothesis simultaneously in multiple threads: https://hypothesis.readthedocs.io/en/latest/details.html#thread-safety-policy I'll open an issue to document this in the free-threading porting guide and an issue on pytest-run-parallel to hopefully detect this and warn about it. I think the fix for the python-zstandard tests is to mark any tests using hypothesis as thread-unsafe. |
It's also probably worth adding some tests for sharing a compressor or decompressor between threads. It looks like the GIL does get released when calling into the C zstd library, so any C-level thread safety issues that exist from sharing zstd contexts between threads are probably also present in the GIL-enabled build and no one has reported them yet. |
mm, it's weird that I'm not seeing that. how are you running the tests? |
I'm running it all locally on my mac dev machine. I installed the library with |
I've added I'm seeing a lot of warnings with the annotations, am I using it wrong?
I've also added a test for a compressor object that is shared between several threads, and it does cause a segmentation fault both in the free-threading and in the default build. this mode of passing data to de/compression contexts for python-zstd does not seem to make sense anyways. |
You can make it thread-safe if you do something like this to add a per-decompressor lock: https://py-free-threading.github.io/porting/#dealing-with-thread-unsafe-libraries. Of course that won't scale well but as you said it's a weird thing to do. Does the test segfault with the GIL too? I wouldn't be surprised if it does. If it does, the fact that no one has reported this issue means it's not a big problem in practice and maybe you can just document the thread safety caveats? I have a suspicion that python threading will spike in popularity soon as people adopt free-threading, so it was probably inevitable that someone would hit this eventually and in that case it probably is worth adding the locking. |
Oh you said it's a bug in the default build I missed that. We've generally been trying to fix pre-existing thread safety issues that can be triggered in the default build if we can but we don't see them as blockers for shipping wheels. |
Maybe @andfoy or @lysnikolaou know what's up with the warning. |
The docs in We need to ensure we don't crash if someone violates the "concurrent operations on multiple threads" rule. I'm fine with undefined behavior if someone attempts operations on the same zstd context from multiple threads. But I would prefer detecting and raising an error if this can be implemented with minimal runtime cost. |
You could have an atomic flag that a thread sets when it acquires the context, if another thread tries to acquire a context with the flag set then that would be an error. It's a little obnoxious to write cross-platform C code that uses atomics (see |
one fairly easy cross-platform option is to just use pymutex behind some essentially, for Python >= 3.13 we could guarantee to throw an exception around concurrent use, and for prior versions we could retain the current behavior.
|
not sure if I should do it in this PR or open a new one. |
You can also use PyThread_type_lock on older python versions. It's sadly undocumented in CPython but you can take a look at what I did to make NumPy's use of lapack_lite thread safe to see how to conditionally use either depending on Python version: https://github.com/numpy/numpy/blob/main/numpy/linalg/umath_linalg.cpp. Grep for HAVE_EXTERNAL_LAPACK and LOCK_LAPACK_LITE. The main difference wrt PyMutex is it's slower, supports a try lock API (which you probably don't need) and it requires a heap allocation. |
@ngoldbaum I've ended up opting for an atomic flag, partly using your @indygreg the modifications I just pushed make it so that when a I believe the performance impact is as little as it can possibly be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that you're not doing a relaxed read - at least for the C11 atomics case you would need to use atomic_load_explicit
. Right now as I understand it this is using SeqCst ordering for all operations. It's possible to write something that will likely be fast and scale better, but given that this implements a flag that triggers an error case on multithreaded access to shared resources, I doubt that matters.
c-ext/_pyzstd_atomics.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the long-term there should probably be some sort of C threading utility library that C extensions can use, including a full copy of the CPython atomics headers. Something that can be easily vendored as a submodule and ideally header-only. That said, that project doesn't yet exist so copying numpy's header for this purpose is probably the most practical solution for python-zstandard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! thanks 🙏
yeah, this is not the ideal solution. I hoped that CPython would expose them, but it seems unlikely it will.
a header-only library would be very nice, there may be other people interested in having it, out there in the C world.
maybe we could eventually push for CPython to create a separate C package and consume it itself, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @SonicField (author of ft_utils), we were talking about this last week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't know about that library, looks very similar to my https://dpdani.github.io/cereggii/, maybe we could collaborate in the future!
@indygreg we'd appreciate it if you could give us your opinion on this approach and/or some code review. It would be really helpful to be able to have cp313t wheels available. One particularly disruptive impact of python-zstandard failing to build on the free-threaded Python at the moment, is that There are probably things that could happen inside hatch to avoid this issue, but I think the "right" fix is for python-zstandard to help out a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took so long to look at this. I've been exceptionally busy.
I suspect the level of support for free-threaded builds has improved since this PR was authored.
Please refresh and try to use the latest versions of things (with presumed FT compatibility) so we don't have to hack around missing support when 3.13 initially shipped.
Please split out the @pytest.mark.thread_unsafe
annotations to their own PR along with adding the initial FT coverage to CI. I want to see the main
branch running FT builds successfully, even if like 90% of tests are skipped.
Then we can revisit the meat of this change, which is adding the free-threaded detection to the C code.
Is that reasonable?
@@ -59,47 +60,60 @@ jobs: | |||
PYTHONDEVMODE: '1' | |||
steps: | |||
- name: Set up Python | |||
uses: actions/setup-python@v5 | |||
uses: Quansight-Labs/setup-python@v5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to adopt a less official action? Does actions/setup-python
not support the free-threaded builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, GitHub has been quite unresponsive to this: actions/setup-python#771
Though, there seems to be some recent activity: actions/setup-python#973
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fork is tracking upstream and has the open PR to add free-threading support applied.
You could also use setup-uv
, which can be used as a drop-in replacement if you install pip into the uv environment.
# TODO enable once PyO3 supports 3.13. | ||
- name: Build (Rust) | ||
if: matrix.arch == 'x64' && matrix.py != '3.13' | ||
if: matrix.arch == 'x64' && !startsWith(matrix.py, '3.13') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is supported now. But scope bloat to resolve it in this PR.
- name: Test CFFI Backend | ||
if: "!startsWith(matrix.py, '3.13')" # see pyproject.toml:4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this limitation no longer holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no, CFFI is one of the last low-level major dependencies without support. Recently the maintainers asked our team to work on a fork with free-threading support so they can review one big PR:
Hi Gregory 👋 Thanks for coming back to us! Tomorrow I'll take a look at bumping numbers 👍
Do you mean you would like the annotations merged beforehand?
Do you have any thoughts on this? |
Actually, looking at this PR more and the prior discussion, I'm a bit confused what the purpose of the atomics code actually is. Our existing docs say that ZstdCompressor and ZstdDecompressor instances aren't thread safe. It was already possible for customers to footgun themselves on GIL builds by calling into multiple methods simultaneously since the GIL would be released when calling into libzstd C code. The atomics - if implemented globally (which this PR doesn't yet do) - could be a nice quality-of-life guard to catch code patterns that violate our thread safety guarantees. Is that the only problem it solves? I initially/naively thought that free-threaded builds would require a fair amount of invasive code changes to prevent SEGFAULTs or similar crashes. But I think our existing API contract is essentially already free-threaded compatible since we're purposefully not multi-threaded safe at the ZstdCompressor and ZstdDecompressor level? If so and there is no new potential for crashes in free-threaded builds, I'm inclined to do the minimal thing possible to support free-threaded builds. We can then look into the lightweight atomic-based context guards as a quality-of-life improvement as a separate and follow-up PR. |
Yes, exactly right.
That’s also the approach we took with NumPy - we didn’t initially put effort into fixing preexisting thread safety issues on the GIL-enabled build and focused instead on free-threading specific issues to get an initial release out. |
I'll remove the checks and the new test 👍 |
# Conflicts: # .github/workflows/test.yml # pyproject.toml
I have reverted the changes regarding the runtime context-sharing checks and have merged main, but I won't have time to go through the version bumps today. |