Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for free-threading builds of CPython #243

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

dpdani
Copy link

@dpdani dpdani commented Nov 17, 2024

@ngoldbaum
Copy link

In addition to the failure you left a comment for, I also see a different failure which is more obviously a thread safety issue in hypothesis itself:

self = <tests.test_decompressor_fuzzing.TestDecompressor_stream_reader_fuzzing testMethod=test_buffer_source_constant_read_size>

    @hypothesis.settings(
>       suppress_health_check=[hypothesis.HealthCheck.large_base_example]
    )

tests/test_decompressor_fuzzing.py:145:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

state = <hypothesis.core.StateForActualGivenExecution object at 0x4b3c0220190>
wrapped_test = <function accept.<locals>.test_buffer_source_constant_read_size at 0x4b3b4dc8fc0>, arguments = ()
kwargs = {'self': <tests.test_decompressor_fuzzing.TestDecompressor_stream_reader_fuzzing testMethod=test_buffer_source_constant_read_size>}
original_sig = <Signature (self, original, level, streaming, source_read_size, read_size)>

    def execute_explicit_examples(state, wrapped_test, arguments, kwargs, original_sig):
        assert isinstance(state, StateForActualGivenExecution)
        posargs = [
            p.name
            for p in original_sig.parameters.values()
>           if p.kind is p.POSITIONAL_OR_KEYWORD
        ]
E       RecursionError: maximum recursion depth exceeded

../../.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/core.py:433: RecursionError

Hypothesis even warns about this:

tests/test_decompressor_fuzzing.py: 17 warnings
  /Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/core.py:822: HypothesisWarning: The recursion limit will not be reset, since it was changed from another thread or during execution of a test.
    with ensure_free_stackframes():

tests/test_decompressor_fuzzing.py: 17 warnings
  /Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/internal/conjecture/engine.py:316: HypothesisWarning: The recursion limit will not be reset, since it was changed from another thread or during execution of a test.
    with ensure_free_stackframes():

tests/test_decompressor_fuzzing.py: 10 warnings
  /Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/site-packages/hypothesis/core.py:643: HypothesisWarning: The recursion limit will not be reset, since it was changed from another thread or during execution of a test.
    with ensure_free_stackframes():

So I suspect the other failure is caused by a similar problem happening.

And indeed looking at the hypothesis docs, they do not support running hypothesis simultaneously in multiple threads:

https://hypothesis.readthedocs.io/en/latest/details.html#thread-safety-policy

I'll open an issue to document this in the free-threading porting guide and an issue on pytest-run-parallel to hopefully detect this and warn about it.

I think the fix for the python-zstandard tests is to mark any tests using hypothesis as thread-unsafe.

@ngoldbaum
Copy link

ngoldbaum commented Nov 19, 2024

It's also probably worth adding some tests for sharing a compressor or decompressor between threads. It looks like the GIL does get released when calling into the C zstd library, so any C-level thread safety issues that exist from sharing zstd contexts between threads are probably also present in the GIL-enabled build and no one has reported them yet.

@dpdani
Copy link
Author

dpdani commented Nov 19, 2024

mm, it's weird that I'm not seeing that.

how are you running the tests?
I'm doing act -W .github/workflows/test.yml

@ngoldbaum
Copy link

I'm running it all locally on my mac dev machine. I installed the library with pip install -e . then ran ZSTD_SLOW_TESTS=1 pytest --numprocesses=1 --parallel-threads=10 -v tests/test_decompressor_fuzzing.py after installing the CI requirements.

@dpdani
Copy link
Author

dpdani commented Nov 23, 2024

I've added thread_unsafe annotations to all tests that use hypothesis.

I'm seeing a lot of warnings with the annotations, am I using it wrong?

PytestUnknownMarkWarning: Unknown pytest.mark.thread_unsafe - is this a typo?

I've also added a test for a compressor object that is shared between several threads, and it does cause a segmentation fault both in the free-threading and in the default build.

this mode of passing data to de/compression contexts for python-zstd does not seem to make sense anyways.
this library does support multithreaded compression, but it spawns its own pool of threads internally.
there's probably no reason for a Python program to feed data into a de/compressor from multiple threads.

@ngoldbaum
Copy link

ngoldbaum commented Nov 23, 2024

You can make it thread-safe if you do something like this to add a per-decompressor lock: https://py-free-threading.github.io/porting/#dealing-with-thread-unsafe-libraries. Of course that won't scale well but as you said it's a weird thing to do.

Does the test segfault with the GIL too? I wouldn't be surprised if it does.

If it does, the fact that no one has reported this issue means it's not a big problem in practice and maybe you can just document the thread safety caveats?

I have a suspicion that python threading will spike in popularity soon as people adopt free-threading, so it was probably inevitable that someone would hit this eventually and in that case it probably is worth adding the locking.

@ngoldbaum
Copy link

ngoldbaum commented Nov 23, 2024

Oh you said it's a bug in the default build I missed that. We've generally been trying to fix pre-existing thread safety issues that can be triggered in the default build if we can but we don't see them as blockers for shipping wheels.

@ngoldbaum
Copy link

Maybe @andfoy or @lysnikolaou know what's up with the warning.

@indygreg
Copy link
Owner

so any C-level thread safety issues that exist from sharing zstd contexts between threads are probably also present in the GIL-enabled build and no one has reported them yet.

The docs in api_usage.rst state our policy around API thread safety.

We need to ensure we don't crash if someone violates the "concurrent operations on multiple threads" rule. I'm fine with undefined behavior if someone attempts operations on the same zstd context from multiple threads. But I would prefer detecting and raising an error if this can be implemented with minimal runtime cost.

@ngoldbaum
Copy link

You could have an atomic flag that a thread sets when it acquires the context, if another thread tries to acquire a context with the flag set then that would be an error.

It's a little obnoxious to write cross-platform C code that uses atomics (see npy_atomic.h in NumPy for my basic attempt based on the private CPython atomics headers) but in rust or C++11 or newer it's pretty straightforward.

@dpdani
Copy link
Author

dpdani commented Nov 24, 2024

one fairly easy cross-platform option is to just use pymutex behind some #ifdef Py_VERSION..., which would only affect some builds (noticeably the free-threading ones).

essentially, for Python >= 3.13 we could guarantee to throw an exception around concurrent use, and for prior versions we could retain the current behavior.
as pointed out by Gregory the documentation is pretty clear on this:

ZstdCompressor and ZstdDecompressor instances have no guarantees about thread safety. Do not operate on the same ZstdCompressor and ZstdDecompressor instance simultaneously from different threads. It is fine to have different threads call into a single instance, just not at the same time.

@dpdani
Copy link
Author

dpdani commented Nov 24, 2024

not sure if I should do it in this PR or open a new one.

@ngoldbaum
Copy link

ngoldbaum commented Nov 24, 2024

You can also use PyThread_type_lock on older python versions. It's sadly undocumented in CPython but you can take a look at what I did to make NumPy's use of lapack_lite thread safe to see how to conditionally use either depending on Python version: https://github.com/numpy/numpy/blob/main/numpy/linalg/umath_linalg.cpp. Grep for HAVE_EXTERNAL_LAPACK and LOCK_LAPACK_LITE.

The main difference wrt PyMutex is it's slower, supports a try lock API (which you probably don't need) and it requires a heap allocation.

@dpdani
Copy link
Author

dpdani commented Nov 27, 2024

@ngoldbaum I've ended up opting for an atomic flag, partly using your npy_atomic.h. Thank you for sharing it! 🙏

@indygreg the modifications I just pushed make it so that when a ZstdCompressor is concurrently used, only one thread will succeed in using it.
All other threads get an ZstdError instead with a message pointing to the wiki.
Is this desirable? Should I apply this rule throughout the library?

I believe the performance impact is as little as it can possibly be.
The additional overhead is just one relaxed read to check if there's a thread using the ZstdCompressor object and one atomic write to mark it as used, if it isn't already.

Copy link

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that you're not doing a relaxed read - at least for the C11 atomics case you would need to use atomic_load_explicit. Right now as I understand it this is using SeqCst ordering for all operations. It's possible to write something that will likely be fast and scale better, but given that this implements a flag that triggers an error case on multithreaded access to shared resources, I doubt that matters.

Copy link

@ngoldbaum ngoldbaum Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the long-term there should probably be some sort of C threading utility library that C extensions can use, including a full copy of the CPython atomics headers. Something that can be easily vendored as a submodule and ideally header-only. That said, that project doesn't yet exist so copying numpy's header for this purpose is probably the most practical solution for python-zstandard

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! thanks 🙏

yeah, this is not the ideal solution. I hoped that CPython would expose them, but it seems unlikely it will.

a header-only library would be very nice, there may be other people interested in having it, out there in the C world.
maybe we could eventually push for CPython to create a separate C package and consume it itself, too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @SonicField (author of ft_utils), we were talking about this last week.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't know about that library, looks very similar to my https://dpdani.github.io/cereggii/, maybe we could collaborate in the future!

@ngoldbaum
Copy link

@indygreg we'd appreciate it if you could give us your opinion on this approach and/or some code review.

It would be really helpful to be able to have cp313t wheels available. One particularly disruptive impact of python-zstandard failing to build on the free-threaded Python at the moment, is that pip install hatch does not work, so any project using hatch/hatchling can't easily support free-threaded Python. Even without wheels, a release with this PR applied would help enormously because the python-zstandard build would succeed on CI machines with developer tools installed.

There are probably things that could happen inside hatch to avoid this issue, but I think the "right" fix is for python-zstandard to help out a bit.

Copy link
Owner

@indygreg indygreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long to look at this. I've been exceptionally busy.

I suspect the level of support for free-threaded builds has improved since this PR was authored.

Please refresh and try to use the latest versions of things (with presumed FT compatibility) so we don't have to hack around missing support when 3.13 initially shipped.

Please split out the @pytest.mark.thread_unsafe annotations to their own PR along with adding the initial FT coverage to CI. I want to see the main branch running FT builds successfully, even if like 90% of tests are skipped.

Then we can revisit the meat of this change, which is adding the free-threaded detection to the C code.

Is that reasonable?

@@ -59,47 +60,60 @@ jobs:
PYTHONDEVMODE: '1'
steps:
- name: Set up Python
uses: actions/setup-python@v5
uses: Quansight-Labs/setup-python@v5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to adopt a less official action? Does actions/setup-python not support the free-threaded builds?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, GitHub has been quite unresponsive to this: actions/setup-python#771

Though, there seems to be some recent activity: actions/setup-python#973

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fork is tracking upstream and has the open PR to add free-threading support applied.

You could also use setup-uv, which can be used as a drop-in replacement if you install pip into the uv environment.

Comment on lines 88 to +90
# TODO enable once PyO3 supports 3.13.
- name: Build (Rust)
if: matrix.arch == 'x64' && matrix.py != '3.13'
if: matrix.arch == 'x64' && !startsWith(matrix.py, '3.13')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is supported now. But scope bloat to resolve it in this PR.

- name: Test CFFI Backend
if: "!startsWith(matrix.py, '3.13')" # see pyproject.toml:4
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this limitation no longer holds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no, CFFI is one of the last low-level major dependencies without support. Recently the maintainers asked our team to work on a fork with free-threading support so they can review one big PR:

python-cffi/cffi#143 (comment)

@dpdani
Copy link
Author

dpdani commented Feb 9, 2025

Hi Gregory 👋

Thanks for coming back to us!

Tomorrow I'll take a look at bumping numbers 👍

Please split out the @pytest.mark.thread_unsafe annotations to their own PR along with adding the initial FT coverage to CI. I want to see the main branch running FT builds successfully, even if like 90% of tests are skipped.

Do you mean you would like the annotations merged beforehand?
If we were to check the CI against a free-threading interpreter we would trigger the mechanism that turns the GIL back on at runtime, since the module doesn't set the Py_MOD_GIL_NOT_USED flag.
We would be effectively testing with the GIL twice. Is this what you intended?

[...] when a ZstdCompressor is concurrently used, only one thread will succeed in using it.
All other threads get an ZstdError instead with a message pointing to the wiki.

Do you have any thoughts on this?

@indygreg
Copy link
Owner

indygreg commented Feb 9, 2025

Actually, looking at this PR more and the prior discussion, I'm a bit confused what the purpose of the atomics code actually is.

Our existing docs say that ZstdCompressor and ZstdDecompressor instances aren't thread safe. It was already possible for customers to footgun themselves on GIL builds by calling into multiple methods simultaneously since the GIL would be released when calling into libzstd C code.

The atomics - if implemented globally (which this PR doesn't yet do) - could be a nice quality-of-life guard to catch code patterns that violate our thread safety guarantees. Is that the only problem it solves?

I initially/naively thought that free-threaded builds would require a fair amount of invasive code changes to prevent SEGFAULTs or similar crashes. But I think our existing API contract is essentially already free-threaded compatible since we're purposefully not multi-threaded safe at the ZstdCompressor and ZstdDecompressor level? If so and there is no new potential for crashes in free-threaded builds, I'm inclined to do the minimal thing possible to support free-threaded builds. We can then look into the lightweight atomic-based context guards as a quality-of-life improvement as a separate and follow-up PR.

@ngoldbaum
Copy link

ngoldbaum commented Feb 9, 2025

Is that the only problem it solves?

Yes, exactly right.

I'm inclined to do the minimal thing possible to support free-threaded builds. We can then look into the lightweight atomic-based context guards as a quality-of-life improvement as a separate and follow-up PR.

That’s also the approach we took with NumPy - we didn’t initially put effort into fixing preexisting thread safety issues on the GIL-enabled build and focused instead on free-threading specific issues to get an initial release out.

@dpdani
Copy link
Author

dpdani commented Feb 10, 2025

I'll remove the checks and the new test 👍

# Conflicts:
#	.github/workflows/test.yml
#	pyproject.toml
@dpdani
Copy link
Author

dpdani commented Feb 10, 2025

I have reverted the changes regarding the runtime context-sharing checks and have merged main, but I won't have time to go through the version bumps today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants