Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread safety for pybind11 loader_life_support #3237

Merged
merged 28 commits into from
Sep 10, 2021
Merged

Fix thread safety for pybind11 loader_life_support #3237

merged 28 commits into from
Sep 10, 2021

Conversation

laramiel
Copy link
Contributor

@laramiel laramiel commented Sep 2, 2021

Description

Fixes issue: #2765

This converts the vector of PyObjects to either a single void* or a per-thread void* depending on the WITH_THREAD define.

The new void* field is used to construct a stack of loader_life_support frames that can extend the life of python objects.

The pointer is updated when the loader_life_support object is allocated (which happens before a call) as well as on release.

Each loader_life_support frame maintains a set of PyObject references that need to be lifetime extended; this is done by storing them in a c++ std::unordered_set and clearing the references when the object goes out of scope when the call has completed.

Suggested changelog entry:

Fixes thread safety for some pybind11::type_caster which require lifetime extension, such as for std::string_view.

Fixes issue: #2765

This converts the vector of PyObjects to either a single void* or
a per-thread void* depending on the WITH_THREAD define.

The new field is used by each thread to construct a stack
of loader_life_support frames that can extend the life of python
objects.

The pointer is updated when the loader_life_support object is allocated
(which happens before a call) as well as on release.

Each loader_life_support maintains a set of PyObject references
that need to be lifetime extended; this is done by storing them
in a c++ std::unordered_set and clearing the references when the
method completes.
@laramiel
Copy link
Contributor Author

laramiel commented Sep 2, 2021

Sure, a PyList or a vector or some other container would work as well.

The idea of using an unordered_set is intended to also address #1251
A PySet would probably also work for that case.

Alternatively we could allocate a simple empty list and then use the other patients mechanism to maintain the lifetime.

@daniel-smith
Copy link

Is it worth adding a python test similar to #2765 (comment) somewhere?

It may be appropriate to run this under msan/tsan/etc.
@laramiel laramiel requested a review from henryiii as a code owner September 3, 2021 01:26
@laramiel
Copy link
Contributor Author

laramiel commented Sep 3, 2021

I added a similar version; perhaps I should use a convertible reference rather than a string view. Let me know what you think.

@daniel-smith
Copy link

I added a similar version; perhaps I should use a convertible reference rather than a string view. Let me know what you think.

I believe that going via the string_view caster would exhibit the same issues - though I've not tested this out myself. I think as long as the test exercises the relevant code path, and verifies that things are thread-safe, then we're good.

@daniel-smith
Copy link

Actually @laramiel, one consideration of using string views for the test is that this is a C++17 feature - if we want to support compilation of tests on older C++ standards (?)

@laramiel
Copy link
Contributor Author

laramiel commented Sep 3, 2021

@daniel-smith Converted test to use lifetime-extended references.

@Skylion007
Copy link
Collaborator

Also please remove unused imports from the added test python file and ensure clang-tidy is happy.

@@ -154,7 +157,7 @@ struct type_info {
};

/// Tracks the `internals` and `type_info` ABI version independent of the main library version
#define PYBIND11_INTERNALS_VERSION 4
#define PYBIND11_INTERNALS_VERSION 5
Copy link
Collaborator

@henryiii henryiii Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are bumping the ABI, we should probably move through several of the PRs sitting round that bump the ABI and get them in too. Also, this is going to create a massive mess, as we'll have to start a conda-forge migration, and quite a few packages don't use the conda-forge's pybind11, so... It will be a mess. But I guess it's going to have to happen eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a non-abi bumping strategy available; if you like I can take that approach.

@Skylion007
Copy link
Collaborator

Clang-tidy is still unhappy about a missing comment: (see the check failure) and the tests are failing on 2.7 and PyPy

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approval is based on Jeremy's and Daniel's reviews.

@Skylion007 or @henryiii, could you please also weigh in?

(One explicit needs to be added before we merge.)

@Skylion007
Copy link
Collaborator

Merging this in as a stop-gap. Definitely should be refactored later as per the above discussion.

@Skylion007 Skylion007 merged commit 0e59958 into pybind:master Sep 10, 2021
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Sep 10, 2021
def run(self):
try:
for i in range(10):
self.fn(i, i)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I avoid implicit cast (and triggering life support) by changing this line to self.fn(i, m.IntStruct(i))?

@daniel-smith
Copy link

Thanks everyone. Looking forward to seeing any future changes w.r.t. the issues @jbms raised.

jbms added a commit to jbms/pybind11 that referenced this pull request Sep 16, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 16, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 16, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 16, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 16, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Sep 16, 2021
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 17, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 17, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 17, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
jbms added a commit to jbms/pybind11 that referenced this pull request Sep 17, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Sep 19, 2021
This revises the existing fix for
pybind#2765 in
pybind#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.

Fix test_pytypes.py::test_issue2361 failure on PyPy3.7

Add github actions tests for unstable ABI
rwgk pushed a commit that referenced this pull request Sep 20, 2021
* Eliminate duplicate TLS keys for loader_life_support stack

This revises the existing fix for
#2765 in
#3237 to reduce the amount of
TLS storage used.

The shared TLS key is stored in two different ways, depending on
`PYBIND11_INTERNALS_VERSION`.  If `PYBIND11_INTERNALS_VERSION ==
4` (as is currently set), the TLS key is stored in the
`internal::shared_data` map to avoid breaking ABI compatibility.  If
`PYBIND11_INTERNALS_VERSION > 4`, the TLS key is stored directly in
the `internals` struct.

* Fix test_pytypes.py::test_issue2361 failure on PyPy3.7

* Add github actions tests for unstable ABI
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 10, 2022
The proximate reason for the upgrade is to pick up a thread-safety fix (pybind/pybind11#3237) for JAX, but we might as well just upgrade to the current version for everyone.

Remove a backport of a patch now present in the released version (pybind/pybind11#2864).

Patch out the "VERSION" file from nsync to avoid conflicts with the <version> C++ header included by pybind11.

PiperOrigin-RevId: 420856622
Change-Id: Ied007ed62b93d96cf008d6ec6336017af1fa10fd
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Apr 18, 2022
The reason for the downgrade is that v2.9.x is causing a segmentation fault in the TF Lite interpreter in TF Text conversion tests.

The original reason for the 2.9.0 upgrade was to include a thread-safety fix (pybind/pybind11#3237, pybind/pybind11@0e59958) for JAX that was added in 2.8.0 and above, so performing this downgrade is not problematic for its intention.

PiperOrigin-RevId: 442586909
broken added a commit to broken/tensorflow that referenced this pull request May 6, 2022
The reason for the downgrade is that v2.9.x is causing a segmentation fault in the TF Lite interpreter in TF Text conversion tests.

The original reason for the 2.9.0 upgrade was to include a thread-safety fix (pybind/pybind11#3237, pybind/pybind11@0e59958) for JAX that was added in 2.8.0 and above, so performing this downgrade is not problematic for its intention.

PiperOrigin-RevId: 442586909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants