Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix thread safety for pybind11 loader_life_support #3237
Fix thread safety for pybind11 loader_life_support #3237
Changes from all commits
53b3919
caa974f
9179d60
0c2bf55
4b7dc7a
fdfff88
d91a4a7
c6720ca
2c07c0d
8a1a59f
dfc94f3
4f25e31
366f40d
b5a0538
ffc52a3
dc7df66
dd8f264
c4c6acb
6ad3de6
d7e3067
638d091
a06f851
5c58953
5f66855
1237bbe
afbc066
fe49b37
5787104
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to use
pybind11::object
here, as then the reference counting doesn't have to be done manually.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Previously the possibility of using PySet was discussed, but that cannot be used since if the type defines a custom hash and equality function then it won't work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately py::object is not hashable, so that would require intrusive adaptors or similar. Since the control flow here is so limited, I think that the simplest answer is to just use PyObject* here and refcount it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurred to me that an unfortunate effect of using C++ thread_local is that every single pybind11 extension module will add an additional TLS variable --- that could make the thread-local storage rather large, as there can easily be a large number of extension modules.
To avoid that, the Python TLS API could be used instead --- the key that is allocated would have to be stored in the internals struct (which would make the ABI incompatible), or if we really want to maintain ABI compatibility, could be accessed via a separate PyCapsule that is handled in the same way as the existing internals struct. Probably it would make more sense to just break the ABI, and take the opportunity to merge in the other ABI-breaking changes, though I don't have sufficient context to really offer much judgement on that decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why does that concern you? I suspect that many extensions may already use thread_local variables without this concern.
The problem with using the python TLS API is that without versioning the internal data structure we're left with the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some extensions may already use TLS, but I expect that is only a tiny fraction of them, and presumably it is providing useful functionality.
This will add an extra separate TLS variable for every pybind11 extension module (due to the use of -fvisibility=hidden, they will not be merged). As you can easily have hundreds of extension modules loaded in a program, this is effectively adding potentially a very large number of TLS variables, and the memory usage scales with
number_of_threads_in_program * number_of_pybind11_extensions
. Furthermore, with #3257 taking care of thestring_view
use case, the main remaining use case ispybind11::implicitly_convertible
, which I think is a relatively niche feature.Yes, the Python TLS API only helps if we ensure there is a single key shared by all pybind11 extensions. As far as versioning the
internals
data structure, note that pybind11 already has such a mechanism, it is just a matter of bumping the abi version number --- the cost is that the type registry is not shared between different ABI versions, which may create an incompatibility between extensions compiled. The internals data is currently accessed via a PyCapsule set as an attribute of the builtins module in the Python interpreter. To avoid breaking the ABI, it would be possible to create a separate PyCapsule using the same mechanism that holds any additional data, in this case just the single tls key.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're talking about potentially hundreds of instances, which is admittedly an outlier.
As it stands now, I'd rather that we get this in to fix the threading issue, which I see as significant, and then we can rework it with an API version revision to use a shared python TLS variable or a pycapsule.
I'd prefer broader consensus w.r.t. the pycapsule encapsulation, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think hundreds is an outlier. For example, tensorflow alone seems to have ~76.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue that occurred to me is that if you call
py::cast
from translation unita.cc
, and that call is nested within an invocation of a pybind11-bound function defined in translation unitb.cc
, and these two translation units have separate copies of pybind11 due to-fvisibility=hidden
, then previously this would have worked (albeit not thread safe) but now it will fail.For example, within Google's code base, this could occur if we define some utility function in a pybind_cc_library target that calls
py::cast
, then use that utility function from within a pybind11-bound function defined in a pybind_extension. All of the pybind_cc_library targets will share a single copy of the thread_local, while each pybind_extension will have its own copy of the thread_local.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If in the implementation of such a call you call
pybind11::cast
in such a way that a loader_life_support temporary is necessary then you would need to add a scoped instance ofloader_life_support
in the-fvisibility=hidden
method implementation. I think that this is a really rare edge case, and as I said, I think that the state of the world here is vastly better than before, and we can change the implementation with an internal API bump.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that there is already a
shared_data
member of the internals struct that could be used to store the TLS key without breaking ABI --- a macro could control whether to use that or a regular data member of the internals struct. For efficiency the TLS key could also be copied to the local_internals struct. Then when ABI is bumped the mechanism using shared_data could be removed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my two cents - I would say that this is a critical bug, of which the impact is far worse than any of the aforementioned side effects/edge cases. The team I work with have taken a fork of pybind11 and applied this fix so that we can move forward - as we have a strong requirement to run IO-bound and computationally expensive functions in parallel. Without this fix, we cannot reliably do so. I would second @laramiel's suggestion to get this fix in, and then rework/optimise later if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any objection to merging this as is, especially given that there appears to be a relatively easy path to revising it to avoid duplicating the TLS variables without introducing an ABI break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a destructor that modified
value
or modifies some global variable, so that we aren't relying on debugging features of the memory allocator as much.Separately, perhaps add a note that this test should be run with asan for greater effectiveness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I avoid implicit cast (and triggering life support) by changing this line to
self.fn(i, m.IntStruct(i))
?