-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python::allow_threads
is unsound in the presence of scoped-tls
.
#3640
Comments
From a first glance, I fear that saving the current approach might mean to never ever expose any GIL-bound references no matter how short-lived and have all operations explicitly require a GIL token. This would be highly unergonomic and imply that our upcoming GIL-bound smart pointers If we somehow could avoid at least storing our GIL-bound smart pointers within
should be upheld. Or at least require an extra unsafe trait bound for usage of types with
Better yet, there would be a case of unsoundness implied by usage of @steffahn Since you are obviously experienced in the finding these holes, did you ever look into |
But this is not true, right? Because I could also use |
Yes. At least in the extended version provided in the crate |
use pyo3::prelude::*;
use scoped_tls_hkt::scoped_thread_local;
fn main() {
Python::with_gil(|py| {
scoped_thread_local!(static GIL: for<'a> Python<'a>);
GIL.set(py, || {
py.allow_threads(|| {
GIL.with(|smuggled_py: Python<'_>| {
// profit
});
});
});
});
} |
Ouch, this is painful. Thanks @steffahn for finding such an interesting case! My feeling here is that this is a much more severe mechanism to break It seems like the natural reaction is to declare What would a "safe" solution look like? Can we somehow express Alternatively, a future approach might involve contexts, where |
I think this would basically be equivalent to removing the |
At least that I think, yes. I think it might also have implications for My feeling is that coming up with a workable proposal for that is going to need someone to dive deep into this for at least a few days. How do you feel about accepting Do we think this has implications for the proposed |
(Given the |
My gut feeling is to not rush this (after all, we having been living with the
|
Sure thing. The last point in particular that |
Just for the sake of throwing out another middle-of-the-road solution, what if the method was renamed to I can see a very mixed reaction from the community to doing that (e.g. we might be accused of shipping code which doesn't trip |
FYI, I came across this issue from a context of discussing the role of In my reply in that thread I argued that a Of course that’s not any form of short-term solution; landing such a language feature would be a huge undertaking, unless we find a simpler way of keeping it backwards compatible. Regarding invariance as a solution, look at this comment and my answer, which is an essentially-invariant situation. Invariant lifetimes can always be hidden behind a shorter covariant one using trait objects. Regarding the py.allow_threads(unsafe { marker_name_tbd() }, || { … }) |
Thank you, I'll try to digest that thread in the near future! The |
Ok, one most likely idiotic idea. Thread-locals in whatever form currently break the isolation we require by Since we (at least on stable) already require the closure to be EDIT: Maybe we would need a completely need thread for soundness? Or we would need to track whether the GIL was ever acquired on these runtime threads? |
Probably enought to track whether GIL is currently acquired on such a runtime thread, not ever acquired. However, for this notion of “currently required” we would need to avoid unlocking via If |
I think it's actually quite a nice idea as a way to add a safe As moving work to a different thread seems like a semantic change, one option is to introduce this as a new safe API e.g. |
I think we should turn this around, make |
I think deep nesting is pathological but a certain degree of nesting can be expected due to callback-style API and things like that.
I think I would want to avoid changing |
One thing I just noticed is that |
I'm the author of I'm fairly familiar with the Python GIL, but not that familiar with how PyO3 currently solves this problem. If I were implementing from scratch, I would model it something like this:
Comparing this to what I see PyO3 currently implements, it looks like the main difference is that the lifetime on However, if you change |
If I understood things correctly, making the lifetime invariant was already ruled out as a solution in the middle section of #3640 (comment), i.e. via the linked technique used to create an exploit for |
That sounds, I think, like it would force users to get rid of all GIL-dependent python object references created with that GIL reference before the temporary release. I’m not actually a user of |
This should be independent of variance concerns, and all about |
A pretty typical use case for releasing the GIL is to process a That said, you could work around this via https://docs.rs/pyo3/latest/pyo3/prelude/struct.Py.html#method.as_bytes for this specific case. |
I don't agree - releasing the GIL would require a mutable borrow, so it's impossible to "store" the GIL somewhere for later and also call the release function.
It would force them to be downgraded (to non-GIL objects) or no longer used, yes.
Yeah sorry you're correct - it's the T that's invariant in &'a mut T, not the 'a. The important thing is that &mut guarantees uniqueness regardless of anything else, which is what is wanted with the GIL.
I mentioned "downgraded" types - I would expect that safe functionality (such as accessing immutable objects) would be available directly on the downgraded versions of types, so you wouldn't need the GIL in order to access them. |
While I agree that letting references live throughout the scope of the closure passed to In my opinion, forcing the whole ecosystem to implement downgraded wrapper types which continue to provide access while the GIL is temporarily released seems too big an ergonomics hit compared to having the type system work this out as it is now. EDIT: Also c.f. #3610 (comment) for another situation where the existing design seems to work out well automatically. Of course, the solution by starting threads might still be prohibitively expensive over there... |
Fair enough. Another option would be to always use the "downgraded" versions of the types everywhere, and require a GIL reference to be passed in to all methods that require it (including getters that access mutable data). |
Just chiming in here (without having read the entire thread, sorry) to note that as far as I am concerned, the fact that Looking at some of the prior discussion here around an |
My feeling is that this might be a viable long-term solution, but there are very compelling performance and ergonomics reasons to not require the "GIL reference to passed in to all methods that require it". The main reason is for interactions with traits: for example, what about
This is a much better expressed form of what I originally wrote at #3640 (comment)
I definitely agree that it's an ergonomics hit, but maybe it's possible to get there with a macro. Maybe for example we can one day have a macro that takes a list of idents to temporarily downgrade, and reupgrade once the allow_threads call is over. let list = PyList::new(...);
let data = "1, 2, 3";
allow_threads!(py, [list], { /* do something with `data`, `list` is only accessible as a downgraded reference */ }); I think this line of exploration of is likely impossible until we've moved |
I am less concerned with the syntactical side of things, but rather that every creator of such a downgraded type would have to worry what is still allowed and what is not, i.e. figuring out a sound API for these types all over the ecosystem. I think being able to rely on the existing |
I think we shouldn't do anything:
A solution like running it on a task pool has a significant runtime and code complexity cost. Making My personal preference would be to, in debug mode only, insert checks that assert that the gil really is held. |
I agree that this could be a reasonable approach. But I think that we should discuss such a choice with the community at large as soundness is quite important to many and I would rather be proactive than reactive in handling possible conflicts. I also think that if we do this, we should also update our documentation to fully explain these loopholes and why we do not try to close them. We should also drop the
Could you expand on where you would add these checks? If I understand things correctly, we would need to add them to all functions which require the GIL to be held, e.g. all methods on |
It's not all that obscure in my opinion: |
Doesn’t |
Yes and no. |
But it's fairly obscure at the user-level, right? Most of its dependents seem to be some key ecosystem crates (like tokio for example) and things like testing frameworks. |
Something like that, yes. |
I think that if we choose to do nothing we definitely need a big community consultation. I also think that the community (rightly) expects that safe == sound, and to ask for forgiveness on an exception will require lots of community goodwill. I also think that as we're now aware of three ways that If the community perception is that PyO3 has soundness gaps, my fear is that we become an obstacle rather than a tool for adoption of Rust as a memory safe companion to Python. We may weaken the argument for many as to why they should consider moving their Python extension from C/C++ to Rust, thereby failing to prevent the creation of other vulnerabilities. Personally, I would much prefer we act to keep PyO3 sound and find a way to modify these APIs, even if the end result is clunky. I'd hope that long term research would find new APIs or language features to improve ergonomics.
If these checks don't make debug mode unusably slow then I am always in favour of having extra checks irrespective of the rest of the discussion. |
If ergonomics are the most important, then perhaps the best option is to get rid of the GIL reference entirely from user code and have it live in a (And if the compiler's not smart enough yet, maybe that's something that could be worked on - eg. a MIR level optimization that converts scoped TLS accesses into extra function arguments) |
That's an interesting idea. I see two snags of an "auto gil" API which we'd have to explore:
Nevertheless I'm certainly open to understanding how it works out in practice. It would probably be worth at least understanding how it compares to our proposed API in #3382 before we push that migration on our users. |
I see a danger of deadlocks here which the GIL is already prone to too even without hidden attempts to acquire it. |
For historical reference, here are my original thoughts on this issue (from 2015!): dgrunwald/rust-cpython#15 (comment) |
Thanks @dgrunwald - it's really interesting reading through that old discussion. It's interesting to see you made some very astute observations there such as the HKT TLS, and even your analysis on API choices feel very familiar to me! Interestingly latest PyO3 now implements the design which you named "PyPtr", and just like you I am not super pleased about the lifetime ergonomics but it was also a necessary step to overcome peformance and safety costs of the "nicer" abstraction. |
In analogy to
Python::allow_threads
is unsound in the presence ofsend_wrapper
. #2141we can smuggle
Ungil
data acrossPython::allow_threads
usingscoped-tls
, too.(results in segfault in my test)
Unlike #2141, this issue is virtually unsolvable, i.e. even the auto trait approach with the
feature="nightly"
enabled cannot catch this at all. There’s no property in the callback toallow_threads
that can catch this. Really, the callback doesn’t even capture anything at all:Again, there’s a “whose fault” question to be asked, whether the fact that the standard library’s thread-locals require
T: 'static
means there’s any guarantees that non-'static
data isn’t allowed to be “smuggled” through thread-local storage. In my view, if you look at what kind of things it offers,scoped-tls
seems even less unreasonable to be called “sound”, compared tosync_wrapper
. Yet the consequences forUngil
are more detrimental.The text was updated successfully, but these errors were encountered: