Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try_init breaks assumptions about pythonic module imports #29

Closed
ShadowJonathan opened this issue Jun 18, 2021 · 7 comments
Closed

try_init breaks assumptions about pythonic module imports #29

ShadowJonathan opened this issue Jun 18, 2021 · 7 comments

Comments

@ShadowJonathan
Copy link

ShadowJonathan commented Jun 18, 2021

It induces side effects, this is a huge no-no for python;

  • it calls get_event_loop, capturing a reference to a global scope
    • this assumes that there will only be one event loop in the entirety of the program, and assumes it'll be the default event loop
    • this breaks when uvloop or other event loops are introduced, these are often installed after all imports are done (imports must have no side effects)
    • this breaks when an asyncio event loop is spun up in a different thread, there may be a few reasons for this, but compatibility may be one of them. The default event loop is captured, and when the native extension is called, the awaitable will be associated with the default (main thread) event loop, which cannot be awaited by another event loop, throwing an exception.
  • this installs the default thread pool executor on that event loop immediately.
    • this assumes that no other executor is installed
      • this breaks when another executor is indeed installed on post-import code

Python analysis tools strongly push to not intermix module imports and sequential code, the wide consensus is that no side effects during import must take place, try_init breaks this rule by assuming the application it is in will only have a single event loop, and that it will only have the event loop that is the original as reported by asyncio on import.

I recommend changing the code to at least only consider get_event_loop() at the time of calling asyncio-specific behaviour, no assumptions must be made inbetween asyncio-related calls on threads, same and different, that the asyncio event loop remains the same.


Additionally, according to the asyncio docs, get_event_loop only has behaviour when imported on the main thread, I can't confirm it, but I have a suspicion it'll break when it is called from another thread (and no event loop exists).

@awestlake87
Copy link
Owner

awestlake87 commented Jun 19, 2021

Thanks for bringing this to my attention! I've been thinking about some of these problems in #26, but you've also brought up some more considerations that I wasn't aware of.

it calls get_event_loop, capturing a reference to a global scope

  • this breaks when an asyncio event loop is spun up in a different thread, there may be a few reasons for this, but compatibility may be one of them. The default event loop is captured, and when the native extension is called, the awaitable will be associated with the default (main thread) event loop, which cannot be awaited by another event loop, throwing an exception.

According to the docs, this will get the event loop associated with the current os thread, so I assume it would work with event loops other than the default (main thread) event loop. The problem with pyo3-asyncio right now is that we cache the reference returned by the first call to get_event_loop so this library currently won't work with multiple event loops.

try_init breaks this rule by assuming the application it is in will only have a single event loop, and that it will only have the event loop that is the original as reported by asyncio on import.

I think the only way to account for this single event loop assumption is by calling get_event_loop for every conversion. I was hoping to avoid this for performance reasons, but then again I have not made any benchmarks yet so I'm not certain how much slower this would be or if it's even slower at all. This is more correct though, so it's probably a good change to make.

  • get_running_loop might be even more correct, but I want to maintain compatibility with Python 3.6, and it appears that these functions are the same except for the lazy initialization in get_event_loop. As long as initialization practices are outlined in the docs, I think this should be fine.
  • this (try_init) installs the default thread pool executor on that event loop immediately.
    • this assumes that no other executor is installed
      • this breaks when another executor is indeed installed on post-import code

Yeah, this is a problem. I think the ThreadPoolExecutor management can be taken out of pyo3-asyncio entirely since the internals of pyo3-asyncio don't need it to function correctly.

Additionally, according to the asyncio docs, get_event_loop only has behaviour when imported on the main thread, I can't confirm it, but I have a suspicion it'll break when it is called from another thread (and no event loop exists).

My interpretation of those docs was that it would work on any thread, the exception being that it would not attempt to initialize the event loop unless the calling thread was also the main thread. One problem is that we can't rely on it to get the relevant event loop from a Rust thread, so we'll most likely have to add a loop parameter to the async/await conversions so they will associate themselves with the proper event loop.

So in short, I think we need to make the following changes:

  • Get rid of the ThreadPoolExecutor initialization and cleanup
  • Stop recommending that people call try_init in their module export function.
  • Stop caching the result of get_event_loop at the start to support multiple event loops
  • Change/Add conversions that accept a Python event loop as a parameter since these conversions might not take place on a thread that is aware of the relevant asyncio event loop.

Open questions:

  • Is it still ok to initialize the tokio event loop during the module import?
    • I think this is still ok considering it doesn't directly affect the Python code, and async-std initializes itself whenever it needs to.
  • Should we create a simpler compatibility mode for the single-event-loop case?
    • These changes will likely be pretty disruptive. Some users with single global event loops might find it cumbersome to work around these changes.
    • Can this be done without sacrificing correctness?

@ShadowJonathan
Copy link
Author

ShadowJonathan commented Jun 19, 2021

Thank you for leaving a candid response, sorry if my tone was otherwise harsh or rude.

Is it still ok to initialize the tokio event loop during the module import?

Yes, as this problem is mainly focused on the python side effects, initialisation limited to a module's "boundries" during import happens all the time, unless it starts actively running work in the background that steals GIL or CPU cycles on idle (which can be frowned upon), this is entirely fine (as far as python is concerned).

Should we create a simpler compatibility mode for the single-event-loop case?

I'd say that some caching to keep performance is in order, but once an edge-case situation is detected (multiple event loops in this case), the library should enter "bailout" mode, where it'll regress to a less performant (but more correct) state.

Should we create a simpler compatibility mode for the single-event-loop case?

Maybe not, it'll help locality maybe better to provide some way to reference the python event loop via a "context" variable of sorts, maybe that is not exactly what you want, but for correctness' sake, it could be included.


With the way asyncio works, you can technically never be sure that, at the time of calling a function to be converted into a coroutine, you can re-use an old event loop.

I think that for the sake of sanity though, at a time that any coroutine per the originating event loop's thread will share the same event loop, that event loop is assumed to not have dropped. So containing a (A)Rc into that context object could work here (of sorts), with a weak reference in a global cache that'll be able to provide a "canary" for whenever it cannot be sure anymore that the event loop for that thread remains the same. This works when every coroutine (and therefore Arc) gets dropped/goes out of scope when the event loop stops (more research needed, look below). This'll improve performance under high loads, as many coroutine objects will be created at the same time, keeping the Arc 'canary' alive. Whenever the 'canary' is dead, get_event_loop will have to be called at every coroutine/asyncio related initialization behaviour to check what the current event loop is (per that thread), and then the canary can immediately be revived.

This is sound, as the top-level call on a thread containing a new event loop will be asyncio.run (or an equivalent thereof), everything that asyncio does will at least run one stack frame lower than that. So as long as a coroutine is active (as long as the event loop is running), the event loop will stay the same, this makes the cache as simple as Dash/HashMap<ThreadID, WeakRef<AsyncioEventLoop>> (or something of sorts)

I don't know what to do with the rust side of things once an event loop finishes, some investigation needs to be done into that. I expect UB, but unless this affects how rust must respond after a coroutine gets dropped in general (so when an event loop gets stopped long before the program exits), I don't see this being a major issue.


I'll be willing to discuss this further on gitter/matrix whenever you have the time though, as i know some details can get lost inbetween the cracks, and i would like to work on making this more correct.

@awestlake87
Copy link
Owner

Thank you for leaving a candid response, sorry if my tone was otherwise harsh or rude.

No worries, I didn't get that impression. This library kinda spawned out of my use-case which was a Rust application using some async Python. It turns out the reverse is a bit more complicated, but arguably more important to the community, so I'm always happy to get feedback from this perspective.

As for the implementation, the caching mechanism has the potential to get really complicated, and I'm not entirely sure which parts are going to be the most important to optimize. I think we should focus on correctness first, make some benchmarks for the conversions, then target the most problematic areas for performance.

It might also be a good idea to outline some of the use-cases we think are going to be most common so we can be sure our design accounts for them. For example, custom event loops, multiple event loops, multiple native modules, etc.

I'll be willing to discuss this further on gitter/matrix whenever you have the time though, as i know some details can get lost inbetween the cracks, and i would like to work on making this more correct.

I'd like to keep most of the macro for the design / implementation either in this thread or #26. That way if anyone else is searching for this or has some thoughts on it they can read up on the rationale / progress and join in on the convo whenever. We can use gitter/matrix/discord to chat about the details whenever though. Whatever works best!

@ShadowJonathan
Copy link
Author

As for the implementation, the caching mechanism has the potential to get really complicated, and I'm not entirely sure which parts are going to be the most important to optimize. I think we should focus on correctness first, make some benchmarks for the conversions, then target the most problematic areas for performance.

That sounds good, the "correct" way here would be to run get_event_loop at every coroutine initialisation, though i don't know where else independent initialisation happens.

get_event_loop internally calls _get_running_loop, which could be placed in _asyncio, the native C library, meaning its performance will be good. Capturing and globalising a reference to get_event_loop will be alright here, i think.

For example, custom event loops, multiple event loops, multiple native modules, etc.

The problems and solutions i arose in this issue addresses the first two, but i dont think there's a good solution for the latter.

Some context: a while back i tried to do this exact thing, try to get rust async working from python async, in some manner similar to py03-tokio, or with python-driven rust futures (#6). Not only did i not get very far, i also realised that the python event loop and the rust event loop are completely incompatible, as i'd have to map a buttload of stuff to the asyncio side to get it "triggered" into the rust side. It felt unintuitive, and this (library's) method of running two event loops side by side seemed to make the most sense in the end.

However, that'll also mean that if two native extensions try to do the same thing, by running an event loop aside from python, they'll each be spawning its own event loop, from its own compiled code, effectively creating duplication at runtime and at compile time.

This is unavoidable, ive decided, as you basically cannot guarantee that the code compiled by extension A will be compatible with extension B, minor version or compiler differences can prop up, and I don't think the effort of trying to match that and effectively using another dynamic library's async executor is a good idea. Much UB can arise out of it, and i think it's easier (and safer) to just isolate each extension from eachother, both at runtime and at compile time.

This can result in some pressure on making the runtimes smaller, runtimes like smol can address that, although i have no idea if it's usage is as simple as dropping it in in this library.

I'd like to keep most of the macro for the design / implementation either in this thread or #26. That way if anyone else is searching for this or has some thoughts on it they can read up on the rationale / progress and join in on the convo whenever. We can use gitter/matrix/discord to chat about the details whenever though. Whatever works best!

Thanks, I'm currently sitting in the PyO3 lobby on the matrix side as @jboi:jboi.nl, if you wanna DM me, I cant work on this for the next few weeks as of college pressure, but i'll be willing to consider spending some time contributing somewhere inbetween now and then.

@awestlake87
Copy link
Owner

However, that'll also mean that if two native extensions try to do the same thing, by running an event loop aside from python, they'll each be spawning its own event loop, from its own compiled code, effectively creating duplication at runtime and at compile time.

Ah ok, I wondered if this would be an issue back when I was writing up #26. Maybe this'll be possible sometime in the future when async runtimes are more mature and Rust has a relatively stable ABI. In the meantime, this caveat might deserve a place in the docs.

This can result in some pressure on making the runtimes smaller, runtimes like smol can address that, although i have no idea if it's usage is as simple as dropping it in in this library.

The traits in generic should make this relatively easy, although you never know until you try :). Both tokio and async-std conversions are implemented on top of these traits. If smol can address this issue, maybe it'd be worth adding first-class support for it as a feature like tokio and async-std.

Thanks, I'm currently sitting in the PyO3 lobby on the matrix side as @jboi:jboi.nl, if you wanna DM me, I cant work on this for the next few weeks as of college pressure, but i'll be willing to consider spending some time contributing somewhere inbetween now and then.

No problem, school comes first! I probably won't have much time to work on it this weekend, but I'll try to get started on some of these changes at some point next week. I'll post updates as I go, but there's really no deadline for this, so don't worry about responding or contributing unless you know you have the time.

@awestlake87 awestlake87 mentioned this issue Jul 2, 2021
20 tasks
@awestlake87
Copy link
Owner

awestlake87 commented Aug 9, 2021

I think with #30 merged this issue can be closed. We can open a follow-up issue at some point in the near future.

I know you and I both probably have some thoughts on how to improve this library further, but I kinda want to take a step back for a few weeks to wait and see if there are any unanticipated problems that users start running into with the v0.14 release before we start thinking seriously about changing things up for v0.15. I think we've covered a lot of our bases, but you never know :).

@ShadowJonathan
Copy link
Author

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants