-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async Call-Stack Reconstruction #91048
Comments
Profiling tools that use the call-stack (i.e. all of them) paint an incomplete picture of what’s really going on in async-heavy codebases. They can only show the stack of the currently executing task; they miss the chain of awaitables that are transitively waiting on the current task. To remedy this, we have added support in Cinder to expose the async call-stack. This consists of the call stack for the currently executing task, followed by the chain of awaitables that are transitively reachable from the currently executing task. See below for a clarifying example.
When retrieved from f4, the two different stacks (top-of-stack last) are: We’d like to merge our implementation into CPython so that other heavy users of asyncio can benefit. This will consist of a few parts:
|
Could you provide a link first, please? |
Sorry for the confusion, I'm working on a PR. I filed the BPO to gauge interest in the feature. |
I've recently dabbled a bit in some new primitives for asyncio, and based on that experience I think this would be very useful. IIRC Trio does this (presumably at considerable cost) in userland. |
The idea looks interesting. |
somewhat related discussion (where this feature might have been helpful) - https://discuss.python.org/t/can-i-get-a-second-opinion-on-an-asyncio-issue/18471 this is the cinder 3.10 implementation of the |
If someone wants to move this forward let them propose on a design here, and (if they're confident enough) submit a PR. |
@mpage Are you still interested in working on this? I am still interested in having this as a feature in CPython! |
@gvanrossum Yes, still interested! Just haven't found the time yet to start working on it. |
You might be interested in the existence of Task.get_stack(). Apparently it was part of the original asyncio code (I'd forgotten about it). I'm not sure if it addresses this problem in general (it's part of asyncio) or if it is fast enough or if it even works. |
For example, the code from the initial comment is codified in https://gist.github.com/mpage/584a02fc986d32b11b290c7032700369. Unfortunately you need Cinder in order to run it. When using
|
Oh, you're right. Looking through the Cinder code it seems that this requires a fair amount of C code (which is maybe why you haven't submitted your PR yet?). Is that fundamental or an optimization? How would pure Python code in 3.11 go about finding the awaiter of a suspended coroutine? Is there just no way? What if we know it's an asyncio task? |
I just created a little proof-of-concept that gets the await chain for a task, unless that task is the current task. To get around that, you can create a dummy task and call it from there. Here's the basic code (sans dummy task hack): def get_stack(task):
coro = task.get_coro()
frames = []
while coro is not None and hasattr(coro, "cr_frame"):
frame = coro.cr_frame
frames.append(frame)
coro = coro.cr_await
return frames It probably needs improvements (I see in Maybe that's enough if you need this in a debugging context? Or is this something where performance is important? Or am I missing something? (Maybe you need this when an uncaught exception is raised?) |
Talking to myself here, my above code doesn't appear to be able to cross task boundaries -- I can get it to produce [f3, f4] for your example program (by calling it from a dummy helper task), but it's missing [f1, f2]. It looks like the iterator used by the Future object (either the Python or the C version) is impenetrable. Thoughts? (Gist here.) |
I have a better version now. It relies on
This leaves a lot to be desired if you're in a coroutine world other than asyncio, and it's pretty inefficient, since it must traverse the My question for you at this point is, do you really need to do this purely at the coroutine level, or is it acceptable for your use case(s) if this is essentially a |
I just realized that f4 and f5 were missing from the output. A little tweak to the logic handles that -- the coro stack should be collected from the helper task as well. I've updated the gist, and the output is now:
|
Our primary use case for this feature is an always-on sampling profiler (roughly equivalent to GWP at Google). Periodically, for each machine in the fleet the profiler will wake up, grab the await stack for the currently running task, and finally send the await stack to a service that performs aggregation. The primary use of this data is to provide CPU-time profiling for application. Performance is important for us. Since this is always on and pauses request processing we don't want the implementation to negatively impact performance of in-flight requests. We typically have a large number of active tasks (I'm somewhat embarrassed to admit that I don't have a number handy here), so I'm not sure the approach you've taken here would work for us. The logic for collecting the await stack is currently implemented as an eBPF probe and the awaiter implementation in Cinder simplifies that. The probe "only" has to walk the await stack backwards from the coroutine for the current task. The approach you've taken is neat but unfortunately I don't think it can cross
|
@njsmith I think that this feature would solve the problem you describe as "the big one"? So it seems like it does solve a problem that Trio has? (EDIT: on second thought I'm not sure it does; need more study. Thanks for the summary of Trio's issues in this area.) |
I don't think it does, but I could be wrong -- here's a concrete example to make sure we're on the same page :-) async def parent1():
await parent2()
async def parent2():
async with trio.open_nursery() as nursery:
nursery.start_soon(child1)
await parent3()
async def parent3():
await trio.sleep(10)
async def child1():
await child2()
async def child2():
await trio.sleep(10) From this, we want to reconstruct the child's call stack as: (And to clarify a few edge cases: (1) it doesn't matter where |
Per #103976 (comment), can this wait for 3.13? |
My understanding is that this feature would support tracing async stacks "inside-out": starting with a particular task or coroutine, trace outward to find who's waiting on it. There is no way to do that currently, except maybe via The semantics are decidedly funky, though. If you do
then the stack of Trio doesn't allow a task to directly wait on another task (except indirectly via synchronization primitives like Not everything in an async call stack is a coroutine object. Some of the challenging ones I ran into:
There are also regular generators (for The logic in stackscope for locating the context managers on a certain frame's stack (which is a prerequisite for drawing the nursery/task tree accurately) is by far the most convoluted/brittle part. There are two implementations, one that uses I would recommend to anyone working on async call stack introspection to review stackscope's implementation; it's a pretty good synthesis of all the dark corners I've found in ~3 years of poking at this. There are a lot of dark corners. It would be great if we could make CPython changes that result in fewer dark corners. I don't know how much |
Unfortunately @mpage has not had time to look at this recently, so I'm having a try at addressing the comments on his original PR #103976, and this issue. To start with I've made a completely new implementation here: jbower-fb@3918379. This isn't PR ready yet - it's only got a Python implementation, is lacking proper tests, and almost certainly needs some public iteration. So, I wanted to get it up early for comment. My version follows @markshannon and @njsmith's suggestions on #103976, and rather than linking coroutines instead links In addition to new await dependency data, I’ve added a function Finally there is a straight-forward Most of the implementation is now in the Note the problems @njsmith mentioned with Looking forward to feedback. |
Just popped in as a Google search for "python call stack create task" gave this page. |
…ms and enable profiling (#124640) Signed-off-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: Kumar Aditya <kumaraditya@python.org> Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: Savannah Ostrowski <savannahostrowski@gmail.com> Co-authored-by: Jacob Coffee <jacob@z7x.org> Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
…r tasks This was missing from pythongh-124640. It's already covered by the new test_asyncio/test_free_threading.py in combination with the runtime assertion in set_ts_asyncio_running_task.
Since GH-124640 was merged, the Intel Mac buildbot has been timing out in |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
asyncio-graph.rst
doc #129224_asyncio.future_discard_from_awaited_by
#129731The text was updated successfully, but these errors were encountered: