Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit asyncio thread safety #128002

Open
9 tasks done
kumaraditya303 opened this issue Dec 16, 2024 · 9 comments
Open
9 tasks done

Audit asyncio thread safety #128002

kumaraditya303 opened this issue Dec 16, 2024 · 9 comments

Comments

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Dec 16, 2024

The following functions needs to be made thread safe and atomic for free-threading for both pure python and C implementation:

  • asyncio._enter_task
  • asyncio._leave_task
  • asyncio._register_task
  • asyncio._unregister_task
  • asyncio._swap_current_task
  • asyncio.current_task
  • asyncio.all_tasks

Note that some of these were made thread safe in C impl #120974

The following classes needs to be thread safe for free-threading in C implementation:

  • asyncio.Task
  • asyncio.Future

Both of these classes are documented to be not thread safe but currently calling methods on these classes from multiple threads can crash the interpreter. The pure python implementation cannot crash the interpreter when called from multiple threads so changes are only needed for C implementation. Before making these thread safe in C I would gather some numbers for how much a difference the C implementation makes in free-threading and if it isn't much we can just disable the C extension for free-threading.

cc @colesbury

Linked PRs

@colesbury
Copy link
Contributor

You may be already planning on this, but I think it'd be helpful to have a summary of the current thread safety issues in these classes and functions.

@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented Dec 18, 2024

I have gathered data of performance of pure python implementation and C implementation.

Without free-threading

Benchmark C implementation Pure Python implementation
async_tree_cpu_io_mixed_tg 478 ms 746 ms: 1.56x slower
async_tree_io_tg 620 ms 1.00 sec: 1.61x slower
async_tree_cpu_io_mixed 493 ms 796 ms: 1.62x slower
async_tree_io 631 ms 1.03 sec: 1.63x slower
async_tree_memoization_tg 314 ms 580 ms: 1.85x slower
async_tree_memoization 338 ms 632 ms: 1.87x slower
async_tree_none 272 ms 522 ms: 1.92x slower
async_tree_none_tg 252 ms 489 ms: 1.94x slower
Geometric mean (ref) 1.50x slower

With free-threading

Benchmark C implementation Pure Python implementation
async_tree_cpu_io_mixed_tg 580 ms 951 ms: 1.64x slower
async_tree_cpu_io_mixed 627 ms 1.03 sec: 1.65x slower
async_tree_io_tg 765 ms 1.28 sec: 1.67x slower
async_tree_io 810 ms 1.37 sec: 1.69x slower
async_tree_memoization_tg 433 ms 805 ms: 1.86x slower
async_tree_memoization 472 ms 884 ms: 1.87x slower
async_tree_none 372 ms 733 ms: 1.97x slower
async_tree_none_tg 333 ms 679 ms: 2.04x slower
Geometric mean (ref) 1.53x slower

From this data, it looks like there's on average slowdown of 1.50x when disabling the C implementation as such it is too large of performance loss to consider dropping the C implementation even in case of free-threading.
I was hoping that the difference would be less so we could disable the C implementation for free-threading and reduce the changes but it isn't worth the perf loss.

Some thread safety Issues in C implementation

Here's an outline of some of the thread safety issues, I'll be adding more to this list.

  • Critical sections needs to be added methods of asyncio.Task and asyncio.Future otherwise they can crash the interpreter in C implementation. Even though tasks are documented to be not thread safe but still it shouldn't crash the interpreter. Example: tasks could be concurrently cancelled while it's stepping through the coroutine in another thread. The cancellation counter isn't atomic so it could lead to possible loss of cancellations requests.

    _asyncio_Task_uncancel_impl(TaskObj *self)
    /*[clinic end generated code: output=58184d236a817d3c input=68f81a4b90b46be2]*/
    /*[clinic end generated code]*/
    {
    if (self->task_num_cancels_requested > 0) {
    self->task_num_cancels_requested -= 1;
    if (self->task_num_cancels_requested == 0) {
    self->task_must_cancel = 0;
    }
    }
    return PyLong_FromLong(self->task_num_cancels_requested);

  • The eager_tasks set could be concurrently mutated while it's being iterated over in another thread in asyncio.all_tasks.

    // First add eager tasks to the set so that we don't miss
    // any tasks which graduates from eager to non-eager
    PyObject *eager_iter = PyObject_GetIter(state->eager_tasks);

  • asyncio.Handle needs to be thread safe because it can be called from any thread with loop.call_soon_threadsafe.

  • The internal asyncio linked list uses borrowed references to task as such can be concurrently deallocated while list is being read in another thread as deallocation doesn't hold the state lock.

Performance bottlenecks in free-threading

Consider an async server which uses multiple threads to run one independent loop per thread.

  • asyncio currently relies on globals dicts and sets like current_tasks, eager_tasks, non_asyncio_tasks etc which would perform poorly with free-threading. Moving these to per thread would help in performance scaling.
  • Use non python data structures, this was an overall performance win when I implemented double linked implementation because we can it avoids a lot of reference counting at expense of some locking which could possibly be further optimized to compare exchanges.
  • maybe defer reference counting on the loop objects which are the source of most reference counting operations.

@itamaro
Copy link
Contributor

itamaro commented Dec 18, 2024

  • asyncio currently relies on globals dicts and sets like current_tasks, eager_tasks, non_asyncio_tasks etc which would perform poorly with free-threading. Moving these to per thread would help in performance scaling.

I vaguely remember past discussions about moving the global state to live on the loop. I don't remember what was the conclusion (or if there was a conclusion), but if we do that, that would resolve most thread-safety issues, wouldn't it?

@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented Dec 18, 2024

I vaguely remember past discussions about moving the global state to live on the loop. I don't remember what was the conclusion (or if there was a conclusion), but if we do that, that would resolve most thread-safety issues, wouldn't it?

That has backwards compatibility issues with custom event loops, currently asyncio handles it for all loops. Moving things to event loop may improve performance for free-threading but will causes performance loss in normal builds and in cases where just a single loop runs because of extra attribute lookups and indirection and we cannot use non-python data structures like the new double linked-lists implementation.

kumaraditya303 added a commit that referenced this issue Jan 4, 2025
* Makes `_asyncio.Task` and `_asyncio.Future` thread-safe by adding critical sections
* Add assertions to check for thread safety checking locking of object by critical sections in internal functions
* Make `_asyncio.all_tasks` thread safe when eager tasks are used
* Add a thread safety test
WolframAlph pushed a commit to WolframAlph/cpython that referenced this issue Jan 4, 2025
)

* Makes `_asyncio.Task` and `_asyncio.Future` thread-safe by adding critical sections
* Add assertions to check for thread safety checking locking of object by critical sections in internal functions
* Make `_asyncio.all_tasks` thread safe when eager tasks are used
* Add a thread safety test
@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented Jan 6, 2025

The internal asyncio linked list1 currently uses borrowed references to task as such it can be concurrently deallocated while list is being read in another thread as deallocation doesn't hold the state lock. Borrowed references were used because otherwise it will extend the lifetime of the task object by holding a strong reference in regular builds however this isn't safe in free-threading.
To fix this I see two options:

  • use strong references to task, this would be a larger semantic change because asyncio has always used either weakset or linked list with borrowed references for its tasks. It would make sense to move this to per loop later on but for now this doesn't seem plausible because of compatibility changes and that this is only a issue for free-threading builds.
  • The fix for free-threading is that the linked list is traversed in only asyncio.all_tasks as such if all other threads are paused while reading from it using a stop the world event, it can still use the efficient linked list implementation. I have implemented this at gh-128002: fix asyncio.all_tasks against concurrent deallocations of tasks #128541 and seems like the best solution for the time being.

Footnotes

  1. https://github.com/python/cpython/blob/1ef6bf4e29db43bbf06639923516838db2d5a5ba/Modules/_asynciomodule.c#L138

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Jan 8, 2025
)

* Makes `_asyncio.Task` and `_asyncio.Future` thread-safe by adding critical sections
* Add assertions to check for thread safety checking locking of object by critical sections in internal functions
* Make `_asyncio.all_tasks` thread safe when eager tasks are used
* Add a thread safety test
@colesbury
Copy link
Contributor

@kumaraditya303 - I think we're missing a _PyObject_SetMaybeWeakref on Task objects. Without it, the _Py_TryIncref may fail in non-owning threads even when the total reference count is not zero. We can either call it in register_task or when the task object is created.

@kumaraditya303
Copy link
Contributor Author

I think we're missing a _PyObject_SetMaybeWeakref on Task objects. Without it, the _Py_TryIncref may fail in non-owning threads even when the total reference count is not zero.

So I think that would be the case when asyncio.all_tasks is called for a loop running in a different thread right?

We can either call it in register_task or when the task object is created.

I have a side question: judging from the function name _PyObject_SetMaybeWeakref shouldn't it be called for all objects which are weakrefable automatically. Are there reasons why this isn't done?

@colesbury
Copy link
Contributor

So I think that would be the case when asyncio.all_tasks is called for a loop running in a different thread right?

Yes. In that case, I think asyncio.all_tasks may currently miss some tasks.

judging from the function name _PyObject_SetMaybeWeakref shouldn't it be called for all objects which are weakrefable automatically

It's called on objects when weakrefs are actually created. Most objects that are weakrefable do not actually have any weakrefs. _PyObject_SetMaybeWeakref forces a slightly slower deallocation path. The extra overhead is not much, but it would add up to a few percent if it were enabled on every object.

@kumaraditya303
Copy link
Contributor Author

It's called on objects when weakrefs are actually created. Most objects that are weakrefable do not actually have any weakrefs. _PyObject_SetMaybeWeakref forces a slightly slower deallocation path. The extra overhead is not much, but it would add up to a few percent if it were enabled on every object.

That makes sense, created #128885 to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

3 participants