Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit asyncio thread safety #128002

Open
9 tasks
kumaraditya303 opened this issue Dec 16, 2024 · 4 comments
Open
9 tasks

Audit asyncio thread safety #128002

kumaraditya303 opened this issue Dec 16, 2024 · 4 comments

Comments

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Dec 16, 2024

The following functions needs to be made thread safe and atomic for free-threading for both pure python and C implementation:

  • asyncio._enter_task
  • asyncio._leave_task
  • asyncio._register_task
  • asyncio._unregister_task
  • asyncio._swap_current_task
  • asyncio.current_task
  • asyncio.all_tasks

Note that some of these were made thread safe in C impl #120974

The following classes needs to be thread safe for free-threading in C implementation:

  • asyncio.Task
  • asyncio.Future

Both of these classes are documented to be not thread safe but currently calling methods on these classes from multiple threads can crash the interpreter. The pure python implementation cannot crash the interpreter when called from multiple threads so changes are only needed for C implementation. Before making these thread safe in C I would gather some numbers for how much a difference the C implementation makes in free-threading and if it isn't much we can just disable the C extension for free-threading.

cc @colesbury

Linked PRs

@colesbury
Copy link
Contributor

You may be already planning on this, but I think it'd be helpful to have a summary of the current thread safety issues in these classes and functions.

@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented Dec 18, 2024

I have gathered data of performance of pure python implementation and C implementation.

Without free-threading

Benchmark C implementation Pure Python implementation
async_tree_cpu_io_mixed_tg 478 ms 746 ms: 1.56x slower
async_tree_io_tg 620 ms 1.00 sec: 1.61x slower
async_tree_cpu_io_mixed 493 ms 796 ms: 1.62x slower
async_tree_io 631 ms 1.03 sec: 1.63x slower
async_tree_memoization_tg 314 ms 580 ms: 1.85x slower
async_tree_memoization 338 ms 632 ms: 1.87x slower
async_tree_none 272 ms 522 ms: 1.92x slower
async_tree_none_tg 252 ms 489 ms: 1.94x slower
Geometric mean (ref) 1.50x slower

With free-threading

Benchmark C implementation Pure Python implementation
async_tree_cpu_io_mixed_tg 580 ms 951 ms: 1.64x slower
async_tree_cpu_io_mixed 627 ms 1.03 sec: 1.65x slower
async_tree_io_tg 765 ms 1.28 sec: 1.67x slower
async_tree_io 810 ms 1.37 sec: 1.69x slower
async_tree_memoization_tg 433 ms 805 ms: 1.86x slower
async_tree_memoization 472 ms 884 ms: 1.87x slower
async_tree_none 372 ms 733 ms: 1.97x slower
async_tree_none_tg 333 ms 679 ms: 2.04x slower
Geometric mean (ref) 1.53x slower

From this data, it looks like there's on average slowdown of 1.50x when disabling the C implementation as such it is too large of performance loss to consider dropping the C implementation even in case of free-threading.
I was hoping that the difference would be less so we could disable the C implementation for free-threading and reduce the changes but it isn't worth the perf loss.

Some thread safety Issues in C implementation

Here's an outline of some of the thread safety issues, I'll be adding more to this list.

  • Critical sections needs to be added methods of asyncio.Task and asyncio.Future otherwise they can crash the interpreter in C implementation. Even though tasks are documented to be not thread safe but still it shouldn't crash the interpreter. Example: tasks could be concurrently cancelled while it's stepping through the coroutine in another thread. The cancellation counter isn't atomic so it could lead to possible loss of cancellations requests.

    _asyncio_Task_uncancel_impl(TaskObj *self)
    /*[clinic end generated code: output=58184d236a817d3c input=68f81a4b90b46be2]*/
    /*[clinic end generated code]*/
    {
    if (self->task_num_cancels_requested > 0) {
    self->task_num_cancels_requested -= 1;
    if (self->task_num_cancels_requested == 0) {
    self->task_must_cancel = 0;
    }
    }
    return PyLong_FromLong(self->task_num_cancels_requested);

  • The eager_tasks set could be concurrently mutated while it's being iterated over in another thread in asyncio.all_tasks.

    // First add eager tasks to the set so that we don't miss
    // any tasks which graduates from eager to non-eager
    PyObject *eager_iter = PyObject_GetIter(state->eager_tasks);

  • asyncio.Handle needs to be thread safe because it can be called from any thread with loop.call_soon_threadsafe.

Performance bottlenecks in free-threading

Consider an async server which uses multiple threads to run one independent loop per thread.

  • asyncio currently relies on globals dicts and sets like current_tasks, eager_tasks, non_asyncio_tasks etc which would perform poorly with free-threading. Moving these to per thread would help in performance scaling.
  • Use non python data structures, this was an overall performance win when I implemented double linked implementation because we can it avoids a lot of reference counting at expense of some locking which could possibly be further optimized to compare exchanges.
  • maybe defer reference counting on the loop objects which are the source of most reference counting operations.

@itamaro
Copy link
Contributor

itamaro commented Dec 18, 2024

  • asyncio currently relies on globals dicts and sets like current_tasks, eager_tasks, non_asyncio_tasks etc which would perform poorly with free-threading. Moving these to per thread would help in performance scaling.

I vaguely remember past discussions about moving the global state to live on the loop. I don't remember what was the conclusion (or if there was a conclusion), but if we do that, that would resolve most thread-safety issues, wouldn't it?

@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented Dec 18, 2024

I vaguely remember past discussions about moving the global state to live on the loop. I don't remember what was the conclusion (or if there was a conclusion), but if we do that, that would resolve most thread-safety issues, wouldn't it?

That has backwards compatibility issues with custom event loops, currently asyncio handles it for all loops. Moving things to event loop may improve performance for free-threading but will causes performance loss in normal builds and in cases where just a single loop runs because of extra attribute lookups and indirection and we cannot use non-python data structures like the new double linked-lists implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

3 participants