-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: structured concurrency via task::scope
#2592
Comments
For "nested scopes", a couple of things stood out to me:
|
Just to clarify that I'm understanding this correctly: does this then mean that any call to |
To address:
I have added the following paragraph:
Does that answer your question? I will address the other points shortly. |
The sub task may be concurrently executing. In that case, cancellation must wait for execution to complete then drop.
I should probably call this out. Given that there is no AsyncDrop, you are correct. If a However, the steps I outlined ensure that the parent scope's The goal is that, at any level where @jonhoo did this answer your question? |
This document is missing sections on what benefits it adds over the existing implementations at #2153, #2576, #2579. What are the shortcomings of the other proposals? What are the main benefits this adds over those? Here are my comments for this proposal. First of all, this seems like a variation of the initial implementation at #2153, which defaulted to forceful cancellation. The main difference here seems to be that there is no explicit However already in #2153 it turned out that that forceful cancellation does not compose. You can't do: task::scope(async {
task::scope(async {
tokio::spawn(...)
}.await;
}).await; without issues, since the cancelling the outer scope would always lead to a force drop of the inner scope, for which no great solution is available. In this proposal the solution for this is attached the tasks now to the parent scope. Now my short comment for this is: This breaks structured concurrency. If tasks can migrate up the stack there simply exists no structured concurrency. Application programmers can not rely on the fact that some task does not outlive its containing scope anymore. This is a super powerful guarantee - you can be 100% sure that no unexpected things are running in the background anymore. E.g. some code might assume that a certain file that is worked on in the background task will always be closed after the scope exits. Then code after the scope can always successfully reopen it - or clean it - whatever it wants. Without structured concurrency we don't have this guarantee - the code might still be running. Besides this: Forceful cancellation is the wrong defaultAs pointed out there are no great ways to deal with forceful cancellation. Any component that can not be forcefully cancelled (e.g. because it needs to perform some cleanup work, finish a transaction, etc) can never run in a forcefully cancelled environment. Clean cancellation using our favorite new fancy io_uring topic is one of those examples, gracefully cancellating database transactions, asynchronous calls, threadpool work, etc. are other examples. Now if the top level system only supports forceful cancellation, there is no sane way to integrate those concepts. They can only rely on hacks (like a global spawn on drop) - which are exactly the kind of hacks that were intended to be avoided with structured concurrency. Real world cancellation is asynchronous. Even though rust What do I do in a pure forceful cancellation world if my requirement is to run a pending transaction always to the end, and just stop starting new transactions? In a world with graceful cancellation I can do: scope::enter(async move {
while let transaction = transaction_provider.accept_for_next_transaction_with_token(scope::current_cancellation_token()) await {
scope::spawn(async move {
// This would always run to completion inside the outer scope
transaction.execute().await;
});
}
}).await; In a world with forceful cancellation as the primary mechanism this wouldn't work. scope::enter(async move {
while let transaction = transaction_provider.accept_for_next_transaction_with_token(scope::current_cancellation_token()) await {
scope::spawn(async move {
// This would always run to completion inside the outer scope
transaction.execute().timeout(time::duration::from_secs(60).await;
});
}
}).await; You can use the same mechanism to e.g. stop a TCP listener from accepting incoming connections, while still giving the active requests some time to finish (draining phase). And then only return from the listener scope when you are 100% sure that everything shut down. I don't see this covered with this proposal This makes the task system even more complicated instead of lesstokios task implementation together with shutdown and co are already very high on the complexity scale. This implementation adds yet more complexity to them, in terms of a cancellation states, more joinhandle functionality, etc. The fact that everything needs to deal with forceful cancellation certainly contributes a lot to the complexity. Compared to this #2579 would have allowed to simplify the task system instead up to the point that just run to completion tasks without any cancellation abilities were required, since graceful and forceful cancellation was running on top of the system. This also allowed to drop the annoying error type on the |
I would say the majority of this proposal is the same as yours. This attempts to outline the API / behavior standalone without having to worry about implementation detail. The main difference is avoiding the need for builders/configuration when it comes to core functionality. For example, instead of controlling how cancellation signals get propagated to descendants via builder options, it is done via Rust ownership.
Yes, it is done via context.
I would disagree, the proposal could be implemented in a separate crate. Any integration with the core task system would be for performance reasons.
I do not believe this is correct. All code outside of
All tasks need to be able to handle forceful cancellation in a safe way. The Tokio runtime may forcefully cancel all tasks at any point. Even with graceful cancellation, parent scopes will not allow sub tasks to gracefully cancel for an indefinite amount of time. Parent scopes will signal cancellation, wait for a fixed amount of time, then forcefully cancel. Sub-tasks must be able to tolerate the forceful cancellation. Given that forceful cancellation is a given, this proposal takes a "forceful first" approach. It is based on forceful cancellation and provides a strategy to layer on best-effort graceful cancellation.
This is difficult to respond to as it is fairly subjective. I'm not entirely sure what specifics you think are more complicated. You mention "forceful cancellation" as a root issue. As mentioned in the above section, no matter the proposal, handling forceful cancellation is a requirement as it is a core part of Rust's future model. Are there any other aspects you think are more complex? |
@carllerche I still believe |
@jonhoo I'm not set on any name. If you have suggestions, please share :) |
Thanks for writing this up! I have somewhat related two comments. In the TCP listener accept loop pattern, am I correct in thinking that the expected usage of the new methods on the join handle be: while let Some(stream) = listener.accept().await? {
tokio::spawn(async move {
process(stream).await
}).try_background().await?;
} If so, can you add the above example to the initial RFC comment?My second comment echo's @jonhoo's dislike of the |
No, it is not. To start with, this is not structured concurrency. If you leak tasks in error cases then you simply don't have structured concurrency. This is the same as if the borrow checker sometimes is imprecise - we also don't accept this. The goal of #1879 was to implement support for it. Now I have no idea why we throw that goal now away in favor of a "tokio-shutdown shutdown system" which has other properties. Besides this throws a few hundred hours of work away, since it's built on different primitives. The existing propoals built on a hierarchical cancellation token and a wait group, which are industry standard tools to achieve this behavior. As far as I understand this now requires none of those, and instead a Wait Group which bubbles tasks up on cancellation.
The latest implementation in #2579 (which still received no feedback) does not require any mandatory builder options. It's APIs surface is not more than what is described here. It was merely mentioned that it's functionality could be expanded in the future by providing additional options to the scope.
In this proposal things outside of the scope can rely on cancellation having been initiated. They have no guarantees about it having finished. The background task could still run. The goal of structured concurrency is to have those guarantees.
No. This might be the current state of things. However that doesn't mean that things have to stay that way. With a follow-up addition to #2579 any forceful runtime shutdown could be have been removed, and users would have been relieved from worrying about having to support 2 different cancellation mechanisms as they would do today. There exists even an RFC for being able to indicate the run-to-completion vs force-cancellable behavior using Rusts type system. I still think this proposal is even worse than the current state of things. Currently authors of library code can rely on their code running to completion as long as:
Both are fairly typical in long running application. If you implement this behavior there is however no safe place for code left that does not want to deal with forced cancellation. The only thing people could do is spawn their own runtime that they have more control about.
See above. This is a choice. And I don't think it is the right choice. People should be able to think about tasks as they think about threads. All the differences between those just make tasks harder to work with. I do think lots of code out there would be buggy when encountering forced cancellation in the same way that most code doesn't deal very well with Being able to specify that those path should never be taken for a controlled large subset of an application and to reduce the scope of force-cancellability to tasks which have been built to work with it is a big plus. |
The accept loop becomes: while let Some(stream) = listener.accept().await? {
tokio::spawn(async move {
process(stream).await
}).try_background();
} There is no
I'm not attached to the name.
They could, but we would still need to name them. |
I think it would be more productive to discuss specific examples and how they are handled in both cases. I am not following your logic of how one option is less precise than the other. Perhaps I misread your PR somehow. It is a large chunk of code. Backing out behavior details can be tricky. Are you saying that there should be no force cancellation of tasks at all? Also, what happens if a scope is dropped from a |
I've been thinking more about what I think are the important bits. I think the "nesting by ownership" is less important. NestingI think nested tasks need to have a very clear and deterministic shutdown order. To forcefully cancel a task, the following recursive steps are taken:
To gracefully cancel a task, I believe it is important to only signal the task being gracefully cancelled itself. It should be that task's responsibility to gracefully cancel its sub tasks. Consider a task that has a hyper client and that hyper client has a sub task to manage the socket. If the parent task gracefully cancels by first sending an HTTP request, it is important that the sub task continues to operate normally until the parent task is ready for it to terminate. The implicit nesting via ownership enables being able to treat scopes as values that can be sent around (see the hyper example in the RFC's root comment), but I don't think that is critical. Forceful vs. graceful cancellationGiven that a scope may be dropped at any time, I don't see a way to enable an API where the user does not have to consider the forceful cancellation scenario. I may be missing a detail that makes this assumption wrong, if so that would need to be explained. |
@tokio-rs/maintainers I would love for others to weigh in on the various points here that still lack consensus. |
@Matthias247 Ah, I remember the reason why I ended up going the path of linking scopes on drop.
|
This is described in the API docs of #2579: Scopes can only be created inside other scopes which support graceful cancellation. If you don't adhere to this principle they will panic when dropped. Thinking about that now that should maybe changed in panicing when they are created to indicate the API violation earlier. While this isn't great there are no really better options for this. And there were now really 6 month of research on this. Force cancelling and not waiting (as proposed here) breaks structured concurrency. Forceful cancellation and doing a blocking wait on join might deadlock - and won't work on a singlethreaded eventloop to start with. #2597 supports forceful cancellation in all blocks which are explicitely marked as supporting to it. To recap #2597, the API surface isn't really big:
This doesn't really work if the task is blocked on the subtask. E.g. consider this example: scope::enter(async move {
scope::spawn(async move {
wait_for_message_from_peer().await;
}).await;
}).await; Now if you would just gracefully cancel the parent without the cancellation signal propagating this would potentially never terminate. Therefore all cancellation systems (and it doesn't matter now whether we look at Kotlin, Go, C# or anything else) went on to perform cancellation in depth. Of course you could make the example above working by instead writing: scope::enter(async move {
let handle = scope::spawn(async move {
wait_for_message_from_peer().await;
});
select! {
r = &handle => {
return r;
}
_ = scope::current_cancellation_token()=> {
handle.cancel();
return handle.await;
}
}
}).await; But that really leads to a lot of boilerplate code for the common case. |
Ok, just to make sure that I understand, scopes as you propose cannot be used from within |
@Matthias247 If it is such a critical requirement, shouldn't we just say that a scope is the task and you can't enter sub scopes within the task? That would resolve most of the issues we are talking about, no? |
Closed in favor of #2596 |
This proposal is mostly the result of @Matthias247's work (#1879, #2579, #2576).
I took this work, summarized the API and added a few tweaks as proposals.
Summary
A specific proposal for structured concurrency. The background info and
motivation have already been discussed in #1879, so this will be skipped here.
There are also two PRs with specific proposals: #2579, #2576.
This RFC build upon this work.
Also related: #2576
The RFC proposes breaking changes for 0.3. "Initial steps" proposes a way to add
the functionality without breaking changes in 0.2.
Requirements
At a high level, structured concurrency requires that:
Result
and panic) goes unhandled.Again, see #2579 for more details.
Proposed Changes
task::scope
.task::signalled().await
waits until the task is signalled, indicating itshould gracefully terminate.
JoinHandle
forcefully cancels the task on drop.JoinHandle
gains the following methods:background(self)
: run the task in the background until the scope drops.try_background(self)
: run the task in the background until the scopedrops. If the task completes with
Err
, forcefully cancel the owning scope.signal(&self)
signals to the task to gracefully terminate.Terminology
forceful cancellation: The runtime drops the spawned asap without providing
an ability to gracefully complete. All cleanup is done synchronously in drop
implementations.
graceful cancellation: Signal to a task it should stop processing and clean
up any resources itself.
Details
Scopes
All spawned tasks must be spawned within a scope. The task is bound to the
scope.
This creates a new scope and executes the provided block within the context of
this scope. All calls to
tokio::spawn
link the task to the current scope. Whenthe block provided to
scope
completes, any spawned tasks that have not yetcompleted are forcefully cancelled. Once all tasks are cancelled, the call to
task::scope(...).await
completes.Scopes do not attempt to handle graceful cancellation. Graceful cancellation
is provided by a separate set of utilities described below.
All tasks come with an implicit scope. In other words, spawning a task is
equivalent to:
There could also be a global scope that is used as a catch-all for tasks that
need to run in the background without being tied to a specific scope. THis
global scope would be the "runtime global" scope.
Error propagation
As @Matthias247 pointed out in #1879, when using
JoinHandle
values, errorpropagation is naturally handled using Rust's
?
operator:In this case, if
t1
completes withErr
,t1.await?
will return from theasync { ... }
block passed totask::scope
. Once this happens, alloutstanding tasks are forcibly cancelled, resulting in no tasks leaking.
Dropping
JoinHandle
forcefully cancel the taskSub task errors must be handled. This requires avoiding to drop them without
processing them. To do this, the return value of the task must be processed
somehow. This is done by using the
JoinHandle
returned fromtokio::spawn
. In order to ensure the return value is handled, droppingJoinHandle
results in the associated task being forcefully canceled.JoinHandle
is also annotated with#[must_use]
.However, there are cases in which the caller does not need the return value. For
example, take the
TcpListener
accept loop pattern:In this case, the caller does not need to track each
JoinHandle
. To handlethese cases,
JoinHandle
gains two new functions (naming TBD):In the first case, the type signature indicates the task can only fail due to a
panic. In the second, the task may fail due to
Err
being returned by the task.When a task that is moved to the scope "background" fails due to a panic or
Err
return, the scope is forcibly canceled, resulting in any outstanding taskassociated with the scope to be forcibly canceled.
Now, the
TcpListener
accept loop becomes:Nesting scopes
Usually, structured concurrency descripes a "tree" of tasks. Scopes have N
associated tasks. Each one of those tasks may have M sub tasks, etc. When an
error happens at any level, all decendent tasks are canceled and all ancestors
are canceled until the error is handled.
Instead of explicitly linking scopes, Rust's ownership system is used. A
scope is nested by virtue of the return value of
task::scope
(Scope
) being held in atask owned by a parent scope. If a
Scope
is dropped it forcefully cancels allassociated sub tasks. This, in turn, results in the sub tasks being dropped and
any
Scope
values held by the sub task to be dropped.However,
Scope::drop
must be synchronous yet cancelling a task and waitingfor all sub tasks to drop is asynchronous. This is handled has follows:
Every
Scope
has a "wait set" which is the set of tasks that the scope needs towait on when it is
.await
edWhen
Scope
is dropped:By doing this,
containing_scope.await
does not complete until all descendenttasks are completely terminated.
If a scope block terminates early due to an unhandled sub task error,
task::scope(async { ... }).await
completes with an error. In this case, thetask either handles the error or completes the task with an error. In the latter
case, the task's containing scope will receive the error and the process
repeats up the task hierarchy until the error is handled.
Graceful task cancellation
Graceful task cancellation requires "signalling" a task, then allowing the task
to terminate on its own time. To do this,
JoinHandle
gains a new function,signal
. This sends a signal to the task. While this signal could be used forany purpose, it is implied to indicate graceful cancellation.
From the spawned task, the signal is received using:
task::signalled().await
.Putting it all together:
Initial steps
The majority of this work can be done without any breaking changes. As an
initial step, all behavior in Tokio 0.2 would remain as is. Tasks would not
get an implicit scope. Instead,
task::scope(...)
must always be called.Secondly, there would be
tokio::spawn_scoped(...)
which would spawn taskswithin the current scope. In 0.3,
spawn_scoped
would becometokio::spawn
.The text was updated successfully, but these errors were encountered: