-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured Concurrency Support #1879
Comments
Terminology comparison Kotlin/RustSince there will be a few comparisons to Kotlin due to it's super interesting implementation of structured concurrency, here is a short terminology comparison of concepts for people not familiar with it:
|
Waiting for child tasks to completeAs mentioned in the requirements, parent tasks should be able to wait for child tasks to complete. I think some of the remaining problems to solve here are: Waiting for child tasks to complete with an absence of explicit
|
Some notes: I think Go's There's a long discussion of Rust/futures/structured concurrency here: https://trio.discourse.group/t/structured-concurrency-in-rust/73. I don't know if any of it's directly relevant, but you might find it interesting at least.
I'd say the core requirement is: a function can't spawn tasks that outlive the function's lifetime, unless explicitly granted that permission. That's the point of reifying nurseries. So you want to be careful here:
I'm not an expert on tokio, but in most systems, a "task" is a larger unit of work than a single function frame. So if a function can freely spawn a child task whose lifetime is bounded to the parent task, then that can violate the structured concurrency rule, because the task lifetime is larger than the lifetime of the function inside it. |
Cancelling child tasksRusts lives the in the luxury situation that we can perform cancellation in a couple of ways:
Graceful cancellation has some benefits:
It obviously also has the drawback that cancellation signals might not be respected, or not being acted on in a timely manner. I personally favor graceful cancellation. In order to support graceful cancellation a The explicit version could look like: async fn subtask(cancel_token: &CancellationToken) {
select! {
result = execute_other_logic() => {},
_ = cancel_token => {
// Subtask is being cancelled, but can still continue to execute code
}
}
spawn(subtask); The implicit version would get the CancellationToken e.g. via With the implicit version we can directly add listen to cancellation signals in low level tokio types like sockets and timers, and thereby basically guarantee that cancellation is respected - even if the user would not explicitly forward the parameter. However it's obviously more magic, and might make it harder to standardize the mechanism. When cancelling tasks we might have situations where we only want to cancel one particular subtask (e.g. as discussed in #1830), or one where we want to cancel all subtasks - e.g. because our server is shutting down. Both mechanisms should be efficient - which ideally means cancelling |
Managing child/parent relationsAs mentioned earlier, a list of child tasks can likely be stored by having an [intrusive] list of child tasks in the parent, or the
|
Would the functionality provided by https://github.com/bastion-rs/bastion (https://docs.rs/bastion/0.3.1/bastion/) qualify as structured concurrency? |
Maybe :) As a contributor you might be able to answer this question best. I just skimmed the docs and I'm not yet sure whether I fully figured out what it provides. The child hierarchies are certainly something similar. But it also seems very much inspired by an actor model - and what Tokio would offer is likely more general. |
I didn't found too much time to work on this so far. But here are just some further thoughts so far: Waiting for child tasks to completeI don't think anymore that
is something to persuit. One thing that I don't like is that it doesn't allow for multiple "scopes" inside a Task, within which we want to constrain concurrency. E.g. we couldn't do along async fn some_method() {
while !done {
SCOPE_START
// spawn child tasks here
SCOPE_END
// all child tasks are guaranteed to have finished here
}
}); since the scope is automatically the task. It also makes it hard to pass scope handles around, since the task is not a value. It could only be addressed via functions which act on thread-locals, which are better avoided. WaitSetTherefore I now think
is the preferred approach, which I will investigate further. @carllerche proposed calling it An examplary usage was already provided earlier: async fn parent_task() {
WaitSet::new(|wait_set| async {
let join_handle = wait_set.spawn(async {
doSth().await;
}
}).await; // This waits for all child tasks to complete
} This is the most basic example. Creating a The What else can we winBefore going deeper into some implementation questions, here is something that already this most simple version of the It could allow borrowing of values on the parent task inside subtasks in the async fn parent_task() {
let data = [0u8; 1024];
WaitSet::new(|wait_set| async {
for i in 0 .. 10 {
wait_set.spawn(async {
write_data_to_receiver(i, &data[..]).await
}
}
}).await; // This waits for all child tasks to complete
} since the child tasks would have a lifetime which is smaller than the lifetime of What happens if the
|
I just had a discussion with @carllerche whether borrowing from the parent task like in the following code is safe: async fn parent_task() {
let data = [0u8; 1024];
WaitSet::new(|wait_set| async {
for i in 0 .. 10 {
wait_set.spawn(async {
write_data_to_receiver(i, &data[..]).await
}
}
}).await; // This waits for all child tasks to complete
} The concern here is that due to the use of In my understanding the mechanism is safe due to the following reasons:
I am not so sure about a let data = [0u8; 1024];
let wait_set = WaitSet::new();
wait_set.spawn(async {
write_data_to_receiver(&data[..]).await
});
std::mem::forget(wait_set) If the As far as I understand it also wouldn't be safe if let wait_set = WaitSet::new();
{
let data = [0u8; 1024];
pin_mut!(wait_set);
wait_set.spawn(async {
write_data_to_receiver(&data[..]).await
});
std::mem::forget(wait_set)
} But maybe one of our pinning experts (@Nemo157, @RalfJung) could chime in and correct me on everything I've written :) |
I am very happy to see discussion about this, but I would love for this not to be considered a tokio issue. This concerns the whole rust async ecosystem and if some sort of solution comes forth, it would be nice to not lock people into using tokio in order to benefit from it. I would love to see solutions which are executor agnostic. |
@najamelan I believe that an implementation would require access to runtime internals |
(Error) return valuesThis is mostly based on the last comments, and assumes we will start with a async fn parent_task() {
let result = task::scope(|scope| async {
let join_handle = scope.spawn(async {
doSth().await;
};
let join_handle_2 = scope.spawn(async {
doSth().await;
};
join_handle_2.await
}).await; // This waits for all child tasks to complete
} Based on this assumption, the remaining questions to clarify are:
I will start answering the last question first, because that was the one where I found my personal answer first: I think do not we should silently exchange return values outside of the users view. This would be very surprising, and typically not what the user wants. Instead of this, if the block inside the scope finishes
After having thought about this, I continued to wonder whether we can still allow the following use-case:
It then actually occurred to me that this works actually quite naturally due to how
This means I think we would need no special handling for So as as a more complete example of the upload files task, I expect users to be able to do: let result = task::scope(|scope| async {
let upload_task_handle_1 = scope.spawn(async {
doUpload().await;
};
let upload_task_handle_2 = scope.spawn(async {
doUpload().await;
};
try_join!(upload_task_handle_1, upload_task_handle_2)
}).await?; Careful readers will notice the code looks pretty much the same as if users would have utilized a The tasks will also feature a graceful cancellation facility, which is part of the next post. |
CancellationSo the last piece of the puzzle is cancellation. Here there are 2 main questions to answer
Scope cancellation vs individual task cancellationThe approach as i outlined it so far has actually no real need for offering per-task cancellation. It is sufficient if all remaining tasks get cancelled as soon as the return from the scope happens. Therefore a per-task cancellation would be an additional feature. Benefits of per-task cancellation:It makes the cancellation path more consistent. If we offer graceful cancellation (via some kind of Drawbacks of per-task cancellationWe would need to track cancellation state per task. At the end of a scope we would likely need to iterate over all pending tasks in order to signal them the cancellation request. Compared to this cancelling a shared I personally think we should therefore offer per-task cancellation. It's likely we can also discover a way that allows to cancel Graceful cancellation vs forced cancellationI think we should offer graceful cancellation, because it's a superset of functionality, and users could still perform forced cancellation by using an adapter like task::scope(|scope|{
scope.spawn(|cancel_token| async {
select! {
_ = cancel_token => { /** return cancellation error */ },
result = actual_async_fn() => result,
}
}
} In this case the tokio can also offer a helper function like The main drawback of graceful cancellation that I see is that it's not well integrated into the other code - e.g. not on Tokio sockets and co. If a user ignores a cancellation request, the socket will not automatically abort, and the connection might stick around for a while even if the task was already cancelled a while ago. So this is an additional source of errors. But on the other hand we could also call it a feature, since a cancelled task would not automatically lead to everything being abondened at any IO point, but to tasks only being exited at points that have been explicitly designated by the application eveloper. |
I actually played around a decent amount with graceful termination over in https://docs.rs/stream-cancel/, though specifically in the context of streams. There was no consideration there of something akin to |
I'm concerned about the possibility of the Leakpocalypse: let mut t = Box::pin(task::scope(|scope| {
// spawn a task
scope.spawn(async { ... });
// do other stuff
}));
futures::poll!(t.as_mut()); // the spawned task starts running
mem::forget(t); Since we can forget the result of it without dropping nor canceling the passed future, the spawned tasks might outlive the parent. |
@pandaman64 I still need to read the entire thing in depth, but I believe you are correct. The proposed API would not support borrowing in a spawn and I don't know of an obvious way to make it work w/o blocking the thread owning the task. |
That said, I believe that we could support borrowed spawns from a blocking context. So, from outside of the runtime (or from a |
@jonhoo Providing wrappers that handle graceful cancellation is definitely an Option! The nice thing about
I think especially a graceful cancellation mechanism has some potential to lead to unexpected leaks, and we need to trade it off. But I also think we can maybe add runtime diagnostic mechanisms to discover those spots. In a similar fashion as it might be possible to use those to detect accidentally blocking tasks. But that's for another discussion. I'm not concerned about your given example leaking, because
Indeed it can in this case! Which is a bit of a bummer. I guess we can agree this is as a contrived example (why would you ever do this?), and that it's actually questionable whether there is a parent task at all in this case (you just kick off something via Based on my understand I do see the structured concurrency concept violation. But not yet a potential memory issue that @carllerche is concerned about. I think the child tasks here are only allowed to capture data with a lifetime of that outlives the I was initially wondering whether the violation could be fixed by requiring that a child scope always borrows its parent scope (which would require us to provide a |
Yeah, it's a contrived example. Typically users will await the scoped future (or at least drop it), so it's not an issue. |
I made a prototype based on Edit: I agree that |
@pandaman64 Nice work! Thanks for the quick prototyping. The macro might definitely be an option if it doesn't work out otherwise. I think macros still require that the functions they refer to are public - but we could at least we could make the implementation Edit: The following comment was wrong. The program indeed compiles and allows to use parent data then the scope gets boxed and forgotten. Fixed code which runs the child task: {
let data = vec![1u32, 2, 3];
let data_ref = &data;
{
let mut scope_fut = Box::pin(scope_impl(handle.clone(), |scope| async move {
scope.spawn(async move {
println!("Hello from 2nd scope");
println!("Data: {:?}", data_ref);
delay_for(Duration::from_millis(500)).await;
println!("End from 2nd scope");
println!("Data: {:?}", data_ref);
});
5u32
}));
futures::poll!(scope_fut.as_mut());
std::mem::forget(scope_fut);
}
println!("Dead scope");
}
let data = vec![1, 2, 3];
let data_ref = &data;
let mut scope_fut = Box::pin(scope_impl(handle.clone(), |scope| async move {
scope.spawn(async {
println!("Hello from 2nd scope");
println!("Data: {:?}", data_ref);
delay_for(Duration::from_millis(500)).await;
println!("End from 2nd scope");
println!("Data: {:?}", data_ref);
});
5u32
}));
futures::poll!(scope_fut.as_mut());
std::mem::forget(scope_fut);
|
@pandaman64 I had a similar idea to yours to support It would be built on top of the existing spawning API, with the following additions (and the added ability to cancel tasks): unsafe fn spawn_scoped<'a, F, R>(func: F) -> Handle<'a, R>
where
F: FnOnce() -> R + Send + 'a;
macro_rules! spawn_scoped {
($func:expr) => { unsafe { spawn_scoped($func) }.await }
} The snags I could think of (and I believe they apply to yours as well) is that the Edit: Not sure if this approach could be made fully sound (edit: especially in the face of a safe |
Yeah, the scoped API must wait for all spawned tasks to reach a cancellation point (when |
@pandaman64 let data = vec![1u32, 2, 3];
let data_ref = &data;
let mut scope_fut = Box::pin(async {
scope!(handle.clone(), |scope| async move {
scope.spawn(async move {
// Invalid access
println!("Data: {:?}", data_ref);
});
5u32
})});
futures::poll!(scope_fut.as_mut());
std::mem::forget(scope_fut); I think here the borrowing lifetime would need to be restricted to not be able to borrow things outside of Proposals on what we can do regarding cancellation (force-cancel or block the thread) are already higher up in the thread. I'm not too worried about those in the moment. The whole @udoprog I'm not sure if I understand it correctly. But isn't this even closer to the old scoped threads API in the synchronous world, where the problem was even bigger - since people could leak or |
Regarding borrowing data: it sounds to me like
But there is a future here that we do have control over, and can add to its drop body: the scope future itself. Given that we take ownership of the closure and store it in the scope future's "stack", it should be possible for the child tasks to borrow and share data from within the closure:
The children will be stopped by the drop call that also owns the referenced data, so the tasks can't outlive their references. It's also fine to never drop the future, in which case the data is leaked so references to it are still valid. This isn't a totally satisfying answer in that we can't borrow from the actual parent task. But it is a useful step in the right direction, at least the children can share borrowed data while running in parallel. Regarding forced cancellation: I suspect there won't be a great solution without async destructors |
I mean, you can't actually forget a non-Unpin pinned future with |
This change adds task::scope as a mechanism for supporting structured concurrency as described in tokio-rs#1879. In this version of the change scopes can be created using a freestanding `task::scope` function, which creates a new detached scope (which is not coupled to the parent scope), `Scope::detached` and `Scope::with_parent`, and a `::child()` method on an existing scope handle. Since a few of of those methods are doing the same thing, some of those might get dropped before the change is merged. For future extensibility also a `Scope::with_config` and `ScopeConfigBuilder` are demonstrated in this change. Those might also be part of the initial change.
This change adds task::scope as a mechanism for supporting structured concurrency as described in tokio-rs#1879. In this version of the change scopes can be created using a freestanding `task::scope` function, which creates a new detached scope (which is not coupled to the parent scope), `Scope::detached` and `Scope::with_parent`, and a `::child()` method on an existing scope handle. Since a few of of those methods are doing the same thing, some of those might get dropped before the change is merged. For future extensibility also a `Scope::with_config` and `ScopeConfigBuilder` are demonstrated in this change. Those might also be part of the initial change.
This change adds task::scope as a mechanism for supporting structured concurrency as described in tokio-rs#1879. This version of the scope implementation makes use of implicit scopes, which are propgated within the task system through task local storage. Ever task spawned via `scope::spawn` or `scope::spawn_cancellable` is automatically attached to it's current scope without having to explicitly attach to it. This provides stronger guarantees, since child tasks in this model will never be able to outlive the parent - there is no `ScopeHandle` available to spawn a task on a certain scope after this is finished. One drawback of this approach is however that since no `ScopeHandle` is available, we also can't tie the lifetime of tasks and their `JoinHandle`s to this scope. This makes it less likely that we could borrowing data from the parent task using this approach. One benefit however is that there seems to be an interesting migration path from tokios current task system to this scoped approach: - Using `tokio::spawn` could in the future be equivalent to spawning a task on the runtimes implicit top level scope. The task would not be force-cancellable, in the same fashion as tasks spawned via `scope::spawn` are not cancellable. - Shutting down the runtime could be equivalent to leaving a scope: The remaining running tasks get a graceful cancellation signal and the scope would wait for those tasks to finish. - However since the Runtime would never have to force-cancel a task (people would opt into this behavior using `scope::spawn_cancellable`) the `JoinError` could be removed from the "normal" spawn API. It is still available for cancellable spawns.
This change adds task::scope as a mechanism for supporting structured concurrency as described in tokio-rs#1879. This version of the scope implementation makes use of implicit scopes, which are propgated within the task system through task local storage. Ever task spawned via `scope::spawn` or `scope::spawn_cancellable` is automatically attached to it's current scope without having to explicitly attach to it. This provides stronger guarantees, since child tasks in this model will never be able to outlive the parent - there is no `ScopeHandle` available to spawn a task on a certain scope after this is finished. One drawback of this approach is however that since no `ScopeHandle` is available, we also can't tie the lifetime of tasks and their `JoinHandle`s to this scope. This makes it less likely that we could borrowing data from the parent task using this approach. One benefit however is that there seems to be an interesting migration path from tokios current task system to this scoped approach: - Using `tokio::spawn` could in the future be equivalent to spawning a task on the runtimes implicit top level scope. The task would not be force-cancellable, in the same fashion as tasks spawned via `scope::spawn` are not cancellable. - Shutting down the runtime could be equivalent to leaving a scope: The remaining running tasks get a graceful cancellation signal and the scope would wait for those tasks to finish. - However since the Runtime would never have to force-cancel a task (people would opt into this behavior using `scope::spawn_cancellable`) the `JoinError` could be removed from the "normal" spawn API. It is still available for cancellable spawns.
I'm going to close this now as there is no way forward. More details here #2596 (comment) Thanks again for all the work you did on this. Even though we didn't manage to get this through, I personally learned a lot on the topic and from your research. Also, once again, I'm sorry that didn't put more time into this sooner. |
If people are interested in structured concurrency and can live without a borrowing scope, check out async_nursery. Disclaimer: I wrote that. If you are specifically after borrowing, you could check out async-scoped. |
Tokio - and Rust's async model in general - is pretty freaking cool, but it isn't a perfect fit for everything. After hammering for a few days, I'm pretty confident that it's not working out here: - There's no way to enforce scoped async tasks without blocking the current thread.[1][2][3] This means that there's no async task equivalent to Rayon/Crossbeam-like scopes, and you're back to arcs and pins and all sorts of fun boilerplate if you'd like to foster parallelism with task::spawn(). - Traits and recursive calls need lots o' boxes, implemented by proc macros at best and by hand at worst. - Since many FS syscalls block, tokio asyncifies them by wrapping each in a spawn_blocking(), which spawns a dedicated thread. Of course you can wrap chunks of synchronous file I/O in spawn_blocking() if kicking off a separate thread for each File::open() call doesn't sound fun, but that means you can't interact with async code anywhere inside... - Add in the usual sprinkling of async/await/join/etc. throughout the code base - since anything that awaits needs to be a future itself, async code has a habit of bubbling up the stack. None of these are dealbreakers by themselves, but they can add up to real overheads. Not just cognitive, but performance too, especially if you've already got a design with concurrent tasks that do a decent job of saturating I/O and farming CPU-heavy work out to a thread pool. < gestures around > [1]: tokio-rs/tokio#1879 [2]: tokio-rs/tokio#2596 [3]: https://docs.rs/async-scoped/latest/async_scoped/struct.Scope.html#safety-3
I think this is possible in a way that hasn't been discussed yet, although it does have tradeoffs. |
That also has the issue that it breaks any types of in-task concurrency such as |
Sorry I don't quite understand, are you talking about a situation where a Task joins two tasks that both call scope? ... that does seem like it would be broken. |
Even if only one branch of a |
First of all a disclaimer: This issue is not yet a full proposal. This serves more as a collection of things to explore, and to gather feedback on interest.
What is structured concurrency?
Structured concurrency describes programming paradigm. Concurrent tasks are structured in a fashion where there exist clean task hierarchies, and where the lifetime of all sub-tasks/child-tasks is constrained within the lifetime of their parent task.
The term was likely brought up first by Martin Sustrik in this blog post, and was a guiding idea behind the libdill library. @njsmith utilized the term in Notes on structured concurrency, or: Go statement considered harmful, and designed the python trio library around the paradigm. I highly recommend to read the blog post.
The paradigm has also been adopted by Kotlin coroutines. @elizarov gives a talk at HydraConf about structured concurrency and the evolution of Kotlins async task model, which I also highly recommend to watch. It provides some hints on things to look out for, and how APIs could look like. Kotlins documentation around coroutines is also a good resource.
Go adopted some support for structured concurrency with the
errgroup
package.Benefits of structured concurrency
I again recommend to check out the linked resources, which also elaborate on this 😀
In short: Applying the structured concurrency paradigm can simplify reasoning about concurrent programs and thereby reduce errors. It is helpful at preventing resource leaks, in the same fashion as RAII allows to avoid leaks on a scope level. It might also allow for optimizations.
Examples around error reductions and simplifications
Here is one motivating example of how structured concurrency can simplify things:
We are building a building a web application
A
, which is intended to handle at least 1000 transactions per second. Internally each transaction requires a few concurrent interactions, which will involve reaching out to remote services. When one of those transactions fails, we need to perform certain actions. E.g. we need to call another serviceB
for a cleanup or rollback. Without structured concurrency, we might have the idea just to dospawn(cleanup_task())
in order to do this. While this works, it has a side effect:cleanup
tasks might still be running while the main webservice handler has already terminated. This sounds harmless at first, but can have surprising consequences: We obviously want our services to be resilient against overloads, so we limit the amount of concurrent requests to2000
via an asyncSemaphore
. This works fine for our main service handler. But what happens if lots of transactions are failing? How manycleanup
tasks can run at the same point of time? The answer is unfortunately, that the number of those is effectively unbounded. Thereby our service can be overloaded through queuing up cleanup tasks - even though we protected ourself against too many concurrent API calls. This can lead to large scale outages in distributed systems.By making sure all cleanup logic is performed inside the lifetime/scope of the main service handler, we can guarantee that the number of cleanup tasks is also bounded by our
Semaphore
.Another example could be applying configuration changes at runtime: While our service is running we want to able to update it's configuration. After the configuration change is applied no transaction should still be utilizing the old configuration. What we need to do now is:
Without having a structured approach for concurrency, this is a lot more of a complicated problem than it sounds. Any old transaction might have spawned a subtask which might still be executing after we have updated the configuration. There is no easy way to check for the higher level code if everything has finished.
Potential for optimizations
The application of structured concurrency might allow for optimizations. E.g. we might be able to allow subtasks to borrow data inside the parent tasks scope without the need for additional heap allocations. Since the exact mechanisms are however not yet designed, the exact potential is unknown.
Core requirements
I think the core requirements for structured concurrency are:
futures-rs
.Regarding the last point I am not sure whether automatic error propagation is a required point of structured concurrency and whether it can be achieved on a task level, but it definitely makes things easier.
Do we actually need to have builtin support for this?
Rusts
async/await
mechanism already provides structured concurrency inside a particulartask
: By utilizing tools likeselect!
orjoin!
we can run multiple child-tasks which are constrained to the same lifetime - which is the current scope. This is not possible in Go or Kotlin - which require an explicit child-task to be spawned to achieve the behavior. Therefore the benefits might be lower.I built an examples in futures-intrusive around those mechanisms.
However the concurrency inside a certain
task
will not scale very well, due requiring polling of all childFuture
s. Therefore real concurrenttask
s will be required in most applications.On this level we have a set of tools in our toolbox that allow us to structure our current tasks manually:
Oneshot
channels or the newJoinHandle
sHowever these tools all require explicit code in order to guarantee correctness. Builtin support for structured concurrency could improve on usability and allow more developers to use good and correct defaults.
And as mentioned earlier, I think builtin support could also allow for new usages, e.g. borrowing inside child tasks or potential scheduler improvements when switching between child tasks and parent tasks.
The following posts are now mainly a braindump around how these requirements could be fulfilled and how they align with existing mechanisms.
The text was updated successfully, but these errors were encountered: