-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: stop background threads between estimations #7689
Conversation
Explicitly stop and wait for prefetching background threads to terminate when a testbed is dropped. This avoids that estimations are influenced by background threads left over from previous estimations, which we have observed since merging near#7661.
|
||
impl<'a> Drop for Testbed<'a> { | ||
fn drop(&mut self) { | ||
self.inner.stop_prefetching_threads(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's propagate this drop to ShardTriesInner
? Joining thread on drop should be responsibility of the entity which spawns the threads. As a rule-of-thumb, every spawn thread should have its' join.
Perhaps also use something like
struct ShardTriesInner {
/// Prefetcher state, such as IO threads, per shard.
prefetchers: RwLock<HashMap<ShardUId, (PrefetchApi, Vec<JoinHandle<()>>)>>,
}
?
The funky Clone
impl is funky, better if we can separate Prefetcher
and PrefetcherHandle
at the type level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, 100% agree that this clone thing is too funky, thanks for pointing it out.
However, ShardTriesInner::drop
seems like the wrong place to me. At least I failed in my attempt to make it work.
The problem is that we really must close the crossbeam channel first and only then join the threads. Joining when dropping Testbed
only works because the testbed itself outlives all places that could hold a clone of a channel sender, which is not true for ShardTriesInner
.
The sender is stored inside PrefetchApi
, which in turn is created by ShardTriesInner
. PrefetchApi
is cloned around into every instance of Trie
and that lives in various other structs, such as TrieUpdate
. And unlike Testbed
, ShardTriesInner
is not guaranteed to outlive all of those. Thus, joining when ShardTriesInner
is dropped results in deadlocks as the background threads can still be waiting on an open channel.
I have now attempted to apply your suggestion in a slightly different way. Definitive ownership of the channel sender AND the join handles is now given to the clonable struct WorkQueue
, which itself is stored inside PrefetchApi
. This way all clones of the sender also clone the Arc<Vec<JoinHandle>>
. When the last instance that combination is dropped, it is safe to join the threads.
I am still not 100% happy with my implementation, though. It is still gimmicky and there are too many nested structs for my taste. But it's the truest representation of ownership that I could come up with right now.^^ Ideas for improvements are welcome. :)
The potential clones of `PrefetchApi` is unknown by its initial creator, the `InnerShardTries` instance. But the last channel sender must be dropped before joining the threads. Therefore, it is tricky to find the right place to join background threads. To solve it locally inside `core/store/src/trie/prefetching_trie_storage.rs` we use a helper struct `JoinGuard`. Dropping the join guard joins all threads. It is stored inside a reference counted pointer right after the the crossbeam sender, such that they are always cloned together. This ensures the join guard outlives the last sender to the channel.
I had a call with matklad just now. Plan of action is to create a second channel of some kind for actively shutting down IO threads and call those in the |
Explicitly stop and wait for prefetching background threads to terminate
when a testbed is dropped. This avoids that estimations are influenced
by background threads left over from previous estimations, which we have
observed since merging #7661.