fix: stop background threads between estimations #7689

jakmeier · 2022-09-26T12:30:52Z

Explicitly stop and wait for prefetching background threads to terminate
when a testbed is dropped. This avoids that estimations are influenced
by background threads left over from previous estimations, which we have
observed since merging #7661.

Explicitly stop and wait for prefetching background threads to terminate when a testbed is dropped. This avoids that estimations are influenced by background threads left over from previous estimations, which we have observed since merging near#7661.

matklad · 2022-09-26T13:34:09Z

runtime/runtime-params-estimator/src/estimator_context.rs

+
+impl<'a> Drop for Testbed<'a> {
+    fn drop(&mut self) {
+        self.inner.stop_prefetching_threads();


Let's propagate this drop to ShardTriesInner? Joining thread on drop should be responsibility of the entity which spawns the threads. As a rule-of-thumb, every spawn thread should have its' join.

Perhaps also use something like

struct ShardTriesInner { /// Prefetcher state, such as IO threads, per shard. prefetchers: RwLock<HashMap<ShardUId, (PrefetchApi, Vec<JoinHandle<()>>)>>, }

?

The funky Clone impl is funky, better if we can separate Prefetcher and PrefetcherHandle at the type level.

Yeah, 100% agree that this clone thing is too funky, thanks for pointing it out.

However, ShardTriesInner::drop seems like the wrong place to me. At least I failed in my attempt to make it work.

The problem is that we really must close the crossbeam channel first and only then join the threads. Joining when dropping Testbed only works because the testbed itself outlives all places that could hold a clone of a channel sender, which is not true for ShardTriesInner.

The sender is stored inside PrefetchApi, which in turn is created by ShardTriesInner. PrefetchApi is cloned around into every instance of Trie and that lives in various other structs, such as TrieUpdate. And unlike Testbed, ShardTriesInner is not guaranteed to outlive all of those. Thus, joining when ShardTriesInner is dropped results in deadlocks as the background threads can still be waiting on an open channel.

I have now attempted to apply your suggestion in a slightly different way. Definitive ownership of the channel sender AND the join handles is now given to the clonable struct WorkQueue, which itself is stored inside PrefetchApi. This way all clones of the sender also clone the Arc<Vec<JoinHandle>>. When the last instance that combination is dropped, it is safe to join the threads.

I am still not 100% happy with my implementation, though. It is still gimmicky and there are too many nested structs for my taste. But it's the truest representation of ownership that I could come up with right now.^^ Ideas for improvements are welcome. :)

The potential clones of `PrefetchApi` is unknown by its initial creator, the `InnerShardTries` instance. But the last channel sender must be dropped before joining the threads. Therefore, it is tricky to find the right place to join background threads. To solve it locally inside `core/store/src/trie/prefetching_trie_storage.rs` we use a helper struct `JoinGuard`. Dropping the join guard joins all threads. It is stored inside a reference counted pointer right after the the crossbeam sender, such that they are always cloned together. This ensures the join guard outlives the last sender to the channel.

jakmeier · 2022-09-27T12:23:47Z

I had a call with matklad just now. Plan of action is to create a second channel of some kind for actively shutting down IO threads and call those in the impl Drop of ShardTries. Anyone still holding on to PrefetchApi at this point will not be able to send prefetch requests, which seems okay.

jakmeier requested a review from Ekleog-NEAR September 26, 2022 12:30

jakmeier requested a review from a team as a code owner September 26, 2022 12:30

jakmeier requested a review from matklad September 26, 2022 12:30

jakmeier added the A-params-estimator Area: runtime params estimator label Sep 26, 2022

matklad reviewed Sep 26, 2022

View reviewed changes

jakmeier added 2 commits September 26, 2022 21:56

fix tests

0d651aa

jakmeier requested a review from matklad September 26, 2022 20:21

jakmeier added 2 commits September 26, 2022 22:23

move JoinHandle to bottom

b5bed6a

update comment

6b16bd0

jakmeier mentioned this pull request Sep 28, 2022

fix: properly stop prefetching background threads #7712

Merged

jakmeier closed this Sep 28, 2022

jakmeier deleted the fix-estimator-bkg-threads branch September 28, 2022 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop background threads between estimations #7689

fix: stop background threads between estimations #7689

jakmeier commented Sep 26, 2022

matklad Sep 26, 2022

jakmeier Sep 26, 2022

jakmeier commented Sep 27, 2022

fix: stop background threads between estimations #7689

fix: stop background threads between estimations #7689

Conversation

jakmeier commented Sep 26, 2022

matklad Sep 26, 2022

Choose a reason for hiding this comment

jakmeier Sep 26, 2022

Choose a reason for hiding this comment

jakmeier commented Sep 27, 2022