Throttle per-index snapshot deletes #100793

DaveCTurner · 2023-10-13T05:46:38Z

Each per-index process during snapshot deletion takes some nonzero
amount of working memory to hold the relevant snapshot IDs and metadata
generations etc. which we can keep under tighter limits and release
sooner if we limit the number of per-index processes running
concurrently. That's what this commit does.

Each per-index process during snapshot deletion takes some nonzero amount of working memory to hold the relevant snapshot IDs and metadata generations etc. which we can keep under tighter limits and release sooner if we limit the number of per-index processes running concurrently. That's what this commit does.

ywangd

I have a couple questions. They may all be inconsequential. I'd appreciate your inputs to help my understanding. Thanks!

ywangd · 2023-10-17T01:03:15Z

server/src/main/java/org/elasticsearch/repositories/RepositoryData.java

+        if (snapshotsToDelete.containsAll(snapshotsContainingIndex)) {
+            return snapshotsContainingIndex.size() > snapshotsToDelete.size();


I think this means there are possibly duplicated SnapshotId entries in snapshotsContainingIndex? Why would that be possible? If there are duplicated entries, why do we consider this index to be "updated but not removed"? It seems to me that any duplicated entries are removed when computing survivingSnapshots.
Given these questions, I think some comments are necessary.

This is copied verbatim from the previous code. I also find it unclear, but I haven't had a chance to think about it very hard. Some comments would be welcome, but maybe not in this PR.

Yeah I am aware this is existing code. I assumed that you knew the answers. Since it's unclear to you as well, perhaphs adding a TODO is useful?

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

.../java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryDeleteThrottlingTests.java

ywangd · 2023-10-17T05:25:07Z

.../java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryDeleteThrottlingTests.java

+    // This test ensures that we appropriately throttle the per-index activity when deleting a snapshot by marking an index as "active" when
+    // its index metadata is read, and then as "inactive" when it updates the shard-level metadata. Without throttling, we would pretty much


IIUC, this depends on the index having only a single shard. If that is case, maybe worth to mention in this comment.

ywangd · 2023-10-17T06:00:00Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            // etc. which we can keep under tighter limits and release sooner if we limit the number of concurrently processing indices.
+            // Each one needs at least one snapshot thread at all times, so threadPool.info(SNAPSHOT).getMax() of them at once is enough to
+            // keep the threadpool fully utilized.
+            ThrottledIterator.run(


This is a neat class, TIL. In BlobStoreRepository, we have 3 ways for running jobs on a thread pool. I wonder whethere there is some possiblity to consolidate them:

ThreadPool#executor#execute - eagerly runs jobs on all available threads and eagerly queue remaining jobs

ThrottledTaskRunner#enqueueTask - Uses up to only a specified number of threads in the underlying threadpoo. Manages its own queue to avoid flooding the threadpool's queue.

ThrottledIterator#run - Uses up to only a specified number of threads in the underlying threadpool. Does not manage its own queue but still avoids flooding the threadpool's queue with an iterator. The usage of iterator also makes it more memory efficient than the above classes because jobs are not materialized till they are needed. One important difference with this class is that jobs must be added only once with an Iterator. It is not possible to add more jobs afterwards.

I am sure there are details that I am not aware yet. But conceptually it seems useful to have one abstraction that does the following:

Configurable max number of threads to use

Configurable queuing strategy, aggressive on conservative.

Able to take new jobs beyond its instantiation time

Eagerly run jobs on a single (or configurable) thread similar to ThrottledTaskRunner

Its most basic form would essentially behave just like a ThreadPool#executor and the advanced forms would be like ThrottledTaskRunner or ThrottledIterator. If such thing is possible, we can potentially use it in all places of BlobStoreRepository. Right now, we are still using plain ThreadPool#executor at shard deletion level. If this usage also needs to be migrated, I wonder whether it is helpful if we could have an overarching Threading abstraction that manages all jobs came out from a snapshot deletion.

Apologies this feels a bit rambling on my side. Feel free to disagree. Thanks!

I'm not sure we need to consolidate these things, they're quite different in many important ways. Conceptually:

ThrottledTaskRunner is about fairness/priority in a threadpool queue (amongst different computations).

ThrottledIterator is about limiting work-in-progress (within the context of a single computation) and not really anything to do with threading.

It's appropriate to use a plain ThreadPool#executor for the shard-level operations, there are effectively no other operations for the threadpool to do at this point (except perhaps some older background cleanup work but that can wait). It's important for the shard-level operations to go through as fast as possible because they're holding up the next snapshot. We don't want to interleave them with any background cleanup work from a previous delete which is what a ThrottledTaskRunner would do.

It's important for the shard-level operations to go through as fast as possible because they're holding up the next snapshot.

I realised after writing this that of course this change does have the effect of interleaving the shard-level metadata ops with any leftover background cleanup activity a bit more. Eh. My feeling is that this is ok, background cleanup activity is not normally very onerous, and I believe it's more important to limit the WIP in this area.

Sorry for being dense here

ThrottledIterator is about limiting work-in-progress (within the context of a single computation) and not really anything to do with threading.

ThrottledIterator does not have an executor by itself. But it sorta expects the work item to fork. In use case here, it effectively controls fairness/priority in threadpool's queue?
By a single computation, do you mean all tasks are passed in once via the iterator? The way how it triggers the next around of run is sorta similar to how ThrottledTaskRunner#pollAndSpawn is triggered?

It's true that ThrottledIterator only makes sense if the tasks do not all complete immediately on the calling thread, but that might mean that they dispatch a single task to a threadpool executor, or they dispatch several tasks (which in turn possibly dispatch further tasks) or indeed they send some work to another node entirely.

Originally ThrottledTaskRunner did not take a Releasable that tasks could use to indicate their completion on a different thread, it was only about whether the forked task had completed execution on the calling thread or not. Now we have that I guess we can cover some similar cases with it. I'd still want it to take an Iterator which computes the tasks on the fly, rather than having a fully-materialized queue tho.

ywangd

LGTM

The ongoing discussion about ThrottleIterator etc is really for my education and does not block this PR.

elasticsearchmachine · 2023-10-17T07:04:05Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner added >non-issue WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.12.0 labels Oct 13, 2023

This comment was marked as outdated.

Sign in to view

DaveCTurner force-pushed the 2023/10/13/throttle-snapshot-index-deletes branch 3 times, most recently from c1f29e0 to b236c7a Compare October 16, 2023 20:56

DaveCTurner force-pushed the 2023/10/13/throttle-snapshot-index-deletes branch from b236c7a to 9d5d363 Compare October 16, 2023 21:01

DaveCTurner requested a review from ywangd October 16, 2023 21:59

DaveCTurner marked this pull request as ready for review October 16, 2023 21:59

ywangd reviewed Oct 17, 2023

View reviewed changes

ywangd approved these changes Oct 17, 2023

View reviewed changes

DaveCTurner added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed WIP labels Oct 17, 2023

Merge branch 'main' into 2023/10/13/throttle-snapshot-index-deletes

d9c7a0b

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 17, 2023

DaveCTurner added 2 commits October 17, 2023 08:09

TODO

7a6dc46

Better test comments

175d235

elasticsearchmachine merged commit b0469e9 into elastic:main Oct 17, 2023

DaveCTurner deleted the 2023/10/13/throttle-snapshot-index-deletes branch October 17, 2023 08:08

thecoop mentioned this pull request Jan 25, 2024

GC rate increases randomly, increasing the cpu to 100%, dropping indexing rate in v8.11.3 #103779

Closed

DaveCTurner restored the 2023/10/13/throttle-snapshot-index-deletes branch June 17, 2024 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttle per-index snapshot deletes #100793

Throttle per-index snapshot deletes #100793

DaveCTurner commented Oct 13, 2023

This comment was marked as outdated.

ywangd left a comment

ywangd Oct 17, 2023

DaveCTurner Oct 17, 2023

ywangd Oct 17, 2023

ywangd Oct 17, 2023

ywangd Oct 17, 2023

DaveCTurner Oct 17, 2023 •

edited

Loading

DaveCTurner Oct 17, 2023

ywangd Oct 17, 2023

DaveCTurner Oct 17, 2023

ywangd left a comment

elasticsearchmachine commented Oct 17, 2023

		if (snapshotsToDelete.containsAll(snapshotsContainingIndex)) {
		return snapshotsContainingIndex.size() > snapshotsToDelete.size();

		// This test ensures that we appropriately throttle the per-index activity when deleting a snapshot by marking an index as "active" when
		// its index metadata is read, and then as "inactive" when it updates the shard-level metadata. Without throttling, we would pretty much

Throttle per-index snapshot deletes #100793

Throttle per-index snapshot deletes #100793

Conversation

DaveCTurner commented Oct 13, 2023

This comment was marked as outdated.

ywangd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywangd left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 17, 2023

DaveCTurner Oct 17, 2023 •

edited

Loading