-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throttle per-index snapshot deletes #100793
Throttle per-index snapshot deletes #100793
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
c1f29e0
to
b236c7a
Compare
Each per-index process during snapshot deletion takes some nonzero amount of working memory to hold the relevant snapshot IDs and metadata generations etc. which we can keep under tighter limits and release sooner if we limit the number of per-index processes running concurrently. That's what this commit does.
b236c7a
to
9d5d363
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple questions. They may all be inconsequential. I'd appreciate your inputs to help my understanding. Thanks!
if (snapshotsToDelete.containsAll(snapshotsContainingIndex)) { | ||
return snapshotsContainingIndex.size() > snapshotsToDelete.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means there are possibly duplicated SnapshotId
entries in snapshotsContainingIndex
? Why would that be possible? If there are duplicated entries, why do we consider this index to be "updated but not removed"? It seems to me that any duplicated entries are removed when computing survivingSnapshots
.
Given these questions, I think some comments are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied verbatim from the previous code. I also find it unclear, but I haven't had a chance to think about it very hard. Some comments would be welcome, but maybe not in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am aware this is existing code. I assumed that you knew the answers. Since it's unclear to you as well, perhaphs adding a TODO is useful?
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Show resolved
Hide resolved
.../java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryDeleteThrottlingTests.java
Outdated
Show resolved
Hide resolved
// This test ensures that we appropriately throttle the per-index activity when deleting a snapshot by marking an index as "active" when | ||
// its index metadata is read, and then as "inactive" when it updates the shard-level metadata. Without throttling, we would pretty much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this depends on the index having only a single shard. If that is case, maybe worth to mention in this comment.
// etc. which we can keep under tighter limits and release sooner if we limit the number of concurrently processing indices. | ||
// Each one needs at least one snapshot thread at all times, so threadPool.info(SNAPSHOT).getMax() of them at once is enough to | ||
// keep the threadpool fully utilized. | ||
ThrottledIterator.run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat class, TIL. In BlobStoreRepository
, we have 3 ways for running jobs on a thread pool. I wonder whethere there is some possiblity to consolidate them:
ThreadPool#executor#execute
- eagerly runs jobs on all available threads and eagerly queue remaining jobsThrottledTaskRunner#enqueueTask
- Uses up to only a specified number of threads in the underlying threadpoo. Manages its own queue to avoid flooding the threadpool's queue.ThrottledIterator#run
- Uses up to only a specified number of threads in the underlying threadpool. Does not manage its own queue but still avoids flooding the threadpool's queue with an iterator. The usage of iterator also makes it more memory efficient than the above classes because jobs are not materialized till they are needed. One important difference with this class is that jobs must be added only once with anIterator
. It is not possible to add more jobs afterwards.
I am sure there are details that I am not aware yet. But conceptually it seems useful to have one abstraction that does the following:
- Configurable max number of threads to use
- Configurable queuing strategy, aggressive on conservative.
- Able to take new jobs beyond its instantiation time
- Eagerly run jobs on a single (or configurable) thread similar to
ThrottledTaskRunner
Its most basic form would essentially behave just like a ThreadPool#executor
and the advanced forms would be like ThrottledTaskRunner
or ThrottledIterator
. If such thing is possible, we can potentially use it in all places of BlobStoreRepository
. Right now, we are still using plain ThreadPool#executor
at shard deletion level. If this usage also needs to be migrated, I wonder whether it is helpful if we could have an overarching Threading abstraction that manages all jobs came out from a snapshot deletion.
Apologies this feels a bit rambling on my side. Feel free to disagree. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need to consolidate these things, they're quite different in many important ways. Conceptually:
ThrottledTaskRunner
is about fairness/priority in a threadpool queue (amongst different computations).ThrottledIterator
is about limiting work-in-progress (within the context of a single computation) and not really anything to do with threading.
It's appropriate to use a plain ThreadPool#executor
for the shard-level operations, there are effectively no other operations for the threadpool to do at this point (except perhaps some older background cleanup work but that can wait). It's important for the shard-level operations to go through as fast as possible because they're holding up the next snapshot. We don't want to interleave them with any background cleanup work from a previous delete which is what a ThrottledTaskRunner
would do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important for the shard-level operations to go through as fast as possible because they're holding up the next snapshot.
I realised after writing this that of course this change does have the effect of interleaving the shard-level metadata ops with any leftover background cleanup activity a bit more. Eh. My feeling is that this is ok, background cleanup activity is not normally very onerous, and I believe it's more important to limit the WIP in this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being dense here
ThrottledIterator is about limiting work-in-progress (within the context of a single computation) and not really anything to do with threading.
ThrottledIterator
does not have an executor by itself. But it sorta expects the work item to fork. In use case here, it effectively controls fairness/priority in threadpool's queue?
By a single computation
, do you mean all tasks are passed in once via the iterator? The way how it triggers the next around of run
is sorta similar to how ThrottledTaskRunner#pollAndSpawn
is triggered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that ThrottledIterator
only makes sense if the tasks do not all complete immediately on the calling thread, but that might mean that they dispatch a single task to a threadpool executor, or they dispatch several tasks (which in turn possibly dispatch further tasks) or indeed they send some work to another node entirely.
Originally ThrottledTaskRunner
did not take a Releasable
that tasks could use to indicate their completion on a different thread, it was only about whether the forked task had completed execution on the calling thread or not. Now we have that I guess we can cover some similar cases with it. I'd still want it to take an Iterator
which computes the tasks on the fly, rather than having a fully-materialized queue tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The ongoing discussion about ThrottleIterator etc is really for my education and does not block this PR.
Pinging @elastic/es-distributed (Team:Distributed) |
Each per-index process during snapshot deletion takes some nonzero
amount of working memory to hold the relevant snapshot IDs and metadata
generations etc. which we can keep under tighter limits and release
sooner if we limit the number of per-index processes running
concurrently. That's what this commit does.