-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard Repository#getRepositoryData for exception throw #50970
Guard Repository#getRepositoryData for exception throw #50970
Conversation
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@albertzaharovits sorry for not thinking of it last night, but maybe instead of fixing the callers to the repository, shouldn't we better just fix |
@original-brownbear TBH I don't know how it's better, I think it depends on the conventions in the codebase. I don't like functions that throw and have a listener parameter. But in this case, it felt safer to me to handle the exception at a higher level since technically We could also fix I'm fine with any of the three options. Let me know your preference. |
We've synced and learned that the problem is not a case of a state applier thread, but the logic of listing to get the latest generation followed by a failure to read the blob for that generation and resetting the generation. |
Follow up is in #50987 |
In practice,
Repository#getRepositoryData
can throw exceptions, for example theBlobStoreRepository
implementation of itelasticsearch/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Line 1088 in 3c6f649
The
getRepositoryData
method is called before any operation involving snapshots. If the exception is thrown on the cluster state applier thread, as it happens when creating a new snapshot, the state applier thread will loop, retrying to apply the same state. If the exception is persistent, as is the case of an encrypted repository with a wrong password (see #50846), the state applier thread will get stuck.This commit tracks and fixes all uses of
Repository#getRepositoryData
to make sure that at some point the exception is catched and forwarded to aListener#onFailure
.