Guard Repository#getRepositoryData for exception throw #50970

albertzaharovits · 2020-01-14T12:31:25Z

In practice, Repository#getRepositoryData can throw exceptions, for example the BlobStoreRepository implementation of it

elasticsearch/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

Line 1088 in 3c6f649

throw e;

The getRepositoryData method is called before any operation involving snapshots. If the exception is thrown on the cluster state applier thread, as it happens when creating a new snapshot, the state applier thread will loop, retrying to apply the same state. If the exception is persistent, as is the case of an encrypted repository with a wrong password (see #50846), the state applier thread will get stuck.

This commit tracks and fixes all uses of Repository#getRepositoryData to make sure that at some point the exception is catched and forwarded to a Listener#onFailure.

elasticmachine · 2020-01-14T12:31:28Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear · 2020-01-14T12:34:54Z

@albertzaharovits sorry for not thinking of it last night, but maybe instead of fixing the callers to the repository, shouldn't we better just fix org.elasticsearch.repositories.blobstore.BlobStoreRepository#getRepositoryData to not throw and always resolve the listener? It seems like a pretty easy change as well.

albertzaharovits · 2020-01-14T12:55:29Z

@original-brownbear TBH I don't know how it's better, I think it depends on the conventions in the codebase. I don't like functions that throw and have a listener parameter. But in this case, it felt safer to me to handle the exception at a higher level since technically BlobStoreRepository#getRepositoryData can be overridden (and also other implementations of Repository#getRepositoryData can also mistakenly throw).

We could also fix BlobStoreRepository#getRepositoryData to not throw and cover the case that some implementation erroneously throws (as this PR does).

I'm fine with any of the three options. Let me know your preference.

albertzaharovits · 2020-01-14T15:49:52Z

We've synced and learned that the problem is not a case of a state applier thread, but the logic of listing to get the latest generation followed by a failure to read the blob for that generation and resetting the generation.
Armin is on it.

original-brownbear · 2020-01-14T16:24:58Z

Follow up is in #50987

albertzaharovits added 2 commits January 14, 2020 14:05

Done

1f61850

nit

51e091b

albertzaharovits added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.6.0 labels Jan 14, 2020

albertzaharovits requested a review from original-brownbear January 14, 2020 12:31

albertzaharovits self-assigned this Jan 14, 2020

albertzaharovits added 2 commits January 14, 2020 14:57

Merge branch 'master' into get-repository-exception

28458d4

Checkstyle

b74e3c4

albertzaharovits closed this Jan 14, 2020

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard Repository#getRepositoryData for exception throw #50970

Guard Repository#getRepositoryData for exception throw #50970

albertzaharovits commented Jan 14, 2020 •

edited

Loading

elasticmachine commented Jan 14, 2020

original-brownbear commented Jan 14, 2020

albertzaharovits commented Jan 14, 2020

albertzaharovits commented Jan 14, 2020

original-brownbear commented Jan 14, 2020

Guard Repository#getRepositoryData for exception throw #50970

Guard Repository#getRepositoryData for exception throw #50970

Conversation

albertzaharovits commented Jan 14, 2020 • edited Loading

elasticmachine commented Jan 14, 2020

original-brownbear commented Jan 14, 2020

albertzaharovits commented Jan 14, 2020

albertzaharovits commented Jan 14, 2020

original-brownbear commented Jan 14, 2020

albertzaharovits commented Jan 14, 2020 •

edited

Loading