Fix RepoCleanup not Removed on Master-Failover (#49217) #49239

original-brownbear · 2019-11-18T12:23:15Z

The logic for cleanupInProgress() was backwards everywhere (method itself and
all but one user). Also, we weren't checking it when removing a repository.

This lead to a bug (in the one spot that didn't use the method backwards) that prevented
the cleanup cluster state entry from ever being removed from the cluster state if master
failed over during the cleanup process.

This change corrects the backwards logic, adds a test that makes sure the cleanup
is always removed and adds a check that prevents repository removal during cleanup
to the repositories service.

Also, the failure handling logic in the cleanup action was broken. Repeated invocation would lead to the cleanup being removed from the cluster state even if it was in progress. Fixed by adding a flag that indicates whether or not any removal of the cleanup task from the cluster state must be executed. Sorry for mixing this in here, but I had to fix it in the same PR, as the first test (for master-failover) otherwise would often just delete the blocked cleanup action as a result of a transport master action retry.

backport of #49217

The logic for `cleanupInProgress()` was backwards everywhere (method itself and all but one user). Also, we weren't checking it when removing a repository. This lead to a bug (in the one spot that didn't use the method backwards) that prevented the cleanup cluster state entry from ever being removed from the cluster state if master failed over during the cleanup process. This change corrects the backwards logic, adds a test that makes sure the cleanup is always removed and adds a check that prevents repository removal during cleanup to the repositories service. Also, the failure handling logic in the cleanup action was broken. Repeated invocation would lead to the cleanup being removed from the cluster state even if it was in progress. Fixed by adding a flag that indicates whether or not any removal of the cleanup task from the cluster state must be executed. Sorry for mixing this in here, but I had to fix it in the same PR, as the first test (for master-failover) otherwise would often just delete the blocked cleanup action as a result of a transport master action retry.

original-brownbear · 2019-11-18T12:29:42Z

Jenkins test this

original-brownbear · 2019-11-18T12:49:51Z

Jenkins run elasticsearch-ci/master-fwc

original-brownbear · 2019-11-18T13:19:25Z

Jenkins run elasticsearch-ci/1

original-brownbear · 2019-11-18T13:19:57Z

Jenkins run elasticsearch-ci/master-fwc

original-brownbear · 2019-11-18T14:09:35Z

@elasticmachine run elasticsearch-ci/master-fwc

original-brownbear added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Nov 18, 2019

Merge remote-tracking branch 'elastic/7.x' into 49217-7.x

960b780

original-brownbear merged commit 25cc8e3 into elastic:7.x Nov 18, 2019

original-brownbear deleted the 49217-7.x branch November 18, 2019 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RepoCleanup not Removed on Master-Failover (#49217) #49239

Fix RepoCleanup not Removed on Master-Failover (#49217) #49239

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

Fix RepoCleanup not Removed on Master-Failover (#49217) #49239

Fix RepoCleanup not Removed on Master-Failover (#49217) #49239

Conversation

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019

original-brownbear commented Nov 18, 2019