Mark Deleted Snapshot Directories with Tombstones #40228

original-brownbear · 2019-03-19T20:57:43Z

Marking this WIP for now as it's just a suggestion for how to address the failing to delete indices issue in the short-term. This is not an actual fix for the situation, but rather a way of enabling a future fix in a safe way!

The issue with dangling indices files in the blobstore is, that we delete the indices from the metadata before we delete the actual indices files. So if deleting index files fails and there's no more references to the index in any snapshot metadata -> it will never be deleted.

Enabling an automatic cleanup of these unreferenced indices in a safe way is hard. It can technically be done by simply listing all indices folders and then reading all metadata and defining those indices folders as stale that aren't referenced in any meta data. The trouble with this approach lies in the fact that eventually consistent blob stores like S3 do not offer a consistent view of metadata and list of index folders. So technically, one could run into the situation where an index folder is erroneously assumed to not be referenced because the metadata referencing it was not yet readable.

Fortunately, S3 offers one piece of consistency that we can exploit to make such an automatic cleanup safe after-all. That is, read after write is consistent so long as no read has proceeded a write (see
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel).
So the following would be a safe way of identifying most stale index folders without running the risk of false positives:

Instead of currently doing:

Update metadata (removing snapshots from it)
Loop over all index blobs and delete them ignoring errors

this PR does:

Update metadata (removing snapshots from it)
Write tombstone blob for each index that is now unreferenced
Loop over all index blobs and keep track of errors
Delete tombstone only if no errors were registered for a blob

-> any tombstones written will be safe since they are written after the updated metadata was written.
-> this would allow for a consistent check for stale index folders because one could first find the potential stale folders by simply listing index folders and comparing and cross-checking with the metadata and then after doing so check if the tombstone blob exists (if you never check for the existence of this blob in any other situation, this check is a consistent operation as explained in the linked S3 docs).
This has some limitations and would fail to correctly mark stale folders in two scenarios:

Writing a tombstone blob fails after the metadata was updated already (possible but a lot less likely than a failure during the following possibly long running delete operation looping over all the files in the index folder)
A data node writing a blob for a given index/shard after the master node completed all the steps successfully and removed the tombstone (this can be overcome by never deleting tombstone blobs as explained inline)

-> This still seems like a very very safe approach that we could back port all the way to 6.x without risk and that would allow writing a safe cleanup utility/script for stale blobs.

Closes [CI] DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode failed #39852 (adjusted test to understand paths that have tombstones as deleted).
- I also manually verified this works by adding a sleep in the master's delete logic

* Closes elastic#39852

elasticmachine · 2019-03-19T20:57:45Z

Pinging @elastic/es-distributed

original-brownbear · 2019-03-20T06:21:33Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

                } catch (IOException ioe) {
                    // a different IOException occurred while trying to delete - will just log the issue for now
                    logger.debug(() -> new ParameterizedMessage("[{}] index [{}] no longer part of any snapshots in the repository, " +
                        "but failed to clean up its index folder.", metadata.name(), indexId), ioe);
+                    deleteSuccess = false;
+                }
+                if (deleteSuccess) {


Not adding any logic to deal with the failure scenario here because that is a fairly complex problem to automate (basically you'd want to recursively delete an index path once you're sure no more blobs will be added under it, this is trivial to do manually but non-trivial to do automatically).

original-brownbear · 2019-03-20T06:26:22Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    deleteSuccess = false;
+                }
+                if (deleteSuccess) {
+                    indicesBlobContainer.deleteBlob(tombstoneBlob(indexId));


One alternative here could be to not delete these ever:

The storage cost of keeping them around would be minimal

It would solve the issue of "rogue" data nodes writing shard data for an already deleted snapshot and us missing this situation because they do so after this code finishes

original-brownbear · 2019-03-20T07:06:14Z

@ywelsch wdyt? IMO, this is one step we could safely take towards a more controllable future, even though it doesn't fix anything right here and right now (apart from the test) it at least enables some control of the situation in the short-term.

original-brownbear · 2019-04-03T12:26:13Z

Closing here, a more complete solution to stale data on master failover that incorporates the ideas here is incoming in https://github.com/elastic/elasticsearch/compare/master...original-brownbear:delete-lock-via-cs?expand=1

Mark Deleted Snapshot Directories with Tombstones

bd4685c

* Closes elastic#39852

original-brownbear added >enhancement WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.0.0 v6.7.0 v8.0.0 v7.2.0 labels Mar 19, 2019

original-brownbear added 3 commits March 20, 2019 07:02

Merge remote-tracking branch 'elastic/master' into 39852

6e93d4a

nicer

75afe79

nicer

0edd62e

original-brownbear commented Mar 20, 2019

View reviewed changes

original-brownbear requested a review from ywelsch March 20, 2019 07:04

original-brownbear mentioned this pull request Mar 20, 2019

Crashing during snapshot deletion might result in unreferenced data left in repository #13159

Closed

original-brownbear added 3 commits March 20, 2019 13:28

Merge remote-tracking branch 'elastic/master' into 39852

22697a3

ignore empty folder not exist

2f54e85

Merge remote-tracking branch 'elastic/master' into 39852

397a977

danielmitterdorfer added v6.7.1 and removed v6.7.0 labels Mar 21, 2019

colings86 added v6.7.2 and removed v6.7.1 labels Mar 30, 2019

original-brownbear closed this Apr 3, 2019

jakelandis added v7.0.0-rc2 and removed v7.0.0 labels Apr 3, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark Deleted Snapshot Directories with Tombstones #40228

Mark Deleted Snapshot Directories with Tombstones #40228

original-brownbear commented Mar 19, 2019 •

edited

Loading

elasticmachine commented Mar 19, 2019

original-brownbear Mar 20, 2019

original-brownbear Mar 20, 2019

original-brownbear commented Mar 20, 2019

original-brownbear commented Apr 3, 2019

Mark Deleted Snapshot Directories with Tombstones #40228

Mark Deleted Snapshot Directories with Tombstones #40228

Conversation

original-brownbear commented Mar 19, 2019 • edited Loading

elasticmachine commented Mar 19, 2019

original-brownbear Mar 20, 2019

Choose a reason for hiding this comment

original-brownbear Mar 20, 2019

Choose a reason for hiding this comment

original-brownbear commented Mar 20, 2019

original-brownbear commented Apr 3, 2019

original-brownbear commented Mar 19, 2019 •

edited

Loading