Open engine should keep only starting commit #28228

dnhatn · 2018-01-15T18:09:39Z

Keeping existing unsafe commits when opening an engine can be problematic because these commits are not safe at the recovering time but they can suddenly become safe in the future.
The following issues can happen if unsafe commits are kept oninit.

1. Replica can use unsafe commit in peer-recovery. This happens when a replica with a safe commit c1(max_seqno=1) and an unsafe commit c2(max_seqno=2) recovers from a primary with c1(max_seqno=1). If a new document(seqno=2) is added without flushing, the global checkpoint is advanced to 2; and the replica recovers again, it will use the unsafe commit c2(max_seqno=2 <= gcp=2) as the starting commit for sequenced-based recovery even the commit c2 contains a stale operation and the document(with seqno=2) will not be replicated to the replica.

2. Min translog gen for recovery can go backwards in peer-recovery. This happens when are replica with a safe commit c1(local_checkpoint=1, recovery_translog_gen=1) and an unsafe commit c2(local_checkpoint=2, recovery_translog_gen=2). The replica recovers from a primary, and keeps c2 as the last commit, then sets last_translog_gen to 2. Flushing a new commit on the replica will cause exception as the new last commit c3 will have recovery_translog_gen=1. The recovery translog generation of a commit is calculated based on the current local checkpoint. The local checkpoint of c3 is 1 while the local checkpoint of c2 is 2.

3. Commit without translog can be used for recovery. An old index, which was created before multiple-commits is introduced (v6.2), may not have a safe commit. If that index has a snapshotted commit without translog and an unsafe commit, the policy can consider the snapshotted commit as a safe commit for recovery even the commit does not have translog.

These issues can be avoided if the combined deletion policy keeps only the starting commit onInit.

Relates #27804
Relates #28181

The current implementation of the combined deletion policy causes the following issues. # Replica can use an unsafe commit for sequence-based recovery 1. A replica has two commits c1 {max_seqno=1} and c2 {max_seqno=2} with op2 is a stale operation 2. A primary has one commit c1 {max_seqno=1} and the global checkpoint is 1 3. A replica recovers from the primary with c1 as a starting commit 4. Indexes one document to the cluster; the global checkpoint advances to 2 5. Restart the replica 6. The replica will mistakenly use c2 (as max_seqno <= gcp) as a starting commit even it contains stale ops # Translog generations can go backwards in sequence-based recovery 1. A replica has two commits c1 {local_checkpoint=1, translog_gen=1} and c2 {local_checkpoint=2, translog_gen=2} 2. A primary has one commit c1 {local_checkpoint=1} and the global checkpoint is 1 3. A replica recovers from the primary with c1 as a starting commit - c2 is kept as the last commit 4. Flush on a replica will throw exception as a new commit has `translog-gen=1` which is smaller the translog-gen of c2 # Snapshotted commit without translog can be considered a safe commit 1. An index was created before 6.2 has two commits: c1 (snapshotted without translog), c2 unsafe commit 2. We will select c2 as a starting commit when opening an engine as we make sure to select only commits with translog 3. Flush a new commit c3, the policy will consider c1 as a safe commit even it does not have translog. These issues can be avoided if the combined deletion policy keeps only the starting commit onInit.

bleskes

I think this makes sense. I left a suggestion. Also, I think it will be good to have an engine level test to show that opening an index cleans up commits above the safe commit. Last, we discussed some clarification for the PR description. It will help future readers to incorporate those in the PR description.

bleskes · 2018-01-15T19:26:48Z

server/src/main/java/org/elasticsearch/index/engine/CombinedDeletionPolicy.java

        this.snapshottedCommits = new ObjectIntHashMap<>();
    }

    @Override
    public void onInit(List<? extends IndexCommit> commits) throws IOException {
        switch (openMode) {
            case CREATE_INDEX_AND_TRANSLOG:
+                assert startingCommit == null : "CREATE_INDEX_AND_TRANSLOG must not have starting commit; commit [" + startingCommit + "]";
                break;
            case OPEN_INDEX_CREATE_TRANSLOG:


since we're changing the logic of OPEN_INDEX_AND_TRANSLOG, I think we can also consolidate the code paths for OPEN_INDEX_CREATE_TRANSLOG & OPEN_INDEX_AND_TRANSLOG to be the same whenever we open an index - less exceptions and more unity. To do so we need to set startingCommit to a valid value in InternalEngine#getStartingCommitPoint whenever we are actually opening something.

bleskes · 2018-01-15T19:29:47Z

server/src/test/java/org/elasticsearch/index/engine/CombinedDeletionPolicyTests.java

@@ -211,6 +215,31 @@ public void testDeleteInvalidCommits() throws Exception {
        }
    }

+    public void testKeepOnlyStartingCommitOnInit() throws Exception {


can you add java docs explaining why this behavior is needed. I think it will be hard to figure out.

bleskes · 2018-01-15T19:30:16Z

server/src/main/java/org/elasticsearch/index/engine/CombinedDeletionPolicy.java

-                onCommit(commits);
+                assert startingCommit != null && commits.contains(startingCommit) : "Starting commit not in the existing commit list; "
+                    + "startingCommit [" + startingCommit + "], commit list [" + commits + "]";
+                keepOnlyStartingCommitOnInit(commits);


can you leave a big comment explaining why this is needed?

bleskes

LGTM. I left some minor feedback. No need for another review on my side.

bleskes · 2018-01-16T01:13:50Z

server/src/main/java/org/elasticsearch/index/engine/CombinedDeletionPolicy.java

+        safeCommit = startingCommit;
+        // OPEN_INDEX_CREATE_TRANSLOG can open an index commit from other shard with a different translog history,
+        // We therefore should not use that index commit to update the translog deletion policy.
+        if (openMode == EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG) {


This feels strange here. Move it to the constructor?

bleskes · 2018-01-16T01:14:45Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            case OPEN_INDEX_CREATE_TRANSLOG:
+                // Use the last commit
+                existingCommits = DirectoryReader.listCommits(store.directory());
+                startingIndexCommit = existingCommits.get(existingCommits.size() - 1);


can we assert there is only one? I believe this is the case.

I added an assert but moved to IndexShard#openIndexAndCreateTranslog. We have some existing tests which opens engine with multiple commits.

bleskes · 2018-01-16T01:16:57Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

-                // The index commit from IndexWriterConfig is null if the engine is open with other modes
-                // rather than CREATE_INDEX_AND_TRANSLOG. In those cases lastCommittedSegmentInfos will be retrieved from the last commit.
-                lastCommittedSegmentInfos = store.readCommittedSegmentsInfo(indexWriter.getConfig().getIndexCommit());
+                lastCommittedSegmentInfos = store.readLastCommittedSegmentsInfo();


I know we now trim things and this OK, but it feels risky - maybe we'll change our mind later in the future clean less heavily. It will be hard to find this place and change. Maybe just keep it as is?

The starting commit will be deleted before this line is executed if the open mode is OPEN_INDEX_CREATE_TRANSLOG as we have created a new commit (refresh translog uuid + gen).

This reverts commit a252d0b.

dnhatn · 2018-01-16T04:42:54Z

@bleskes Thank you for your helpful reviews. I will wait for CI then merge.

* master: [Docs] Fix base directory to include for put_mapping.asciidoc Open engine should keep only starting commit (elastic#28228) [Docs] Fix Java Api index administration usage (elastic#28133) Fix eclipse build. (elastic#28236) Never return null from Strings.tokenizeToStringArray (elastic#28224) Fallback to TransportMasterNodeAction for cluster health retries (elastic#28195) [Docs] Changes to ingest.asciidoc (elastic#28212) TEST: Update logging for testAckedIndexing

Keeping unsafe commits when opening an engine can be problematic because these commits are not safe at the recovering time but they can suddenly become safe in the future. The following issues can happen if unsafe commits are kept oninit. 1. Replica can use unsafe commit in peer-recovery. This happens when a replica with a safe commit c1 (max_seqno=1) and an unsafe commit c2 (max_seqno=2) recovers from a primary with c1(max_seqno=1). If a new document (seqno=2) is added without flushing, the global checkpoint is advanced to 2; and the replica recovers again, it will use the unsafe commit c2 (max_seqno=2 <= gcp=2) as the starting commit for sequenced based recovery even the commit c2 contains a stale operation and the document (with seqno=2) will not be replicated to the replica. 2. Min translog gen for recovery can go backwards in peer-recovery. This happens when a replica with a safe commit c1 (local_checkpoint=1, recovery_translog_gen=1) and an unsafe commit c2 (local_checkpoint=2, recovery_translog_gen=2). The replica recovers from a primary, and keeps c2 as the last commit, then sets last_translog_gen to 2. Flushing a new commit on the replica will cause exception as the new last commit c3 will have recovery_translog_gen=1. The recovery translog generation of a commit is calculated based on the current local checkpoint. The local checkpoint of c3 is 1 while the local checkpoint of c2 is 2. 3. Commit without translog can be used for recovery. An old index, which was created before multiple-commits is introduced (v6.2), may not have a safe commit. If that index has a snapshotted commit without translog and an unsafe commit, the policy can consider the snapshotted commit as a safe commit for recovery even the commit does not have translog. These issues can be avoided if the combined deletion policy keeps only the starting commit onInit. Relates #27804 Relates #28181

* master: (35 commits) Move the multi-get response tests to server Require JDK 9 for compilation (elastic#28071) Revert "[Docs] Fix Java Api index administration usage (elastic#28133)" Revert "[Docs] Fix base directory to include for put_mapping.asciidoc" Added multi get api to the high level rest client. [Docs] Clarify numeric datatype ranges (elastic#28240) [Docs] Fix base directory to include for put_mapping.asciidoc Open engine should keep only starting commit (elastic#28228) [Docs] Fix Java Api index administration usage (elastic#28133) Fix eclipse build. (elastic#28236) Never return null from Strings.tokenizeToStringArray (elastic#28224) Fallback to TransportMasterNodeAction for cluster health retries (elastic#28195) [Docs] Changes to ingest.asciidoc (elastic#28212) TEST: Update logging for testAckedIndexing [GEO] Add WKT Support to GeoBoundingBoxQueryBuilder Painless: Add whitelist extensions (elastic#28161) Fix daitch_mokotoff phonetic filter to use the dedicated Lucene filter (elastic#28225) Avoid doing redundant work when checking for self references. (elastic#26927) Fix casts in HotThreads. (elastic#27578) Ignore the `-snapshot` suffix when comparing the Lucene version in the build and the docs. (elastic#27927) ...

* master: Fix third-party audit tasks on JDK 8 Remove duplicated javadoc `fieldType` param Handle 5.6.6 and 6.1.2 release Introduce multi-release JAR Move the multi-get response tests to server Require JDK 9 for compilation (#28071) Revert "[Docs] Fix Java Api index administration usage (#28133)" Revert "[Docs] Fix base directory to include for put_mapping.asciidoc" Added multi get api to the high level rest client. [Docs] Clarify numeric datatype ranges (#28240) [Docs] Fix base directory to include for put_mapping.asciidoc Open engine should keep only starting commit (#28228)

* 6.x: Fix third-party audit tasks on JDK 8 Remove duplicated javadoc `fieldType` param Handle 5.6.6 and 6.1.2 release Introduce multi-release JAR Move the multi-get response tests to server Require JDK 9 for compilation (#28071) Revert "[Docs] Fix Java Api index administration usage (#28133)" Revert "[Docs] Fix base directory to include for put_mapping.asciidoc" Added multi get api to the high level rest client. Open engine should keep only starting commit (#28228) [Docs] Clarify numeric datatype ranges (#28240) Add migration guide for 6.1 [Docs] Fix base directory to include for put_mapping.asciidoc [Docs] Fix Java Api index administration usage (#28133) Add 6.1.2 release notes

dnhatn added >bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.2.0 labels Jan 15, 2018

dnhatn requested review from bleskes and ywelsch January 15, 2018 18:09

bleskes suggested changes Jan 15, 2018

View reviewed changes

dnhatn added 3 commits January 15, 2018 17:52

Add comment

f157ed5

Use starting commit for OPEN_INDEX_CREATE_TRANSLOG

11daf09

Add engine tests

5c2853d

dnhatn changed the title ~~Deletion policy should keep only starting commit on init~~ Open engine should delete unsafe index commits Jan 15, 2018

dnhatn changed the title ~~Open engine should delete unsafe index commits~~ Open engine should keep only starting commit Jan 15, 2018

Fix comment

bd544cf

bleskes approved these changes Jan 16, 2018

View reviewed changes

dnhatn added 4 commits January 15, 2018 22:00

Move OPEN_INDEX_CREATE_TRANSLOG case to onInit

64e692a

Add assert for OPEN_INDEX_CREATE_TRANSLOG

a252d0b

Revert "Add assert for OPEN_INDEX_CREATE_TRANSLOG"

97d3825

This reverts commit a252d0b.

Move assert to indexShard

26df28a

dnhatn merged commit 65e9007 into elastic:master Jan 16, 2018

dnhatn deleted the policy-keep-safe-commit-oninit branch January 16, 2018 13:37

dnhatn added the backport pending label Jan 16, 2018

dnhatn removed the backport pending label Jan 16, 2018

dnhatn mentioned this pull request Jan 26, 2018

[Jenkins] Failure in org.elasticsearch.index.engine.InternalEngineTests.testTranslogReplayWithFailure #28167

Closed

bleskes mentioned this pull request Jan 27, 2018

Safe commit is not safe #28405

Closed

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open engine should keep only starting commit #28228

Open engine should keep only starting commit #28228

dnhatn commented Jan 15, 2018 •

edited

Loading

bleskes left a comment

bleskes Jan 15, 2018

bleskes Jan 15, 2018

bleskes Jan 15, 2018

bleskes left a comment

bleskes Jan 16, 2018

bleskes Jan 16, 2018

dnhatn Jan 16, 2018

bleskes Jan 16, 2018

dnhatn Jan 16, 2018

dnhatn commented Jan 16, 2018

Open engine should keep only starting commit #28228

Open engine should keep only starting commit #28228

Conversation

dnhatn commented Jan 15, 2018 • edited Loading

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Jan 16, 2018

dnhatn commented Jan 15, 2018 •

edited

Loading