[Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5332

Rishikesh1159 · 2022-11-22T03:53:37Z

Description

This PR adds logic of triggering a round of replication during peer recovery before shard is marked as STARTED. It fixes the bug of newly added replica shards falling behind primary shard until an operation is performed on index when segment replication is enabled. More detail about bug is present on issue #5313.

Solution used to fix bug

With segment replication enabled when a new replica is added to cluster it goes through peer recovery process. During this recovery process after peer recovery process is completed and before replica shard is marked as STARTED, we are triggering a replication event from replica to copy all latest segment from primary shard.

Issues Resolved

Resolves #5313

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2022-11-22T04:24:25Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testCoordinatingPrimaryThreadedUpdateToShardLimitsAndRejections

URL: https://build.ci.opensearch.org/job/gradle-check/7202/
CommitID: f75cad8
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov-commenter · 2022-11-22T04:26:36Z

Codecov Report

Merging #5332 (3e5a7c5) into main (438369c) will decrease coverage by 0.13%.
The diff coverage is 14.81%.

@@             Coverage Diff              @@
##               main    #5332      +/-   ##
============================================
- Coverage     71.06%   70.93%   -0.14%     
+ Complexity    58136    58092      -44     
============================================
  Files          4704     4704              
  Lines        277244   277270      +26     
  Branches      40137    40142       +5     
============================================
- Hits         197025   196669     -356     
- Misses        64095    64544     +449     
+ Partials      16124    16057      -67

Impacted Files	Coverage Δ
...s/replication/SegmentReplicationTargetService.java	`62.26% <0.00%> (-0.60%)`	⬇️
...ch/indices/cluster/IndicesClusterStateService.java	`64.98% <15.38%> (-2.63%)`	⬇️
...n/indices/forcemerge/ForceMergeRequestBuilder.java	`0.00% <0.00%> (-75.00%)`	⬇️
.../indices/forcemerge/TransportForceMergeAction.java	`25.00% <0.00%> (-75.00%)`	⬇️
...adonly/AddIndexBlockClusterStateUpdateRequest.java	`0.00% <0.00%> (-75.00%)`	⬇️
...pensearch/client/cluster/RemoteConnectionInfo.java	`0.00% <0.00%> (-73.18%)`	⬇️
...a/org/opensearch/client/cluster/SniffModeInfo.java	`0.00% <0.00%> (-58.83%)`	⬇️
...readonly/TransportVerifyShardIndexBlockAction.java	`9.75% <0.00%> (-58.54%)`	⬇️
...a/org/opensearch/client/cluster/ProxyModeInfo.java	`0.00% <0.00%> (-55.00%)`	⬇️
...n/admin/indices/readonly/AddIndexBlockRequest.java	`17.85% <0.00%> (-53.58%)`	⬇️
... and 486 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

dreamer-89 · 2022-11-22T16:00:44Z

...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java

+        for (int i = 0; i < 10; i++) {
+            client().prepareIndex(INDEX_NAME).setId(Integer.toString(i)).setSource("field", "value" + i).execute().actionGet();
+        }
+        logger.info("--> flush so we have an actual index");


Did you mean by actual index -> index/segment files on disk ?

Yes, let me change the terminology here. actual index might be confusing.

dreamer-89 · 2022-11-22T16:03:26Z

...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java

+     */
+    public void testAddNewReplica() throws Exception {
+        logger.info("--> starting [node1] ...");
+        final String node_1 = internalCluster().startNode();


nit: Usually I find it better to call nodes by their role. This makes it easier to understand when we perform any node specific actions (e.g. restart(primary), stop (replica) etc). Otherwise, we need to look back when node_i was created and its role.

Sure, makes sense. I will rename both nodes accordingly

dreamer-89 · 2022-11-22T16:04:08Z

server/src/main/java/org/opensearch/indices/cluster/IndicesClusterStateService.java

+        // is marked as Started.
+        if (indexShard.indexSettings().isSegRepEnabled()
+            && shardRouting.primary() == false
+            && ShardRoutingState.RELOCATING != shardRouting.state()) {


Should this condition be ShardRoutingState.STARTED == shardRouting.state() ? Existing condition applies for UNASSIGNED and INITIALIED shards, is that correct ?

No, I think ShardRoutingState.RELOCATING != shardRouting.state() is an edge case check we are doing, so that relocating shard doesn't receive any checkpoints.

For ShardRoutingState.STARTED == shardRouting.state() this check will be false at this point, because we are performing a round of replication before marking shard as STARTED. So, shard routing will never be in STARTED state at this point.

Yes existing conditions works for INITIALIZED shard routing state. ShardRoutingState.INITIALIZED will be shard routing state at this point. Not sure if shard routing state will be UNASSIGNED, after peer recovery is completed usually shard routing will be in INITIALIZED state.

dreamer-89 · 2022-11-22T16:05:32Z

server/src/main/java/org/opensearch/indices/cluster/IndicesClusterStateService.java

+                        )
+                    );
+                    if (sendShardFailure == true) {
+                        logger.error("replication failure", e);


nit: These are logged at debug level on failShard call. May be we can remove it from here

andrross · 2022-11-22T18:30:21Z

Minor, but I would change the commit message/PR title to explain what you've done, as opposed to the side effect you're fixing. Something like "Trigger a round of replication during recovery" or whatever makes sense. In the description you can describe the bug you're fixing and any other details, but the message header should be a clear and concise description of what is changed.

Rishikesh1159 · 2022-11-22T18:41:35Z

Minor, but I would change the commit message/PR title to explain what you've done, as opposed to the side effect you're fixing. Something like "Trigger a round of replication during recovery" or whatever makes sense. In the description you can describe the bug you're fixing and any other details, but the message header should be a clear and concise description of what is changed.

Thanks @andrross for pointing out. Sure, what you said makes sense. I will update the commit message and PR title.

…ication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Merge branch 'seg-rep/force-replication' of https://github.com/Rishikesh1159/OpenSearch into seg-rep/force-replication

github-actions · 2022-11-22T20:33:31Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all}

URL: https://build.ci.opensearch.org/job/gradle-check/7231/
CommitID: 3e5a7c5
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2022-11-23T00:12:51Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/7252/
CommitID: d50a9e0

dreamer-89

Thanks @Rishikesh1159 for this quick fix. LGTM!

mch2 · 2022-11-23T18:00:19Z

server/src/main/java/org/opensearch/indices/cluster/IndicesClusterStateService.java

+        IndexShard indexShard = (IndexShard) indexService.getShardOrNull(shardRouting.id());
+        // For Segment Replication enabled indices, we want replica shards to start a replication event to fetch latest segments before it
+        // is marked as Started.
+        if (indexShard.indexSettings().isSegRepEnabled()


You will need a null check here given you are invoking getShardOrNull above.

Thanks for pointing this out. Sure, I will add null check.

mch2 · 2022-11-23T18:03:17Z

...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java

+        final String primary = internalCluster().startNode();
+
+        logger.info("--> creating test index ...");
+        prepareCreate(INDEX_NAME, Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 1)).get();


Please use the actual settings instead of strings - IndexMetadata.SETTING_NUMBER_OF_SHARDS

mch2 · 2022-11-23T18:08:02Z

...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java

+     * We don't perform any refresh on index and assert new replica shard on doc hit count.
+     * This test makes sure that when a new replica is added to an existing cluster it gets all latest segments from primary even without a refresh.
+     */
+    public void testAddNewReplica() throws Exception {


This test is very similar to testStartReplicaAfterPrimaryIndexesDocs, can we reuse that test? That test currently indexes a doc after the replica is recovered to force another round of replication, but you could assert the doc count is sync'd on line 412 after ensureGreen().

Yes, I think you right. Let me see if we can reuse it

mch2

a few small changes required here - particularly the null check in handleRecoveryDone

mch2 · 2022-11-23T18:17:10Z

server/src/main/java/org/opensearch/indices/cluster/IndicesClusterStateService.java

+        IndexShard indexShard = (IndexShard) indexService.getShardOrNull(shardRouting.id());
+        // For Segment Replication enabled indices, we want replica shards to start a replication event to fetch latest segments before it
+        // is marked as Started.
+        if (indexShard.indexSettings().isSegRepEnabled()


You could also read the setting from indexSettings before fetching a reference to the IndexShard.

indexService.getIndexSettings().isSegRepEnabled().

Sure I can add that

mch2 · 2022-11-23T18:19:43Z

server/src/main/java/org/opensearch/indices/cluster/IndicesClusterStateService.java

+                        );
+                        if (sendShardFailure == true) {
+                            logger.error("replication failure", e);
+                            indexShard.failShard("replication failure", e);


I think we can reuse handleRecoveryFailure here instead of this added block.

Err sorry I'm off here, we'll need both indexShard.failShard("replication failure", e); that fails the engine, followed by handleRecoveryFailure which removes the shard.

On that note - could you pls add test here for the failure case?

Yes this is important. Thanks for catching this. I will update it and an unit/integ test for failure case.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2022-12-02T16:27:36Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/7548/
CommitID: db2532c
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2022-12-02T16:39:44Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/7549/
CommitID: 0f56401
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2022-12-02T17:05:56Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/7550/
CommitID: 0f56401
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2022-12-09T21:00:31Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/7858/
CommitID: 3d08064
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2022-12-09T21:15:24Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/7865/
CommitID: 3471fcc
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2022-12-09T21:20:57Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/7866/
CommitID: 127c7a4
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2022-12-09T23:10:09Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/7878/
CommitID: 527abb6

mch2

Only a nit so approving. Thanks for this change.

mch2 · 2022-12-12T14:27:53Z

server/src/main/java/org/opensearch/indices/cluster/IndicesClusterStateService.java

+                );
+            }
+        } else {
+            shardStateAction.shardStarted(


Nit - this is now invoked 3x. You could clean this up by using a StepListener that when completes marks the shard started.

Thanks @mch2 sure I can do that

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

github-actions · 2022-12-12T15:35:12Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/7935/
CommitID: 18f40a1

opensearch-trigger-bot · 2022-12-12T15:38:23Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5332-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0cf67979064c6c8be95299911db0d1bf1ea5ed68
# Push it to GitHub
git push --set-upstream origin backport/backport-5332-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5332-to-2.x.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

…ds during peer recovery when segment replication is enabled (#5332) * Fix new added replica shards falling behind primary. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Trigger a round of replication during peer recovery when segment replication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary start replication overloaded method. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add test for failure case and refactor some code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments on the PR. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary condition check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add step listeners to resolve forcing round of segment replication. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

…ature/identity (#5581) * Fix flaky ShardIndexingPressureConcurrentExecutionTests (#5439) Add conditional check on assertNull to fix flaky tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix bwc for cluster manager throttling settings (#5305) Signed-off-by: Dhwanil Patel <dhwanip@amazon.com> * Update ingest-attachment plugin dependencies: Apache Tika 3.6.0, Apache Mime4j 0.8.8, Apache Poi 5.2.3, Apache PdfBox 2.0.27 (#5448) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Enhance CheckpointState to support no-op replication (#5282) * CheckpointState enhanced to support no-op replication Signed-off-by: Ashish Singh <ssashish@amazon.com> Co-authored-by: Bukhtawar Khan<bukhtawa@amazon.com> * [BUG] org.opensearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky: randomizing basePath (#5482) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * [Bug] fix case sensitivity for wildcard queries (#5462) Fixes the wildcard query to not normalize the pattern when case_insensitive is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior. Signed-off-by: Nicholas Walter Knize <nknize@apache.org> * Support OpenSSL Provider with default Netty allocator (#5460) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Revert "build no-jdk distributions as part of release build (#4902)" (#5465) This reverts commit 8c9ca4e. It seems that this wasn't entirely the correct way and is currently blocking us from removing the `build.sh` from the `opensearch-build` repository (i.e. this `build.sh` here is not yet being used). See the discussion in opensearch-project/opensearch-build#2835 for further details. Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com> Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com> * Add max_shard_size parameter for Shrink API (fix supported version after backport) (#5503) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Sync CODEOWNERS with MAINTAINERS. (#5501) Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Added jackson dependency to server (#5366) * Added jackson dependency to server Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Updated CHANGELOG Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update build.gradle files Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Add RuntimePermission to fix errors Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix flaky test BulkIntegrationIT.testDeleteIndexWhileIndexing (#5491) Signed-off-by: Poojita Raj <poojiraj@amazon.com> Signed-off-by: Poojita Raj <poojiraj@amazon.com> * Add release notes for 2.4.1 (#5488) Signed-off-by: Xue Zhou <xuezhou@amazon.com> Signed-off-by: Xue Zhou <xuezhou@amazon.com> * Properly skip OnDemandBlockSnapshotIndexInputTests.testVariousBlockSize on Windows. (#5511) PR #5397 skipped this test in @before block but still frequently throws a TestCouldNotBeSkippedException. This is caused by the after block still executing and throwing an exception while cleaning the directory created at the path in @before. Moving the assumption to the individual test prevents this exception by ensuring the path exists. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> * Merge first batch of feature/extensions into main (#5347) * Merge first batch of feature/extensions into main Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fixed CHANGELOG Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fixed newline errors Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Renaming and CHANGELOG fixes Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Refactor extension loading into private method Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed skipValidation and added connectToExtensionNode method Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove unnecessary feature flag calls Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Renaming and exception handling Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Change latches to CompletableFuture Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed unnecessary validateSettingKey call Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix azure-core dependency Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update SHAs Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove unintended dependency changes Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed dynamic settings regitration, removed info() method, and added NoopExtensionsManager Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Add javadoc Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fixed spotless failure Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed NoopExtensionsManager Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Added functioning NoopExtensionsManager Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Added missing javadoc Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove forbiddenAPI Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix spotless Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Change logger.info to logger.error in handleException Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix ExtensionsManagerTests Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removing unrelated change Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update SHAs Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Bump commons-compress from 1.21 to 1.22 (#5520) Bumps commons-compress from 1.21 to 1.22. --- updated-dependencies: - dependency-name: org.apache.commons:commons-compress dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled (#5332) * Fix new added replica shards falling behind primary. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Trigger a round of replication during peer recovery when segment replication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary start replication overloaded method. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add test for failure case and refactor some code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments on the PR. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary condition check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add step listeners to resolve forcing round of segment replication. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding support to register settings dynamically (#5495) * Adding support to register settings dynamically Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update CHANGELOG Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed unnecessary registerSetting methods Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Change setting registration order Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Add unregisterSettings method Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove unnecessary feature flag Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Updated 1.3.7 release notes date (#5536) Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com> Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com> * Pre conditions check before updating weighted routing metadata (#4955) * Pre conditions check to allow weight updates for non decommissioned attribute Signed-off-by: Rishab Nahata <rnnahata@amazon.com> * Atomically update cluster state with decommission status and corresponding action (#5093) * Atomically update the cluster state with decommission status and its corresponding action in the same execute call Signed-off-by: Rishab Nahata <rnnahata@amazon.com> * Update Netty to 4.1.86.Final (#5529) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Update release date in 2.4.1 release notes (#5549) Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> * Update 2.4.1 release notes (#5552) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Refactor fuzziness interface on query builders (#5433) * Refactor Object to Fuzziness type for all query builders Signed-off-by: noCharger <lingzhichu.clz@gmail.com> * Revise on bwc Signed-off-by: noCharger <lingzhichu.clz@gmail.com> * Update change log Signed-off-by: noCharger <lingzhichu.clz@gmail.com> Signed-off-by: noCharger <lingzhichu.clz@gmail.com> Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Upgrade lucene version (#5570) * Added bwc version 2.4.2 Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Added 2.4.2. Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Update Lucene snapshot to 9.5.0-snapshot-d5cef1c Signed-off-by: Suraj Singh <surajrider@gmail.com> * Update changelog entry Signed-off-by: Suraj Singh <surajrider@gmail.com> * Add 2.4.2 bwc version Signed-off-by: Suraj Singh <surajrider@gmail.com> * Internal changes post lucene upgrade Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com> Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Add CI bundle pattern to distribution download (#5348) * Add CI bundle pattern for ivy repo Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Gradle update Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Extract path Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Change with customDistributionDownloadType Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Add default for exception handle Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Add documentations Signed-off-by: Zelin Hao <zelinhao@amazon.com> Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs (#5519) * Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs Bumps [protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.21.9 to 3.21.11. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py) - [Commits](protocolbuffers/protobuf@v3.21.9...v3.21.11) --- updated-dependencies: - dependency-name: com.google.protobuf:protobuf-java dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Updating SHAs Signed-off-by: dependabot[bot] <support@github.com> * Updated changelog Signed-off-by: Owais Kazi <owaiskazi19@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Dhwanil Patel <dhwanip@amazon.com> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Ashish Singh <ssashish@amazon.com> Signed-off-by: Nicholas Walter Knize <nknize@apache.org> Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com> Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Poojita Raj <poojiraj@amazon.com> Signed-off-by: Xue Zhou <xuezhou@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com> Signed-off-by: Rishab Nahata <rnnahata@amazon.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: noCharger <lingzhichu.clz@gmail.com> Signed-off-by: Zelin Hao <zelinhao@amazon.com> Signed-off-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: Rishikesh Pasham <62345295+Rishikesh1159@users.noreply.github.com> Co-authored-by: Dhwanil Patel <dhwanip@amazon.com> Co-authored-by: Andriy Redko <andriy.redko@aiven.io> Co-authored-by: Ashish <ssashish@amazon.com> Co-authored-by: Nick Knize <nknize@apache.org> Co-authored-by: Ralph Ursprung <39383228+rursprung@users.noreply.github.com> Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Co-authored-by: Ryan Bogan <10944539+ryanbogan@users.noreply.github.com> Co-authored-by: Poojita Raj <poojiraj@amazon.com> Co-authored-by: Xue Zhou <85715413+xuezhou25@users.noreply.github.com> Co-authored-by: Marc Handalian <handalm@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: Rishab Nahata <rnnahata@amazon.com> Co-authored-by: Suraj Singh <surajrider@gmail.com> Co-authored-by: Louis Chu <lingzhichu.clz@gmail.com> Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com> Co-authored-by: Zelin Hao <87548827+zelinh@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>

…ds during peer recovery when segment replication is enabled (opensearch-project#5332) * Fix new added replica shards falling behind primary. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Trigger a round of replication during peer recovery when segment replication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary start replication overloaded method. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add test for failure case and refactor some code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments on the PR. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary condition check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add step listeners to resolve forcing round of segment replication. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 and others added 2 commits November 22, 2022 03:38

Fix new added replica shards falling behind primary.

f3783f8

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Merge branch 'opensearch-project:main' into seg-rep/force-replication

f75cad8

Rishikesh1159 requested review from a team and reta as code owners November 22, 2022 03:53

Rishikesh1159 mentioned this pull request Nov 22, 2022

[BUG][Segment Replication] With Segment Replication enabled new Replica shards are falling behind Primary until an operation happens on index #5313

Closed

dreamer-89 reviewed Nov 22, 2022

View reviewed changes

Rishikesh1159 added 2 commits November 22, 2022 20:01

Trigger a round of replication during peer recovery when segment repl…

f93ed6f

…ication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

3e5a7c5

Merge branch 'seg-rep/force-replication' of https://github.com/Rishikesh1159/OpenSearch into seg-rep/force-replication

Rishikesh1159 changed the title ~~[Segment Replication] Fix bug of newly added replica shards falling behind primary when segment replication is enabled~~ [Segment Replication] Trigger a round of replication during peer recovery when segment replication is enabled Nov 22, 2022

Rishikesh1159 added the skip-changelog label Nov 22, 2022

Rishikesh1159 changed the title ~~[Segment Replication] Trigger a round of replication during peer recovery when segment replication is enabled~~ [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled Nov 22, 2022

Remove unnecessary start replication overloaded method.

d50a9e0

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

dreamer-89 approved these changes Nov 23, 2022

View reviewed changes

dblock requested a review from mch2 November 23, 2022 16:15

mch2 reviewed Nov 23, 2022

View reviewed changes

mch2 requested changes Nov 23, 2022

View reviewed changes

mch2 reviewed Nov 23, 2022

View reviewed changes

Rishikesh1159 added 2 commits December 2, 2022 15:58

Add test for failure case and refactor some code.

db2532c

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Apply spotless check.

0f56401

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Merge branch 'opensearch-project:main' into seg-rep/force-replication

b964573

Rishikesh1159 requested review from nknize, owaiskazi19, adnapibar, ryanbogan, saratvemulapalli, shwetathareja, tlfeng, VachaShah and xuezhou25 as code owners December 9, 2022 20:37

Remove unnecessary condition check.

3471fcc

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Apply spotless check.

127c7a4

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Merge branch 'opensearch-project:main' into seg-rep/force-replication

527abb6

mch2 approved these changes Dec 12, 2022

View reviewed changes

Add step listeners to resolve forcing round of segment replication.

18f40a1

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 merged commit 0cf6797 into opensearch-project:main Dec 12, 2022

Rishikesh1159 added the backport 2.x Backport to 2.x branch label Dec 12, 2022

Rishikesh1159 added a commit to Rishikesh1159/OpenSearch that referenced this pull request Dec 12, 2022

Backport opensearch-project#5332

d09e300

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Rishikesh1159 mentioned this pull request Dec 12, 2022

[Backport 2.x] [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5533

Merged

6 tasks

Rishikesh1159 added a commit that referenced this pull request Dec 13, 2022

Backport #5332 (#5533)

ae7b665

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

mch2 mentioned this pull request Jan 6, 2023

[Segment Replication] Trigger segment copy after initial recovery. #4036

Closed

dreamer-89 mentioned this pull request Jan 7, 2023

[Segment Replication] For replica recovery, force segment replication sync from peer recovery source #5746

Merged

6 tasks

[Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5332

[Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5332

Conversation

Rishikesh1159 commented Nov 22, 2022 • edited Loading

Description

Solution used to fix bug

Issues Resolved

Check List

github-actions bot commented Nov 22, 2022

Gradle Check (Jenkins) Run Completed with:

codecov-commenter commented Nov 22, 2022 • edited Loading

Codecov Report

dreamer-89 Nov 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrross commented Nov 22, 2022

Rishikesh1159 commented Nov 22, 2022

github-actions bot commented Nov 22, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Nov 23, 2022

Gradle Check (Jenkins) Run Completed with:

dreamer-89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rishikesh1159 Nov 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mch2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 2, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Dec 2, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Dec 2, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Dec 9, 2022

Gradle Check (Jenkins) Run Completed with:

mch2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 12, 2022

Gradle Check (Jenkins) Run Completed with:

opensearch-trigger-bot bot commented Dec 12, 2022

Rishikesh1159 commented Nov 22, 2022 •

edited

Loading

codecov-commenter commented Nov 22, 2022 •

edited

Loading

dreamer-89 Nov 22, 2022 •

edited

Loading

Rishikesh1159 Nov 23, 2022 •

edited

Loading