-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled #5332
Conversation
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #5332 +/- ##
============================================
- Coverage 71.06% 70.93% -0.14%
+ Complexity 58136 58092 -44
============================================
Files 4704 4704
Lines 277244 277270 +26
Branches 40137 40142 +5
============================================
- Hits 197025 196669 -356
- Misses 64095 64544 +449
+ Partials 16124 16057 -67
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
for (int i = 0; i < 10; i++) { | ||
client().prepareIndex(INDEX_NAME).setId(Integer.toString(i)).setSource("field", "value" + i).execute().actionGet(); | ||
} | ||
logger.info("--> flush so we have an actual index"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean by actual index
-> index/segment files on disk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let me change the terminology here. actual index
might be confusing.
*/ | ||
public void testAddNewReplica() throws Exception { | ||
logger.info("--> starting [node1] ..."); | ||
final String node_1 = internalCluster().startNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Usually I find it better to call nodes by their role. This makes it easier to understand when we perform any node specific actions (e.g. restart(primary), stop (replica) etc). Otherwise, we need to look back when node_i was created and its role.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense. I will rename both nodes accordingly
// is marked as Started. | ||
if (indexShard.indexSettings().isSegRepEnabled() | ||
&& shardRouting.primary() == false | ||
&& ShardRoutingState.RELOCATING != shardRouting.state()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this condition be ShardRoutingState.STARTED == shardRouting.state()
? Existing condition applies for UNASSIGNED and INITIALIED shards, is that correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think ShardRoutingState.RELOCATING != shardRouting.state()
is an edge case check we are doing, so that relocating shard doesn't receive any checkpoints.
For ShardRoutingState.STARTED == shardRouting.state()
this check will be false at this point, because we are performing a round of replication before marking shard as STARTED. So, shard routing will never be in STARTED state at this point.
Yes existing conditions works for INITIALIZED shard routing state. ShardRoutingState.INITIALIZED
will be shard routing state at this point. Not sure if shard routing state will be UNASSIGNED, after peer recovery is completed usually shard routing will be in INITIALIZED state.
) | ||
); | ||
if (sendShardFailure == true) { | ||
logger.error("replication failure", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These are logged at debug level on failShard call. May be we can remove it from here
Minor, but I would change the commit message/PR title to explain what you've done, as opposed to the side effect you're fixing. Something like "Trigger a round of replication during recovery" or whatever makes sense. In the description you can describe the bug you're fixing and any other details, but the message header should be a clear and concise description of what is changed. |
Thanks @andrross for pointing out. Sure, what you said makes sense. I will update the commit message and PR title. |
…ication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Merge branch 'seg-rep/force-replication' of https://github.com/Rishikesh1159/OpenSearch into seg-rep/force-replication
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Rishikesh1159 for this quick fix. LGTM!
IndexShard indexShard = (IndexShard) indexService.getShardOrNull(shardRouting.id()); | ||
// For Segment Replication enabled indices, we want replica shards to start a replication event to fetch latest segments before it | ||
// is marked as Started. | ||
if (indexShard.indexSettings().isSegRepEnabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need a null check here given you are invoking getShardOrNull
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Sure, I will add null check.
final String primary = internalCluster().startNode(); | ||
|
||
logger.info("--> creating test index ..."); | ||
prepareCreate(INDEX_NAME, Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 1)).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the actual settings instead of strings - IndexMetadata.SETTING_NUMBER_OF_SHARDS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure
* We don't perform any refresh on index and assert new replica shard on doc hit count. | ||
* This test makes sure that when a new replica is added to an existing cluster it gets all latest segments from primary even without a refresh. | ||
*/ | ||
public void testAddNewReplica() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is very similar to testStartReplicaAfterPrimaryIndexesDocs
, can we reuse that test? That test currently indexes a doc after the replica is recovered to force another round of replication, but you could assert the doc count is sync'd on line 412 after ensureGreen().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you right. Let me see if we can reuse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few small changes required here - particularly the null check in handleRecoveryDone
IndexShard indexShard = (IndexShard) indexService.getShardOrNull(shardRouting.id()); | ||
// For Segment Replication enabled indices, we want replica shards to start a replication event to fetch latest segments before it | ||
// is marked as Started. | ||
if (indexShard.indexSettings().isSegRepEnabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also read the setting from indexSettings before fetching a reference to the IndexShard.
indexService.getIndexSettings().isSegRepEnabled()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can add that
); | ||
if (sendShardFailure == true) { | ||
logger.error("replication failure", e); | ||
indexShard.failShard("replication failure", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can reuse handleRecoveryFailure
here instead of this added block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err sorry I'm off here, we'll need both indexShard.failShard("replication failure", e);
that fails the engine, followed by handleRecoveryFailure
which removes the shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note - could you pls add test here for the failure case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is important. Thanks for catching this. I will update it and an unit/integ test for failure case.
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a nit so approving. Thanks for this change.
); | ||
} | ||
} else { | ||
shardStateAction.shardStarted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - this is now invoked 3x. You could clean this up by using a StepListener that when completes marks the shard started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mch2 sure I can do that
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5332-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0cf67979064c6c8be95299911db0d1bf1ea5ed68
# Push it to GitHub
git push --set-upstream origin backport/backport-5332-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
…ds during peer recovery when segment replication is enabled (#5332) * Fix new added replica shards falling behind primary. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Trigger a round of replication during peer recovery when segment replication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary start replication overloaded method. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add test for failure case and refactor some code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments on the PR. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary condition check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add step listeners to resolve forcing round of segment replication. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
…ature/identity (#5581) * Fix flaky ShardIndexingPressureConcurrentExecutionTests (#5439) Add conditional check on assertNull to fix flaky tests. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Fix bwc for cluster manager throttling settings (#5305) Signed-off-by: Dhwanil Patel <dhwanip@amazon.com> * Update ingest-attachment plugin dependencies: Apache Tika 3.6.0, Apache Mime4j 0.8.8, Apache Poi 5.2.3, Apache PdfBox 2.0.27 (#5448) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Enhance CheckpointState to support no-op replication (#5282) * CheckpointState enhanced to support no-op replication Signed-off-by: Ashish Singh <ssashish@amazon.com> Co-authored-by: Bukhtawar Khan<bukhtawa@amazon.com> * [BUG] org.opensearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky: randomizing basePath (#5482) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * [Bug] fix case sensitivity for wildcard queries (#5462) Fixes the wildcard query to not normalize the pattern when case_insensitive is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior. Signed-off-by: Nicholas Walter Knize <nknize@apache.org> * Support OpenSSL Provider with default Netty allocator (#5460) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Revert "build no-jdk distributions as part of release build (#4902)" (#5465) This reverts commit 8c9ca4e. It seems that this wasn't entirely the correct way and is currently blocking us from removing the `build.sh` from the `opensearch-build` repository (i.e. this `build.sh` here is not yet being used). See the discussion in opensearch-project/opensearch-build#2835 for further details. Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com> Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com> * Add max_shard_size parameter for Shrink API (fix supported version after backport) (#5503) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Sync CODEOWNERS with MAINTAINERS. (#5501) Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Added jackson dependency to server (#5366) * Added jackson dependency to server Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Updated CHANGELOG Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update build.gradle files Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Add RuntimePermission to fix errors Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix flaky test BulkIntegrationIT.testDeleteIndexWhileIndexing (#5491) Signed-off-by: Poojita Raj <poojiraj@amazon.com> Signed-off-by: Poojita Raj <poojiraj@amazon.com> * Add release notes for 2.4.1 (#5488) Signed-off-by: Xue Zhou <xuezhou@amazon.com> Signed-off-by: Xue Zhou <xuezhou@amazon.com> * Properly skip OnDemandBlockSnapshotIndexInputTests.testVariousBlockSize on Windows. (#5511) PR #5397 skipped this test in @before block but still frequently throws a TestCouldNotBeSkippedException. This is caused by the after block still executing and throwing an exception while cleaning the directory created at the path in @before. Moving the assumption to the individual test prevents this exception by ensuring the path exists. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> * Merge first batch of feature/extensions into main (#5347) * Merge first batch of feature/extensions into main Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fixed CHANGELOG Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fixed newline errors Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Renaming and CHANGELOG fixes Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Refactor extension loading into private method Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed skipValidation and added connectToExtensionNode method Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove unnecessary feature flag calls Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Renaming and exception handling Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Change latches to CompletableFuture Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed unnecessary validateSettingKey call Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix azure-core dependency Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update SHAs Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove unintended dependency changes Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed dynamic settings regitration, removed info() method, and added NoopExtensionsManager Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Add javadoc Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fixed spotless failure Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed NoopExtensionsManager Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Added functioning NoopExtensionsManager Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Added missing javadoc Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove forbiddenAPI Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix spotless Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Change logger.info to logger.error in handleException Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Fix ExtensionsManagerTests Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removing unrelated change Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update SHAs Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Bump commons-compress from 1.21 to 1.22 (#5520) Bumps commons-compress from 1.21 to 1.22. --- updated-dependencies: - dependency-name: org.apache.commons:commons-compress dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled (#5332) * Fix new added replica shards falling behind primary. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Trigger a round of replication during peer recovery when segment replication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary start replication overloaded method. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add test for failure case and refactor some code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments on the PR. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary condition check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add step listeners to resolve forcing round of segment replication. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Adding support to register settings dynamically (#5495) * Adding support to register settings dynamically Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Update CHANGELOG Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Removed unnecessary registerSetting methods Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Change setting registration order Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Add unregisterSettings method Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Remove unnecessary feature flag Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> * Updated 1.3.7 release notes date (#5536) Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com> Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com> * Pre conditions check before updating weighted routing metadata (#4955) * Pre conditions check to allow weight updates for non decommissioned attribute Signed-off-by: Rishab Nahata <rnnahata@amazon.com> * Atomically update cluster state with decommission status and corresponding action (#5093) * Atomically update the cluster state with decommission status and its corresponding action in the same execute call Signed-off-by: Rishab Nahata <rnnahata@amazon.com> * Update Netty to 4.1.86.Final (#5529) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Update release date in 2.4.1 release notes (#5549) Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> * Update 2.4.1 release notes (#5552) Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> * Refactor fuzziness interface on query builders (#5433) * Refactor Object to Fuzziness type for all query builders Signed-off-by: noCharger <lingzhichu.clz@gmail.com> * Revise on bwc Signed-off-by: noCharger <lingzhichu.clz@gmail.com> * Update change log Signed-off-by: noCharger <lingzhichu.clz@gmail.com> Signed-off-by: noCharger <lingzhichu.clz@gmail.com> Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Upgrade lucene version (#5570) * Added bwc version 2.4.2 Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Added 2.4.2. Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Update Lucene snapshot to 9.5.0-snapshot-d5cef1c Signed-off-by: Suraj Singh <surajrider@gmail.com> * Update changelog entry Signed-off-by: Suraj Singh <surajrider@gmail.com> * Add 2.4.2 bwc version Signed-off-by: Suraj Singh <surajrider@gmail.com> * Internal changes post lucene upgrade Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com> Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> * Add CI bundle pattern to distribution download (#5348) * Add CI bundle pattern for ivy repo Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Gradle update Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Extract path Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Change with customDistributionDownloadType Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Add default for exception handle Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Add documentations Signed-off-by: Zelin Hao <zelinhao@amazon.com> Signed-off-by: Zelin Hao <zelinhao@amazon.com> * Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs (#5519) * Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs Bumps [protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.21.9 to 3.21.11. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py) - [Commits](protocolbuffers/protobuf@v3.21.9...v3.21.11) --- updated-dependencies: - dependency-name: com.google.protobuf:protobuf-java dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Updating SHAs Signed-off-by: dependabot[bot] <support@github.com> * Updated changelog Signed-off-by: Owais Kazi <owaiskazi19@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Dhwanil Patel <dhwanip@amazon.com> Signed-off-by: Andriy Redko <andriy.redko@aiven.io> Signed-off-by: Ashish Singh <ssashish@amazon.com> Signed-off-by: Nicholas Walter Knize <nknize@apache.org> Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com> Signed-off-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Signed-off-by: Ryan Bogan <rbogan@amazon.com> Signed-off-by: Poojita Raj <poojiraj@amazon.com> Signed-off-by: Xue Zhou <xuezhou@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: owaiskazi19 <owaiskazi19@gmail.com> Signed-off-by: Rishab Nahata <rnnahata@amazon.com> Signed-off-by: Suraj Singh <surajrider@gmail.com> Signed-off-by: noCharger <lingzhichu.clz@gmail.com> Signed-off-by: Zelin Hao <zelinhao@amazon.com> Signed-off-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: Rishikesh Pasham <62345295+Rishikesh1159@users.noreply.github.com> Co-authored-by: Dhwanil Patel <dhwanip@amazon.com> Co-authored-by: Andriy Redko <andriy.redko@aiven.io> Co-authored-by: Ashish <ssashish@amazon.com> Co-authored-by: Nick Knize <nknize@apache.org> Co-authored-by: Ralph Ursprung <39383228+rursprung@users.noreply.github.com> Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com> Co-authored-by: Ryan Bogan <10944539+ryanbogan@users.noreply.github.com> Co-authored-by: Poojita Raj <poojiraj@amazon.com> Co-authored-by: Xue Zhou <85715413+xuezhou25@users.noreply.github.com> Co-authored-by: Marc Handalian <handalm@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Owais Kazi <owaiskazi19@gmail.com> Co-authored-by: Rishab Nahata <rnnahata@amazon.com> Co-authored-by: Suraj Singh <surajrider@gmail.com> Co-authored-by: Louis Chu <lingzhichu.clz@gmail.com> Co-authored-by: opensearch-ci-bot <opensearch-ci-bot@users.noreply.github.com> Co-authored-by: Zelin Hao <87548827+zelinh@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
…ds during peer recovery when segment replication is enabled (opensearch-project#5332) * Fix new added replica shards falling behind primary. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Trigger a round of replication during peer recovery when segment replication is enabled. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary start replication overloaded method. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add test for failure case and refactor some code. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Addressing comments on the PR. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Remove unnecessary condition check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Apply spotless check. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> * Add step listeners to resolve forcing round of segment replication. Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com> Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Description
This PR adds logic of triggering a round of replication during peer recovery before shard is marked as STARTED. It fixes the bug of newly added replica shards falling behind primary shard until an operation is performed on index when segment replication is enabled. More detail about bug is present on issue #5313.
Solution used to fix bug
With segment replication enabled when a new replica is added to cluster it goes through peer recovery process. During this recovery process after peer recovery process is completed and before replica shard is marked as STARTED, we are triggering a replication event from replica to copy all latest segment from primary shard.
Issues Resolved
Resolves #5313
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.