Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Segment Replication - Change replicas to poll for new segments. #6334

Closed
wants to merge 15 commits into from

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Feb 16, 2023

Description

still working on some tests & cleanup...

This changes Segment Replication flows to use a polling based architecture rather than relying on primaries to publish new checkpoints. The motivation for this change right now is to fix relocation bugs outlined here #6065 (comment). Though there are other benefits not listed on the issue.

This change has several parts, but largely in SegmentReplicationTarget/Source service classes.:

  • Removes SegmentReplicationCheckpointPublisher & related classes.
  • Adds a new Async task in IndexService to trigger a function in IndexShard that performs replication, which is a functional entrypoint in SegmentReplicationTargetService.
  • Updates SegmentReplicationTargetService to trigger segrep off the shard's current state, vs an incoming checkpoint.
  • Rewrites logic in SegmentReplicationSourceService to cache only a single copystate object at a time per shard and update it on primary refresh.

Issues Resolved

closes #6065

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

mch2 added 9 commits February 15, 2023 20:51
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWithQueuedOperationsDuringHandoff

);
}
} catch (AlreadyClosedException ignored) {
if (indexService.getIndexSettings().isSegRepEnabled() == false) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With SR, the Source Service will still hold a reference to its latest CopyState until it is shut down, so skipped this assertion here as it is made before node shutdown.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #6334 (a5b3763) into main (6bb9e3e) will increase coverage by 0.11%.
The diff coverage is 60.94%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6334      +/-   ##
============================================
+ Coverage     70.68%   70.80%   +0.11%     
- Complexity    58994    59058      +64     
============================================
  Files          4800     4797       -3     
  Lines        282427   282394      -33     
  Branches      40717    40717              
============================================
+ Hits         199641   199950     +309     
+ Misses        66378    66038     -340     
+ Partials      16408    16406       -2     
Impacted Files Coverage Δ
...ain/java/org/opensearch/indices/IndicesModule.java 98.14% <ø> (+1.75%) ⬆️
...ensearch/indices/replication/common/CopyState.java 88.00% <ø> (-0.89%) ⬇️
...s/replication/SegmentReplicationTargetService.java 47.29% <32.43%> (-2.71%) ⬇️
...c/main/java/org/opensearch/index/IndexService.java 73.56% <40.00%> (-0.07%) ⬇️
...in/java/org/opensearch/index/shard/IndexShard.java 69.32% <45.45%> (-0.44%) ⬇️
...s/replication/SegmentReplicationSourceService.java 55.69% <60.71%> (-3.99%) ⬇️
.../java/org/opensearch/test/InternalTestCluster.java 58.16% <66.66%> (-0.45%) ⬇️
...ndices/replication/OngoingSegmentReplications.java 81.81% <73.17%> (-11.24%) ⬇️
...nsearch/index/shard/CheckpointRefreshListener.java 88.88% <75.00%> (ø)
...in/java/org/opensearch/indices/IndicesService.java 63.27% <100.00%> (-0.29%) ⬇️
... and 472 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

mch2 added 2 commits February 17, 2023 00:55
…pyState map before close.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@mch2 mch2 closed this Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] failing IT test : SegmentReplicationRelocationIT
2 participants