Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Consider primary shard balancing first #6325

Conversation

dreamer-89
Copy link
Member

@dreamer-89 dreamer-89 commented Feb 15, 2023

Description

This change considers primary shards first for rebalancing operation versus any random shard; that ensures more fairer primary distribution. This change allows nodes to converge towards a more balanced primary distribution versus skewed distribution though still balanced. This rebalancing logic considers shards of one index at a time starting with most unbalanced first, thus this change is non-intrusive.

Note: The primary shard distribution can still remain unbalanced in below cases:

  1. Multiple indices. Rebalancing logic runs balancing logic on one index at a time. This can result in nodes balanced specific to index but when combined. E.g. N1, N2 are two nodes, I1P1, I2P1 are primary shards of index I1 and I2, while I1R1 and I2R1 are replica shards. Below distribution is balanced considering I1, I2 separately but not when combining both.
    N1 N2
    I1P1 I1R1
    I2P1 I2R1

  2. Failover. During failover the replicas are immediately promoted as primary which results in primaries on dead node to be demoted as replica; resulting in uneven primary/replica distribution; which is balanced when considering node weight but unbalanced when evaluating primaries only.

Issues Resolved

#6210

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@@ -967,6 +969,13 @@ AllocateUnassignedDecision decideAllocateUnassigned(final ShardRouting shard) {
}

private static final Comparator<ShardRouting> BY_DESCENDING_SHARD_ID = Comparator.comparing(ShardRouting::shardId).reversed();
private static final Comparator<ShardRouting> PRIMARY_FIRST = Comparator.comparing(ShardRouting::primary).reversed();

private boolean isSegRepEnabled(String index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shard Balancing algo shouldn't be aware of replication. Have method inside the IndexMetadata which return the mode PRIMARY_FIRST or not and that could check whether it is segrep enabled index or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @shwetathareja for the feedback and review. I will update this as you suggested.

This is an improvement over #6017 where primary shard balancing factor was introduced. It would be great if you can review and share your thoughts on issue #5240

Copy link
Member

@shwetathareja shwetathareja Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thoughts:

Have you looked into adding an AllocationConstraint similar to isIndexShardsPerNodeBreached which essentially adds constraint to ensure each node shouldn't have more than avg. no.of shards per index even if a node has overall less shards compared to other nodes. This helps in preventing un-necessary rebalances later. More details are here - #487

I feel you can simply add a constraint for segrep indices to ensure a node shouldn't have more than avg. no. of expected primaries for an index. Lets say it is 5 node cluster and segrep index has 5 primaries + 2 replicas. So no node should have more than 1 primary. This wouldn't require coming up with magic value for weight factor. Later, we can evaluate if we want to extend this to balancing sum of primaries across all indices across all nodes as well. e.g. 1 segrep index has 3 primaries while other has 2 primaries and it is 5 node cluster, then each node shouldn't have more than 1 primary across indices as well.

Copy link
Member Author

@dreamer-89 dreamer-89 Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @shwetathareja for the feedback. This is a great idea!

Moving this discussion to issue #6210 to that it is not lost. I updated the PR, please have a look.

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #6325 (e1b8c29) into main (1540e00) will increase coverage by 0.17%.
The diff coverage is 89.47%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6325      +/-   ##
============================================
+ Coverage     70.67%   70.84%   +0.17%     
- Complexity    58948    59073     +125     
============================================
  Files          4800     4800              
  Lines        282421   282437      +16     
  Branches      40719    40723       +4     
============================================
+ Hits         199587   200104     +517     
+ Misses        66392    65941     -451     
+ Partials      16442    16392      -50     
Impacted Files Coverage Δ
...ting/allocation/allocator/LocalShardsBalancer.java 85.27% <81.81%> (+0.26%) ⬆️
...org/opensearch/cluster/metadata/IndexMetadata.java 84.62% <100.00%> (+0.14%) ⬆️
.../index/shard/IndexShardNotRecoveringException.java 0.00% <0.00%> (-50.00%) ⬇️
...n/decider/SnapshotInProgressAllocationDecider.java 34.78% <0.00%> (-34.79%) ⬇️
...nsearch/index/shard/IndexShardClosedException.java 66.66% <0.00%> (-33.34%) ⬇️
...arch/search/aggregations/pipeline/LinearModel.java 23.07% <0.00%> (-30.77%) ⬇️
...search/search/aggregations/pipeline/EwmaModel.java 24.44% <0.00%> (-28.89%) ⬇️
...tions/bucket/terms/ParsedSignificantLongTerms.java 73.68% <0.00%> (-21.06%) ⬇️
...ain/java/org/opensearch/geometry/MultiPolygon.java 80.00% <0.00%> (-20.00%) ⬇️
...n/java/org/opensearch/test/rest/yaml/Features.java 60.00% <0.00%> (-20.00%) ⬇️
... and 471 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dreamer-89
Copy link
Member Author

Closing this in favour of #6422 which defines constraint based primary shard allocation and is controlled via a dynamic setting.

@dreamer-89 dreamer-89 closed this Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants