-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Consider primary shard balancing first #6325
[Segment Replication] Consider primary shard balancing first #6325
Conversation
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
@@ -967,6 +969,13 @@ AllocateUnassignedDecision decideAllocateUnassigned(final ShardRouting shard) { | |||
} | |||
|
|||
private static final Comparator<ShardRouting> BY_DESCENDING_SHARD_ID = Comparator.comparing(ShardRouting::shardId).reversed(); | |||
private static final Comparator<ShardRouting> PRIMARY_FIRST = Comparator.comparing(ShardRouting::primary).reversed(); | |||
|
|||
private boolean isSegRepEnabled(String index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shard Balancing algo shouldn't be aware of replication. Have method inside the IndexMetadata which return the mode PRIMARY_FIRST or not and that could check whether it is segrep enabled index or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @shwetathareja for the feedback and review. I will update this as you suggested.
This is an improvement over #6017 where primary shard balancing factor was introduced. It would be great if you can review and share your thoughts on issue #5240
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thoughts:
Have you looked into adding an AllocationConstraint similar to isIndexShardsPerNodeBreached which essentially adds constraint to ensure each node shouldn't have more than avg. no.of shards per index even if a node has overall less shards compared to other nodes. This helps in preventing un-necessary rebalances later. More details are here - #487
I feel you can simply add a constraint for segrep indices to ensure a node shouldn't have more than avg. no. of expected primaries for an index. Lets say it is 5 node cluster and segrep index has 5 primaries + 2 replicas. So no node should have more than 1 primary. This wouldn't require coming up with magic value for weight factor. Later, we can evaluate if we want to extend this to balancing sum of primaries across all indices across all nodes as well. e.g. 1 segrep index has 3 primaries while other has 2 primaries and it is 5 node cluster, then each node shouldn't have more than 1 primary across indices as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @shwetathareja for the feedback. This is a great idea!
Moving this discussion to issue #6210 to that it is not lost. I updated the PR, please have a look.
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6325 +/- ##
============================================
+ Coverage 70.67% 70.84% +0.17%
- Complexity 58948 59073 +125
============================================
Files 4800 4800
Lines 282421 282437 +16
Branches 40719 40723 +4
============================================
+ Hits 199587 200104 +517
+ Misses 66392 65941 -451
+ Partials 16442 16392 -50
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Closing this in favour of #6422 which defines constraint based primary shard allocation and is controlled via a dynamic setting. |
Description
This change considers primary shards first for rebalancing operation versus any random shard; that ensures more fairer primary distribution. This change allows nodes to converge towards a more balanced primary distribution versus skewed distribution though still balanced. This rebalancing logic considers shards of one index at a time starting with most unbalanced first, thus this change is non-intrusive.
Note: The primary shard distribution can still remain unbalanced in below cases:
Multiple indices. Rebalancing logic runs balancing logic on one index at a time. This can result in nodes balanced specific to index but when combined. E.g. N1, N2 are two nodes, I1P1, I2P1 are primary shards of index I1 and I2, while I1R1 and I2R1 are replica shards. Below distribution is balanced considering I1, I2 separately but not when combining both.
N1 N2
I1P1 I1R1
I2P1 I2R1
Failover. During failover the replicas are immediately promoted as primary which results in primaries on dead node to be demoted as replica; resulting in uneven primary/replica distribution; which is balanced when considering node weight but unbalanced when evaluating primaries only.
Issues Resolved
#6210
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.