[BUG] Offline calculation of total shard across all node and caching it for weight calculation inside LocalShardBalancer #15108
Labels
bug
Something isn't working
Indexing:Replication
Issues and PRs related to core replication framework eg segrep
untriaged
Describe the bug
Description
When selecting a node on which shard will be allocated, OpenSearch calculates weight of that shard on every node. Weight of a shard represents comparison of the number of shards on this node to the number of shards that should be on each node on average (both taking the cluster as a whole into account as well as shards per index). Calculating the average shard per node during weight calculation is a resource-intensive operation. To do this, we sum up the shards count on all nodes by iterating through metadata information of all nodes and dividing this sum by total number of nodes. Since this computation is performed for each node during shard allocation, it becomes computationally expensive. As there is only single thread on master node which handles all the operations including the deciders and make allocation decisions, allocation deciders execution may continue to block these threads which may prevent execution of certain high priority tasks like applying/sending cluster state update, index create, etc.
As can be validated from the graph, about 50% of the time spent for relocating 6k shards (empty shards) from 100 source nodes and assigning them on 100 destination nodes is attributed to average shard calculation during weight determination.
Related component
Indexing:Replication
To Reproduce
Create 500k shards on a setup with 1000 data nodes and 3 master nodes.
Expected behavior
Do an offline calculation of total shards across all nodes and caches it so that LocalShardBalancer does not needs to traverse all the nodes metadata for weight calculation.
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: