Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Shard replication tasks do not start on new nodes intermittently #482

Closed
soosinha opened this issue Aug 17, 2022 · 3 comments
Closed
Assignees
Labels
bug Something isn't working must_fix v2.x v2.3.0 'Issues and PRs related to version v2.3.0'

Comments

@soosinha
Copy link
Member

What is the bug?
When a node is replace with another node and the shard migrates to the new node, the shard replication task does not start sometimes which results in replication not working.

The bug is in this code where the index task checks for shard tasks for all indices. Even if one shard task is present for any other index it bails out. This code should be modified to check the shard tasks for the particular index only.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Start replication for atleast 2 indices
  2. Add a new node to the cluster
  3. Shutdown ES on the node where the index replication task is running
  4. Index replication task should start on any of the new nodes
  5. Shard replication task may not start on the new node(May have to repeat to this simulate the issue as it is intermittent)

What is the expected behavior?
Index and Shard replication task should start after they are stopped due to node replacement

@soosinha soosinha added bug Something isn't working untriaged v2.x and removed untriaged labels Aug 17, 2022
@krishna-ggk
Copy link
Collaborator

Thanks for finding the bug and root-causing it.

One thing to re-evaluate is whether we can unify code-path to start missing shard tasks.

  1. When shard tasks are missing for all shards. (link)
  2. When shard tasks are missing for few shards. (link)

@gbbafna gbbafna added the v2.3.0 'Issues and PRs related to version v2.3.0' label Aug 19, 2022
@ankitkala ankitkala self-assigned this Aug 23, 2022
@ankitkala
Copy link
Member

I've raised the PR for fixing the bug where we weren't filtering the ShardReplicationTasks for current index: #497

We should still keep this issue open to refactor and unify the logic to spawn the missing/all shard tasks

@ankitkala ankitkala mentioned this issue Sep 7, 2022
22 tasks
@ankitkala
Copy link
Member

Closing the issue as we've also addressed the refactoring as part of same PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working must_fix v2.x v2.3.0 'Issues and PRs related to version v2.3.0'
Projects
None yet
Development

No branches or pull requests

5 participants