Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single node down when joining and leaving will cause total outage #1290

Open
bboreham opened this issue Mar 17, 2019 · 5 comments
Open

Single node down when joining and leaving will cause total outage #1290

bboreham opened this issue Mar 17, 2019 · 5 comments

Comments

@bboreham
Copy link
Contributor

If you have 3 ingesters, and replication factor 3, then a rolling update will give this error:

at least 3 live ingesters required, could only find 2

This is because the distributor increases the size of the quorum required when it finds ingesters joining and leaving.

Strikes me there is something wrong here.

@Serrvosky
Copy link

I'm facing the same problem

@Serrvosky
Copy link

#1488

@bboreham bboreham changed the title Errors when updating 3 ingesters Single node down when joining and leaving will cause total outage Feb 20, 2020
@bboreham
Copy link
Contributor Author

Same issue hits when you have more than 3 ingesters, one is LEAVING, and another one goes bad.
Distributors will fail the entire request back to the sender with a 500 code because it can only find 2 ingesters for some subset of the series.

@alanprot
Copy link
Member

I think if this happened when extended write was set to true, this issue was fixed by #4626

@bboreham
Copy link
Contributor Author

#4626 is another issue; did you mean #4636?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants