Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ingester.autoforget-unhealthy-timeout opt-in feature #3919

Merged

Conversation

github-vincent-miszczak
Copy link
Contributor

Follow up of #3661, I was too long to push an update and the PR was automatically closed :(

What this PR does / why we need it:
It adds a new parameter that allows to automatically forget unhealthy ingesters after heartbeat timeout.

Which issue(s) this PR fixes:
Partially fixes #3360

Special notes for your reviewer:
Not everyone is using stateful sets. In my case, I'm running on AWS ECS in a stateless configuration. If one of my ingester get killed for whatever reason, this is not a drama because I have RF=3. A new task is spawned.

At the moment the failed ingester will remain in the ring forever and requires to push forget button on ingester /ring. If not, another task failing will make the cluster unavailable even if 2 out of 3 are healthy because Loki does not use a sloppy quorum.

Checklist

  • Documentation added
  • Tests updated

@github-vincent-miszczak github-vincent-miszczak force-pushed the unhealthy-ingester-removal branch from 34e05cf to 00ca779 Compare July 1, 2021 16:44
@github-vincent-miszczak github-vincent-miszczak marked this pull request as ready for review July 1, 2021 16:56
@bt909
Copy link
Contributor

bt909 commented Jul 2, 2021

I have tested it, because I was also looking for a feature like this. It works well. Thank you.

Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"too many failed ingesters" using memberlist
4 participants