-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingesters not passing readiness probe #1502
Comments
Cortex itself doesn't care about readiness. If you visit the |
Nice, that helped, I was able to "Forget" the pods that were unhealthy and now I'm up and running, is there a way to automatically forget those unhealthy pods? |
Right now we require human input because it's hard to decide what really happened - was it a bug in the ingester, or data corruption in the ring, etc. |
I see. Is there any plans on automating this process? |
Can’t really plan around the absence of knowledge. For instance, can you say why you had unhealthy pods? |
Is there somewhere I can look for that info? There is nothing in the logs in any service. But I know that I was replacing pods (changing vars or requests/limits on the deployment manifest) that replaced the pod and caused this, if I replace the pods I need to manually fix the cluster using the |
When you shut down an ingester it needs to hand over its chunks to a new one or flush them all to the store, which can take many minutes. After that it will remove itself from the ring. Under Kubernetes you will need a sufficiently large grace period on the pod definition. Also #1307 means you need to raise |
I have 5 ingesters running and none of them are passing the readiness probe (
/ready
) ... I exec'ed into the pod and ran it manually and I'm getting a503
The logs are only showing an memcached error (#1501) and some of the other pods are failing with:
What can I do the debug this issue in the ingesters?
The text was updated successfully, but these errors were encountered: