-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[elasticsearch] New readiness probes causing full cluster-restart #631
Comments
So I suspect this behaviour is related to the change to the Would you be able to provide the output of |
Also, as a general point of interest, if you're pinning to a custom As the only real reason to bump the chart version is to pick up changes related to new versions and bug fixes. |
Yes, I also think that #586 is the reason for this. The events are not visible anymore and I can't restart the cluster right now. If you really need them I can do it today in the evening... I did not explicit upgrade the chart, because there was an update. But I made changes to the chart-settings yesterday and therefore implicit upgraded the chart to the newest version, as I'm not pinning the chart-version if it's not really needed. kubectl describe masterpod
|
Ok, I'll see if I can replicate locally, and I'll come back if need the events...
Ah, ok... So for now, I'd suggest pinning the chart version to a previous version so that any future config updates don't restart in a full restart. |
Yes, that's the plan. If you need more infos, I will try to get them 😉 |
@ckotzbauer OK, I've done a bit more testing locally, and I don't think the change in That got me looking through the config and history, and I came across this issue: #63. By setting Do you want to try applying that config item? |
hm, okay that's an interesting point. I know this behavior too, that the election tooks too long and the cluster is down if only the master is deleted... I can try this. But I (personally) don't think that this causes the problem, that K8s does the rollout too fast. The new pods are marked as ready, so K8s thinks it could go on. The intention of the readinessProbe implementation in the statefulset is to prevent K8s from doing this, by delaying the readiness state until the cluster is recovered again. And that seems the point which does not work anymore. Or did I miss sth. here...? 🤔 |
@ckotzbauer So a quick update from me. We ran some more tests internally, and were able to reproduce the "too fast" rollout on an internal After a bit of digging, it looks like there was a set of quotes missed on the initial I've opened #638 which re-works that behaviour, and I'm running some more tests internally... We'll probably be looking to do a patch release for this pretty quickly. However my recommendation to you as you're running a |
Thank you so much for digging in @fatmcgav. I really appreciate! |
@ckotzbauer The fix in #638 has been merged, and back-ported to the 6.8 and 7.7 branches for inclusion on the next minor release. |
Chart version: 7.7.0
Kubernetes version: 1.17.1
Kubernetes provider: On-prem
Helm Version: 3.2.0
helm get release
outputOutput of helm get release
Describe the bug:
I updated the chart to the newest version 7.7.0 and expected that the three-elastic nodes are updated one after another, waiting until the cluster is green again. (The most recent restarted pod was not ready, until the cluster was green again in the past). Now, the pod became ready after a few minutes and kubernetes moved on too quickly, so the cluster was red and not down.
Steps to reproduce:
Expected behavior:
The readiness probe is working as expected, and mark the pod as not ready, until the cluster is green again.
Any additional context:
I did the update of my release often in the past without such problems, but always today with the new version.
The text was updated successfully, but these errors were encountered: