-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[elasticsearch] add master pre stop hook using voting_config_exclusion #108
[elasticsearch] add master pre stop hook using voting_config_exclusion #108
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments on how this can be improved. Even with these improvements I still feel that this is a very risky solution in comparison to my suggestion in #63 (comment). Due to the risks added by this I don't feel comfortable in continuing with this approach unless you can convince me other wise. If you see issues with my proposal let's discuss them in the issue first to see if there is a solution that would be safe to use until this is fixed in Elasticsearch.
By automating the removal of masters it means that the cluster can end up in a state where it loses 1 or more masters in an automated fashion. The cluster health _cluster/health
check that is used for the readinessProbe will not detect that the amount of eligible masters is below quorum. Meaning that Kubernetes will see this cluster as healthy and gladly continue with the rolling upgrade or other maintenance tasks that requires restarting the pods.
The same thing will happen if the while true
loop gets stuck for whatever reason (like if the other masters have been lost and it can't re-elect a new master). Once the grace period is hit this will then cause the pod to be restarted and for quorum to be lost.
The solution I suggested in #63 (comment) gets around this by never actually removing any masters from being excluded from voting. Instead having a sidecar container will make sure that the pods IP address is kept around until a new master has been elected without the need to do anything risky. The implementation would just be using the "if current master is not me" in a while loop. The only issue that needs to be solved is making sure that the pods IP address is kept around long enough for a new master election to happen. Doing anything else just adds more things that could go wrong.
curl -X${method} -s -k --fail ${BASIC_AUTH} {{ .Values.protocol }}://127.0.0.1:{{ .Values.httpPort }}${path} | ||
} | ||
|
||
NODE_NAME=$(awk 'BEGIN {print ENVIRON["node.name"]}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally confused about why awk was needed here. But it looks like bash can't access environment variables with dots in the name.
|
||
http "/_cluster/voting_config_exclusions/${NODE_NAME}" "POST" | ||
|
||
while true ; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being an infinite loop seems to be ok here because the termination of pods waits for the graceful timeout for the preStop condition to finish: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
NODE_NAME=$(awk 'BEGIN {print ENVIRON["node.name"]}') | ||
echo "Exclude the node ${NODE_NAME} from voting config" | ||
|
||
http "/_cluster/voting_config_exclusions/${NODE_NAME}" "POST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be inside the while true
loop. Otherwise the set -e
is going to cause the preStop script to exit here if this curl request fails.
# Node won't be actually removed from cluster, | ||
# so node should be deleted from voting config exclusions | ||
echo "Node ${NODE_NAME} is now being deleted from voting config exclusions" | ||
http "/_cluster/voting_config_exclusions?wait_for_removal=false" "DELETE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be inside the while true
loop. If this fails then then this node won't be able to rejoin the cluster after starting up again.
|
||
while true ; do | ||
echo -e "Wait for new master node to be elected" | ||
if [[ "$(http "/_cat/master")" != *"${NODE_NAME}"* ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement can be improved to be safer. Right now this is if statement will be matched if the curl command fails in anyway.
[[ "" != "elasticsearch-master-0" ]] ; echo $?
0
A safer option would be to only break if elasticsearch-master
is returned and not ${node.name}
. Otherwise this will produce a lot of false positives.
@Crazybus Thanks for detailed review! Your comments helped me a lot understanding and improving this pr. |
Thank you for saying this. It means a lot. It's hard work maintaining open source software, even harder when you need to say "no" to something that someone has clearly spent a lot of time and effort on (twice in a row even). I really appreciate the time you have invested so far and I'm glad that we can work together to improve this helm chart :) |
Hopefully this will fix #63 for es7 master nodes