-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm join is not fault tolerant to etcd endpoint failures #1432
Comments
I would not say this is a problem of join, but instead, it is a problem of reset that does not cleanup properly, and there is already work ongoing to fix that. See
For the etcd problem I'm not sure what kubeadm can really do (apart of implementing hacky workaround). IMO this should be fixed on etcd, but I'm open to suggestions |
@fabriziopandini in the reproduction steps kubeadm reset is not used, which is a real world scenario outside of intentional chaos engineering. When a node goes away there is no guarantee that kubeadm reset will be called or if it is called that it will complete successfully (network partition etc). So even if Kubeadm should be resilient to those scenarios. This issue may be a good place to track either the hacky workaround (order the map? retry etcd client explicitly?) or the integration of the patched etcd client when it is available. |
If nodes terminated or shutdown, i use I am not sure if i understand your words right? |
We need to:
WDYT? |
AFAIK, if node terminated, we needs to execute command that remove etcd member from the cluster on the one of masters and update the If these steps using |
@pytimer I am ok with kubeadm leaving it up to the user to do the etcd member management. ie if I have a cluster with 3 nodes A,B,C and I terminate C (and don't run kubeadm reset on C) I am left with an unhealthy etcd member, a node ( When adding a new node D before joining the cluster it first:
At this point we have a 2 node cluster, all nodes healthy and the The goal of this issue is to not have to perform the explicit ClusterStatus update. If there is a way forward where we don't have to do any of the cleanup prework, all the better but it may be a better idea to start with a smaller scope. |
@danbeaulieu thanks for explaination
|
I was following the discussion in #1300 and now here, just wanted to throw in my support on this - everything @danbeaulieu pointed out is something we've noticed in our cluster management. Namely, we had a situation where we had a 3-node cluster, and for whatever reason one of the nodes died. AWS ASG brought a new one up, but it hung up on the situation described above. Our fix (in our provisioner) was the same as what Dan described, when a new node joins, reconcile the membership with etcd and the ClusterStatus. |
@danbeaulieu @anitgandhi thanks for your support! It was also agreed to to try to direct some of our bandwidth in v1.15 on implementing reset phases, so we can offer to the users some tools to recover from an uncontrolled loss of a control-plane node What is still missing to the above picture is to define how to determine that we are in this situation, and how to change the join workflow in this case Any help, suggestion, contribution with this regards will be really appreciated |
@fabriziopandini I think this bug still exists in the etcd preflight:
This prevents the edge case where we try to I will PR this unless anyone sees a problem with it, as it is safer in the long run and doesn't preempt the other node reset work being done. |
/lifecycle active |
Thanks, it works after remove etcd member manually |
this is hopefully going to be fixed in 1.16 |
For folks that happen across this issue. I've created a static pod at git.io/etcdclient.yaml that can be used to interact with etcd once you have deployed it. |
i've just send a PR that may be the first step in deprecating the PTAL and do tell if you object on serving a HTTP probe on localhost. |
this should be fixed in 1.16 when kubeadm will use etcd 3.3.15. |
@neolit123 I don't believe this is fixed in 1.16.2.
To test, I created a 2 node HA cluster, then modified the ClusterStatus in kubeadm-config map to add in 3 bogus nodes After the failure I removed the bad nodes from the ClusterStatus and reran kubeadm join and the node joined successfully. Current work around is to edit the kubeadm-config configmap to remove any "bad" nodes before running Also, the log message |
@danbeaulieu as you can see we already switched 1.17 to this version, but such backports (e.g. for 1.16) are tricky: |
I don't believe this is an etcd server version issue, I think it is an etcd client version issue used by kubeadm. The client is not fault tolerant to bad endpoints in the endpoint list if the first endpoint is bad. There are a lot of k/k and etcd related PRs and issues related to this bug but it isn't clear if this was ever actually fixed in 1.16 kubeadm. |
i don't think that folks ouside of the etcd maintainers understand this. we updated the client in master / 1.17 too.
|
@danbeaulieu this should be fixed by using the client in etcd versions > 3.3.14 |
the etcd client version used in the latest 1.16 is v3.3.17: but latest 1.16 does not mean latest stable 1.16. the SHA for the v1.16.2 tag tells me that the etcd client is at v3.3.15 there: so if this is supposed to be fixed by > 3.3.14, it seems it's not. |
etcd v3.3.15 release notes specifically reference that it fixes this issue... |
yes, for 1.17 and master kubeadm and k8s is now using the 3.4 client. |
@neolit123 do you see any thing that could explain why this doesn't seem to be fixed in 1.16.2 (v3.3.15 client) according to my reproduction steps? Is anyone able to reproduce? |
nothing on the kubeadm side at least. we are planning to add e2e tests for removing and re-adding nodes eventually, but everyone is out of bandwidth ATM: please try 1.17 as well - there are already pre-releases with images out. as mentioned above, it bundles etcd 3.4 server and a newer client. |
Hitting the same issue with 1.17.0, deleted a master node + etcd without running
What is the current workaround for this issue? Editing kubeadm-config accordingly and restarting etcd + kube-api-server pods does not solve the issue. |
@cgebe if I'm not wrong you have to delete the member using etcdctl Btw: kubernetes/enhancements#1380 is going to remove problems related to the kubeadm ClusterStatus getting stale |
@fabriziopandini Thanks, removed the dead etcd node manually!
Join thereafter worked as expected! |
What keywords did you search in kubeadm issues before filing this one?
etcd kubeadm join clusterstatus
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use
kubeadm version
):$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:35:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version
): 1.13.4uname -a
): 4.15.0-1032-awsWhat happened?
kubeadm join --experimental-control-plane
sporadically fails when adding a new node to the control plane cluster after a node is removed.What you expected to happen?
For the join to succeed.
How to reproduce it (as minimally and precisely as possible)?
Create an HA stacked control plane cluster. Terminate one of the control plane nodes.
Start another node, remove the failed etcd member, delete the failed node (kubectl delete node ...) and run
kubeadm join --experimental-control-plane
on the new node.Anything else we need to know?
This is due to a few things:
The text was updated successfully, but these errors were encountered: