-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[master] Add etcd-member-management
controller to K3s
#4001
Conversation
node.Labels[EtcdRoleLabel] == "true" && | ||
node.Labels[ControlPlaneLabel] == "true" || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the kubelet interprets any value for this label as applying this role, should we consider testing for the presence of the node-role label(s) and ignoring the value(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially! Is there anywhere that the kubelet's behavior is defined for this? I know we just discussed this on the system-upgrade-controller project. Also, this is coming from copy-pasta (from my renaming of the controller)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple nits on use of node id vs member name.
etcd has members with name and id; the name and the id are not the same thing. We should be consistent about which one we're using. We should also be consistent about referring to Kubernetes nodes vs etcd members.
MasterLabel = "node-role.kubernetes.io/master" | ||
ControlPlaneLabel = "node-role.kubernetes.io/control-plane" | ||
EtcdRoleLabel = "node-role.kubernetes.io/etcd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have these as consts at
Lines 42 to 44 in 699ea16
MasterRoleLabelKey = "node-role.kubernetes.io/master" | |
ControlPlaneRoleLabelKey = "node-role.kubernetes.io/control-plane" | |
ETCDRoleLabelKey = "node-role.kubernetes.io/etcd" |
ad34bb0
to
b5fe115
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the duplicate consts; can we reuse those?
When using those constants, I run into an import cycle. Not sure if we want to move those to a more "universal" point |
Signed-off-by: Chris Kim <oats87g@gmail.com>
Signed-off-by: Chris Kim <oats87g@gmail.com>
b5fe115
to
b7dc271
Compare
Ugh yeah, it looks like we also have another copy as string literals in servicelb as well, probably for the same reason. I guess we can clean it up later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM looks great thanks
Proposed Changes
This PR adds a new etcd member management controller to K3s, as well as moves the existing etcd controller to a "etcd metadata" controller that is responsible for updating the labels/annotations of corresponding etcd node objects.
Types of Changes
Functional
Verification
etcdctl member list
command using a downloadedetcdctl
binary from the etcd project on any of theetcd
nodes similar to:and observe that there should be N etcd members listed, where N is the number of etcd nodes in the cluster.
3. Annotate one of your etcd nodes (preferably not the "bootstrap" node) with the annotation
etcd.k3s.cattle.io/remove=true
. List/describe the annotations of the node and watch theetcd.k3s.io/removed-node-name
annotation be set to the etcd node name that corresponded to the list of nodes in themember list
command from above.4. Re-run the
etcdctl member list
and observe that the etcd member is NOT listed in the list anymore. Additionally, looking at the logs ofk3s server
of the node that was removed, observe that it appears quite angry and is constantly spitting logs related to etcd server stopped and so forth.These 4 steps cover the primary additional functionality of the etcd member management controller. To validate the other bugfixes (2 of them) that were included by this PR, you can:
k3s server
process i.e.systemctl restart k3s
on the etcd node that was removed. Observe that it should re-join the etcd cluster (as observed byetcdctl member list
) BUT it should get a newetcd
member name and ID. When youkubectl describe
the node, observe that theremove
annotation should be reset tofalse
.k3s
should not automatically be restarted bysystemd
oropenrc
once the node is removed from the etcd cluster. The node should basically just go into a bad state and need to be manually fixed viasystemctl restart
Linked Issues
User-Facing Change
There are no immediate user facing changes. Most users should not be using this annotation functionality for managing etcd members.
Further Comments