Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master] Add etcd-member-management controller to K3s #4001

Merged
merged 2 commits into from
Sep 14, 2021

Conversation

Oats87
Copy link
Member

@Oats87 Oats87 commented Sep 13, 2021

Proposed Changes

This PR adds a new etcd member management controller to K3s, as well as moves the existing etcd controller to a "etcd metadata" controller that is responsible for updating the labels/annotations of corresponding etcd node objects.

Types of Changes

Functional

Verification

  1. Create a multi-node etcd-backed K3s cluster.
  2. Perform an etcdctl member list command using a downloaded etcdctl binary from the etcd project on any of the etcd nodes similar to:
etcdctl --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/client.key member list -w table

and observe that there should be N etcd members listed, where N is the number of etcd nodes in the cluster.
3. Annotate one of your etcd nodes (preferably not the "bootstrap" node) with the annotation etcd.k3s.cattle.io/remove=true. List/describe the annotations of the node and watch the etcd.k3s.io/removed-node-name annotation be set to the etcd node name that corresponded to the list of nodes in the member list command from above.
4. Re-run the etcdctl member list and observe that the etcd member is NOT listed in the list anymore. Additionally, looking at the logs of k3s server of the node that was removed, observe that it appears quite angry and is constantly spitting logs related to etcd server stopped and so forth.

These 4 steps cover the primary additional functionality of the etcd member management controller. To validate the other bugfixes (2 of them) that were included by this PR, you can:

  1. Restart the k3s server process i.e. systemctl restart k3s on the etcd node that was removed. Observe that it should re-join the etcd cluster (as observed by etcdctl member list) BUT it should get a new etcd member name and ID. When you kubectl describe the node, observe that the remove annotation should be reset to false.
  2. Observe that k3s should not automatically be restarted by systemd or openrc once the node is removed from the etcd cluster. The node should basically just go into a bad state and need to be manually fixed via systemctl restart

Linked Issues

User-Facing Change

There are no immediate user facing changes. Most users should not be using this annotation functionality for managing etcd members.


Further Comments

@Oats87 Oats87 requested a review from a team as a code owner September 13, 2021 21:32
Comment on lines +50 to +51
node.Labels[EtcdRoleLabel] == "true" &&
node.Labels[ControlPlaneLabel] == "true" ||
Copy link
Contributor

@dweomer dweomer Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the kubelet interprets any value for this label as applying this role, should we consider testing for the presence of the node-role label(s) and ignoring the value(s)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially! Is there anywhere that the kubelet's behavior is defined for this? I know we just discussed this on the system-upgrade-controller project. Also, this is coming from copy-pasta (from my renaming of the controller)

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple nits on use of node id vs member name.
etcd has members with name and id; the name and the id are not the same thing. We should be consistent about which one we're using. We should also be consistent about referring to Kubernetes nodes vs etcd members.

Comment on lines +59 to +63
MasterLabel = "node-role.kubernetes.io/master"
ControlPlaneLabel = "node-role.kubernetes.io/control-plane"
EtcdRoleLabel = "node-role.kubernetes.io/etcd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have these as consts at

k3s/pkg/server/server.go

Lines 42 to 44 in 699ea16

MasterRoleLabelKey = "node-role.kubernetes.io/master"
ControlPlaneRoleLabelKey = "node-role.kubernetes.io/control-plane"
ETCDRoleLabelKey = "node-role.kubernetes.io/etcd"

pkg/etcd/member_controller.go Outdated Show resolved Hide resolved
pkg/etcd/member_controller.go Outdated Show resolved Hide resolved
Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the duplicate consts; can we reuse those?

@Oats87
Copy link
Member Author

Oats87 commented Sep 13, 2021

LGTM except for the duplicate consts; can we reuse those?

When using those constants, I run into an import cycle. Not sure if we want to move those to a more "universal" point

Signed-off-by: Chris Kim <oats87g@gmail.com>
Signed-off-by: Chris Kim <oats87g@gmail.com>
@brandond
Copy link
Member

brandond commented Sep 14, 2021

When using those constants, I run into an import cycle. Not sure if we want to move those to a more "universal" point

Ugh yeah, it looks like we also have another copy as string literals in servicelb as well, probably for the same reason. I guess we can clean it up later.

Copy link
Contributor

@galal-hussein galal-hussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM looks great thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants