-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Introduce an etcd operator leader status field #694
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ironcladlou The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
See the discussion in openshift/machine-config-operator#1897 for more details about one way this could be useful. |
This patch explores adding an etcd operator status field which reports leader member information. Exposing this would allow, for example, smarter decision-making by the MCO regarding reboot ordering by providing a hint to help minimize disruption via excessive leader changes.
6ffe8b6
to
06b1a7d
Compare
// name is the etcd leader member name, if available. | ||
Name string `json:"name,omitempty"` | ||
// node is the etcd leader member node, if available. | ||
Node string `json:"node,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be e.g. NodeRef *corev1.ObjectReference
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be e.g.
NodeRef *corev1.ObjectReference
?
No. We encourage the creation of specific reference types. See the reasoning in https://github.com/kubernetes/api/blob/master/core/v1/types.go#L5172-L5186
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Hmm. So, since there's nothing more we care about here to a node than its name, does that argue for keeping it as a string
?
@ironcladlou: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
P.S., this is for now to help promote discussion and experimentation — we haven't actually done the work/measurement to prove the cited use case has a benefit with expanding the API for |
@ironcladlou what about something like status struct{
conditions
[]EtcdMembers
}
EtcdMembers struct{
name string
node string
status string // learning,unhealthy,not-a-member,healthy
memberType string // leader,follower (whatever this is)
fsyncP99InLastMinute string
peerLatencyP99InLastMinute string
// or whatever it is you want
} |
OK so I'm about at the point in openshift/machine-config-operator#1897 work where I'm blocking on this, because I really want to test OS update things after we're only updating etcd followers first. I starting looking at hacking something into the MCD for this but it's ugly. |
If we don't land this my tentative thoughts are:
In playing with this what I'm stumbling over is getting the right creds set up to talk to etcd, and I don't know if there's a non-polling way to watch for leader changes. The rest of the logic for managing upgrade order would be in the MCC and proceed basically the same. |
You know what may be far simpler (and avoid the "etcd writes its state to kube which writes to etcd" issue) is having the etcd pod write out something extremely simple like Anyways for now I wrote some hacky code that reuses the certs from the apiserver in the MCD |
Or, rather than trying to extend the API here we could use a node annotation. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This patch explores adding an etcd operator status field which reports
leader member information. Exposing this would allow, for example, smarter
decision-making by the MCO regarding reboot ordering by providing a hint
to help minimize disruption via excessive leader changes.