-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Domain-specific reboot minimisation enhancement #159
Domain-specific reboot minimisation enhancement #159
Conversation
/assign @eparis |
AFAICS none of the MCO developers are CC on this enhancement? See also openshift/machine-config-operator#650 I think rather than phrasing this like "dictionary of file paths" which just sounds odd to me, we should say: First, the MCD will gain support for applying at least one type of change without rebooting; an excellent target example would be the kubelet maxPods (we may need to restart kubelet, but not the host). Second, we attempt to generalize and expand that infrastructure. There's a lot of things that need to be plumbed through in the MCO to handle this at all - most likely we need to better expand our "diff MCs" code so we can filter down on targeted changes better. One thing I mused about that may help is if we don't try to inspect file paths at all, but only handle the case where a high level CRD like kubeletconfig changes, and reverse-engineer that from the generated MC. Then, responsibility for "live apply" is delegated to the high level controllers. IOW it'd be the kubeletconfigcontroller that knows how to live apply max pods. |
Thanks for the feedback @cgwalters
Odd. I definitely meant to. @runcom has been added
Would "lookup table" be any better?
That's an interesting alternative to consider. The intention from Alex was that this would be a minimally invasive approach that could be implemented in the short term to meet telco requirements. He envisioned a more expressive solution in the longer term that defined a higher level syntax that solved the coming Ignition format problem and also, somehow, indicated the requirements for various fields. This sounds broadly inline with your other idea. Expanding the diff code seems to fall somewhere in the middle. |
Call out that this is a short-term enabler that should not preclude a more comprehensive approach.
Is it assumed that any change that would necessitate a draining of pods would default to simply rebooting? If we stick with file paths, it seems like we'd want a review process around approving addition of those paths to the dictionary, ie: My change to this file may be safe without a reboot, but Alice's change to the same file requires a reboot, therefore we should not add the file. |
I think the enhancement should be specified at a high level; our goal is to support changes without reboots. "dictionary" and "lookup table" are more potential implementation details. |
Agreed. If any exposed part of the file requires a reboot, then any change to the file requires reboot. Opening that particular can of worms is a much larger piece of work and is unlikely to be possible in the required timeframe. Also agreed that we need to be extremely cautious about which paths get added to the list. |
I would amend the goal to be "support some changes without reboots ASAP", hence the simplifying assumption of dealing with file paths instead of individual options. I will see if I can find a better balance in the PR to write it at a high level but also keep it focused on low hanging fruit rather than a bigger and more comprehensive solution (I am assuming that would be a follow up PR that would in part stem from the experience we gain from this initial implementation). |
This looks fine to me. @runcom do you or the rest of the MCO team have any concerns with this? |
I think this enhancement was filed with an eye towards bare metal, where rebooting can entail whole minutes in the BIOS. So it may be useful to distinguish between draining/restarting kubelet and rebooting. |
After the MCD writes out a configuation change, it will consult the whitelist | ||
before deciding if a reboot is required. If the filename is present, has a | ||
valid action and any associated data, the action will be performed. Otherwise, | ||
a reboot will be performed as in prior 4.x versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was a case where MCD was expected to drain the node even before any configuration change is applied... and here proposal is regarding actions after configuration has been applied.. so it seems like skip drain is not an action that can be supported?? would love to see details wrt to drain before reboot...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing is, if the configuration change (which is usually multiple file/units) has conflicting actions ie some files require reboot, some say skip reboot.. what should be the expected behavior. Would like to see those details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any pointers to additional information around the drain-before-reboot usecase would be great. It would be good to understand the motivation before making any claims about how the two features should interact.
The proposal calls for a whitelist that MCD uses to decide which action to take | ||
after updating a configuration file for the MCO. To prevent the admin from | ||
tampering with the whitelist, it will be included as part of the MCD image as a | ||
static file in yaml format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file going to be versioned, or there is a strict compatibility with the current MCD only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the reason for including it with the MCD was to avoid the need for versioning.
But I don't have strong feelings about it.
Promote the timeout to a top-level field, timeouts can only be trusted if they are enforced by the caller, not callee. Change the kubelet example to show how drain + restart might be achieved.
Allow the ability to drain independently of any other action. We don't want teams to re-implement their own each time
Spot on.
I've modified the proposal to allow drain to be specified in addition to whatever other action is appropriate. This avoids duplicating functionality for files requiring special combinations. |
Signed-off-by: Andrew Beekhof <andrew@beekhof.net>
OK I only skimmed this update. There was a lot of discussion on this and it's actually yet another of a "important for bare metal, less for cloud" things we have going on (e.g. nmstate, usbguard in extension system, etc). The lack of visibility into which actions are going to result in reboot is going to be very confusing for admins. I guess I'd still try to implement this differently were I doing it, but in the end this enhancement as written doesn't preclude changing the implementation model (we're not adding new APIs) so I'll give it a 👍 from that PoV. You really need the MCO team signoff now I think. |
cc @runcom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current iteration of this enhancement looks sane as a first step towards avoiding reboots in special cases. I actually like a modification to the mcd that does produce a list of actions to take in certain scenarios (we can do that but the risk of missing a required reboot because we just didn’t anticipate it could be high, are we fully ok with that? What do we suggest to do in that case?)
My only worry are, as usual, upgrades and I wouldn’t be surprised to find some behavior that we’re relying on today with drain+reboot that could potentially be changed by this.
The enhancement focuses on specific cases mainly so that should alleviate the above I guess, as we reboot anyway during major upgrades (I’m thinking about network upgrades as others already pointed out).
So, assuming we’re not doing ansible again and everything is controlled by us (and mcd), I think this looks good as a first step and doesn’t prevent us from future iterations of a rebootless behavior with other mechanisms (without regressing on what we introduce with this).
## Summary | ||
|
||
A node's configuration is managed by the MCO. When the configuation is changed | ||
by the system or an admin, the MCO and MCD write a new version of the file to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MCD does the write (nit) the MCO just starts the whole process in particular cases (eg the network config changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the new wording is more precise :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal seems to assume that the reader already fully understands what's being proposed.
@@ -0,0 +1,247 @@ | |||
--- | |||
title: mcd-reboot-dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "dictionary" does not appear anywhere else in this document. It is unclear what the name "mcd-reboot-dictionary" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that was the old title. Will look for any other occurrences too
## Proposal | ||
|
||
1. Add a function to MCD that compares two MachineConfigs, and outputs an | ||
ordered list of known actions that are needed to apply it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it know what changes require what actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the absence of any other information, the function would normally return drain+reboot
if there is any difference detected.
The intention is that in very specific cases where an alternative has been proven to be safe (such as drain + service restart
for changes that only affect ICSP), the function will modified to return a different set of actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the answer here would be that we're going to base this on which files changed between 2 rendered machine configs and compute the actions based on that - we can add whatever complexity we want here also. I think the concern has been captured here #159 (comment) (addition vs deletion mainly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I mean, how does it know that modifications to file X require a reboot but modifications to file Y don't? Especially, is this something which is hardcoded into the MCO by us, or is it something configured at runtime by the administrator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I mean, how does it know that modifications to file X require a reboot but modifications to file Y don't?
the MCO only knows that a modification to a file X "doesn't require a reboot" - anything else will fall into "reboot as always"
Especially, is this something which is hardcoded into the MCO by us, or is it something configured at runtime by the administrator?
The aim is to avoid leaking this choice to the admin so this logic is gonna be computed by us in the MCO (and we drive that for future cases where we might want to avoid a reboot)
### Implementation Details/Notes/Constraints [optional] | ||
|
||
How the comparision determines what constitutes a domain, whether by target file | ||
or by configuration element, is left for later discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear what "domain" means here. It seems to mean something much more specific than in, eg, "Domain specific minimisation of config driven cluster reboots"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded to explain the intended usage/meaning
- Unit tests for the comparision function that cover | ||
- Identical, missing, and invalid MachineConfigs | ||
- Changes to domains for which no exception exists | ||
- Changes to only domains for which a 'do nothing' exception exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no previous explanation of "exceptions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an explanation: the case when unit of configuration can be applied by a means other than the default drain+reboot behaviour
- Retitle to reflect recent discussions and remove references to "dictionary" - Explain the use of the term "exception" in the "Test Plan" section - More precisely state the role of the MCO in applying configuration changes - Explain usage of the term "domain" in "Implementation Details" Signed-off-by: Andrew Beekhof <andrew@beekhof.net>
I think going this way to address users needs on BM especially is a good first step and as @cgwalters pointed out, we can expand on a real API later (that's why we're starting with a minimal set of "things" that won't trigger a reboot). I guess this all boils down to "when do we need this" and it seems the answer is yesterday so this is still the most practical way to account immediate users need. If we had time, as always, I agree with Colin this would have been different and we could have leveraged the API and CRD objects to carefully drive all this. /approve |
Before this enhancement merges I'd like to ensure that:
|
@ashcrow: GitHub didn't allow me to request PR reviews from the following users: beekhof. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @marrusl |
@ashcrow: GitHub didn't allow me to request PR reviews from the following users: marrusl. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ashcrow all valid concerns, responses in-line
The epic associated with this work (MGMT-300) was part of the 4.6 telco epic readout that included Eric and Derek.
Talking to @runcom last night, he described the enhancement as exactly what was in his head but didn't have the opportunity to write down yet. So I think we're on the path of a long term approach even if some details change down the road.
I don't like bringing teams problems without the engineering resources to address them, so my team is absolutely volunteering for all the engineering work on this. This obviously includes regression and e2e tests for CI. The only requirement from the MCO team is code reviews once we refine the PoC. |
@runcom Is there anything still stopping this from being merged? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: beekhof, runcom The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Today MCO defaults to rebooting node in order to apply successfully changes to the node. This is the safest way to apply a change on the node as we don't have to worry about things like which change is safe to skip reboot. In certain environments, rebooting is expensive and due to complex hardware setup sometimes can cause problems and node won't boot up. This will help to avoid rebooting nodes for certain cases where MCO thinks it is safe to do so. See - openshift/enhancements#159
No description provided.