Domain-specific reboot minimisation enhancement #159

beekhof · 2019-12-18T04:07:51Z

No description provided.

beekhof · 2019-12-18T04:52:21Z

cgwalters · 2019-12-23T18:35:53Z

AFAICS none of the MCO developers are CC on this enhancement?

See also openshift/machine-config-operator#650

I think rather than phrasing this like "dictionary of file paths" which just sounds odd to me, we should say:

First, the MCD will gain support for applying at least one type of change without rebooting; an excellent target example would be the kubelet maxPods (we may need to restart kubelet, but not the host).

Second, we attempt to generalize and expand that infrastructure.

There's a lot of things that need to be plumbed through in the MCO to handle this at all - most likely we need to better expand our "diff MCs" code so we can filter down on targeted changes better.

One thing I mused about that may help is if we don't try to inspect file paths at all, but only handle the case where a high level CRD like kubeletconfig changes, and reverse-engineer that from the generated MC. Then, responsibility for "live apply" is delegated to the high level controllers. IOW it'd be the kubeletconfigcontroller that knows how to live apply max pods.

beekhof · 2020-01-06T05:52:42Z

Thanks for the feedback @cgwalters

AFAICS none of the MCO developers are CC on this enhancement?

Odd. I definitely meant to. @runcom has been added

See also openshift/machine-config-operator#650

I think rather than phrasing this like "dictionary of file paths" which just sounds odd to me,

Would "lookup table" be any better?

we should say:

First, the MCD will gain support for applying at least one type of change without rebooting; an excellent target example would be the kubelet maxPods (we may need to restart kubelet, but not the host).

Second, we attempt to generalize and expand that infrastructure.

There's a lot of things that need to be plumbed through in the MCO to handle this at all - most likely we need to better expand our "diff MCs" code so we can filter down on targeted changes better.

That's an interesting alternative to consider.

The intention from Alex was that this would be a minimally invasive approach that could be implemented in the short term to meet telco requirements. He envisioned a more expressive solution in the longer term that defined a higher level syntax that solved the coming Ignition format problem and also, somehow, indicated the requirements for various fields. This sounds broadly inline with your other idea.

Expanding the diff code seems to fall somewhere in the middle.
Should we discuss the two approaches here or in the enhancement?

Call out that this is a short-term enabler that should not preclude a more comprehensive approach.

sdodson · 2020-01-06T13:29:55Z

Is it assumed that any change that would necessitate a draining of pods would default to simply rebooting? If we stick with file paths, it seems like we'd want a review process around approving addition of those paths to the dictionary, ie: My change to this file may be safe without a reboot, but Alice's change to the same file requires a reboot, therefore we should not add the file.

cgwalters · 2020-01-06T13:51:48Z

Would "lookup table" be any better?

I think the enhancement should be specified at a high level; our goal is to support changes without reboots. "dictionary" and "lookup table" are more potential implementation details.

beekhof · 2020-01-06T23:37:52Z

Is it assumed that any change that would necessitate a draining of pods would default to simply rebooting? If we stick with file paths, it seems like we'd want a review process around approving addition of those paths to the dictionary, ie: My change to this file may be safe without a reboot, but Alice's change to the same file requires a reboot, therefore we should not add the file.

Agreed. If any exposed part of the file requires a reboot, then any change to the file requires reboot. Opening that particular can of worms is a much larger piece of work and is unlikely to be possible in the required timeframe.

Also agreed that we need to be extremely cautious about which paths get added to the list.

beekhof · 2020-01-06T23:50:50Z

Would "lookup table" be any better?

I think the enhancement should be specified at a high level; our goal is to support changes without reboots. "dictionary" and "lookup table" are more potential implementation details.

I would amend the goal to be "support some changes without reboots ASAP", hence the simplifying assumption of dealing with file paths instead of individual options.

I will see if I can find a better balance in the PR to write it at a high level but also keep it focused on low hanging fruit rather than a bigger and more comprehensive solution (I am assuming that would be a follow up PR that would in part stem from the experience we gain from this initial implementation).

crawford · 2020-01-27T23:35:44Z

This looks fine to me. @runcom do you or the rest of the MCO team have any concerns with this?

cgwalters · 2020-01-28T09:30:33Z

Is it assumed that any change that would necessitate a draining of pods would default to simply rebooting?

I think this enhancement was filed with an eye towards bare metal, where rebooting can entail whole minutes in the BIOS. So it may be useful to distinguish between draining/restarting kubelet and rebooting.

enhancements/mcd-reboot-dictionary.md

abhinavdahiya · 2020-01-30T23:20:38Z

enhancements/mcd-reboot-dictionary.md

+After the MCD writes out a configuation change, it will consult the whitelist
+before deciding if a reboot is required. If the filename is present, has a
+valid action and any associated data, the action will be performed.  Otherwise,
+a reboot will be performed as in prior 4.x versions.


I think there was a case where MCD was expected to drain the node even before any configuration change is applied... and here proposal is regarding actions after configuration has been applied.. so it seems like skip drain is not an action that can be supported?? would love to see details wrt to drain before reboot...

Another thing is, if the configuration change (which is usually multiple file/units) has conflicting actions ie some files require reboot, some say skip reboot.. what should be the expected behavior. Would like to see those details.

Any pointers to additional information around the drain-before-reboot usecase would be great. It would be good to understand the motivation before making any claims about how the two features should interact.

enhancements/mcd-reboot-dictionary.md

abhinavdahiya · 2020-01-30T23:31:23Z

enhancements/mcd-reboot-dictionary.md

+The proposal calls for a whitelist that MCD uses to decide which action to take
+after updating a configuration file for the MCO.  To prevent the admin from
+tampering with the whitelist, it will be included as part of the MCD image as a
+static file in yaml format.


Is this file going to be versioned, or there is a strict compatibility with the current MCD only?

Part of the reason for including it with the MCD was to avoid the need for versioning.
But I don't have strong feelings about it.

enhancements/mcd-reboot-dictionary.md

Promote the timeout to a top-level field, timeouts can only be trusted if they are enforced by the caller, not callee. Change the kubelet example to show how drain + restart might be achieved.

Allow the ability to drain independently of any other action. We don't want teams to re-implement their own each time

beekhof · 2020-01-31T02:45:22Z

I think this enhancement was filed with an eye towards bare metal, where rebooting can entail whole minutes in the BIOS.

Spot on.

So it may be useful to distinguish between draining/restarting kubelet and rebooting.

I've modified the proposal to allow drain to be specified in addition to whatever other action is appropriate. This avoids duplicating functionality for files requiring special combinations.

Signed-off-by: Andrew Beekhof <andrew@beekhof.net>

cgwalters · 2020-06-23T13:15:35Z

OK I only skimmed this update. There was a lot of discussion on this and it's actually yet another of a "important for bare metal, less for cloud" things we have going on (e.g. nmstate, usbguard in extension system, etc).

The lack of visibility into which actions are going to result in reboot is going to be very confusing for admins.

I guess I'd still try to implement this differently were I doing it, but in the end this enhancement as written doesn't preclude changing the implementation model (we're not adding new APIs) so I'll give it a 👍 from that PoV.

You really need the MCO team signoff now I think.

ashcrow · 2020-06-23T14:56:41Z

cc @runcom

runcom

I think the current iteration of this enhancement looks sane as a first step towards avoiding reboots in special cases. I actually like a modification to the mcd that does produce a list of actions to take in certain scenarios (we can do that but the risk of missing a required reboot because we just didn’t anticipate it could be high, are we fully ok with that? What do we suggest to do in that case?)
My only worry are, as usual, upgrades and I wouldn’t be surprised to find some behavior that we’re relying on today with drain+reboot that could potentially be changed by this.
The enhancement focuses on specific cases mainly so that should alleviate the above I guess, as we reboot anyway during major upgrades (I’m thinking about network upgrades as others already pointed out).

So, assuming we’re not doing ansible again and everything is controlled by us (and mcd), I think this looks good as a first step and doesn’t prevent us from future iterations of a rebootless behavior with other mechanisms (without regressing on what we introduce with this).

runcom · 2020-06-25T07:22:17Z

enhancements/mcd-reboot-exceptions.md

+## Summary
+
+A node's configuration is managed by the MCO.  When the configuation is changed
+by the system or an admin, the MCO and MCD write a new version of the file to


The MCD does the write (nit) the MCO just starts the whole process in particular cases (eg the network config changes)

Hopefully the new wording is more precise :)

danwinship

This proposal seems to assume that the reader already fully understands what's being proposed.

danwinship · 2020-06-25T10:34:45Z

enhancements/mcd-reboot-exceptions.md

@@ -0,0 +1,247 @@
+---
+title: mcd-reboot-dictionary


The word "dictionary" does not appear anywhere else in this document. It is unclear what the name "mcd-reboot-dictionary" means.

Oh, that was the old title. Will look for any other occurrences too

danwinship · 2020-06-25T10:38:57Z

enhancements/mcd-reboot-exceptions.md

+## Proposal
+
+1. Add a function to MCD that compares two MachineConfigs, and outputs an
+   ordered list of known actions that are needed to apply it


How does it know what changes require what actions?

In the absence of any other information, the function would normally return drain+reboot if there is any difference detected.

The intention is that in very specific cases where an alternative has been proven to be safe (such as drain + service restart for changes that only affect ICSP), the function will modified to return a different set of actions.

I guess the answer here would be that we're going to base this on which files changed between 2 rendered machine configs and compute the actions based on that - we can add whatever complexity we want here also. I think the concern has been captured here #159 (comment) (addition vs deletion mainly)

But I mean, how does it know that modifications to file X require a reboot but modifications to file Y don't? Especially, is this something which is hardcoded into the MCO by us, or is it something configured at runtime by the administrator?

But I mean, how does it know that modifications to file X require a reboot but modifications to file Y don't?

the MCO only knows that a modification to a file X "doesn't require a reboot" - anything else will fall into "reboot as always"

Especially, is this something which is hardcoded into the MCO by us, or is it something configured at runtime by the administrator?

The aim is to avoid leaking this choice to the admin so this logic is gonna be computed by us in the MCO (and we drive that for future cases where we might want to avoid a reboot)

danwinship · 2020-06-25T10:41:48Z

enhancements/mcd-reboot-exceptions.md

+### Implementation Details/Notes/Constraints [optional]
+
+How the comparision determines what constitutes a domain, whether by target file
+or by configuration element, is left for later discussion.


It is not clear what "domain" means here. It seems to mean something much more specific than in, eg, "Domain specific minimisation of config driven cluster reboots"

Reworded to explain the intended usage/meaning

danwinship · 2020-06-25T10:46:01Z

enhancements/mcd-reboot-exceptions.md

+- Unit tests for the comparision function that cover
+  - Identical, missing, and invalid MachineConfigs
+  - Changes to domains for which no exception exists
+  - Changes to only domains for which a 'do nothing' exception exists


There's no previous explanation of "exceptions"

Added an explanation: the case when unit of configuration can be applied by a means other than the default drain+reboot behaviour

- Retitle to reflect recent discussions and remove references to "dictionary" - Explain the use of the term "exception" in the "Test Plan" section - More precisely state the role of the MCO in applying configuration changes - Explain usage of the term "domain" in "Implementation Details" Signed-off-by: Andrew Beekhof <andrew@beekhof.net>

runcom · 2020-06-29T12:42:35Z

I guess I'd still try to implement this differently were I doing it, but in the end this enhancement as written doesn't preclude changing the implementation model (we're not adding new APIs) so I'll give it a 👍 from that PoV.

I think going this way to address users needs on BM especially is a good first step and as @cgwalters pointed out, we can expand on a real API later (that's why we're starting with a minimal set of "things" that won't trigger a reboot). I guess this all boils down to "when do we need this" and it seems the answer is yesterday so this is still the most practical way to account immediate users need. If we had time, as always, I agree with Colin this would have been different and we could have leveraged the API and CRD objects to carefully drive all this.

/approve

ashcrow · 2020-06-29T14:13:32Z

Before this enhancement merges I'd like to ensure that:

@smarterclayton and/or @eparis are comfortable with this being started soon
This isn't just an interim solution, but iteration towards the real feature future state
The MCO team will not be owning development/bugs for this if it's being handled external to the team -- they have committed feature work themselves already at capacity

/cc @runcom @beekhof @imcleod

openshift-ci-robot · 2020-06-29T14:13:34Z

@ashcrow: GitHub didn't allow me to request PR reviews from the following users: beekhof.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Before this enhancement merges I'd like to ensure that:

@smarterclayton and/or @eparis are comfortable with this being started soon

This isn't just an interim solution, but iteration towards the real feature future state

The MCO team will not be owning development/bugs for this if it's being handled external to the team -- they have committed feature work themselves already at capacity

/cc @runcom @beekhof @imcleod

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ashcrow · 2020-06-29T14:49:36Z

/cc @marrusl

openshift-ci-robot · 2020-06-29T14:49:37Z

@ashcrow: GitHub didn't allow me to request PR reviews from the following users: marrusl.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @marrusl

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

beekhof · 2020-06-30T07:10:38Z

@ashcrow all valid concerns, responses in-line

Before this enhancement merges I'd like to ensure that:

@smarterclayton and/or @eparis are comfortable with this being started soon

The epic associated with this work (MGMT-300) was part of the 4.6 telco epic readout that included Eric and Derek.
I didn't hear anything that sounded remotely like "don't work on this for 4.6" but I'm following up with Eric and Clayton for an explicit ack that I can reference.

This isn't just an interim solution, but iteration towards the real feature future state

Talking to @runcom last night, he described the enhancement as exactly what was in his head but didn't have the opportunity to write down yet. So I think we're on the path of a long term approach even if some details change down the road.

The MCO team will not be owning development/bugs for this if it's being handled external to the team -- they have committed feature work themselves already at capacity

I don't like bringing teams problems without the engineering resources to address them, so my team is absolutely volunteering for all the engineering work on this. This obviously includes regression and e2e tests for CI.
We also discussed QE and docs on the call last night.

The only requirement from the MCO team is code reviews once we refine the PoC.

/cc @runcom @beekhof @imcleod

beekhof · 2020-07-10T06:20:26Z

@runcom Is there anything still stopping this from being merged?

runcom · 2020-07-13T11:37:31Z

/approve
/lgtm

openshift-ci-robot · 2020-07-13T11:37:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: beekhof, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Today MCO defaults to rebooting node in order to apply successfully changes to the node. This is the safest way to apply a change on the node as we don't have to worry about things like which change is safe to skip reboot. In certain environments, rebooting is expensive and due to complex hardware setup sometimes can cause problems and node won't boot up. This will help to avoid rebooting nodes for certain cases where MCO thinks it is safe to do so. See - openshift/enhancements#159

Create mcd-reboot-dictionary enhancement

b45a0a7

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 18, 2019

openshift-ci-robot requested review from knobunc and tbielawa December 18, 2019 04:08

openshift-ci-robot assigned eparis Dec 18, 2019

cgwalters mentioned this pull request Dec 23, 2019

Optimize reboots openshift/machine-config-operator#650

Closed

Add MCO lead to the review list

d4f4aa8

Document alternatives

586c25c

Call out that this is a short-term enabler that should not preclude a more comprehensive approach.

abhinavdahiya reviewed Jan 30, 2020

View reviewed changes

enhancements/mcd-reboot-dictionary.md Outdated Show resolved Hide resolved

abhinavdahiya reviewed Jan 30, 2020

View reviewed changes

enhancements/mcd-reboot-dictionary.md Outdated Show resolved Hide resolved

abhinavdahiya reviewed Jan 30, 2020

View reviewed changes

enhancements/mcd-reboot-dictionary.md Outdated Show resolved Hide resolved

abhinavdahiya reviewed Jan 30, 2020

View reviewed changes

enhancements/mcd-reboot-dictionary.md Outdated Show resolved Hide resolved

beekhof added 5 commits January 31, 2020 10:52

Update mcd-reboot-dictionary.md

a845980

Provide concrete examples and details

431d636

Update examples

a9c311b

Promote the timeout to a top-level field, timeouts can only be trusted if they are enforced by the caller, not callee. Change the kubelet example to show how drain + restart might be achieved.

Fix table

1bffd24

Create a drain required field

b27f725

Allow the ability to drain independently of any other action. We don't want teams to re-implement their own each time

Drop externally exposed timeouts

dc4b140

Remove remaining references to whitelists. Formating.

05b5af7

Signed-off-by: Andrew Beekhof <andrew@beekhof.net>

runcom reviewed Jun 25, 2020

View reviewed changes

danwinship suggested changes Jun 25, 2020

View reviewed changes

beekhof changed the title ~~Create mcd-reboot-dictionary enhancement~~ Domain-specific reboot minimisation enhancement Jun 26, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2020

openshift-ci-robot requested a review from runcom June 29, 2020 14:13

openshift-ci-robot requested a review from imcleod June 29, 2020 14:13

cgwalters mentioned this pull request Jul 2, 2020

Bug 1850057: stage OS updates (nicely) while etcd is still running openshift/machine-config-operator#1897

Closed

openshift-ci-robot assigned runcom Jul 13, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 13, 2020

openshift-merge-robot merged commit d75f480 into openshift:master Jul 13, 2020

beekhof mentioned this pull request Jul 13, 2020

Avoid reboots for ICSP changes openshift/machine-config-operator#1922

Closed

beekhof mentioned this pull request Jul 28, 2020

Prepare for rebooting to be an optional part of applying updates openshift/machine-config-operator#1953

Closed

danwinship mentioned this pull request Jul 30, 2020

[machine-config-operator/baremetal] MCO declarative network configuration #399

Merged

sinnykumari mentioned this pull request Nov 24, 2020

daemon: skip or perform node reboot based on rebootAction openshift/machine-config-operator#2254

Closed

corburn mentioned this pull request Mar 22, 2021

GPG Public Key Rotation Reboot Exception openshift/machine-config-operator#2481

Closed

Domain-specific reboot minimisation enhancement #159

Domain-specific reboot minimisation enhancement #159

Conversation

beekhof commented Dec 18, 2019

beekhof commented Dec 18, 2019

cgwalters commented Dec 23, 2019

beekhof commented Jan 6, 2020

sdodson commented Jan 6, 2020

cgwalters commented Jan 6, 2020

beekhof commented Jan 6, 2020

beekhof commented Jan 6, 2020

crawford commented Jan 27, 2020

cgwalters commented Jan 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beekhof Jan 30, 2020 • edited Loading

Choose a reason for hiding this comment

beekhof commented Jan 31, 2020

cgwalters commented Jun 23, 2020

ashcrow commented Jun 23, 2020

runcom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

runcom commented Jun 29, 2020

ashcrow commented Jun 29, 2020

openshift-ci-robot commented Jun 29, 2020

ashcrow commented Jun 29, 2020

openshift-ci-robot commented Jun 29, 2020

beekhof commented Jun 30, 2020

beekhof commented Jul 10, 2020

runcom commented Jul 13, 2020

openshift-ci-robot commented Jul 13, 2020

beekhof Jan 30, 2020 •

edited

Loading