Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain-specific reboot minimisation enhancement #159

Merged
merged 19 commits into from
Jul 13, 2020
Merged

Domain-specific reboot minimisation enhancement #159

merged 19 commits into from
Jul 13, 2020

Conversation

beekhof
Copy link
Contributor

@beekhof beekhof commented Dec 18, 2019

No description provided.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 18, 2019
@beekhof
Copy link
Contributor Author

beekhof commented Dec 18, 2019

/assign @eparis

@cgwalters
Copy link
Member

AFAICS none of the MCO developers are CC on this enhancement?

See also openshift/machine-config-operator#650

I think rather than phrasing this like "dictionary of file paths" which just sounds odd to me, we should say:

First, the MCD will gain support for applying at least one type of change without rebooting; an excellent target example would be the kubelet maxPods (we may need to restart kubelet, but not the host).

Second, we attempt to generalize and expand that infrastructure.

There's a lot of things that need to be plumbed through in the MCO to handle this at all - most likely we need to better expand our "diff MCs" code so we can filter down on targeted changes better.

One thing I mused about that may help is if we don't try to inspect file paths at all, but only handle the case where a high level CRD like kubeletconfig changes, and reverse-engineer that from the generated MC. Then, responsibility for "live apply" is delegated to the high level controllers. IOW it'd be the kubeletconfigcontroller that knows how to live apply max pods.

@beekhof
Copy link
Contributor Author

beekhof commented Jan 6, 2020

Thanks for the feedback @cgwalters

AFAICS none of the MCO developers are CC on this enhancement?

Odd. I definitely meant to. @runcom has been added

See also openshift/machine-config-operator#650

I think rather than phrasing this like "dictionary of file paths" which just sounds odd to me,

Would "lookup table" be any better?

we should say:

First, the MCD will gain support for applying at least one type of change without rebooting; an excellent target example would be the kubelet maxPods (we may need to restart kubelet, but not the host).

Second, we attempt to generalize and expand that infrastructure.

There's a lot of things that need to be plumbed through in the MCO to handle this at all - most likely we need to better expand our "diff MCs" code so we can filter down on targeted changes better.

That's an interesting alternative to consider.

The intention from Alex was that this would be a minimally invasive approach that could be implemented in the short term to meet telco requirements. He envisioned a more expressive solution in the longer term that defined a higher level syntax that solved the coming Ignition format problem and also, somehow, indicated the requirements for various fields. This sounds broadly inline with your other idea.

Expanding the diff code seems to fall somewhere in the middle.
Should we discuss the two approaches here or in the enhancement?

Call out that this is a short-term enabler that should not preclude a more 
comprehensive approach.
@sdodson
Copy link
Member

sdodson commented Jan 6, 2020

Is it assumed that any change that would necessitate a draining of pods would default to simply rebooting? If we stick with file paths, it seems like we'd want a review process around approving addition of those paths to the dictionary, ie: My change to this file may be safe without a reboot, but Alice's change to the same file requires a reboot, therefore we should not add the file.

@cgwalters
Copy link
Member

Would "lookup table" be any better?

I think the enhancement should be specified at a high level; our goal is to support changes without reboots. "dictionary" and "lookup table" are more potential implementation details.

@beekhof
Copy link
Contributor Author

beekhof commented Jan 6, 2020

Is it assumed that any change that would necessitate a draining of pods would default to simply rebooting? If we stick with file paths, it seems like we'd want a review process around approving addition of those paths to the dictionary, ie: My change to this file may be safe without a reboot, but Alice's change to the same file requires a reboot, therefore we should not add the file.

Agreed. If any exposed part of the file requires a reboot, then any change to the file requires reboot. Opening that particular can of worms is a much larger piece of work and is unlikely to be possible in the required timeframe.

Also agreed that we need to be extremely cautious about which paths get added to the list.

@beekhof
Copy link
Contributor Author

beekhof commented Jan 6, 2020

Would "lookup table" be any better?

I think the enhancement should be specified at a high level; our goal is to support changes without reboots. "dictionary" and "lookup table" are more potential implementation details.

I would amend the goal to be "support some changes without reboots ASAP", hence the simplifying assumption of dealing with file paths instead of individual options.

I will see if I can find a better balance in the PR to write it at a high level but also keep it focused on low hanging fruit rather than a bigger and more comprehensive solution (I am assuming that would be a follow up PR that would in part stem from the experience we gain from this initial implementation).

@crawford
Copy link
Contributor

This looks fine to me. @runcom do you or the rest of the MCO team have any concerns with this?

@cgwalters
Copy link
Member

Is it assumed that any change that would necessitate a draining of pods would default to simply rebooting?

I think this enhancement was filed with an eye towards bare metal, where rebooting can entail whole minutes in the BIOS. So it may be useful to distinguish between draining/restarting kubelet and rebooting.

Comment on lines 98 to 101
After the MCD writes out a configuation change, it will consult the whitelist
before deciding if a reboot is required. If the filename is present, has a
valid action and any associated data, the action will be performed. Otherwise,
a reboot will be performed as in prior 4.x versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was a case where MCD was expected to drain the node even before any configuration change is applied... and here proposal is regarding actions after configuration has been applied.. so it seems like skip drain is not an action that can be supported?? would love to see details wrt to drain before reboot...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing is, if the configuration change (which is usually multiple file/units) has conflicting actions ie some files require reboot, some say skip reboot.. what should be the expected behavior. Would like to see those details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any pointers to additional information around the drain-before-reboot usecase would be great. It would be good to understand the motivation before making any claims about how the two features should interact.

The proposal calls for a whitelist that MCD uses to decide which action to take
after updating a configuration file for the MCO. To prevent the admin from
tampering with the whitelist, it will be included as part of the MCD image as a
static file in yaml format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file going to be versioned, or there is a strict compatibility with the current MCD only?

Copy link
Contributor Author

@beekhof beekhof Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the reason for including it with the MCD was to avoid the need for versioning.
But I don't have strong feelings about it.

Promote the timeout to a top-level field, timeouts can only be trusted if they are enforced by the caller, not callee.
Change the kubelet example to show how drain + restart might be achieved.
Allow the ability to drain independently of any other action.
We don't want teams to re-implement their own each time
@beekhof
Copy link
Contributor Author

beekhof commented Jan 31, 2020

I think this enhancement was filed with an eye towards bare metal, where rebooting can entail whole minutes in the BIOS.

Spot on.

So it may be useful to distinguish between draining/restarting kubelet and rebooting.

I've modified the proposal to allow drain to be specified in addition to whatever other action is appropriate. This avoids duplicating functionality for files requiring special combinations.

Signed-off-by: Andrew Beekhof <andrew@beekhof.net>
@cgwalters
Copy link
Member

OK I only skimmed this update. There was a lot of discussion on this and it's actually yet another of a "important for bare metal, less for cloud" things we have going on (e.g. nmstate, usbguard in extension system, etc).

The lack of visibility into which actions are going to result in reboot is going to be very confusing for admins.

I guess I'd still try to implement this differently were I doing it, but in the end this enhancement as written doesn't preclude changing the implementation model (we're not adding new APIs) so I'll give it a 👍 from that PoV.

You really need the MCO team signoff now I think.

@ashcrow
Copy link
Member

ashcrow commented Jun 23, 2020

cc @runcom

Copy link
Member

@runcom runcom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current iteration of this enhancement looks sane as a first step towards avoiding reboots in special cases. I actually like a modification to the mcd that does produce a list of actions to take in certain scenarios (we can do that but the risk of missing a required reboot because we just didn’t anticipate it could be high, are we fully ok with that? What do we suggest to do in that case?)
My only worry are, as usual, upgrades and I wouldn’t be surprised to find some behavior that we’re relying on today with drain+reboot that could potentially be changed by this.
The enhancement focuses on specific cases mainly so that should alleviate the above I guess, as we reboot anyway during major upgrades (I’m thinking about network upgrades as others already pointed out).

So, assuming we’re not doing ansible again and everything is controlled by us (and mcd), I think this looks good as a first step and doesn’t prevent us from future iterations of a rebootless behavior with other mechanisms (without regressing on what we introduce with this).

## Summary

A node's configuration is managed by the MCO. When the configuation is changed
by the system or an admin, the MCO and MCD write a new version of the file to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCD does the write (nit) the MCO just starts the whole process in particular cases (eg the network config changes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the new wording is more precise :)

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal seems to assume that the reader already fully understands what's being proposed.

@@ -0,0 +1,247 @@
---
title: mcd-reboot-dictionary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "dictionary" does not appear anywhere else in this document. It is unclear what the name "mcd-reboot-dictionary" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was the old title. Will look for any other occurrences too

## Proposal

1. Add a function to MCD that compares two MachineConfigs, and outputs an
ordered list of known actions that are needed to apply it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it know what changes require what actions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of any other information, the function would normally return drain+reboot if there is any difference detected.

The intention is that in very specific cases where an alternative has been proven to be safe (such as drain + service restart for changes that only affect ICSP), the function will modified to return a different set of actions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the answer here would be that we're going to base this on which files changed between 2 rendered machine configs and compute the actions based on that - we can add whatever complexity we want here also. I think the concern has been captured here #159 (comment) (addition vs deletion mainly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I mean, how does it know that modifications to file X require a reboot but modifications to file Y don't? Especially, is this something which is hardcoded into the MCO by us, or is it something configured at runtime by the administrator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I mean, how does it know that modifications to file X require a reboot but modifications to file Y don't?

the MCO only knows that a modification to a file X "doesn't require a reboot" - anything else will fall into "reboot as always"

Especially, is this something which is hardcoded into the MCO by us, or is it something configured at runtime by the administrator?

The aim is to avoid leaking this choice to the admin so this logic is gonna be computed by us in the MCO (and we drive that for future cases where we might want to avoid a reboot)

### Implementation Details/Notes/Constraints [optional]

How the comparision determines what constitutes a domain, whether by target file
or by configuration element, is left for later discussion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what "domain" means here. It seems to mean something much more specific than in, eg, "Domain specific minimisation of config driven cluster reboots"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to explain the intended usage/meaning

- Unit tests for the comparision function that cover
- Identical, missing, and invalid MachineConfigs
- Changes to domains for which no exception exists
- Changes to only domains for which a 'do nothing' exception exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no previous explanation of "exceptions"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanation: the case when unit of configuration can be applied by a means other than the default drain+reboot behaviour

@beekhof beekhof changed the title Create mcd-reboot-dictionary enhancement Domain-specific reboot minimisation enhancement Jun 26, 2020
- Retitle to reflect recent discussions and remove references to "dictionary"
- Explain the use of the term "exception" in the "Test Plan" section
- More precisely state the role of the MCO in applying configuration changes
- Explain usage of the term "domain" in "Implementation Details"

Signed-off-by: Andrew Beekhof <andrew@beekhof.net>
@runcom
Copy link
Member

runcom commented Jun 29, 2020

I guess I'd still try to implement this differently were I doing it, but in the end this enhancement as written doesn't preclude changing the implementation model (we're not adding new APIs) so I'll give it a 👍 from that PoV.

I think going this way to address users needs on BM especially is a good first step and as @cgwalters pointed out, we can expand on a real API later (that's why we're starting with a minimal set of "things" that won't trigger a reboot). I guess this all boils down to "when do we need this" and it seems the answer is yesterday so this is still the most practical way to account immediate users need. If we had time, as always, I agree with Colin this would have been different and we could have leveraged the API and CRD objects to carefully drive all this.

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2020
@ashcrow
Copy link
Member

ashcrow commented Jun 29, 2020

Before this enhancement merges I'd like to ensure that:

  1. @smarterclayton and/or @eparis are comfortable with this being started soon
  2. This isn't just an interim solution, but iteration towards the real feature future state
  3. The MCO team will not be owning development/bugs for this if it's being handled external to the team -- they have committed feature work themselves already at capacity

/cc @runcom @beekhof @imcleod

@openshift-ci-robot
Copy link

@ashcrow: GitHub didn't allow me to request PR reviews from the following users: beekhof.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Before this enhancement merges I'd like to ensure that:

  1. @smarterclayton and/or @eparis are comfortable with this being started soon
  2. This isn't just an interim solution, but iteration towards the real feature future state
  3. The MCO team will not be owning development/bugs for this if it's being handled external to the team -- they have committed feature work themselves already at capacity

/cc @runcom @beekhof @imcleod

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ashcrow
Copy link
Member

ashcrow commented Jun 29, 2020

/cc @marrusl

@openshift-ci-robot
Copy link

@ashcrow: GitHub didn't allow me to request PR reviews from the following users: marrusl.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @marrusl

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@beekhof
Copy link
Contributor Author

beekhof commented Jun 30, 2020

@ashcrow all valid concerns, responses in-line

Before this enhancement merges I'd like to ensure that:

  1. @smarterclayton and/or @eparis are comfortable with this being started soon

The epic associated with this work (MGMT-300) was part of the 4.6 telco epic readout that included Eric and Derek.
I didn't hear anything that sounded remotely like "don't work on this for 4.6" but I'm following up with Eric and Clayton for an explicit ack that I can reference.

  1. This isn't just an interim solution, but iteration towards the real feature future state

Talking to @runcom last night, he described the enhancement as exactly what was in his head but didn't have the opportunity to write down yet. So I think we're on the path of a long term approach even if some details change down the road.

  1. The MCO team will not be owning development/bugs for this if it's being handled external to the team -- they have committed feature work themselves already at capacity

I don't like bringing teams problems without the engineering resources to address them, so my team is absolutely volunteering for all the engineering work on this. This obviously includes regression and e2e tests for CI.
We also discussed QE and docs on the call last night.

The only requirement from the MCO team is code reviews once we refine the PoC.

/cc @runcom @beekhof @imcleod

@beekhof
Copy link
Contributor Author

beekhof commented Jul 10, 2020

@runcom Is there anything still stopping this from being merged?

@runcom
Copy link
Member

runcom commented Jul 13, 2020

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 13, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: beekhof, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit d75f480 into openshift:master Jul 13, 2020
sinnykumari added a commit to sinnykumari/machine-config-operator that referenced this pull request Nov 24, 2020
Today MCO defaults to rebooting node in order to apply successfully
changes to the node. This is the safest way to apply a change on the
node as we don't have to worry about things like which change is safe
to skip reboot.

In certain environments, rebooting is expensive and due to complex
hardware setup sometimes can cause problems and node won't boot up.

This will help to avoid rebooting nodes for certain cases where
MCO thinks it is safe to do so.

See - openshift/enhancements#159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.