Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubens.service, drop-ins, and kubensenter prefix to kubelet.service #3274

Merged
merged 2 commits into from
Aug 26, 2022

Conversation

lack
Copy link
Member

@lack lack commented Jul 29, 2022

- What I did

In support of openshift enhancement https://github.com/openshift/enhancements/blob/master/enhancements/hide-container-mountpoints.md

This comes in 2 parts:

  • One prefixes the kubelet.service files so they start via the new 'kubensenter' script.
  • The other adds the new 'kubens.service' and related drop-ins for kubelet and cri-o.

The new 'kubens.service' that drives this new feature is disabled by default, so should be safe to go in as soon as the openshift-hyperkube RPM lands which includes the new 'kubensenter' script required for the 1st part. (openshift/kubernetes#1327)

/hold until openshift-hyperkube arrives with the new kubensenter script.

- How to verify it

This cannot be mergedf until the new kubensenter script arrives via an updated openshift-hyperkube, but enabling of this feature also requires that a CRI-O change (cri-o/cri-o#5974) that will arrive once RHCOS includes CRI-O 1.25.

Once both prerequisites are in place, this feature should be verified as follows:

  • Initially this must have NO EFFECT unless enabled. Systemd, kubelet, and cri-o are all in the same mount namespace as usual:
[root@node ~]# readlink /proc/1/ns/mnt
mnt:[4026531840]
[root@node ~]# readlink /proc/$(pgrep kubelet)/ns/mnt
mnt:[4026531840]
[root@node ~]# readlink /proc/$(pgrep crio)/ns/mnt
mnt:[4026531840]

Note all 3 are identical

  • Enabling the feature can be done by injecting a single MachineConfig object that enables the new kubens.service systemd service. If this is done, both kubelet and CRI-O should be in a different namespace that systemd, but the system should otherwise behave normally:

With the feature enabled:

[root@node ~]# readlink /proc/1/ns/mnt
mnt:[4026531840]
[root@node ~]# readlink /proc/$(pgrep kubelet)/ns/mnt
mnt:[4026534696]
[root@node ~]# readlink /proc/$(pgrep crio)/ns/mnt
mnt:[4026534696]

Note that kubelet and crio are identical, but unique from pid 1

- Description for the changelog

Adds a new optional systemd service called kubens.service which can be enabled to segregate all Kubernetes-specific mount points into a new mount namespace separate from the host OS and systemd.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 29, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 29, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@lack lack marked this pull request as ready for review July 29, 2022 20:26
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2022
@openshift-ci openshift-ci bot requested a review from jkyros July 29, 2022 20:28
@lack
Copy link
Member Author

lack commented Jul 30, 2022

I expect these tests to fail until we get builds with the updated openshift-hyperkube RPM.

@lack
Copy link
Member Author

lack commented Aug 1, 2022

/retest

@lack
Copy link
Member Author

lack commented Aug 1, 2022

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2022
@lack
Copy link
Member Author

lack commented Aug 1, 2022

/retest

@lack
Copy link
Member Author

lack commented Aug 2, 2022

/retest-required

1 similar comment
@lack
Copy link
Member Author

lack commented Aug 4, 2022

/retest-required

@lack lack force-pushed the kubens-service-combined branch from e9cd9bf to f1d6fd0 Compare August 5, 2022 19:25
@sinnykumari
Copy link
Contributor

This PR has changes more specific to kubelet and crio, so will be good to get reviewed by node team first to ensure that it doesn't break anything.

Looking briefly at the changes, don't see any concern from MCO side. Will do another pass once LGTM'ed from node team side.

/assign @rphillips @haircommander

@sinnykumari
Copy link
Contributor

/retest

@haircommander
Copy link
Member

it may benefit us to put some work into making sure https://bugzilla.redhat.com/show_bug.cgi?id=2057618 is not due to the mount namespace before going forward with this. it's been assigned to me but I've not been able to give it enough attention

@lack
Copy link
Member Author

lack commented Aug 11, 2022

it may benefit us to put some work into making sure https://bugzilla.redhat.com/show_bug.cgi?id=2057618 is not due to the mount namespace before going forward with this. it's been assigned to me but I've not been able to give it enough attention

I can take a peek at it, to see if I can find it it's related to this feature.

@lack
Copy link
Member Author

lack commented Aug 17, 2022

FYI: CRI-O 1.25 has now landed in RHCOS as of Build 412.86.202208032219-0

This means that it should now be possible to spin up a cluster with that RHCOS and this PR and do actual end-to-end testing with the resulting system by enabling the 'kubens.service' in systemd.

@lack
Copy link
Member Author

lack commented Aug 17, 2022

/retest

@haircommander
Copy link
Member

/approve

@lack did some investigation trying to see if the bug I attached could be related. Because he did not find anything, and my hunch is only that, I am okay moving forward with this

@lack lack force-pushed the kubens-service-combined branch from f1d6fd0 to e4e3ca6 Compare August 17, 2022 19:13
@lack
Copy link
Member Author

lack commented Aug 18, 2022

I have done some preliminary testing via ClusterBot, building this PR. I have verified:

  • Without any action taken, the cluster comes up fine, and systemd/CRI-O/Kubelet are all in the same namespace
sh-4.4# readlink /proc/1/ns/mnt
mnt:[4026531840]
sh-4.4# readlink /proc/$(pgrep kubelet)/ns/mnt
mnt:[4026531840]
sh-4.4# readlink /proc/$(pgrep crio)/ns/mnt
mnt:[4026531840]
  • When I add a MachineConfig to enable kubens.service, the cluster comes up fine, but now with CRI-O and Kubelet in their own mount namespace separate from systemd:
sh-4.4# readlink /proc/1/ns/mnt
mnt:[4026531840]
sh-4.4# readlink /proc/$(pgrep kubelet)/ns/mnt
mnt:[4026532278]
sh-4.4# readlink /proc/$(pgrep crio)/ns/mnt
mnt:[4026532278]

@rphillips
Copy link
Contributor

/retest
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2022
@lack
Copy link
Member Author

lack commented Aug 23, 2022

Side note: pretty sure this would have entirely prevented https://bugzilla.redhat.com/show_bug.cgi?id=2111817

Sounds likely! This is why telco likes it especially for SNO - systemd CPU usage goes way down! And why we want it for everyone; who doesn't want a couple more CPU cycles available for workloads?

@lack lack force-pushed the kubens-service-combined branch from cd543c8 to 50f0bff Compare August 23, 2022 16:18
@lack
Copy link
Member Author

lack commented Aug 23, 2022

/hold until openshift/kubernetes#1350 goes in so we can remove the NotifyAccess=all line from kubelet.service

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 23, 2022
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2022
lack added 2 commits August 25, 2022 11:03
This imports the systemd components from the vendored
github.com/containers/kubensmnt and keeps them in-sync via `make update`
and `make verify` (thanks to the new `hack/update-templates.sh` script)

Signed-off-by: Jim Ramsay <jramsay@redhat.com>
To be upgrade-compatible with RHEL workers (where the MC may take effect
before the openshift-hyperkube RPM is upgraded, which is what installs
the new 'kubensenter' script), as well as to be backwards compatible
with older openshift-hyperkube RPMs or other potential edge cases, we
wrap kubelet's execution in a temporary wrapper script which will
execute kubebnsenter if it is available on the system as the service
starts up.

Signed-off-by: Jim Ramsay <jramsay@redhat.com>
@lack lack force-pushed the kubens-service-combined branch from 50f0bff to d8f42ae Compare August 25, 2022 15:04
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2022
@lack
Copy link
Member Author

lack commented Aug 25, 2022

The CI checks are going to fail until openshift/kubernetes#1350 goes in and the change makes its way into an RHCOS nightly. Once that's done I'll remove the hold and retest, and we should be good to go!

@lack
Copy link
Member Author

lack commented Aug 26, 2022

/unhold
/retest-required

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 26, 2022
@lack
Copy link
Member Author

lack commented Aug 26, 2022

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 26, 2022

@lack: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack d8f42ae link false /test e2e-openstack

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -0,0 +1,18 @@
# Autogenerated by hack/update-templates.sh; do not edit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this being here mean we'll have the kubens enabled by default? is that intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drop-in does nothing except set the environment variable that will be laid down by kubens.service if it's running, so it's safe fro this to be present all the time.

The kubens.service itself is not being enabled by default (see https://github.com/openshift/machine-config-operator/pull/3274/files/d8f42ae7a8e66970a5fcbdaac6fd8712d1466ba7#diff-c38a0be931e5bc991f959879c6af89aee96be94967b4544f63ace1d52e5f88ccR3)

# Note: This compatibility wrapper is needed to bridge the gap from OCP 4.11->4.12 when the new 'kubensenter' script is being introduced.
# It can be removed (and the kubelet.service should call kubensenter directly) when 4.11 is no longer a release we must upgrade from.
mode: 0755
path: "/usr/local/bin/kubenswrapper"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why /usr/local instead of /usr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignition can't write to /usr. See #3137

@haircommander
Copy link
Member

LGTM

@cgwalters
Copy link
Member

From my PoV
/approve
Though one more question - is a machineconfig enabling this going to be added to one of the e2e tests or periodics?

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 26, 2022
@sakhoury
Copy link

From my PoV /approve Though one more question - is a machineconfig enabling this going to be added to one of the e2e tests or periodics?

@cgwalters it is tested in an e2e periodic openshift/release#31715

@haircommander
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 26, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, haircommander, lack, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 6704eef into openshift:master Aug 26, 2022
@lack lack deleted the kubens-service-combined branch August 26, 2022 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants