Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct/cg/sd/v1: fix freezeBeforeSet (alt 2) #3166

Merged

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Aug 16, 2021

This is a combo of #3143, #3151, and #3161, which aims to be both fast and correct for all use cases.

From #3143 -- fixes to getting and checking device properties;
From #3151 -- fast(er) path in freezeBeforeSet (in case devices.list is empty we can skip getting unit properties);
From #3161 -- SkipFreezeOnSet knob for kubernetes.
From #3148 -- test cases (those are also cherry-picked in #3143)

Aimed to fix kubernetes/kubernetes#104280 (once we backport, release 1.0.2, and vendor it to kubernetes)

1.0 backport: #3167

@mrunalp
Copy link
Contributor

mrunalp commented Aug 17, 2021

Can we also add the knob on top of this? :)

@kolyshkin kolyshkin force-pushed the fix-freeze-before-set-alt-2 branch 2 times, most recently from dec9ced to a5c822a Compare August 17, 2021 00:41
@kolyshkin
Copy link
Contributor Author

Can we also add the knob on top of this? :)

Done.

I can't say I am the biggest fan of this complex monster we have created, but this seems to cover all the use cases in the best known way.

@kolyshkin kolyshkin force-pushed the fix-freeze-before-set-alt-2 branch from a5c822a to f2d67bc Compare August 17, 2021 00:53
@kolyshkin kolyshkin requested a review from cyphar August 17, 2021 01:59
@mrunalp
Copy link
Contributor

mrunalp commented Aug 17, 2021

I am a 👍 on this approach. @odinuge @AkihiroSuda ptal

@kolyshkin kolyshkin added area/cgroupv1 area/systemd backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 kind/bug labels Aug 17, 2021
@odinuge
Copy link
Contributor

odinuge commented Aug 17, 2021

Yeah, I think this approach is ok overall. 👍

mrunalp
mrunalp previously approved these changes Aug 17, 2021
// hasDeviceRules checks cgroup device rules. Returns true if any rules
// other than "allow all" exist. Used by freezeBeforeSet.
func (m *legacyManager) hasDeviceRules() bool {
const allowAll = "a *:* rwm\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a deny-list rule set, this won't be able to detect it. Unfortunately there isn't a nice solution for this really (there's no real way to figure out what rules are denied if a cgroup is in default-allow mode). If this shortcut really is required, then okay (though this behaviour does at least deserve some kind of comment) but I'd prefer if we didn't have to do this if at all possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a deny-list rule set, this won't be able to detect it.

Are you talking about rules from the spec? Note that this code is only used when we have SkipDevices set.

A long explanation of this commit is below.

If all three conditions below is true

  1. we are setting properties of an existing systemd unit,
  2. we have SkipDevices set (e.g. no device rules to be applied)
  3. devices.list file of unit's cgroup contains nothing but a *:* rwm plus a newline,

then systemd will not be writing any device rules in a manner resulting in spurious "access denied" errors.

This is so because (2) means we are not setting any device rules, and (1) and (3) means no rules were ever set before for this systemd unit. This also relies on assumption that systemd properties are in sync with the cgroup state.

Now, if the above (1) && (2) && (3) is not true, we fall back to the slow path of checking systemd unit properties. This happens e.g. if a parent cgroup has some device access restrictions.

Now, I am more than ready to drop this shortcut, especially since we have SkipFreezeOnSet flag which is a "shorter shortcut" (if I can say so), but I am just trying to understand what am I missing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit removed; PTAL @cyphar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so because (2) means we are not setting any device rules, and (1) and (3) means no rules were ever set before for this systemd unit. This also relies on assumption that systemd properties are in sync with the cgroup state.

The issue is that (3) doesn't necessarily mean that no rules were set.

If you put a cgroup into "deny-list mode" (in other words, you write a *:* rwm to devices.allow and then rules to devices.deny) then devices.list always shows a *:* rwm even if you add deny rules to the cgroup. This is a pretty big anti-feature in cgroupv1 and is one of the reasons there's quite a few hacks in the devicesemulator code to deal with the fact that you can't tell whether a cgroup actually has no rules set if devices.list says the rule set is a *:* rwm.

cyphar
cyphar previously requested changes Aug 18, 2021
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meant to make the last review a "request changes" one

odinuge and others added 3 commits August 18, 2021 12:43
This fixes the behavior intended to avoid freezing containers/control
groups without it being necessary. This is important for end users of
libcontainer who rely on the behavior of no freeze.

The previous implementation would always get error trying to get
DevicePolicy from the Unit via dbus, since the Unit interface doesn't
contain DevicePolicy.

Signed-off-by: Odin Ugedal <odin@uged.al>
Add a test for freezeBeforeSet, checking various scenarios including
those that were failing before the fix in the previous commit.

[v2: add more cases, add a check before creating a unit.]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is helpful to kubernetes in cases it knows for sure that the freeze
is not required (since it created the systemd unit with no device
restrictions).

As the code is trivial, no tests are required.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin requested a review from cyphar August 18, 2021 23:19
@kolyshkin kolyshkin added this to the 1.1.0 milestone Aug 18, 2021
@kolyshkin kolyshkin merged commit 3023e6c into opencontainers:master Aug 19, 2021
@cyphar
Copy link
Member

cyphar commented Aug 20, 2021

(Delayed LGTM.)

@kolyshkin kolyshkin added backport/1.0-done A PR in main branch which has been backported to release-1.0 and removed backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 labels Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cgroupv1 area/go-api libcontainer Go API area/systemd backport/1.0-done A PR in main branch which has been backported to release-1.0 kind/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubelet regularly freeze control groups causing issues further down
5 participants