Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] libct/cg/sd/v1: Set: don't leave cgroup frozen #3072

Closed
wants to merge 2 commits into from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jul 7, 2021

This is the alternative to #3065 with a different approach and the same test case.

I. In case a parent cgroup is frozen, the current code reads the cgroup
state (into targetFreezerState) as FROZEN, and sets it explicitly after
setting unit properties. This is obviously wrong, as we should not
explicitly freeze the cgroup because its parent was frozen.

The issue happens because:

  1. The m.GetFreezerState method does not distinguish between
    the "self frozen" and "parent frozen" states (i.e. it does not
    consult freezer.self_freezing).

  2. The m.Freeze method changes m.cgroups.Resources.Frozen field.

II. The current code does freeze/thaw unconditionally, meaning it
freezes the already frozen cgroup (which is a no-op except it
still requires some reads and writes to freezer.state), and
thaws the cgroup which is about to be frozen.

Solve all this by figuring out whether we need to freeze and thaw,
and introducing and using a method which does not change value of
m.cgroups.Resources.Frozen.

Add a test case (initially developed in #3065 (comment), and tested to fail without the fix in #3066).

NOTE the current code assume that the container that is currently frozen
will remain frozen for the duration of SetUnitProperties. This might not
be true but there's no way to guarantee it (even in case we freeze the
cgroup ourselves).

kolyshkin and others added 2 commits July 7, 2021 15:35
I. In case a parent cgroup is frozen, the current code reads the cgroup
   state (into targetFreezerState) as FROZEN, and sets it explicitly after
   setting unit properties. This is obviously wrong, as we should not
   explicitly freeze the cgroup because its parent was frozen.

   The issue happens because:

   1. The m.GetFreezerState method does not distinguish between
      "self frozen" and "parent frozen" states (i.e. it does not
      consult freezer.self_freezing).

   2. The m.Freeze method changes m.cgroups.Resources.Frozen field.

II. The current code does freeze/thaw unconditionally, meaning it
    freezes the already frozen cgroup (which is a no-op except it
    still requires some reads and writes to freezer.state), and
    thaws the cgroup which is about to be frozen.

Solve all this by figuring out whether we need to freeze and thaw,
and introducing and using a method which does not change value of
m.cgroups.Resources.Frozen.

NOTE the current code assume that the container that is currently frozen
will remain frozen for the duration of SetUnitProperties. This might not
be true but there's no way to guarantee it (even in case we freeze the
cgroup ourselves).

Reported-by: Odin Ugedal <odin@uged.al>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Initial test work was done by Kir Kolyshkin (@kolyshkin).

Co-Authored-By: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
if targetFreezerState == configs.Undefined {
targetFreezerState = configs.Thawed
needsFreeze := true
if freezerState == configs.Frozen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could result in a race between the freezing of the current cgroup and one of its ancestors.

If one ancestor is FROZEN, this will not freeze the control group. If the parent is thowed before systemd does its device update sequence, things will not work as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does also not fix the difference between cgroup v1 and v2.

If the parent control group is frozen, and the control group itself is not, GetFreezerState will return FROZEN on v1 and THAWED on v2. I don't think the naming or definition is important, other than the fact that it works the same way on both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that make sense to you @kolyshkin and @cyphar?

Copy link
Contributor

@odinuge odinuge Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But overall, I agree that always thawing the cgroup is a valid way to do it. If people want to freeze it.

However, doing:

runc pause container
runc update container
# Contianer will now still be paused
runc resume container

now works as expected, but with your pr it will just resume the container on update (or)?

edit: that should work as expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could result in a race between the freezing of the current cgroup and one of its ancestors.

You are right, and I have anticipated that -- this is noted in the commit and PR description.

t.Fatalf("expected container cgroup path %q to be under pod cgroup path %q",
cm.Path("freezer"), pm.Path("freezer"))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the definition of GetFreezerState, we should do a check of its output here as well, to make sure it works the same on cgroup v1 and v2.

	cf, err := cm.GetFreezerState()
	if err != nil {
		t.Fatal(err)
	}
	if cf != configs.Thawed {
		t.Fatalf("expected container to be thawed, got %v", cf)
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; will add.

@kolyshkin
Copy link
Contributor Author

Closing in favor of #3080 which carries this one.

@kolyshkin kolyshkin closed this Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants