cgroup: fix --manage-cgroups=ignore #1800

adrianreber · 2022-04-03T16:55:33Z

Using --manage-cgroups=ignore was still trying to restore cgroup properties at certain places.

Restoring a OCI container in a different cgroup (than the one it was checkpointed from) resulted in CRIU trying to put the container in the original cgroup but the container runtime (runc, crun) was putting it another cgroup. A second checkpoint of the container with two different cgroup settings did no longer work.

With this change CRIU really completely ignores the cgroup and the container can be checkpointed and restored multiple times even if the cgroup changes.

This PR includes a test to verify --manage-cgroups=ignore' works as expected.

codecov-commenter · 2022-04-03T17:16:07Z

Codecov Report

Merging #1800 (0772605) into criu-dev (5b2e585) will increase coverage by 0.05%.
The diff coverage is 100.00%.

❗ Current head 0772605 differs from pull request most recent head 6d0b0b2. Consider uploading reports for the commit 6d0b0b2 to get more accurate results

@@             Coverage Diff              @@
##           criu-dev    #1800      +/-   ##
============================================
+ Coverage     68.90%   68.96%   +0.05%     
============================================
  Files           128      128              
  Lines         33191    33194       +3     
============================================
+ Hits          22871    22892      +21     
+ Misses        10320    10302      -18

Impacted Files	Coverage Δ
criu/cgroup.c	`77.23% <100.00%> (+0.05%)`	⬆️
compel/src/lib/infect.c	`56.10% <0.00%> (+0.38%)`	⬆️
criu/uffd.c	`78.88% <0.00%> (+2.41%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b2e585...6d0b0b2. Read the comment docs.

criu/cgroup.c

avagin · 2022-04-04T16:24:17Z

I don't agree that runsc or crun has to use --manage-cgroups=ignore. A container can have a local hierarchy that we need to restore. I think a proper use of --cgroup-root should fix the origin problem.

test/zdtm/static/cgroup_ignore.c

avagin · 2022-04-04T16:30:08Z

You need to add a commit message for the test. It has to describe what test does.

rst0git

LGTM

I can confirm that the patch in this PR fixes containers/podman#13672

adrianreber · 2022-04-05T14:17:19Z

I don't agree that runsc or crun has to use --manage-cgroups=ignore.

Currently with the use of --manage-cgroups=soft we are using something which restores the container in the wrong cgroup. With ignore it works correctly.

A container can have a lock hierarchy that we need to restore.

Sorry, I don't understand this.

I think a proper use of --cgroup-root should fix the origin problem.

Currently the container runtimes are moving the restored process into the cgroup after restore and that seems to work.

avagin · 2022-04-05T14:28:11Z

Sorry, I don't understand this.

It was my typo. I mean it can have a local hierarchy. Do you mount cgroups into a container? Can processes in a container create new cgroups?

avagin · 2022-04-05T14:29:08Z

I think a proper use of --cgroup-root should fix the origin problem.

Currently the container runtimes are moving the restored process into the cgroup after restore and that seems to work.

This works only in cases when processes in a container don't create cgroups...

test/others/rpc/Makefile

adrianreber · 2022-04-05T14:36:17Z

Sorry, I don't understand this.

It was my type. I mean it can have a local hierarchy. Do you mount cgroups into a container? Can processes in a container create new cgroups?

Probably yes, but so far no one seems to be using it in OCI containers, because until now I have not seen any reports of people complaining about non working cgroups inside of the container.

During run in Podman the cgroup limit is specified. During restore this limit is again set in the cgroup and the container is restored into that newly created cgroup and the resulting container has the same limits as the original containers.

cgroup limits which are set during runtime of the container using exec are not preserved this way, but I am not sure anybody really uses it in OCI containers.

adrianreber · 2022-04-05T15:07:28Z

Anyway, the fix for CRIU is to correctly work if ignore is specified. Currently CRIU does not correctly handle ignore.

What the container engine does is not really important here.

avagin · 2022-04-05T15:56:39Z

Anyway, the fix for CRIU is to correctly work if ignore is specified. Currently CRIU does not correctly handle ignore.

It works how it was designed by its author. It doesn't restore a cgroup tree, but it moves tasks in their cgroups. This behaviour was introduced in commit f95b05e:

Date:   Wed Aug 6 22:23:00 2014 +0400

    opts: add --manage-cgroups option

    criu managed cgroups is now an opt-in thing, so by default criu does not manage
    (i.e. dump or restore) cgroups. This allows users to use the previous behavior.

Do we have uses that use this behaviour and whom new changes will break? I think the chance of breaking someone is low.

adrianreber · 2022-04-05T16:07:59Z

Anyway, the fix for CRIU is to correctly work if ignore is specified. Currently CRIU does not correctly handle ignore.

It works how it was designed by its author. It doesn't restore a cgroup tree, but it moves tasks in their cgroups. This behaviour was introduced in commit f95b05e:
Date:   Wed Aug 6 22:23:00 2014 +0400

    opts: add --manage-cgroups option

    criu managed cgroups is now an opt-in thing, so by default criu does not manage
    (i.e. dump or restore) cgroups. This allows users to use the previous behavior.

No, I do not agree. According to the man page ignore should Don’t deal with cgroups and pretend that they don’t exist. and if I restore a process with ignore that does not happen:

# sleep infinity &
# cat /proc/`pidof sleep`/cgroup
0::/user.slice/user-0.slice/session-1.scope
# mkdir /sys/fs/cgroup/test1
# echo `pidof sleep` > /sys/fs/cgroup/test1/cgroup.procs 
# cat /proc/`pidof sleep`/cgroup
0::/test1
# criu dump -t `pidof sleep` -j -D /tmp/2 --manage-cgroups=ignore
# criu restore -j -D /tmp/2 -v4 -d --manage-cgroup=ignore
(00.000005) Version: 3.16.1 (gitid 0)
(00.000037) Running on fedora02 Linux 5.16.18-200.fc35.x86_64 #1 SMP PREEMPT Mon Mar 28 14:10:07 UTC 2022 x86_64
(00.000130) Loaded kdat cache from /run/criu/criu.kdat
(00.000149) rlimit: RLIMIT_NOFILE unlimited for self
(00.000509) cpu: x86_family 25 x86_vendor_id AuthenticAMD x86_model_id AMD Ryzen 5 5600G with Radeon Graphics         
(00.000561) cpu: fpu: xfeatures_mask 0x205 xsave_size 2440 xsave_size_max 2440 xsaves_size 840
(00.000590) cpu: fpu: x87 floating point registers     xstate_offsets      0 / 0      xstate_sizes    160 / 160   
(00.000598) cpu: fpu: AVX registers                    xstate_offsets    576 / 576    xstate_sizes    256 / 256   
(00.000611) cpu: fpu: Protection Keys User registers   xstate_offsets   2432 / 832    xstate_sizes      8 / 8     
(00.000616) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:1 xgetbv1:1 xsaves:1
(00.000708) kernel pid_max=4194304
(00.000741) Reading image tree
(00.000906) Add mnt ns 6 pid 769
(00.000921) Add net ns 2 pid 769
(00.000924) Add pid ns 1 pid 769
(00.000929) pstree pid_max=769
(00.000938) Will restore in 0 namespaces
(00.000949) NS mask to use 0
(00.001031) Collecting 51/56 (flags 3)
(00.001121) No memfd.img image
(00.001146)  `- ... done
(00.001174) Collecting 40/54 (flags 2)
(00.001252) Collected [usr/bin/sleep] ID 0x1
(00.001280) Collected [usr/lib/locale/locale-archive] ID 0x2
(00.001304) Collected [usr/lib64/libc.so.6] ID 0x3
(00.001326) Collected [usr/lib64/ld-linux-x86-64.so.2] ID 0x4
(00.001363) Collected [dev/pts/0] ID 0x6
(00.001439) Collected [tmp] ID 0x7
(00.001496) Collected [.] ID 0x8
(00.001518)  `- ... done
(00.001529) Collecting 46/68 (flags 0)
(00.001536) No remap-fpath.img image
(00.001581)  `- ... done
(00.001771) No apparmor.img image
(00.001897) Running pre-restore scripts
(00.002084) No pidns-1.img image
(00.002246) Forking task with 769 pid (flags 0x0)
(00.002295) Creating process using clone3()
(00.002717) PID: real 769 virt 769
(00.003161)    769: timens: monotonic -3 172033336
(00.003195)    769: timens: boottime -3 171999592
(00.003281)    769: cg: Move into 2
(00.003299)    769: cg:   `-> unifie//test1/cgroup.procs
(00.003324)    769: Error (criu/cgroup.c:1092): cg: Can't move 769 into unifie//test1/cgroup.procs (-1/-1): Bad file descriptor
(00.003337)    769: Error (criu/cgroup.c:1206): cg: Can't move into unifie//test1/cgroup.procs (-1/-1): Bad file descriptor
(00.003390) Error (criu/cr-restore.c:2447): Restoring FAILED.
(00.003806) Error (criu/cr-restore.c:1480): 769 killed by signal 9: Killed

With my patch that works. After restore the process is back in the current cgroup:

# cat /proc/`pidof sleep`/cgroup
0::/user.slice/user-0.slice/session-1.scope

The ignore functionality is partially implemented currently but it is missing at two places. That is what my patch fixes.

avagin · 2022-04-05T16:24:50Z

No, I do not agree. According to the man page ignore should Don’t deal with cgroups and pretend that they don’t exist.

You can be disagree and it is what it is. The source of truth is the commit message and the code but not the man page that was written much later than the code.

And I lean to accept this patch rather than introduce a new mode.

adrianreber · 2022-04-05T16:34:38Z

And I lean to accept this patch rather than introduce a new mode.

When did we talk about a new mode? This is just about fixing the existing ignore mode. Nothing else.

adrianreber · 2022-04-05T16:35:51Z

manage_cgroup as a boolean is deprecated and according to the source the right thing to do is --manage-cgroups-mode=MODE.

kolyshkin · 2022-04-07T00:54:23Z

Ahh! That must explain why my earlier use of --manage-cgroup-mode=ignore did not fix the issues I was seeing with lazy migration (opencontainers/runc#2924 and opencontainers/runc#2760)!

Snorch · 2022-04-07T09:21:15Z

I agree with @avagin, "ignore" should only mean that the container runtime is in charge of creating the whole container cgroup directory tree, but criu should still put processes it creates into same topology it had seen them on dump.

In Virtuozzo criu when we migrate container with change of container ID we do dump it with something like --cgroup-root /machine.slice/$old_ct_ID and restore with --cgroup-root /machine.slice/$new_ct_ID. This way if process was dumped in /sys/fs/cgroup/memory/machine.slice/$old_ct_ID/sub/directory it would restore in /sys/fs/cgroup/memory/machine.slice/$new_ct_ID/sub/directory.

Also note that in "soft" mode we don't restore cgroup options on root container cgroups if cgroup directories were pre-created by container environment, so container environment is in charge to put the same values there as was on dump. Using "full" mode solves it but requires some fixes, see e.g.: https://src.openvz.org/projects/OVZ/repos/criu/commits/6cab5493c59d699fdd231fd1ad6d808fe3bc2278

adrianreber · 2022-04-07T09:46:50Z

I agree with @avagin, "ignore" should only mean that the container runtime is in charge of creating the whole container cgroup directory tree, but criu should still put processes it creates into same topology it had seen them on dump.

Independent of what the container runtime should do, ignore currently does not work. See my example in #1800 (comment)

Snorch · 2022-04-07T10:24:51Z

I agree with @avagin, "ignore" should only mean that the container runtime is in charge of creating the whole container cgroup directory tree, but criu should still put processes it creates into same topology it had seen them on dump.

Independent of what the container runtime should do, ignore currently does not work. See my example in #1800 (comment)

Yeh probably you are right, proving hunks from the original patch:

c7d646afb ("cgroups: Introduce cgroup management modes")
...
+#define CG_MODE_IGNORE         (0u << 0)       /* Zero is important here */
...
@@ -1122,10 +1130,14 @@ static int prepare_cgroup_sfd(CgroupEntry *ce)
        int off, i, ret;
        char paux[PATH_MAX];
 
-       pr_info("Preparing cgroups yard\n");
+       pr_info("Preparing cgroups yard (cgroups restore mode %#x)\n",
+               opts.manage_cgroups);
+       if (!opts.manage_cgroups)
+               return 0;

Cgroup yard was never initialized for "ignore", so it was never possible to put tasks in right cgroups with "ignore".

Please also apply this fix to you pr, so that "unified" does not convert to "unifie":

diff --git a/criu/cgroup.c b/criu/cgroup.c
index 82d9b16a2..7490cebf2 100644
--- a/criu/cgroup.c
+++ b/criu/cgroup.c
@@ -1035,7 +1035,7 @@ static int ctrl_dir_and_opt(CgControllerEntry *ctl, char *dir, int ds, char *opt
                }
 
                if (n[0] == 0)
-                       doff += snprintf(dir + doff, ds - doff, "unified");
+                       doff += snprintf(dir + doff, ds - doff, "unified,");
                else
                        doff += snprintf(dir + doff, ds - doff, "%s,", n);
                if (opt)

After this fix it is obvious that with "ignore" the fail happens as we always have -1 CGROUP_YARD service fd in userns_move.

test/zdtm/static/cgroup_ignore.c

Snorch

Looks good.

avagin · 2022-04-08T16:30:07Z

pls remove the passage about oci containers from the commit message. It is not correlate directly with the patch.

test/zdtm/static/cgroup_ignore.c

rst0git · 2022-04-08T17:24:58Z

pls remove the passage about oci containers from the commit message. It is not correlate directly with the patch.

Actually, the commit message describes the problem we are trying to fix: containers/podman#13672

test/zdtm/static/cgroup_ignore.c

avagin · 2022-04-13T21:59:10Z

BTW @Snorch explained me that what I think the ignore mode means is actually covered by the none mode.

The code expected that the cgroup directory ends with a ',' and unconditionally removes the last character. For the "unified" case this resulted in the last 'd' being remove instead of the non existing comma. This just adds a comma after "unified" so that the last removed character is not the 'd'. Suggested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Adrian Reber <areber@redhat.com>

Using '--manage-cgroups=ignore' fails during restore. This fixes the use of '--manage-cgroups=ignore'. Signed-off-by: Adrian Reber <areber@redhat.com>

Test to ensure that --manage-cgroups=ignore works correctly. Signed-off-by: Adrian Reber <areber@redhat.com>

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch from e6dd649 to 824b9c7 Compare April 3, 2022 17:18

This was referenced Apr 3, 2022

Checkpoint of imported container with different name fails containers/podman#13672

Closed

criu: add support for different manage cgroups modes containers/crun#903

Merged

runc/crun, cgroups and CRIU #1793

Open

avagin reviewed Apr 4, 2022

View reviewed changes

criu/cgroup.c Outdated Show resolved Hide resolved

avagin reviewed Apr 4, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Outdated Show resolved Hide resolved

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch 2 times, most recently from f221917 to aedc3be Compare April 5, 2022 07:00

rst0git approved these changes Apr 5, 2022

View reviewed changes

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch 4 times, most recently from 4897c12 to 10d6512 Compare April 5, 2022 14:14

avagin reviewed Apr 5, 2022

View reviewed changes

test/others/rpc/Makefile Outdated Show resolved Hide resolved

avagin force-pushed the criu-dev branch from 22ec298 to 5b2e585 Compare April 5, 2022 14:44

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch from 10d6512 to 6d0b0b2 Compare April 5, 2022 15:25

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch from 6d0b0b2 to 91af633 Compare April 8, 2022 07:52

Snorch reviewed Apr 8, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Outdated Show resolved Hide resolved

Snorch approved these changes Apr 8, 2022

View reviewed changes

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch from 91af633 to 6425209 Compare April 8, 2022 13:46

avagin reviewed Apr 8, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Outdated Show resolved Hide resolved

avagin reviewed Apr 8, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Outdated Show resolved Hide resolved

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch 2 times, most recently from 67a9557 to d62380c Compare April 11, 2022 06:55

avagin reviewed Apr 11, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Outdated Show resolved Hide resolved

avagin reviewed Apr 11, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Show resolved Hide resolved

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch from d62380c to 4db6280 Compare April 12, 2022 08:01

avagin reviewed Apr 12, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Show resolved Hide resolved

avagin reviewed Apr 12, 2022

View reviewed changes

test/zdtm/static/cgroup_ignore.c Outdated Show resolved Hide resolved

adrianreber added 3 commits April 14, 2022 08:06

cgroup: fix --manage-cgroups=ignore

1493816

Using '--manage-cgroups=ignore' fails during restore. This fixes the use of '--manage-cgroups=ignore'. Signed-off-by: Adrian Reber <areber@redhat.com>

cgroup: test for --manage-cgroups=ignore

bed20ce

Test to ensure that --manage-cgroups=ignore works correctly. Signed-off-by: Adrian Reber <areber@redhat.com>

adrianreber force-pushed the 2022-04-03-really-ignore-cgroups branch from 4db6280 to bed20ce Compare April 14, 2022 09:11

avagin merged commit c1125b1 into checkpoint-restore:criu-dev Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroup: fix --manage-cgroups=ignore #1800

cgroup: fix --manage-cgroups=ignore #1800

adrianreber commented Apr 3, 2022

codecov-commenter commented Apr 3, 2022 •

edited

Loading

avagin commented Apr 4, 2022 •

edited

Loading

avagin commented Apr 4, 2022

rst0git left a comment •

edited

Loading

adrianreber commented Apr 5, 2022

avagin commented Apr 5, 2022 •

edited

Loading

avagin commented Apr 5, 2022

adrianreber commented Apr 5, 2022

adrianreber commented Apr 5, 2022

avagin commented Apr 5, 2022 •

edited

Loading

adrianreber commented Apr 5, 2022 •

edited

Loading

avagin commented Apr 5, 2022

adrianreber commented Apr 5, 2022

adrianreber commented Apr 5, 2022

kolyshkin commented Apr 7, 2022

Snorch commented Apr 7, 2022 •

edited

Loading

adrianreber commented Apr 7, 2022

Snorch commented Apr 7, 2022 •

edited

Loading

Snorch left a comment

avagin commented Apr 8, 2022

rst0git commented Apr 8, 2022

avagin commented Apr 13, 2022

cgroup: fix --manage-cgroups=ignore #1800

cgroup: fix --manage-cgroups=ignore #1800

Conversation

adrianreber commented Apr 3, 2022

codecov-commenter commented Apr 3, 2022 • edited Loading

Codecov Report

avagin commented Apr 4, 2022 • edited Loading

avagin commented Apr 4, 2022

rst0git left a comment • edited Loading

Choose a reason for hiding this comment

adrianreber commented Apr 5, 2022

avagin commented Apr 5, 2022 • edited Loading

avagin commented Apr 5, 2022

adrianreber commented Apr 5, 2022

adrianreber commented Apr 5, 2022

avagin commented Apr 5, 2022 • edited Loading

adrianreber commented Apr 5, 2022 • edited Loading

avagin commented Apr 5, 2022

adrianreber commented Apr 5, 2022

adrianreber commented Apr 5, 2022

kolyshkin commented Apr 7, 2022

Snorch commented Apr 7, 2022 • edited Loading

adrianreber commented Apr 7, 2022

Snorch commented Apr 7, 2022 • edited Loading

Snorch left a comment

Choose a reason for hiding this comment

avagin commented Apr 8, 2022

rst0git commented Apr 8, 2022

avagin commented Apr 13, 2022

codecov-commenter commented Apr 3, 2022 •

edited

Loading

avagin commented Apr 4, 2022 •

edited

Loading

rst0git left a comment •

edited

Loading

avagin commented Apr 5, 2022 •

edited

Loading

avagin commented Apr 5, 2022 •

edited

Loading

adrianreber commented Apr 5, 2022 •

edited

Loading

Snorch commented Apr 7, 2022 •

edited

Loading

Snorch commented Apr 7, 2022 •

edited

Loading