crun delete: call systemd's reset-failed #1295
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
According to the OCI runtime spec (https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#delete), runtime's delete is supposed to remove all the container's artefacts.
In case systemd cgroup driver is used, and the systemd unit has failed (e.g. oom-killed), systemd won't remove the unit (that is, unless the "CollectMode: inactive-or-failed" property is set).
Leaving a leftover failed unit is a violation of runtime spec; in addition, a leftover unit result in inability to start a container with the same systemd unit name (such operation will fail with "unit already exists" error).
Call reset-failed from systemd's cgroup manager
destroy_cgroup
call, so the failed unit will be removed (by systemd) after "crun delete".This change is similar to the one in runc (see opencontainers/runc#3888). A (slightly modified) test case from runc added by the above change was used to check that the bug is fixed.
For bigger picture, see:
To test manually, systemd >= 244 is needed. Create a container config that runs
sleep 10
and has the following systemd annotations:Start a container using --systemd-cgroup option.
The container will be killed by systemd in 2 seconds, thus its systemd unit status will be "failed". Once it has failed, the
systemctl status $UNIT_NAME
should have exit code of 3 (meaning "unit is not active").Now, run
crun delete $CTID
and repeatsystemctl status $UNIT_NAME
. It should result in exit code of 4 (meaning "no such unit").