-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove leftover autoremove containers during refresh #21541
Remove leftover autoremove containers during refresh #21541
Conversation
Oops. Fixes #21482 |
1030bc3
to
805351e
Compare
During system shutdown, Podman should go down gracefully, meaning that we have time to spawn cleanup processes which remove any containers set to autoremove. Unfortunately, this isn't always the case. If we get a SIGKILL because the system is going down immediately, we can't recover from this, and the autoremove containers are not removed. However, we can pick up any leftover autoremove containers when we refesh the DB state, which is the first thing Podman does after a reboot. By detecting any autoremove containers that have actually run (a container that was created but never run doesn't need to be removed) at that point and removing them, we keep the fresh boot clean, even if Podman was terminated abnormally. Fixes containers#21482 [NO NEW TESTS NEEDED] This requires a reboot to realistically test. Signed-off-by: Matt Heon <mheon@redhat.com>
805351e
to
9983e87
Compare
libpod/runtime.go
Outdated
@@ -856,6 +859,22 @@ func (r *Runtime) refresh(alivePath string) error { | |||
if err := ctr.refresh(); err != nil { | |||
logrus.Errorf("Refreshing container %s: %v", ctr.ID(), err) | |||
} | |||
// This is the only place it's safe to use ctr.state.State unlocked | |||
// We're holding the alive lock, guaranteed to be the only Libpod on the system right now. | |||
if ctr.AutoRemove() && ctr.state.State == define.ContainerStateExited { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this can work. resetContainerState() seems to set the state to ContainerStateConfigured if the state was not ContainerStateExited which may be fine but if you consider this is supposed to fix the case of a hard shutdown I would think the previous state could have been running, stopped or stopping.
The only case were the state is ContainerStateExited is likely when podman container cleanup would have been run successfully at which would point it should be safe to assume the the container was already deleted.
I really think the state check must happen before resetContainerState() and it should delete for all states except ContainerStateConfigured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refresh logic should be forcing anything except ContainerStateConfigured and ContainerStateInitialized to ContainerStateExited. If we're not forcing Running and Stopped to Exited, that's a problem.
We should probably also catch ContainerStateRemoving now that I think about it. Pass it through unchanged and remove immediately if true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, our refresh logic is definitely wrong, we should be catching more than just Exited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. We now catch anything that was running and convert it into an Exited container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this looks much better.
We were preserving ContainerStateExited, which is better than nothing, but definitely not correct. A container that ran at any point during the last boot should be moved to Exited state to preserve the fact that they were run at least one. This means we have to convert Running, Stopped, Stopping, Paused containers to exited as well. Signed-off-by: Matt Heon <mheon@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I am somewhat concerned that this critical part of system refresh after boot is pretty much untested.
I wonder if we can do a e2e test, start a container, then SIGKILL conmon, delete the rundir/tmpdir (should simulate a reboot since that holds the alive file), then do a podman inspect and we should se the exited state, WDYT?
You'd probably want to force-kill the container process and unmount the container as well before you removed the rundir... Is there a solid way of unmounting a container from the CLI without using a Podman command, I wonder |
We show the overlayfs MergedDir in podman inspect, so a simple |
Do you want me to do that in this PR, or should that be followon? |
I am fine with a follow on. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
3aa413f
into
containers:main
During system shutdown, Podman should go down gracefully, meaning that we have time to spawn cleanup processes which remove any containers set to autoremove. Unfortunately, this isn't always the case. If we get a SIGKILL because the system is going down immediately, we can't recover from this, and the autoremove containers are not removed.
However, we can pick up any leftover autoremove containers when we refesh the DB state, which is the first thing Podman does after a reboot. By detecting any autoremove containers that have actually run (a container that was created but never run doesn't need to be removed) at that point and removing them, we keep the fresh boot clean, even if Podman was terminated abnormally.
Fixes #21482
Does this PR introduce a user-facing change?