Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Move from restore_dir() to podman system reset for user #1598

Merged
merged 1 commit into from
Mar 1, 2024

Conversation

martinpitt
Copy link
Member

@martinpitt martinpitt commented Mar 1, 2024

The restore_dir() for podman's data directory is highly problematic:
This interferes with btrfs subvolumes and overlayfs mounts, and often
causes cp failures like

cp: cannot stat '/home/admin/.local/share/containers/storage/overlay/compat3876082856': No such file or directory

So move to podman system reset, and restore the test images
with podman load for each test.

Unfortunately podman system reset defaults to the 10 s wait timeout
(containers/podman#21874), so we still need
the separate rm --time 0 hack. But conceptually that can go away once
that bug is fixed.

This approach would also be nice on the system podman side, but it is super
hard to get right there especially on CoreOS: There we simultaneously want a
thorough cleanup, but also rely on the running cockpit/ws container. It also
collides with the "force unmount everything below /var/lib/containers" hack
that we unfortunately still need for some OSes. But doing it for the user at
least solves half of the problem. The observed failures in the field
all occurred on the user directory, anyway.

Fixes #1591


I tried the "full" approach in #1592, with pieces of it in #1596 and #1597, but this is a mine field 😢

The `restore_dir()` for podman's data directory is highly problematic:
This interferes with btrfs subvolumes and overlayfs mounts, and often
causes `cp` failures like

```
cp: cannot stat '/home/admin/.local/share/containers/storage/overlay/compat3876082856': No such file or directory
```

So move to `podman system reset`, and restore the test images
with `podman load` for each test.

Unfortunately `podman system reset` defaults to the 10 s wait timeout
(containers/podman#21874), so we still need
the separate `rm --time 0` hack. But conceptually that can go away once
that bug is fixed.

This approach would also be nice on the system podman side, but it is super
hard to get right there especially on CoreOS: There we simultaneously want a
thorough cleanup, but also rely on the running cockpit/ws container. It also
collides with the "force unmount everything below /var/lib/containers" hack
that we unfortunately still need for some OSes. But doing it for the user at
least solves half of the problem. The observed failures in the field
all occurred on the user directory, anyway.

Fixes cockpit-project#1591
@martinpitt
Copy link
Member Author

This by and large worked, I just forgot the pause container for ubuntu-2204. Fixed, and so it can re-run again to make sure.

@martinpitt martinpitt marked this pull request as ready for review March 1, 2024 09:46
@martinpitt martinpitt requested a review from jelly March 1, 2024 09:46
Copy link
Member

@jelly jelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

done
loginctl disable-linger $(id -u admin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to cleanup the lingering!

podman system reset --force
""")
# HACK: system reset has 10s timeout, make that faster with an extra `stop`
# https://github.com/containers/podman/issues/21874
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting that!

@martinpitt martinpitt merged commit 9e3d0c7 into cockpit-project:main Mar 1, 2024
30 checks passed
@martinpitt martinpitt deleted the cleanup-user branch March 1, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frequent setup/cleanup failures in tests
2 participants