Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On MacOS container checkpoint fails - crun returns 1 #22947

Closed
dilyanpalauzov opened this issue Jun 8, 2024 · 4 comments
Closed

On MacOS container checkpoint fails - crun returns 1 #22947

dilyanpalauzov opened this issue Jun 8, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. remote Problem is in podman-remote

Comments

@dilyanpalauzov
Copy link
Contributor

Issue Description

On MacOS 14.5 I run:

% podman machine init --now --rootful --memory 4096
% podman pull alpine
% podman run -d alpine:latest top
% sudo podman container checkpoint what_ever
-> it works as expected
% podman rm —-force what_ever

% podman run --cap-add=CAP_MKNOD -d -p 2222:22 -p 2225:25 -p 2465:465 -p 2587:587 -p2143:143 -p2993:993 abc
87840b6fe995e85845ec4d89178685def5051fad2a7061e3677ea65252bebcd2
% sudo podman container checkpoint -R 87840b6fe995e85845ec4d89178685def5051fad2a7061e3677ea65252bebcd2
Error: /usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/87840b6fe995e85845ec4d89178685def5051fad2a7061e3677ea65252bebcd2/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/87840b6fe995e85845ec4d89178685def5051fad2a7061e3677ea65252bebcd2/userdata --leave-running --leave-running 87840b6fe995e85845ec4d89178685def5051fad2a7061e3677ea65252bebcd2 failed: exit status 1

% sudo podman run --cap-add=CAP_MKNOD -d -p 2222:22 -p 2225:25 -p 2465:465 -p 2587:587 -p2143:143 -p2993:993 abc
80dbc805c2b698639c19ecf00128edd8bcbe817fd7442113c8ef14fb71d5f857

% sudo podman container checkpoint -R 80dbc805c2b698639c19ecf00128edd8bcbe817fd7442113c8ef14fb71d5f857
Error: /usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/80dbc805c2b698639c19ecf00128edd8bcbe817fd7442113c8ef14fb71d5f857/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/80dbc805c2b698639c19ecf00128edd8bcbe817fd7442113c8ef14fb71d5f857/userdata --leave-running --leave-running 80dbc805c2b698639c19ecf00128edd8bcbe817fd7442113c8ef14fb71d5f857 failed: exit status 1

Steps to reproduce the issue

See above

Describe the results you received

crun returns 1

Describe the results you expected

crun return 0

podman info output

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 99.33
    systemPercent: 0.33
    userPercent: 0.34
  cpus: 6
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: coreos
    version: "40"
  eventLogger: journald
  freeLocks: 2047
  hostname: localhost.localdomain
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.8.11-300.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 3212447744
  memTotal: 4095819776
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.11.0-1.20240531102943328308.main.4.g6838c50.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.0-dev
    package: netavark-1.11.0-1.20240606174759319307.main.8.gfebe31a.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.0-dev
  ociRuntime:
    name: crun
    package: crun-1.15-1.20240607090105650503.main.32.gea54402.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version UNKNOWN
      commit: 7211fa058c4981d69f180cf079f55b7ec032c233
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240510.g7288448-1.fc40.x86_64
    version: |
      pasta 0^20240510.g7288448-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-2.fc40.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 41m 57.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
    overlay.use_composefs: "false"
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 4587167744
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 6
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.1.1
  Built: 1717459200
  BuiltTime: Tue Jun  4 02:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.3
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@dilyanpalauzov dilyanpalauzov added the kind/bug Categorizes issue or PR as related to a bug. label Jun 8, 2024
@github-actions github-actions bot added the remote Problem is in podman-remote label Jun 8, 2024
@dilyanpalauzov
Copy link
Contributor Author

Here are more concise instructions. Containerfile is:

FROM docker.io/library/ubuntu:20.04
RUN apt-get update && apt-get install -y sudo python3 openssh-server 
ENTRYPOINT ["/lib/systemd/systemd“]

and then

% podman build -t uuu .
% podman run -d uuu:latest
8147b370dcc131310521682426b6df7769c47009f5b397e15b1f2fb8745a33ad
% sudo podman container checkpoint 8147b370dcc131310521682426b6df7769c47009f5b397e15b1f2fb8745a33ad
Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/8147b370dcc131310521682426b6df7769c47009f5b397e15b1f2fb8745a33ad/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/8147b370dcc131310521682426b6df7769c47009f5b397e15b1f2fb8745a33ad/userdata 8147b370dcc131310521682426b6df7769c47009f5b397e15b1f2fb8745a33ad` failed: exit status 1

The reason to install sudo, python3, openssh-server is that one of these installs systemd, which is the entrypoint.

@giuseppe
Copy link
Member

giuseppe commented Jun 8, 2024

this is the error I get on Fedora 40:

(33.251449) epoll: Dumping: eventpoll-tfd: id 0x00005e tfd       10 events 0x000019 data 0x0000000000000a
(33.251449) epoll: Dumping: eventpoll-tfd: id 0x00005e tfd        9 events 0x000019 data 0x00000000000009
(33.251450) epoll: Dumping: eventpoll-tfd: id 0x00005e tfd        8 events 0x000019 data 0x00000000000008
(33.251451) epoll: Dumping: eventpoll-tfd: id 0x00005e tfd        5 events 0x000019 data 0x00000000000005
(33.251452) epoll: Dumping: eventpoll-tfd: id 0x00005e tfd        6 events 0x000019 data 0x00000000000006
(33.251452) epoll: Dumping: eventpoll-tfd: id 0x00005e tfd        3 events 0x000019 data 0x00000000000003
(33.251467) 822520 fdinfo 5: pos:                0 flags:                0/0x1
(33.251489) fsnotify: wd: wd 0x000003 s_dev 0x000097 i_ino       0x30d156d8 mask 0x0002c8
(33.251491) fsnotify: 	[fhandle] bytes 0x000024 type 0x0000f8 __handle 0x810021fb00000000:0x7a4be1136260a9fa
(33.251497) fsnotify: 		Trying via mntid 2612 root / ns_mountpoint @./ (28)
(33.251500) Warn  (criu/fsnotify.c:281): fsnotify: 	Handle 0x97:0x30d156d8 cannot be opened
(33.251501) irmap: Resolving 97:30d156d8 path
(33.251502) irmap: 	Found /usr/share/dbus-1/system-services in cache
(33.251503) fsnotify: 	Dumping /usr/share/dbus-1/system-services as path for handle
(33.251504) fsnotify: wd: wd 0x000002 s_dev 0x000097 i_ino       0x2149dfd3 mask 0x0002c8
(33.251505) fsnotify: 	[fhandle] bytes 0x000024 type 0x0000f8 __handle 0x810021fb00000000:0x7a4be1136260a9fa
(33.251508) fsnotify: 		Trying via mntid 2612 root / ns_mountpoint @./ (28)
(33.251510) Warn  (criu/fsnotify.c:281): fsnotify: 	Handle 0x97:0x2149dfd3 cannot be opened
(33.251512) irmap: Resolving 97:2149dfd3 path
(33.251514) irmap: 	Found /etc/dbus-1/system.d in cache
(33.251515) fsnotify: 	Dumping /etc/dbus-1/system.d as path for handle
(33.251515) fsnotify: wd: wd 0x000001 s_dev 0x000097 i_ino         0xa1e942 mask 0x0002c8
(33.251516) fsnotify: 	[fhandle] bytes 0x000024 type 0x0000f8 __handle 0x810021fb00000000:0x7a4be1136260a9fa
(33.251519) fsnotify: 		Trying via mntid 2612 root / ns_mountpoint @./ (28)
(33.251521) Warn  (criu/fsnotify.c:281): fsnotify: 	Handle 0x97:0xa1e942 cannot be opened
(33.251522) irmap: Resolving 97:a1e942 path
(33.251523) irmap: Scanning /etc hint
(33.251530) irmap: Scanning /var/spool hint
(33.251531) irmap: Scanning /var/log hint
(33.251531) irmap: Scanning /usr/share/dbus-1/system-services hint
(33.251532) irmap: Scanning /var/lib/polkit-1/localauthority hint
(33.251533) irmap: Scanning /usr/share/polkit-1/actions hint
(33.251533) irmap: Scanning /lib/udev hint
(33.251534) irmap: Scanning /. hint
(33.251534) irmap: Scanning /no-such-path hint
(33.251535) irmap: Refresh stat for /no-such-path
(33.251545) Warn  (criu/irmap.c:104): irmap: Can't stat /no-such-path: No such file or directory
(33.251547) Error (criu/fsnotify.c:284): fsnotify: 	Can't dump that handle
(33.251550) ----------------------------------------
(33.251557) Error (criu/cr-dump.c:1674): Dump files (pid: 822520) failed with -1
(33.251560) Waiting for 822520 to trap
(33.251565) Daemon 822520 exited trapping
(33.251568) Sent msg to daemon 3 0 0
pie: 30: __fetched msg: 3 0 0
pie: 30: 30: new_sp=0x7fd3119a0e48 ip 0x7fd311d4c64a
(33.251589) 822520 was trapped
(33.251596) 822520 was trapped
(33.251598) 822520 (native) is going to execute the syscall 15, required is 15
(33.251603) 822520 was stopped
(33.251726) net: Unlock network
(33.251728) Running network-unlock scripts
(33.268963) Unfreezing tasks into 1
(33.268982) 	Unseizing 822446 into 1
(33.268986) 	Unseizing 822505 into 1
(33.268988) 	Unseizing 822520 into 1
(33.269010) 	Unseizing 822521 into 1
(33.269012) 	Unseizing 822523 into 1
(33.269015) 	Unseizing 822531 into 1
(33.269035) Error (criu/cr-dump.c:2098): Dumping FAILED.

looks similar to checkpoint-restore/criu#2324

If it can be of any help, in my case the inode is:

# podman exec -lti find -inum 10611010
./usr/share/dbus-1/system.d

@adrianreber any idea what it could be?

@adrianreber
Copy link
Collaborator

Upstream we do not test checkpointing systemd a lot. So I would recommend a container without systemd.

@Luap99
Copy link
Member

Luap99 commented Jun 10, 2024

From https://docs.podman.io/en/latest/markdown/podman-container-checkpoint.1.html

IMPORTANT: If the container is using systemd as entrypoint checkpointing the container might not be possible.

So this is already documented that systemd may not work.

In any case this does not seem to be a podman bug as CRIU is failing not podman thus closing this here.

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2024
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 9, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. remote Problem is in podman-remote
Projects
None yet
Development

No branches or pull requests

4 participants