Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman checkpoint on macOS fails #12053

Closed
alanruttenberg opened this issue Oct 20, 2021 · 16 comments
Closed

podman checkpoint on macOS fails #12053

alanruttenberg opened this issue Oct 20, 2021 · 16 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@alanruttenberg
Copy link

/kind bug

Description

podman checkpoint on macos fails.

Steps to reproduce the issue:

brew install podman
podman machine init
podman machine start
sudo podman run -dt -p 8080:80/tcp docker.io/library/httpd
<id printed>
sudo podman container checkpoint <id>

Describe the results you received:

Error: `/usr/bin/crun checkpoint --image-path /var/home/core/.local/share/containers/storage/overlay-containers/<id>/userdata/checkpoint --work-path /var/home/core/.local/share/containers/storage/overlay-containers/<id>/userdata <id>` failed: exit status 1

Describe the results you expected:

Checkpoint succeeds

Additional information you deem important (e.g. issue happens only occasionally):

Not sure it is supposed to work yet

Output of podman version:

Client:
Version:      3.4.0
API Version:  3.4.0
Go Version:   go1.17.2
Built:        Thu Sep 30 14:44:31 2021
OS/Arch:      darwin/amd64

Server:
Version:      3.3.1
API Version:  3.3.1
Go Version:   go1.16.6
Built:        Mon Aug 30 16:46:36 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.22.3
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.29-2.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: '
  cpus: 1
  distribution:
    distribution: fedora
    version: "34"
  eventLogger: journald
  hostname: localhost
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.9-200.fc34.x86_64
  linkmode: dynamic
  logDriver: ""
  memFree: 968065024
  memTotal: 2061860864
  ociRuntime:
    name: crun
    package: crun-1.0-1.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.0
      commit: 139dc6971e2f1d931af520188763e984d6cdfbf8
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc34.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 0
  swapTotal: 0
  uptime: 1h 16m 31.02s (Approximately 0.04 days)
plugins:
  log: null
  network: null
  volume: null
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 5
    paused: 0
    running: 1
    stopped: 4
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 3.3.1
  Built: 1630356396
  BuiltTime: Mon Aug 30 20:46:36 2021
  GitCommit: ""
  GoVersion: go1.16.6
  OsArch: linux/amd64
  Version: 3.3.1

Package info (e.g. output of rpm -q podman or apt list podman):

Not sure what macOS equivalent is. Providing "brew info" output

podman: stable 3.4.0 (bottled), HEAD
Tool for managing OCI containers and pods
https://podman.io/
/usr/local/Cellar/podman/3.4.0_2 (170 files, 39.5MB) *
  Poured from bottle on 2021-10-20 at 15:39:42
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/podman.rb
License: Apache-2.0
==> Dependencies
Build: go ✘, go-md2man ✘
Required: qemu ✔
==> Options
--HEAD
	Install HEAD version
==> Caveats
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d
==> Analytics
install: 13,562 (30 days), 27,084 (90 days), 54,317 (365 days)
install-on-request: 13,556 (30 days), 27,078 (90 days), 54,221 (365 days)
build-error: 0 (30 days)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
macOS Catalina 10.15.7

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 20, 2021
@vrothberg
Copy link
Member

#12281 should fix it. Remote checkpoint/restore had a number of issues.

@vrothberg
Copy link
Member

#12281 should fix it. Remote checkpoint/restore had a number of issues.

Merged, closing.

@alanruttenberg
Copy link
Author

Hi,
Thanks for this! Could you give me an idea about what the best way to test it is? I tried brew install podman --HEAD, deleted and re-inited podman machine and started it. I run an image and then try to checkpoint it, but still fail.

podman run -it localhost/c2ffi (in another shell)
podman container list

CONTAINER ID  IMAGE                   COMMAND     CREATED        STATUS            PORTS       NAMES
a9b7cc139273  localhost/c2ffi:latest  bash        5 minutes ago  Up 5 minutes ago              flamboyant_wescoff

sudo podman container checkpoint a9b7cc139273

Error: `/usr/bin/crun checkpoint --image-path /var/home/core/.local/share/containers/storage/overlay-containers/a9b7cc13927386f3b6b5ee087ee6060baf5807d2fb54802d632a4bb56f02282f/userdata/checkpoint --work-path /var/home/core/.local/share/containers/storage/overlay-containers/a9b7cc13927386f3b6b5ee087ee6060baf5807d2fb54802d632a4bb56f02282f/userdata a9b7cc13927386f3b6b5ee087ee6060baf5807d2fb54802d632a4bb56f02282f` failed: exit status 1

I notice a couple other regressions:


podman run   -dt -p 8080:80/tcp docker.io/library/httpd

Trying to pull docker.io/library/httpd:latest...
Getting image source signatures
<copies blobs>
Writing manifest to image destination
Storing signatures
Error: error configuring network namespace for container c3813de70b16c5d0f3c1c176ade216428ed45ada999efd43e2f5dae54983021a: error adding pod happy_edison_happy_edison to CNI network "podman": Post "http://host.crc.testing:7777/services/forwarder/expose": dial tcp 192.168.127.254:7777: connect: connection refused

podman network ls

NETWORK ID  NAME        DRIVER
Error: template: list:1:13: executing "list" at <.ID>: error calling ID: runtime error: slice bounds out of range [:12] with length 0

@alanruttenberg
Copy link
Author

podman info --debug

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 1
  distribution:
    distribution: fedora
    variant: coreos
    version: "35"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.14-300.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 726077440
  memTotal: 2061524992
  networkBackend: ""
  ociRuntime:
    name: crun
    package: crun-1.2-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.2
      commit: 4f6c8e0583c679bfee6a899c05ac6b916022561b
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 0
  swapTotal: 0
  uptime: 14m 0.06s
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: ""
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.1
  Built: 1634740316
  BuiltTime: Wed Oct 20 14:31:56 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 3.4.1

@vrothberg
Copy link
Member

Thank you for checking!

For testing, you would need to compile Podman for Mac and for Linux (and get that into the VM) from the main branch. I don't think that there are daily builds available in brew.

@baude @ashley-cui what do you think?

@ashley-cui
Copy link
Member

Sounds about right.

@alanruttenberg
Copy link
Author

Hi,
I can compile for Mac, but don't know how to get the right version into the VM. Is this fix going to make it into a release? I just tried again with versions 3.4.2 on Mac and in the VM(Fedora CoreOS 35.20211203.2.1) but no love. Failing that, if you could give me (or point me at) some instruction on how to get the right version into the VM I'd be very grateful. Or perhaps there's an alternative VM that I can use with podman machine?

@mheon
Copy link
Member

mheon commented Dec 13, 2021

This did not make it into the recent 3.4.4 release (didn't backport cleanly, diffs versus main were too large). It will be fixed in Podman 4.0.0, released early next year. At that point it should make its way into the VM and Brew at around the same time.

@alanruttenberg
Copy link
Author

alanruttenberg commented Dec 13, 2021 via email

@alanruttenberg
Copy link
Author

Hi,
I see that podman 4.0.0 is released. Congratulations! Is the podman machine that it downloads - fedora-coreos-35.20220305.dev.0-qemu.x86_64.qcow2.xz support 4.0.0? Didn't seem to work with a container checkpoint I tried. If not is there some way to test this now?

podman info --debug
host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 1
  distribution:
    distribution: fedora
    variant: coreos
    version: "35"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.18-200.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 530939904
  memTotal: 2061381632
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.2-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.2
      commit: f6fbc8f840df1a414f31a60953ae514fa497c748
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 22m 14.78s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 1
    stopped: 3
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1646319416
  BuiltTime: Thu Mar  3 09:56:56 2022
  GitCommit: ""
  GoVersion: go1.16.14
  OsArch: linux/amd64
  Version: 4.0.2

@baude
Copy link
Member

baude commented Mar 10, 2022

yes it supports podman 4. an exact reproducer might help as well as you doing a podman machine ssh and looking at journalctl and other relevant things. also, does it work if you ssh into the machine and run the commands in linux?

@alanruttenberg
Copy link
Author

I am mistaken, and checkpointing works! Thank you!

@alanruttenberg
Copy link
Author

While the example checkpoint from the documentation works, on the case I am working on it has failed. I am trying to export the checkpoint with "-e" and the resultant checkpoint file is empty but there is no complaint. If I run the checkpoint inside the podman machine, I get the error

"2022-03-11T00:54:46.000434761Z: CRIU checkpointing failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata/dump.log
Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata 23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2` failed: exit status 1

The CRIU logfile is over 1m, so in the interest of getting started I'm just including tail -50, below, which shows some issue with file locks near the end. In case this matters this is for a container run interactively, and I am checkpointing from another shell.

I am happy to provide any additional information you need in order to debug this.

(01.384343) 0x7f3217522000-0x7f3217523000 (4K) prot 0x3 flags 0x2 fdflags 0 st 0x41 off 0x26000 reg fp  shmid: 0x3
(01.384370) 0x7f3217523000-0x7f3217524000 (4K) prot 0x3 flags 0x22 fdflags 0 st 0x201 off 0 reg ap  shmid: 0
(01.384383) 0x7ffd57732000-0x7ffd57753000 (132K) prot 0x3 flags 0x122 fdflags 0 st 0x201 off 0 reg ap  shmid: 0
(01.384396) 0x7ffd57774000-0x7ffd57778000 (16K) prot 0x1 flags 0x22 fdflags 0 st 0x1201 off 0 reg vvar ap  shmid: 0
(01.384409) 0x7ffd57778000-0x7ffd5777a000 (8K) prot 0x5 flags 0x22 fdflags 0 st 0x209 off 0 reg vdso ap  shmid: 0
(01.384421) 0xffffffffff600000-0xffffffffff601000 (4K) prot 0x5 flags 0x22 fdflags 0 st 0x204 off 0 vsys ap  shmid: 0
(01.384434) Obtaining task auvx ...
(01.384700) Dumping task cwd id 0x6 root id 0x7
(01.385916) mnt: Dumping mountpoints
(01.385981) mnt: 	628: 35:/0 @ ./dev/console
(01.386085) mnt: 	730: 32:/sysrq-trigger @ ./proc/sysrq-trigger
(01.386118) mnt: 	729: 32:/sys @ ./proc/sys
(01.386131) mnt: 	728: 32:/irq @ ./proc/irq
(01.386142) mnt: 	727: 32:/fs @ ./proc/fs
(01.386154) mnt: 	726: 32:/bus @ ./proc/bus
(01.386166) mnt: 	725: 32:/asound @ ./proc/asound
(01.386177) mnt: 	724: 3a:/ @ ./sys/dev/block
(01.391153) mnt: 	723: 39:/ @ ./sys/fs/selinux
(01.395192) mnt: 	722: 38:/ @ ./sys/firmware
(01.399614) mnt: 	721: 37:/ @ ./proc/scsi
(01.403877) mnt: 	720: 5:/null @ ./proc/timer_list
(01.403912) mnt: 	719: 5:/null @ ./proc/latency_stats
(01.403928) mnt: 	718: 5:/null @ ./proc/keys
(01.403956) mnt: 	717: 5:/null @ ./proc/kcore
(01.403973) mnt: 	716: 36:/ @ ./proc/acpi
(01.407960) mnt: 	715: 1b:/ @ ./sys/fs/cgroup
(01.407982) mnt: 	714: 2c:/ @ ./dev/shm
(01.408204) mnt: 	713: 1a:/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata/hosts @ ./etc/hosts
(01.408278) mnt: 	712: 1a:/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata/resolv.conf @ ./etc/resolv.conf
(01.408294) mnt: 	711: 1a:/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata/run/secrets @ ./run/secrets
(01.408308) mnt: 	710: 1a:/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata/.containerenv @ ./run/.containerenv
(01.408320) mnt: 	709: 1a:/containers/storage/overlay-containers/23ce5e1491a0c922a442ed62fe28040ceb11ad079b6e06404e79ec4f24b3f5a2/userdata/hostname @ ./etc/hostname
(01.408337) mnt: 	708: 31:/ @ ./dev/mqueue
(01.408469) mnt: 	707: 35:/ @ ./dev/pts
(01.408484) mnt: 	706: 34:/ @ ./sys
(01.408497) mnt: 	705: 33:/ @ ./dev
(01.408509) mnt: Mount is not fully visible ./dev
(01.408561) mnt: 	mount has children ./dev
(01.414033) mnt: 	704: 32:/ @ ./proc
(01.414059) mnt: 	703: 2f:/ @ ./
(01.414213) Dumping file-locks
(01.414254) Error (criu/file-lock.c:110): Some file locks are hold by dumping tasks! You can try --file-locks to dump them.
(01.415096) Unlock network
(01.415141) Running network-unlock scripts
(01.421282) Unfreezing tasks into 1
(01.421303) 	Unseizing 72212 into 1
(01.421335) 	Unseizing 72214 into 1
(01.421358) 	Unseizing 72220 into 1
(01.421381) 	Unseizing 72223 into 1
(01.422129) Error (criu/cr-dump.c:1781): Dumping FAILED.

@alanruttenberg
Copy link
Author

Some more information: The container I'm working first builds a number of jar files and then runs JAVA, which presents an interactive prompt. If I checkpoint the container after the build but before I run JAVA the checkpoint succeeds. It fails when I try to checkpoint while JAVA is presenting the prompt. I could upload the image and instructions to start the application if desired.

@alanruttenberg
Copy link
Author

The image is available here: https://mumble.net/~alanr/lsw-podman-image.gz
When the image is run with podman run -it lsw2/lisp it is at a bash prompt podmanlsw:~/repos/lsw2/bin % . The command to start the application is ./lsw

When the prompt "CL-USER(1):" is presented is when I try to checkpoint.

I just found CRaC, which is a modified criu used in a project aiming to checkpoint JAVA programs and provide an API for knowing that an checkpoint is being restored. I don't understand CRIU and so I don't know if the changes they make are relevant.

@alanruttenberg
Copy link
Author

The container checkpoints and restores if I invoke the commands with --file-locks --tcp-established
Not sure why I have either, but will investigate.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

5 participants