-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkpoint broken for Podman or Docker #2011
Comments
Docker + CRIU logs
|
Please update to the latest version of CRIU and Podman. That should resolve your problems. |
PODMAN
|
Hello @adrianreber, What version of CRIU should I use? I can try to upgrade, want to make sure I'm using the right one. Thanks. |
Just use the latest. I seems you are using Fedora 33 which went EOL over a year ago. The latest Podman error log you are posting could be solved by not specifying |
Will do.
I pass --tz or --env TZ= in Docker because it is needed in this case. CRIU checkpoint should not crash because of that. But for sake of argument I tried that too and it dies with a new error message now:
As you suggested will try again with a newer version of the tools and OS. Please do not close the issue yet as I want to confirm this will go away with the new version. |
Well, it does not crash. The problem is just that the timezone handling is done in a way that cannot be handled by CRIU. This has already been discussed in Podman. But the mountpoint of the timezone is not correctly specified in We would love to have a fix for it. Please contribute a fix.
It seems you have a non standard setup of SELinux. This should be fixed in newer versions of CRIU. But if you require a timezone in your container, this is not something CRIU can fix as long as the container is not correctly setup by the Podman/Docker. So far these discussion did not lead anywhere, but if you have an idea how to solve it please provide a fix. |
I did not create that Image but I'll crack it open to see what they are doing there, will report my findings here. I was about to test on another distro (Ubuntu Focal 20.04) but then I saw this issue related to the OS, so i will stick with Fedora for the testing. |
Yes, latest Fedora or RHEL clone should work really well. Depending on the Ubuntu kernel it can work if it is new enough. On 22.04 you have to watch out to install a newer version of CRIU from our repository. The CRIU version in Ubuntu is too old and actually completely broken due to missing support for restartable sequences. |
A friendly reminder that this issue had no activity for 30 days. |
Hello, Upgraded to the following versions of docker, podman and criu (enabled also docker experimental settings on docker), Docker version[josevnz@dmaf5 ~]$ rpm -qi docker-ce
Name : docker-ce
Epoch : 3
Version : 20.10.22
Release : 3.fc37
Architecture: x86_64
Install Date: Fri 30 Dec 2022 07:56:35 PM EST
Group : Tools/Docker
Size : 87739886
License : ASL 2.0
Signature : RSA/SHA512, Fri 16 Dec 2022 03:39:02 AM EST, Key ID c52feb6b621e9f35
Source RPM : docker-ce-20.10.22-3.fc37.src.rpm
Build Date : Thu 15 Dec 2022 05:26:12 PM EST
Build Host : 6e7c0d90a1d1
Packager : Docker <[email protected]>
Vendor : Docker
URL : https://www.docker.com
Summary : The open-source application container engine
Description :
Docker is a product for you to build, ship and run any application as a
lightweight container.
Docker containers are both hardware-agnostic and platform-agnostic. This means
they can run anywhere, from your laptop to the largest cloud compute instance
and everything in between - and they don't require you to use a particular
language, framework or packaging system. That makes them great building blocks
for deploying and scaling web apps, databases, and backend services without
depending on a particular stack or provider. Criu version[josevnz@dmaf5 ~]$ rpm -qi criu
criu criu-libs
[josevnz@dmaf5 ~]$ rpm -qi criu
Name : criu
Version : 3.17.1
Release : 3.fc37
Architecture: x86_64
Install Date: Sat 05 Nov 2022 04:43:39 AM EDT
Group : Unspecified
Size : 1545609
License : GPLv2
Signature : RSA/SHA256, Wed 20 Jul 2022 08:33:11 PM EDT, Key ID f55ad3fb5323552a
Source RPM : criu-3.17.1-3.fc37.src.rpm
Build Date : Wed 20 Jul 2022 07:59:41 PM EDT
Build Host : buildhw-x86-07.iad2.fedoraproject.org
Packager : Fedora Project
Vendor : Fedora Project
URL : http://criu.org/
Bug URL : https://bugz.fedoraproject.org/criu
Summary : Tool for Checkpoint/Restore in User-space
Description :
criu is the user-space part of Checkpoint/Restore in User-space
(CRIU), a project to implement checkpoint/restore functionality for
Linux in user-space. Podman version[josevnz@dmaf5 ~]$ rpm -qi podman
Name : podman
Epoch : 4
Version : 4.3.1
Release : 1.fc37
Architecture: x86_64
Install Date: Fri 30 Dec 2022 09:21:59 PM EST
Group : Unspecified
Size : 43050129
License : ASL 2.0 and BSD and ISC and MIT and MPLv2.0
Signature : RSA/SHA256, Fri 11 Nov 2022 11:35:55 AM EST, Key ID f55ad3fb5323552a
Source RPM : podman-4.3.1-1.fc37.src.rpm
Build Date : Fri 11 Nov 2022 10:01:24 AM EST
Build Host : buildhw-x86-09.iad2.fedoraproject.org
Packager : Fedora Project
Vendor : Fedora Project
URL : https://podman.io/
Bug URL : https://bugz.fedoraproject.org/podman
Summary : Manage Pods, Containers and Container Images
Description :
podman (Pod Manager) is a fully featured container engine that is a simple
daemonless tool. podman provides a Docker-CLI comparable command line that
eases the transition from other container engines and allows the management of
pods, containers and images. Simply put: alias docker=podman.
Most podman commands can be run as a regular user, without requiring
additional privileges.
podman uses Buildah(1) internally to create container images.
Both tools share image (not container) storage, hence each can use or
manipulate images (but not containers) created by the other.
Manage Pods, Containers and Container Images
podman Simple management tool for pods, containers and images
Got the following error with Docker:[josevnz@dmaf5 ~]$ docker run --detach --name webtop-test --env PUID=1000 --env PGID=1000 --env TZ=America/New_York --env TITLE='Webtop test' --publish 3000:3000 --volume /home/josevnz/webtop/config2:/config lscr.io/linuxserver/webtop:ubuntu-kde
[josevnz@dmaf5 ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8e6a55bfae2a lscr.io/linuxserver/webtop:ubuntu-kde "/init" 43 seconds ago Up 42 seconds 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp, 3389/tcp webtop-test
[josevnz@dmaf5 ~]$ docker logs webtop-test
[custom-init] No custom services found, skipping...
[migrations] started
[migrations] no migrations found
-------------------------------------
_ ()
| | ___ _ __
| | / __| | | / \
| | \__ \ | | | () |
|_| |___/ |_| \__/
Brought to you by linuxserver.io
-------------------------------------
To support LSIO projects visit:
https://www.linuxserver.io/donate/
-------------------------------------
GID/UID
-------------------------------------
User uid: 1000
User gid: 1000
-------------------------------------
Generating 2048 bit rsa key...
ssl_gen_key_xrdp1 ok
saving to rsakeys.ini
..........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*............+.....+....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*........+..+....+.....+......+.............+.........+...+..+...+..................+....+......+..+...+................+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-----
[custom-init] No custom files found, skipping...
guacd[204]: INFO: Guacamole proxy daemon (guacd) version 1.1.0 started
guacd[204]: INFO: Listening on host 0.0.0.0, port 4822
guacd[204]: INFO: Guacamole connection closed during handshake
Starting guacamole-lite websocket server
listening on *:3000
[guac-init] Auto start not set, application start on login
[ls.io-init] done.
[josevnz@dmaf5 ~]$ docker checkpoint create webtop-test checkpoint
Error response from daemon: Cannot checkpoint container webtop-test: runc did not terminate successfully: exit status 1: criu failed: type NOTIFY errno 0 path= /run/containerd/io.containerd.runtime.v2.task/moby/8e6a55bfae2afedb5cf79ee39c4cc05f7cd8e32100323d7f593e7a6c66cab7d3/criu-dump.log: unknown
From the log file: (00.726002) mnt: 1290: 1f:/home/josevnz/webtop/config2 @ ./config
(00.726008) mnt: 1289: 61:/ @ ./dev/shm
(00.731678) mnt: 1288: 5c:/ @ ./dev/mqueue
(00.731740) mnt: 1287: 1a:/ @ ./sys/fs/cgroup
(00.731745) mnt: 1286: 60:/ @ ./sys
(00.731750) mnt: 1285: 5f:/ @ ./dev/pts
(00.731754) mnt: 1284: 5e:/ @ ./dev
(00.731759) mnt: Mount is not fully visible ./dev(1284)
(00.731803) mnt: mount has children ./dev(1284)
(00.737816) mnt: 1283: 5d:/ @ ./proc
(00.737839) mnt: 1282: 1f:/root/var/lib/docker/btrfs/subvolumes/200ebbf8ab6ab1dfd46167cc3e33fb1c2fddacde7d7334b32663874da3fdd25b @ ./
(00.737885) Dumping file-locks
(00.737891) Error (criu/file-lock.c:110): Some file locks are hold by dumping tasks! You can try --file-locks to dump them.
(00.737949) Unlock network
(00.737954) Running network-unlock scripts
(00.737958) RPC
(00.754236) Unfreezing tasks into 1
(00.754252) Unseizing 18029 into 1
(00.754261) Unseizing 18075 into 1
(00.754269) Unseizing 18077 into 1
(00.754278) Unseizing 18096 into 1
(00.754286) Unseizing 18110 into 1
(00.754294) Unseizing 18097 into 1
(00.754300) Unseizing 18098 into 1
(00.754306) Unseizing 18317 into 1
(00.754313) Unseizing 18328 into 1
(00.754320) Unseizing 18099 into 1
(00.754327) Unseizing 18270 into 1
(00.754336) Unseizing 18100 into 1
(00.754343) Unseizing 18299 into 1
(00.754374) Unseizing 18101 into 1
(00.754382) Unseizing 18260 into 1
(00.754388) Unseizing 18102 into 1
(00.754394) Unseizing 18253 into 1
(00.754441) Error (criu/cr-dump.c:2053): Dumping FAILED. Podman error[josevnz@dmaf5 ~]$ sudo -i
[sudo] password for josevnz:
[root@dmaf5 ~]# podman run \
--detach \
--name webtop-test \
--env PUID=1000 \
--env PGID=1000 \
--tz America/New_York \
--env TITLE='Webtop test' \
--publish 3000:3000 \
--volume /home/josevnz/webtop/config:/config \
lscr.io/linuxserver/webtop:ubuntu-kde
Trying to pull lscr.io/linuxserver/webtop:ubuntu-kde...
Getting image source signatures
Copying blob 2f5c2978749a done
Copying blob 8eebfc527709 done
Copying blob 96464c4b8240 done
Copying blob cbba887b2540 done
Copying blob 274402f9efdb done
Copying blob a15ce2a609e0 done
Copying blob 4416ec73a8df done
Copying blob d78d143d868a done
Copying blob b3fd396573c7 done
Copying blob cfa496bce4e3 done
Copying blob 92c76390af24 done
Copying blob da6b6383c539 done
Copying blob 964f23c9979e done
Copying blob 8e742e7730bb done
Copying blob e47ddd62fe0d done
Copying blob c379c3f6fe6b done
Copying config 481d084330 done
Writing manifest to image destination
Storing signatures
856379f32514148b48aa73308cfd53c94c5fc4014534a33f4fcc7c5555aceb77
[root@dmaf5 ~]# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
856379f32514 lscr.io/linuxserver/webtop:ubuntu-kde 9 seconds ago Up 10 seconds ago 0.0.0.0:3000->3000/tcp webtop-test
[root@dmaf5 ~]# podman container checkpoint webtop-test
2023-01-05T02:53:52.535876Z: CRIU checkpointing failed -52. Please check CRIU logfile /var/lib/containers/storage/overlay-containers/856379f32514148b48aa73308cfd53c94c5fc4014534a33f4fcc7c5555aceb77/userdata/dump.log
Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/856379f32514148b48aa73308cfd53c94c5fc4014534a33f4fcc7c5555aceb77/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/856379f32514148b48aa73308cfd53c94c5fc4014534a33f4fcc7c5555aceb77/userdata 856379f32514148b48aa73308cfd53c94c5fc4014534a33f4fcc7c5555aceb77` failed: exit status 1 From the log file: (00.021433) mnt: Inspecting sharing on 1804 shared_id 0 master_id 0 (@./)
(00.021459) Error (criu/mount.c:753): mnt: 1809:./usr/share/zoneinfo/Etc/UTC doesn't have a proper root mount
(00.021496) Unlock network
(00.021504) Running network-unlock scripts
(00.041280) Unfreezing tasks into 1
(00.041317) Unseizing 20205 into 1
(00.041333) Unseizing 20216 into 1
(00.041343) Unseizing 20217 into 1
(00.041353) Unseizing 20237 into 1
(00.041364) Unseizing 20251 into 1
(00.041374) Unseizing 20238 into 1
(00.041385) Unseizing 20239 into 1
(00.041399) Unseizing 20240 into 1
(00.041411) Unseizing 20241 into 1
(00.041427) Unseizing 20242 into 1
(00.041438) Unseizing 20243 into 1
(00.041482) Error (criu/cr-dump.c:2053): Dumping FAILED. So two different type of errors (Docker versus Podman). I can try a different container to test this use case, any recommendations? |
The docker error can be solved by The Podman error can be solved by removing the timezone parameter. The timezone bind mount is not configured correctly by Podman and so CRIU fails during checkpointing. Podman should have a parameter |
Hello, I got it to work with Docker, here are the steps: sudo mkdir -p /etc/criu/
sudo echo file-locks >> /etc/criu/runc.conf
ocker run --detach --name webtop-test --env PUID=1000 --env PGID=1000 --env TZ=America/New_York --env TITLE='Webtop test' --publish 3000:3000 --volume /home/josevnz/webtop/config2:/config lscr.io/linuxserver/webtop:ubuntu-kde
docker checkpoint create webtop-test checkpoint1 # No crash
docker checkpoint ls webtop-test # Shows checkpoint1
docker start --checkpoint checkpoint1 webtop-test # Bring it back
docker logs --follow webtop-test # Shows the container running As for podman using --file-locks, no luck. It is interesting than Docker managed to dodge the bullet: root@dmaf5 ~]# podman container checkpoint --file-locks webtop-test
2023-01-05T09:43:22.959145Z: CRIU checkpointing failed -52. Please check CRIU logfile /var/lib/containers/storage/overlay-containers/bc1375ab4317d5f3a667c52d7eee6a07bc035d20b28e35f50015be3ecb8f65ae/userdata/dump.log
Error: `/usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/bc1375ab4317d5f3a667c52d7eee6a07bc035d20b28e35f50015be3ecb8f65ae/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/bc1375ab4317d5f3a667c52d7eee6a07bc035d20b28e35f50015be3ecb8f65ae/userdata --file-locks bc1375ab4317d5f3a667c52d7eee6a07bc035d20b28e35f50015be3ecb8f65ae` failed: exit status 1 Log message: 00.016607) mnt: Inspecting sharing on 1936 shared_id 0 master_id 0 (@./)
(00.016625) Error (criu/mount.c:753): mnt: 1941:./usr/share/zoneinfo/Etc/UTC doesn't have a proper root mount
(00.016671) Unlock network
(00.016682) Running network-unlock scripts
(00.044621) Unfreezing tasks into 1 I see someone else pointed out this issue, initially tough Podman was the culprit then came back to Criu. It seems than the takeaway from this problem is:
I'm closing this issue, hopefully this. Also thanks for spending your time looking into this, I'm more hopeful now that we can use checkpoints in our applications after getting this complex container to work. |
Description
Checkpoint feature is broken with the following error:
Steps to reproduce the issue:
Confirm is running (podman top webtop-test, podman logs --follow webtop-test
podman container checkpoint webtop-test
podman failes to freeze the container
Describe the results you received:
PODMAN/ Docker crashes
Describe the results you expected:
container is frozen.
Additional information you deem important (e.g. issue happens only occasionally):
This feature doesn't work for any non trivial components. Maybe it should be removed completely from CRIU.
CRIU logs and information:
Will attach full log as file, below just the snippet with the issue
CRIU full dump/restore logs:
PODMAN
DOCKER
Output of `criu --version`:
Output of `criu check --all`:
Additional environment details:
The text was updated successfully, but these errors were encountered: