-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container doesn't start after a system reboot #2150
Comments
@nalind Potentially an issue with the ZFS graph driver? There is a debug message showing that we definitely instructed c/storage to mount the container, but the container entrypoint is missing, leading me to believe that mount didn't actually succeed. @greg-hydrogen Can you do a |
For whatever reason I decided to remove the test container (sorry about that), I recreated it and here is the mount just after running sudo podman mount Container is up and running Reboot [greg@host ~]$ sudo ls /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a sudo podman --log-level debug start test sudo ls /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a now if I manually mount the dataset sudo mount -t zfs containers/podman/storage/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0 sudo ls /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a We can see all the folders are correct and if I try starting via sudo podman start test the container now runs This time I will keep the test container around |
The directories present after |
I just updated the zfs code from Moby in containers storage, but I did not find much differences. |
Looks like that fixed it! I am able to start the container without any issues after a reboot! Shall I close the issue or wait until it is merged? |
Wonderful! Let's wait until the fix is merged into c/storage and merged here. |
hmmm... it looks like I might have spoken too soon, I just recently built podman again with the storage patch and I am producing the same error I was before, I have no idea why it was working a day ago and now it stopped, I can repost the output from a new container if that helps... |
Thanks for reporting back! We'll have a look. |
anything you need from me, anything I can do to help? |
Try again on master. We removed something that we did not think was necessary, and now we have added it back. |
Just tried again, still the same issue, seems like it doesn't mount the dataset, going in and mounting it manually and starting the container works. |
anything else I can help with here? I don't have any coding skills, but I can test |
One problem here is none of us have any experience with zfs. @greg-hydrogen When you say mount the dataset, you are seeing some content mounted and other content not mounted? |
@rhatdan - when I manually mount the zfs dataset via For some reason when issuing If I create a new container everything works as expected, I am able to stop and start it, but when I reboot and try to start it it fails There must be something with the start code where the mount parameter is not getting called or something like that, but of course that is just a pure guess on my part |
What happens if you just do a podman mount CTR, does the mount point get created? |
Doesn't look like, it says it mounts it but nothing happens sudo podman inspect nessus | grep -i mountpoint -B 1 [greg@greg-lnxworkstn ~]$ sudo podman mount nessus [greg@greg-lnxworkstn ~]$ sudo podman mount [greg@greg-lnxworkstn ~]$ sudo podman --log-level debug mount nessus [greg@greg-lnxworkstn ~]$ sudo podman start nessus [greg@greg-lnxworkstn ~]$ sudo mount -t zfs storage/podman/storage/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268 /var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268 [greg@greg-lnxworkstn ~]$ sudo podman start nessus [greg@greg-lnxworkstn ~]$ sudo podman ps |
That indicates the mount count is off. Try podman umount nesus several times until it fails, and then try to mount it again. And see if it mounts. |
Ah I think you got it, haven't tried rebooting yet to test but here is the output [greg@greg-lnxworkstn ~]$ sudo podman umount nessus [greg@greg-lnxworkstn ~]$ sudo podman mount nessus [greg@greg-lnxworkstn ~]$ sudo podman mount nessus this looks promising! |
So I think what is happening is you run your container and it is still running, you reboot, and as far as containers storage is concerned your file system is mounted. For some reason we don't figure out that xfs is not mounted when we check, which we do with other drivers. |
I did forget to mention that after rebooting and performing the same activity as above it works, anything I can assist with? |
any chance someone can look into this? |
@rhatdan did the last round of ZFS fixes not resolve this? |
Oops seem to have lost this in the noise. As I recall the problem here is on reboot. mounted check on zfs thinks that the container is mounted. Here is my attemp to fix this? BTW Is /run on your machine not on a tmpfs? |
Fix?: containers/storage#296 |
The maintainer of Container Storage really does not like the patch. He would like to know what OS you are running and why /run is not a mounted tmpfs. The idea is that this storage content is on a tmpfs and gets wiped out on reboot. If putting a tmpfs on /run is problematic, could you mount a tmpfs on /run/containers and then it would also fix your issues? |
it's interesting, I installed the patch and I can't seem to reproduce the issue, so the patch might have worked I actually run all my containers with --tmpfs /run and --tmpfs /tmp and for the containers listed for this issue, since they are systemd containers I always run them like this (even though podman adds this automatically) podman run -v /sys/fs/cgroup:/sys/fs/cgroup --tmpfs /run --tmpfs /tmp -dti fedora /sbin/init so I am not sure if this issue is related to mounting tmpfs |
@greg-hydrogen I am not talking about inside of your containers. I am talking about on your host. The issue you are seeing with the storage being confused after a reboot, is because information on /run is supposed to be temporary and cleared on a reboot. If you have a tmpfs mounted on your hosts /run or at least on /run/containers then when you reboot your system, the mount count will be reset to 0 when the first container is run. |
@rhatdan that is very odd, and I will take a look when I am back in front of both machines. I don't make any modifications to the standard mounts so I am surprised if /run is not mounted as tmpfs. I will report back |
@rhatdan - below are my tmpfs mounts for both machines Machine 1 Machine 2 |
Is /var/run/containers/storage actually on tmpfs? |
/var/run/containers/storage is actually on the zfs filesystem. I can try mounting it with tmpfs if that would hlep |
Can you put it back on the tmpfs and see if it works? |
so I moved everything back and now I have the following mounts for 2 containers shm on /var/lib/containers/storage/zfs-containers/9c8f4e4dd2bddaee8b3c89d6a6df4e2f7022a46f4b874a230e8b68b000d430bc/userdata/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:container_file_t:s0:c385,c724",size=64000k) when I first tried to launch the container it didn't create the necessary folders, but after a reboot everything looks good. I have rebooted multiple times and everything is still coming up. I did the same for my rootless mounts as well so I will make the changes to go back to shm for the runroot. Not sure why I had configured both the graphroot and the runroot to use zfs, but I do recall that I couldn't start any container with the zfs driver unless both were set (otherwise there was no reason for me to do this) either way it looks like this is incorrect and it appears to be working... I really apologize for wasting everyone's time I will setup a new VM and test as well Thanks for all the help |
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
while this has been solved and can be closed now, I think it is still safe to add a check that the runroot is on volatile storage since we assume that. I've opened a PR here: containers/storage#317 |
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
Make sure the runroot won't persist after a reboot, if it happens then we can carry wrong information on the current active mounts. Closes: containers/podman#2150 Signed-off-by: Giuseppe Scrivano <[email protected]>
/kind bug
Description
Podman does not start a container after the system is rebooted. This is on a ZFS on Linux system.
Steps to reproduce the issue:
A container is created with
sudo podman run --name test -v /sys/fs/cgroup:/sys/fs/cgroup -p 80:80 -p 443:443 -p 5001:5001 --tmpfs /run -dti centos /sbin/init
The system is rebooted
Try to star the container via
sudo podman start test
Describe the results you received:
Try to start the container again and I receive the following error
unable to start container "test": container create failed: container_linux.go:344: starting container process caused "exec: "/sbin/init": stat /sbin/init: no such file or directory"
: internal libpod error
Running debug level logging
sudo podman --log-level debug start test
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver zfs
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "zfs"
DEBU[0000] [zfs] zfs get -rHp -t filesystem all containers/podman/storage
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
DEBU[0000] Made network namespace at /var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
INFO[0000] Got pod network &{Name:test Namespace:test ID:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 NetNS:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd PortMappings:[{HostPort:80 ContainerPort:80 Protocol:tcp HostIP:} {HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to add CNI network cni-loopback (type=loopback)
INFO[0000] Got pod network &{Name:test Namespace:test ID:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 NetNS:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd PortMappings:[{HostPort:80 ContainerPort:80 Protocol:tcp HostIP:} {HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to add CNI network podman (type=bridge)
DEBU[0000] mounted container "70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9" at "/var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb"
DEBU[0000] Created root filesystem for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 at /var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb
DEBU[0000] [0] CNI result: Interfaces:[{Name:cni0 Mac:4a:79:e0:38:5a:7e Sandbox:} {Name:veth3e16c7ea Mac:8e:2b:b7:c7:17:11 Sandbox:} {Name:eth0 Mac:32:3c:14:21:01:2c Sandbox:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd}], IP:[{Version:4 Interface:0xc00027b100 Address:{IP:10.88.0.112 Mask:ffff0000} Gateway:10.88.0.1}], Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:}], DNS:{Nameservers:[] Domain: Search:[] Options:[]}
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] exporting opaque data as blob "sha256:1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] exporting opaque data as blob "sha256:1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] Setting CGroups for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 to machine.slice:libpod:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
WARN[0000] failed to parse language "en_CA.utf8": language: tag is not well-formed
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0000] reading hooks from /etc/containers/oci/hooks.d
DEBU[0000] Created OCI spec for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 at /var/lib/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata/config.json
DEBU[0000] /usr/libexec/podman/conmon messages will be logged to syslog
DEBU[0000] running conmon: /usr/libexec/podman/conmon args=[-s -c 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 -u 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 -r /usr/bin/runc -b /var/lib/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata -p /var/run/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata/pidfile -l /var/lib/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata/ctr.log --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --storage-driver --exit-command-arg zfs --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 --socket-dir-path /var/run/libpod/socket -t --log-level debug --syslog]
INFO[0000] Running conmon under slice machine.slice and unitName libpod-conmon-70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9.scope
DEBU[0000] Received container pid: -1
DEBU[0000] Cleaning up container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
DEBU[0000] Tearing down network namespace at /var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
INFO[0000] Got pod network &{Name:test Namespace:test ID:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 NetNS:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd PortMappings:[{HostPort:80 ContainerPort:80 Protocol:tcp HostIP:} {HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to del CNI network podman (type=bridge)
DEBU[0000] unmounted container "70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9"
ERRO[0000] unable to start container "test": container create failed: container_linux.go:344: starting container process caused "exec: "/sbin/init": stat /sbin/init: no such file or directory"
: internal libpod error
It appears that the container dataset is not mounted during the podman start after a reboot.
sudo podman inspect test| grep -i mountpoint -B 1
"Dataset": "containers/podman/storage/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb",
"Mountpoint": "/var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb"
Below is the contents of the mount point
sudo ls /var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb
dev etc proc run sys tmp var
However if I mount the dataset manually everything works correctly after
sudo mount -t zfs containers/podman/storage/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb /var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb
sudo podman start test
test
sudo podman exec test cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
Describe the results you expected:
Podman should mount the data and run the container as expected
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Fedora 29 Server on a physical host, selinux is in permissive mode
The text was updated successfully, but these errors were encountered: