Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container doesn't start after a system reboot #2150

Closed
greg-hydrogen opened this issue Jan 13, 2019 · 35 comments
Closed

Container doesn't start after a system reboot #2150

greg-hydrogen opened this issue Jan 13, 2019 · 35 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@greg-hydrogen
Copy link

/kind bug
Description

Podman does not start a container after the system is rebooted. This is on a ZFS on Linux system.

Steps to reproduce the issue:

  1. A container is created with
    sudo podman run --name test -v /sys/fs/cgroup:/sys/fs/cgroup -p 80:80 -p 443:443 -p 5001:5001 --tmpfs /run -dti centos /sbin/init

  2. The system is rebooted

  3. Try to star the container via
    sudo podman start test

Describe the results you received:
Try to start the container again and I receive the following error
unable to start container "test": container create failed: container_linux.go:344: starting container process caused "exec: "/sbin/init": stat /sbin/init: no such file or directory"
: internal libpod error

Running debug level logging
sudo podman --log-level debug start test
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver zfs
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "zfs"
DEBU[0000] [zfs] zfs get -rHp -t filesystem all containers/podman/storage
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
DEBU[0000] Made network namespace at /var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
INFO[0000] Got pod network &{Name:test Namespace:test ID:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 NetNS:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd PortMappings:[{HostPort:80 ContainerPort:80 Protocol:tcp HostIP:} {HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to add CNI network cni-loopback (type=loopback)
INFO[0000] Got pod network &{Name:test Namespace:test ID:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 NetNS:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd PortMappings:[{HostPort:80 ContainerPort:80 Protocol:tcp HostIP:} {HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to add CNI network podman (type=bridge)
DEBU[0000] mounted container "70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9" at "/var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb"
DEBU[0000] Created root filesystem for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 at /var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb
DEBU[0000] [0] CNI result: Interfaces:[{Name:cni0 Mac:4a:79:e0:38:5a:7e Sandbox:} {Name:veth3e16c7ea Mac:8e:2b:b7:c7:17:11 Sandbox:} {Name:eth0 Mac:32:3c:14:21:01:2c Sandbox:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd}], IP:[{Version:4 Interface:0xc00027b100 Address:{IP:10.88.0.112 Mask:ffff0000} Gateway:10.88.0.1}], Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:}], DNS:{Nameservers:[] Domain: Search:[] Options:[]}
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] exporting opaque data as blob "sha256:1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] exporting opaque data as blob "sha256:1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] Setting CGroups for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 to machine.slice:libpod:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
WARN[0000] failed to parse language "en_CA.utf8": language: tag is not well-formed
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0000] reading hooks from /etc/containers/oci/hooks.d
DEBU[0000] Created OCI spec for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 at /var/lib/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata/config.json
DEBU[0000] /usr/libexec/podman/conmon messages will be logged to syslog
DEBU[0000] running conmon: /usr/libexec/podman/conmon args=[-s -c 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 -u 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 -r /usr/bin/runc -b /var/lib/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata -p /var/run/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata/pidfile -l /var/lib/containers/storage/zfs-containers/70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9/userdata/ctr.log --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --storage-driver --exit-command-arg zfs --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 --socket-dir-path /var/run/libpod/socket -t --log-level debug --syslog]
INFO[0000] Running conmon under slice machine.slice and unitName libpod-conmon-70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9.scope
DEBU[0000] Received container pid: -1
DEBU[0000] Cleaning up container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
DEBU[0000] Tearing down network namespace at /var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd for container 70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9
INFO[0000] Got pod network &{Name:test Namespace:test ID:70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9 NetNS:/var/run/netns/cni-e4dc6d29-4077-4615-5e34-f3bda7aa82fd PortMappings:[{HostPort:80 ContainerPort:80 Protocol:tcp HostIP:} {HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to del CNI network podman (type=bridge)
DEBU[0000] unmounted container "70a2ed4ade73b6c66cb6f7f4480afd4b1ed5bc178e09ba832848022617b7d4f9"
ERRO[0000] unable to start container "test": container create failed: container_linux.go:344: starting container process caused "exec: "/sbin/init": stat /sbin/init: no such file or directory"
: internal libpod error

It appears that the container dataset is not mounted during the podman start after a reboot.

sudo podman inspect test| grep -i mountpoint -B 1
"Dataset": "containers/podman/storage/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb",
"Mountpoint": "/var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb"
Below is the contents of the mount point
sudo ls /var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb
dev etc proc run sys tmp var

However if I mount the dataset manually everything works correctly after
sudo mount -t zfs containers/podman/storage/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb /var/lib/containers/storage/zfs/graph/56b635ed4657a9202edd3e2ed29edc5a2ed026edc31dd1d8b7e4dbe80cb28ceb

sudo podman start test
test
sudo podman exec test cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

Describe the results you expected:
Podman should mount the data and run the container as expected

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

using the latest master (January 13)
podman version 0.12.2-dev

Output of podman info:

host:
  BuildahVersion: 1.6-dev
  Conmon:
    package: podman-0.12.1.2-1.git9551f6b.fc29.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.14.0-dev, commit: 9b1f0a08285a7f74b21cc9b6bfd98a48905a7ba2'
  Distribution:
    distribution: fedora
    version: "29"
  MemFree: 65933111296
  MemTotal: 67545485312
  OCIRuntime:
    package: runc-1.0.0-66.dev.gitbbb17ef.fc29.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc6+dev
      commit: ead425507b6ba28278ef71ad06582df97f2d5b5f
      spec: 1.0.1-dev
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 16
  hostname: gitlab.fusion.local
  kernel: 4.19.10-300.fc29.x86_64
  os: linux
  rootless: false
  uptime: 13m 47.36s
insecure registries:
  registries: []
registries:
  registries:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 2
  GraphDriverName: zfs
  GraphOptions: null
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Compression: lz4
    Parent Dataset: containers/podman/storage
    Parent Quota: "no"
    Space Available: "442634854400"
    Space Used By Parent: "13794312192"
    Zpool: containers
    Zpool Health: ONLINE
  ImageStore:
    number: 8
  RunRoot: /var/run/containers/storage

Additional environment details (AWS, VirtualBox, physical, etc.):
Fedora 29 Server on a physical host, selinux is in permissive mode

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2019
@mheon
Copy link
Member

mheon commented Jan 13, 2019

@nalind Potentially an issue with the ZFS graph driver?

There is a debug message showing that we definitely instructed c/storage to mount the container, but the container entrypoint is missing, leading me to believe that mount didn't actually succeed.

@greg-hydrogen Can you do a podman mount on an existing container after a reboot, and browse the contents of the directory it gives back? I'm wondering if it's empty.

@greg-hydrogen
Copy link
Author

For whatever reason I decided to remove the test container (sorry about that), I recreated it and here is the mount just after running
sudo podman run --name test -v /sys/fs/cgroup:/sys/fs/cgroup -p 80:80 -p 443:443 -p 5001:5001 --tmpfs /run -dti centos /sbin/init

sudo podman mount
d33b85ad6305 /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a

Container is up and running
sudo podman exec test cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

Reboot

[greg@host ~]$ sudo ls /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a
[greg@host ~]$
sudo podman mount
d33b85ad6305 /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a
sudo ls /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a
[greg@host ~]$
no contents in the folder, running mount on the host to see if the actual dataset is mounted returns nothing
[greg@gitlab ~]$ sudo mount | grep -i 8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a
[greg@gitlab ~]$
now if I try to run podman start test, it fails and populates the directory

sudo podman --log-level debug start test
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver zfs
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "zfs"
DEBU[0000] [zfs] zfs get -rHp -t filesystem all containers/podman/storage
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
DEBU[0000] Made network namespace at /var/run/netns/cni-80dadcdd-c32d-ed58-6e0d-c79ad7b2a524 for container d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83
INFO[0000] Got pod network &{Name:test Namespace:test ID:d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 NetNS:/var/run/netns/cni-80dadcdd-c32d-ed58-6e0d-c79ad7b2a524 PortMappings:[{HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:} {HostPort:80 ContainerPort:80 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to add CNI network cni-loopback (type=loopback)
DEBU[0000] mounted container "d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83" at "/var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a"
DEBU[0000] Created root filesystem for container d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 at /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a
INFO[0000] Got pod network &{Name:test Namespace:test ID:d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 NetNS:/var/run/netns/cni-80dadcdd-c32d-ed58-6e0d-c79ad7b2a524 PortMappings:[{HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:} {HostPort:80 ContainerPort:80 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to add CNI network podman (type=bridge)
DEBU[0000] [0] CNI result: Interfaces:[{Name:cni0 Mac:52:04:5f:ba:50:23 Sandbox:} {Name:veth58dd603d Mac:a6:be:33:c7:66:06 Sandbox:} {Name:eth0 Mac:b6:2c:03:14:ec:58 Sandbox:/var/run/netns/cni-80dadcdd-c32d-ed58-6e0d-c79ad7b2a524}], IP:[{Version:4 Interface:0xc0004410e0 Address:{IP:10.88.0.117 Mask:ffff0000} Gateway:10.88.0.1}], Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:}], DNS:{Nameservers:[] Domain: Search:[] Options:[]}
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode secret
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] exporting opaque data as blob "sha256:1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] exporting opaque data as blob "sha256:1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] parsed reference into "[zfs@/var/lib/containers/storage+/var/run/containers/storage]@1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb"
DEBU[0000] Setting CGroups for container d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 to machine.slice:libpod:d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83
WARN[0000] failed to parse language "en_CA.utf8": language: tag is not well-formed
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0000] reading hooks from /etc/containers/oci/hooks.d
DEBU[0000] Created OCI spec for container d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 at /var/lib/containers/storage/zfs-containers/d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83/userdata/config.json
DEBU[0000] /usr/libexec/podman/conmon messages will be logged to syslog
DEBU[0000] running conmon: /usr/libexec/podman/conmon args=[-s -c d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 -u d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 -r /usr/bin/runc -b /var/lib/containers/storage/zfs-containers/d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83/userdata -p /var/run/containers/storage/zfs-containers/d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83/userdata/pidfile -l /var/lib/containers/storage/zfs-containers/d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83/userdata/ctr.log --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --storage-driver --exit-command-arg zfs --exit-command-arg container --exit-command-arg cleanup --exit-command-arg d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 --socket-dir-path /var/run/libpod/socket -t --log-level debug --syslog]
INFO[0000] Running conmon under slice machine.slice and unitName libpod-conmon-d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83.scope
DEBU[0000] Received container pid: -1
DEBU[0000] Cleaning up container d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83
DEBU[0000] Tearing down network namespace at /var/run/netns/cni-80dadcdd-c32d-ed58-6e0d-c79ad7b2a524 for container d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83
INFO[0000] Got pod network &{Name:test Namespace:test ID:d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83 NetNS:/var/run/netns/cni-80dadcdd-c32d-ed58-6e0d-c79ad7b2a524 PortMappings:[{HostPort:443 ContainerPort:443 Protocol:tcp HostIP:} {HostPort:5001 ContainerPort:5001 Protocol:tcp HostIP:} {HostPort:80 ContainerPort:80 Protocol:tcp HostIP:}] Networks:[] NetworkConfig:map[]}
INFO[0000] About to del CNI network podman (type=bridge)
DEBU[0001] unmounted container "d33b85ad630562fec94460cbfd77d3b36bd3aa3b0de2cff0fab5ffb70888ea83"
ERRO[0001] unable to start container "test": container create failed: container_linux.go:344: starting container process caused "exec: "/sbin/init": stat /sbin/init: no such file or directory"
: internal libpod error

sudo ls /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a
dev etc proc run sys tmp var

now if I manually mount the dataset
"Dataset": "containers/podman/storage/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a",
"Mountpoint": "/var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0

sudo mount -t zfs containers/podman/storage/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0

sudo ls /var/lib/containers/storage/zfs/graph/8afa03e4f13a72a30fefe4636a3e020eb6ee7ae592147dc17372fa072b523b0a
anaconda-post.log bin dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var

We can see all the folders are correct and if I try starting via sudo podman start test the container now runs

This time I will keep the test container around

@mheon
Copy link
Member

mheon commented Jan 13, 2019

The directories present after podman start are probably runc creating a minimal skeleton to run the container with (probably necessary if you have a container image with only a single static-linked binary in it, for example) - so it seems like mounting containers is definitely not working with the ZFS driver after a system restart.

@rhatdan
Copy link
Member

rhatdan commented Jan 14, 2019

I just updated the zfs code from Moby in containers storage, but I did not find much differences.

containers/storage#263

@greg-hydrogen
Copy link
Author

Looks like that fixed it! I am able to start the container without any issues after a reboot! Shall I close the issue or wait until it is merged?

@vrothberg
Copy link
Member

Wonderful! Let's wait until the fix is merged into c/storage and merged here.

@greg-hydrogen
Copy link
Author

hmmm... it looks like I might have spoken too soon, I just recently built podman again with the storage patch and I am producing the same error I was before, I have no idea why it was working a day ago and now it stopped, I can repost the output from a new container if that helps...

@vrothberg
Copy link
Member

Thanks for reporting back! We'll have a look.

@greg-hydrogen
Copy link
Author

anything you need from me, anything I can do to help?

@rhatdan
Copy link
Member

rhatdan commented Jan 18, 2019

Try again on master. We removed something that we did not think was necessary, and now we have added it back.

@greg-hydrogen
Copy link
Author

Just tried again, still the same issue, seems like it doesn't mount the dataset, going in and mounting it manually and starting the container works.

@greg-hydrogen
Copy link
Author

anything else I can help with here? I don't have any coding skills, but I can test

@vrothberg
Copy link
Member

@nalind @rhatdan it looks like we're still hitting the issue.

@rhatdan
Copy link
Member

rhatdan commented Jan 29, 2019

One problem here is none of us have any experience with zfs. @greg-hydrogen When you say mount the dataset, you are seeing some content mounted and other content not mounted?

@greg-hydrogen
Copy link
Author

greg-hydrogen commented Jan 29, 2019

@rhatdan - when I manually mount the zfs dataset via
mount -t zfs dataset podmanmountpoint
all the content is there and I am able to start the container

For some reason when issuing
podman start container
it doesn't perform the mount operation

If I create a new container everything works as expected, I am able to stop and start it, but when I reboot and try to start it it fails

There must be something with the start code where the mount parameter is not getting called or something like that, but of course that is just a pure guess on my part

@rhatdan
Copy link
Member

rhatdan commented Jan 29, 2019

What happens if you just do a podman mount CTR, does the mount point get created?

@greg-hydrogen
Copy link
Author

greg-hydrogen commented Jan 29, 2019

Doesn't look like, it says it mounts it but nothing happens

sudo podman inspect nessus | grep -i mountpoint -B 1
"Dataset": "storage/podman/storage/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268",
"Mountpoint": "/var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268"

[greg@greg-lnxworkstn ~]$ sudo podman mount nessus
/var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268

[greg@greg-lnxworkstn ~]$ sudo podman mount
ebf970cb0e13 /var/lib/containers/storage/zfs/graph/33237b78a0eb12a6fed38552ed266e4c20bf887838449224390f61dc602b0319
f684c208cec7 /var/lib/containers/storage/zfs/graph/1f9f4d82ae6a079e7e50d43fef56a0f522fa6b147aa2401737c1ba6c8e07f7e7

[greg@greg-lnxworkstn ~]$ sudo podman --log-level debug mount nessus
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver zfs
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "zfs"
DEBU[0000] [zfs] zfs get -rHp -t filesystem all storage/podman/storage
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
DEBU[0000] mounted container "3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0" at "/var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268"
/var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268

[greg@greg-lnxworkstn ~]$ sudo podman start nessus
unable to start container "nessus": container create failed: container_linux.go:344: starting container process caused "exec: "/sbin/init": stat /sbin/init: no such file or directory"
: internal libpod error

[greg@greg-lnxworkstn ~]$ sudo mount -t zfs storage/podman/storage/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268 /var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268

[greg@greg-lnxworkstn ~]$ sudo podman start nessus
nessus

[greg@greg-lnxworkstn ~]$ sudo podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3a890ee17a93 localhost/nessus-builder:v1 /sbin/init 3 weeks ago Up About a minute ago 0.0.0.0:8834->8834/tcp nessus

@rhatdan
Copy link
Member

rhatdan commented Jan 29, 2019

That indicates the mount count is off.

Try podman umount nesus several times until it fails, and then try to mount it again. And see if it mounts.
We are keeping track of the number of times a file system is mounted and theoretically only mounting it when it is not mounted, but this might be getting screwed up.

@greg-hydrogen
Copy link
Author

Ah I think you got it, haven't tried rebooting yet to test but here is the output

[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
error unmounting container nessus: error unmounting container 3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0 root filesystem: layer is not mounted
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
error unmounting container nessus: error unmounting container 3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0 root filesystem: layer is not mounted

[greg@greg-lnxworkstn ~]$ sudo podman mount nessus
/var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268
[greg@greg-lnxworkstn ~]$

[greg@greg-lnxworkstn ~]$ sudo podman mount nessus
/var/lib/containers/storage/zfs/graph/735999f8b905627bdfe17a76dff12455e6998d6e285d38561b90919ed970e268
[greg@greg-lnxworkstn ~]$ sudo podman start nessus
nessus
[greg@greg-lnxworkstn ~]$ sudo podman stop nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0
[greg@greg-lnxworkstn ~]$ sudo podman umount nessus
error unmounting container nessus: error unmounting container 3a890ee17a93497efbe7f98bfcd688e5ebe69d6a49d5c5e3915255312ae742c0 root filesystem: layer is not mounted
[greg@greg-lnxworkstn ~]$ sudo podman start nessus
nessus
[greg@greg-lnxworkstn ~]$

this looks promising!

@rhatdan
Copy link
Member

rhatdan commented Jan 29, 2019

So I think what is happening is you run your container and it is still running, you reboot, and as far as containers storage is concerned your file system is mounted. For some reason we don't figure out that xfs is not mounted when we check, which we do with other drivers.
@nalind Make sense?

@greg-hydrogen
Copy link
Author

I did forget to mention that after rebooting and performing the same activity as above it works, anything I can assist with?

@greg-hydrogen
Copy link
Author

any chance someone can look into this?

@mheon
Copy link
Member

mheon commented Mar 4, 2019

@rhatdan did the last round of ZFS fixes not resolve this?

@rhatdan
Copy link
Member

rhatdan commented Mar 4, 2019

Oops seem to have lost this in the noise.

As I recall the problem here is on reboot. mounted check on zfs thinks that the container is mounted.

Here is my attemp to fix this?

BTW Is /run on your machine not on a tmpfs?

@rhatdan
Copy link
Member

rhatdan commented Mar 4, 2019

Fix?: containers/storage#296

@rhatdan
Copy link
Member

rhatdan commented Mar 6, 2019

The maintainer of Container Storage really does not like the patch. He would like to know what OS you are running and why /run is not a mounted tmpfs. The idea is that this storage content is on a tmpfs and gets wiped out on reboot.

If putting a tmpfs on /run is problematic, could you mount a tmpfs on /run/containers and then it would also fix your issues?

@greg-hydrogen
Copy link
Author

it's interesting, I installed the patch and I can't seem to reproduce the issue, so the patch might have worked

I actually run all my containers with --tmpfs /run and --tmpfs /tmp and for the containers listed for this issue, since they are systemd containers I always run them like this (even though podman adds this automatically)

podman run -v /sys/fs/cgroup:/sys/fs/cgroup --tmpfs /run --tmpfs /tmp -dti fedora /sbin/init

so I am not sure if this issue is related to mounting tmpfs

@rhatdan
Copy link
Member

rhatdan commented Mar 7, 2019

@greg-hydrogen I am not talking about inside of your containers. I am talking about on your host.

The issue you are seeing with the storage being confused after a reboot, is because information on /run is supposed to be temporary and cleared on a reboot. If you have a tmpfs mounted on your hosts /run or at least on /run/containers then when you reboot your system, the mount count will be reset to 0 when the first container is run.

@greg-hydrogen
Copy link
Author

@rhatdan that is very odd, and I will take a look when I am back in front of both machines. I don't make any modifications to the standard mounts so I am surprised if /run is not mounted as tmpfs. I will report back

@greg-hydrogen
Copy link
Author

@rhatdan - below are my tmpfs mounts for both machines

Machine 1
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel)
tmpfs on /ramdrive type tmpfs (rw,relatime,seclabel)
tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=6596228k,mode=700,uid=1000,gid=1000)

Machine 2
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel)
tmpfs on /ramdrive type tmpfs (rw,relatime,seclabel)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=6595492k,mode=700,uid=1000,gid=1000)

@rhatdan
Copy link
Member

rhatdan commented Mar 8, 2019

Is /var/run/containers/storage actually on tmpfs?

@greg-hydrogen
Copy link
Author

greg-hydrogen commented Mar 8, 2019

/var/run/containers/storage is actually on the zfs filesystem.
Originally I was not able to start any container unless the graphroot and runroot were both on zfs,

I can try mounting it with tmpfs if that would hlep

@rhatdan
Copy link
Member

rhatdan commented Mar 8, 2019

Can you put it back on the tmpfs and see if it works?

@greg-hydrogen
Copy link
Author

so I moved everything back and now I have the following mounts for 2 containers

shm on /var/lib/containers/storage/zfs-containers/9c8f4e4dd2bddaee8b3c89d6a6df4e2f7022a46f4b874a230e8b68b000d430bc/userdata/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:container_file_t:s0:c385,c724",size=64000k)
shm on /var/lib/containers/storage/zfs-containers/f53b7f2d81415791686aa28b4089785a207ccea76c4bbd313b40af938872671a/userdata/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:container_file_t:s0:c435,c675",size=64000k)

when I first tried to launch the container it didn't create the necessary folders, but after a reboot everything looks good. I have rebooted multiple times and everything is still coming up.

I did the same for my rootless mounts as well so I will make the changes to go back to shm for the runroot.

Not sure why I had configured both the graphroot and the runroot to use zfs, but I do recall that I couldn't start any container with the zfs driver unless both were set (otherwise there was no reason for me to do this) either way it looks like this is incorrect and it appears to be working... I really apologize for wasting everyone's time

I will setup a new VM and test as well

Thanks for all the help

giuseppe added a commit to giuseppe/storage that referenced this issue Apr 12, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

while this has been solved and can be closed now, I think it is still safe to add a check that the runroot is on volatile storage since we assume that. I've opened a PR here: containers/storage#317

giuseppe added a commit to giuseppe/storage that referenced this issue Apr 12, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 17, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 17, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 18, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 18, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 18, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 23, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 24, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 24, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 25, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Apr 26, 2019
Make sure the runroot won't persist after a reboot, if it happens then
we can carry wrong information on the current active mounts.

Closes: containers/podman#2150

Signed-off-by: Giuseppe Scrivano <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants