Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman does not remove the overlay storage when the systemd service is restarted during reboot or shutdown #21093

Closed
disi opened this issue Dec 27, 2023 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@disi
Copy link

disi commented Dec 27, 2023

Issue Description

When the system is rebooted and systemd shuts down, it fails to remove the storage for every container on the system. Once the system comes back up and tries to start the service, it cannot create a new storage, because of the existing one.
Dec 27 11:08:07 dombox podman[1525]: time="2023-12-27T11:08:07Z" level=warning msg="Unmounting container \"frigate\" while attempting to delete storage: unmounting \"/var/lib/containers/storage/overlay/300f007294edabcaf4dff453cca24448d74d235076494e326f057262fed864f0/merged\": invalid argument"
And when the service tries to start after reboot:
dombox podman[1490890]: Error: reading CIDFile: open /run/container-frigate.service.ctr-id: no such file or directory

This worked before and stopped working about a week ago.
It happens to all containers with podman systemd services.
I can restart, stop, start containers via systemd once they are running, but not reboot or shutdown and start the operating system.

Steps to reproduce the issue

Steps to reproduce the issue
0. podman run -d -n frigate other options

  1. podman generate systemd -f --new --name frigate
  2. cp container-frigate.service /usr/lib/systemd/system/
  3. systemctl daemon-reload
  4. systemctl enable --now container-frigate.service
  5. reboot

Describe the results you received

Dec 27 11:08:07 dombox podman[1525]: time="2023-12-27T11:08:07Z" level=warning msg="Unmounting container \"frigate\" while attempting to delete storage: unmounting \"/var/lib/containers/storage/overlay/300f007294edabcaf4dff453cca24448d74d235076494e326f057262fed864f0/merged\": invalid argument"
And during boot
dombox podman[1490890]: Error: reading CIDFile: open /run/container-frigate.service.ctr-id: no such file or directory

Describe the results you expected

clean shutdown of the container during reboot or shutdown

podman info output

`podman-4.6.1-7.el9_3.x86_64`

Client:       Podman Engine
Version:      4.6.1
API Version:  4.6.1
Go Version:   go1.20.10
Built:        Tue Dec 12 22:13:14 2023
OS/Arch:      linux/amd64
host:
  arch: amd64
  buildahVersion: 1.31.3
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: 879ca989e09d731947cd8d9cbb41038549bf669d'
  cpuUtilization:
    idlePercent: 90.19
    systemPercent: 2.91
    userPercent: 6.91
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: '"almalinux"'
    version: "9.3"
  eventLogger: journald
  freeLocks: 2033
  hostname: dombox.dom
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-362.13.1.el9_3.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 30156939264
  memTotal: 33225449472
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.7.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: netavark-1.7.0-2.el9_3.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: crun-1.8.7-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.7
      commit: 53a9996ce82d1ee818349bdcc64797a1fa0433c4
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.2.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 511700992
  swapTotal: 511700992
  uptime: 0h 7m 53.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 8
    paused: 0
    running: 8
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 103028883456
  graphRootUsed: 18046713856
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.6.1
  Built: 1702419194
  BuiltTime: Tue Dec 12 22:13:14 2023
  GitCommit: ""
  GoVersion: go1.20.10
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.1


### Podman in a container

No

### Privileged Or Rootless

Privileged

### Upstream Latest Release

Yes

### Additional environment details

Additional environment details

### Additional information

I am not a podman expert and it all ran for a year just fine, this only happened since about a week ago. I have now tried to add Before and After to the services in relation to other container services, but get the same result. The first container it tries to start already fails after reboot.
Only now I noticed these issues. I update the OS (AlmaLinux) regularly.

I made a little script, I run now after every reboot/shutdown, then reboot again as a workaround, after the second reboot the services/containers start as expected:

[root@dombox ~]# cat remove_podman_storage.sh
#!/bin/bash

yes | podman rm --storage homeassistant
yes | podman rm --storage mosquitto
yes | podman rm --storage zigbee2mqtt
yes | podman rm --storage wyoming-piper
yes | podman rm --storage wyoming-whisper
yes | podman rm --storage frigate
yes | podman rm --storage pihole
yes | podman rm --storage hass-configurator

@disi disi added the kind/bug Categorizes issue or PR as related to a bug. label Dec 27, 2023
@rhatdan
Copy link
Member

rhatdan commented Dec 27, 2023

I am not sure what is going on, but it looks like the containers are killed before they cleanup. Have you thought about using quadlets to define your containers? It has the latest podman commands for running containers under systemd, while podman generate systemd generates a systemd, that is a snapshot in time, and could have older bugs.

@disi
Copy link
Author

disi commented Dec 27, 2023

I am not sure what is going on, but it looks like the containers are killed before they cleanup. Have you thought about using quadlets to define your containers? It has the latest podman commands for running containers under systemd, while podman generate systemd generates a systemd, that is a snapshot in time, and could have older bugs.

Thank you for the tip, I'll have a look. Not too many containers need to be moved, I'll try with one container for now.

p.s. it's quite a task rewriting the service to quadlets, but after reboot the only container up, is the one I rewrote:

CONTAINER ID  IMAGE                                   COMMAND     CREATED         STATUS         PORTS       NAMES
71cbf8cb183f  ghcr.io/blakeblackshear/frigate:stable              20 seconds ago  Up 21 seconds              frigate

Disadvantage it still does not run correctly, as it cannot access the OpenVino libs for the GPU
RuntimeError: Failed to create plugin /usr/local/lib/python3.9/dist-packages/openvino/libs/libopenvino_intel_gpu_plugin.so for device GPU
I guess it has to do with privileged mode in the normal podman, I could not see an option mapping here:
https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html
Also SHMSize dryrun told me it is an unsupported key in the "Container" Section.
converting "container-frigate.container": unsupported key 'ShmSize' in group 'Container' in /etc/containers/systemd/container-frigate.container

I'll go back to the normal systemd service for now to get it up and running, except reboot.

pps.
OK, Frigate is fixed and running via quadlet now :) It was missing:

AddDevice=/dev/dri/card0
AddDevice=/dev/dri/renderD128

It may be better running these containers with not privileged.

@disi
Copy link
Author

disi commented Dec 28, 2023

So, I have rewritten them all as quadlets and now I have the same issue:
And after reboot:

Dec 28 10:16:41 dombox container-frigate[1521]: time="2023-12-28T10:16:41Z" level=warning msg="Unmounting container \"frigate\" while attempting to delete storage: unmounting \"/var/lib/containers/storage/overlay/0c260cff1e6c0ab7b04bae2249df0d518e8a42b13a649701e55390a523e2cfcb/merged\": invalid argument"
Dec 28 10:16:41 dombox container-frigate[1521]: Error: removing storage for container "frigate": unmounting "/var/lib/containers/storage/overlay/0c260cff1e6c0ab7b04bae2249df0d518e8a42b13a649701e55390a523e2cfcb/merged": invalid argument
Dec 28 10:16:41 dombox podman[1521]: 2023-12-28 10:16:41.457749847 +0000 GMT m=+0.025109765 image pull dac652c4cf36785c91195cdba2d45ad2b320ae417ca221fcff91b4817c34d4dd ghcr.io/blakeblackshear/frigate:stable
Dec 28 10:16:41 dombox systemd[1]: container-frigate.service: Main process exited, code=exited, status=125/n/a
Dec 28 10:16:41 dombox systemd[1]: var-lib-containers-storage-overlay.mount: Deactivated successfully.
Dec 28 10:16:41 dombox systemd[1]: container-frigate.service: Failed with result 'exit-code'.
Dec 28 10:16:41 dombox systemd[1]: Failed to start Podman Quadlet container-frigate.
Dec 28 10:16:41 dombox systemd[1]: container-frigate.service: Scheduled restart job, restart counter is at 5.
Dec 28 10:16:41 dombox systemd[1]: Stopped Podman Quadlet container-frigate.
Dec 28 10:16:41 dombox systemd[1]: container-frigate.service: Start request repeated too quickly.
Dec 28 10:16:41 dombox systemd[1]: container-frigate.service: Failed with result 'exit-code'.
Dec 28 10:16:41 dombox systemd[1]: Failed to start Podman Quadlet container-frigate.

The system is running only 8 containers as service.
Running my workaround script, I can then start the services one by one.

@rhatdan
Copy link
Member

rhatdan commented Dec 28, 2023

@vrothberg @giuseppe thoughts?

@disi
Copy link
Author

disi commented Dec 28, 2023

Here is the container:

[root@dombox ~]# cat /etc/containers/systemd/container-frigate.container
[Unit]
Description=Podman Quadlet container-frigate

[Container]
ContainerName=frigate

Image=ghcr.io/blakeblackshear/frigate:stable

Network=host

Volume=/etc/localtime:/etc/localtime:ro
Volume=/stratis/frigate:/config
Volume=/frigate:/media/frigate

AddDevice=/dev/dri/card0
AddDevice=/dev/dri/renderD128

AutoUpdate=registry

Environment=FRIGATE_RTSP_PASSWORD=dom
Tmpfs=/tmp/cache:rw

[Service]
Restart=always

[Install]
WantedBy=default.target

I only show the logs for that frigate container to keep it consistent, but the others have the same problem.

@vrothberg
Copy link
Member

So, I have rewritten them all as quadlets and now I have the same issue:
And after reboot:

I don't see the reading CIDFile error in the logs with Quadlet. Any chance you can use a newer version of Podman?

@Luap99
Copy link
Member

Luap99 commented Jan 3, 2024

Dup of #19913 and #19491 I think, basically podman 4.6 has c/storage versions that reports a bunch of errors that we can't handle on unclean shutdown. With podman 4.7 that is fixed

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Apr 3, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants