Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman with custom static_dir slow to tear-down containers #19938

Closed
srcshelton opened this issue Sep 12, 2023 · 26 comments · Fixed by #20088
Closed

podman with custom static_dir slow to tear-down containers #19938

srcshelton opened this issue Sep 12, 2023 · 26 comments · Fixed by #20088
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@srcshelton
Copy link
Contributor

srcshelton commented Sep 12, 2023

Issue Description

Whilst podman is able to start containers from even very large images fast enough that it feels interactively instant, shutting them down again takes what feels like a long time.

I wonder whether this is because I'm running the btrfs storage-driver, because I'm not running on a system managed by systemd (and so perhaps some call-back is failing?), or whether some default timeout is being adhered to even if the container processes have exited near the start of the timeout period?

Steps to reproduce the issue

$ sudo podman run -it --rm docker.io/alpine:latest ; date
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 7264a8db6415 skipped: already exists
Copying config 7e01a0d0a1 done
Writing manifest to image destination
/ # date ; exit
Tue Sep 12 08:40:27 UTC 2023
Tue 12 Sep 09:40:47 BST 2023

Describe the results you received

There is a 20s pause between the container process exiting and the container itself having been fully torn-down.

N.B. This is not a new problem - this system has been upgraded from podman 2.x with the VFS storage-driver, but has always been slow to tear-down containers.

Describe the results you expected

In other scenarios, if the process within the container exits then the container is torn-down effectively instantly.

podman run has a --stop-timeout which apparently defaults to 10 seconds, but the stop delay is actually twice that - and in this case the container process has itself exited, so I'd imagine that any timeout dependent on processes within the container (as opposed to related host processes such as crun/runc) should not be adhered to?

podman info output

host:
  arch: amd64
  buildahVersion: 1.31.2
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: app-containers/conmon-2.1.8
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.1.8, commit: 00e08f4a9ca5420de733bf542b930ad58e1a7e7d'
  cpuUtilization:
    idlePercent: 95.44
    systemPercent: 2.05
    userPercent: 2.5
  cpus: 8
  databaseBackend: sqlite
  distribution:
    distribution: gentoo
    version: "2.14"
  eventLogger: file
  freeLocks: 2021
  hostname: dellr330
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.5.2-gentoo
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 33188667392
  memTotal: 67331043328
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: app-containers/aardvark-dns-1.7.0
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: app-containers/netavark-1.7.0
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: app-containers/crun-1.9
    path: /usr/bin/crun
    version: |-
      crun version 1.9
      commit: a538ac4ea1ff319bcfe2bf81cb5c6f687e2dc9d3
      rundir: /var/run/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 42949652480
  swapTotal: 42949652480
  uptime: 8h 6m 42.00s (Approximately 0.33 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  localhost:5000:
    Blocked: false
    Insecure: true
    Location: localhost:5000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: localhost:5000
    PullFromMirror: ""
  search:
  - docker.io
  - docker.pkg.github.com
  - quay.io
  - public.ecr.aws
  - registry.fedoraproject.org
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 25
    paused: 0
    running: 24
    stopped: 1
  graphDriverName: btrfs
  graphOptions: {}
  graphRoot: /mnt/podman-storage
  graphRootAllocated: 2000397795328
  graphRootUsed: 73760190464
  graphStatus:
    Build Version: Btrfs v6.3.3
    Library Version: "102"
  imageCopyTmpDir: /var/tmp/.private/root
  imageStore:
    number: 120
  runRoot: /var/run/podman
  transientStore: false
  volumePath: /space/podman/volumes
version:
  APIVersion: 4.6.2
  Built: 1693253621
  BuiltTime: Mon Aug 28 21:13:41 2023
  GitCommit: 5db42e86862ef42c59304c38aa583732fd80f178
  GoVersion: go1.20.7
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.2

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Host is running openrc, not systemd.

Additional information

The size of the image (within reason) does not seem to have a significant effect on the tear-down time - for a 1.08GB image, the result is:

$ sudo podman run -it --rm --entrypoint /bin/sh localhost/gentoo-build:latest ; date
/ # date ; exit
Tue Sep 12 10:03:07 BST 2023
Tue 12 Sep 10:03:27 BST 2023

… so the delay appears to be a consistent 20s.

@srcshelton srcshelton added the kind/bug Categorizes issue or PR as related to a bug. label Sep 12, 2023
@giuseppe
Copy link
Member

looks like it takes a while to delete the btrfs volume.

Can you try with the overlay driver?

@srcshelton
Copy link
Contributor Author

srcshelton commented Sep 12, 2023

It's been like this since before I switched to the btrfs driver - if anything, it was worse with the overlay driver (although I didn't gather any timing on this).

(Is there any way to switch drivers without having to podman system reset and save/restore all images?)

Or, is there any instrumentation I can add/debug options I can enable that will get the innards of podman/libpod to tell us how much time is taken in removing the btrfs subvolume, to rule this in or out?

I believe that subvolume removal should be instant, and whenever I've removed subvolumes manually on the same system, there's never been any noticeable delay.

@giuseppe
Copy link
Member

you could experiment with overlay on a different store:

podman --storage-driver overlay --root /somewhere/else ...

and see how that performs.

You cannot switch drivers without recreating the images, what you could try is to copy them:

$ podman --root /new/storage pull "containers-storage:[btrfs@/old/path+/run/dir]IMAGE"

@Luap99
Copy link
Member

Luap99 commented Sep 13, 2023

The 20 seconds is the delay uses to wait for the exit code to appear in the db. You can see that with --log-level debug.

So the real problem is is likely the podman container cleanup command not working correctly or something wrong in conmon maybe.

@srcshelton
Copy link
Contributor Author

The 20 seconds is the delay uses to wait for the exit code to appear in the db. You can see that with --log-level debug.

So the real problem is is likely the podman container cleanup command not working correctly or something wrong in conmon maybe.

Setup:

$ sudo podman run -it --rm --log-level debug docker.io/alpine:latest ; date
INFO[0000] podman filtering at log level debug          
DEBU[0000] Called run.PersistentPreRunE(podman run -it --rm --log-level debug docker.io/alpine:latest) 
DEBU[0000] Using conmon: "/usr/libexec/podman/conmon"   
DEBU[0000] Using graph driver btrfs                     
DEBU[0000] Using graph root /mnt/podman-storage         
DEBU[0000] Using run root /var/run/podman               
DEBU[0000] Using static dir /space/podman/static        
DEBU[0000] Using tmp dir /var/run/libpod                
DEBU[0000] Using volume path /space/podman/volumes      
DEBU[0000] Using transient store: false                 
DEBU[0000] [graphdriver] trying provided driver "btrfs" 
DEBU[0000] Initializing event backend file              
DEBU[0000] Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument 
DEBU[0000] Configured OCI runtime ocijail initialization failed: no valid executable found for OCI runtime ocijail: invalid argument 
DEBU[0000] Configured OCI runtime crun-wasm initialization failed: no valid executable found for OCI runtime crun-wasm: invalid argument 
DEBU[0000] Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument 
DEBU[0000] Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument 
DEBU[0000] Configured OCI runtime youki initialization failed: no valid executable found for OCI runtime youki: invalid argument 
DEBU[0000] Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument 
DEBU[0000] Using OCI runtime "/usr/bin/crun"            
INFO[0000] Setting parallel job count to 25             
DEBU[0000] Successfully loaded 1 networks               
DEBU[0000] Pulling image docker.io/alpine:latest (policy: missing) 
DEBU[0000] Looking up image "docker.io/alpine:latest" in local containers storage 
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux  [] } 
DEBU[0000] Trying "docker.io/library/alpine:latest" ... 
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage 
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage ([btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb) 
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Looking up image "docker.io/library/alpine:latest" in local containers storage 
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux  [] } 
DEBU[0000] Trying "docker.io/library/alpine:latest" ... 
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Found image "docker.io/library/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage 
DEBU[0000] Found image "docker.io/library/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage ([btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb) 
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Looking up image "docker.io/alpine:latest" in local containers storage 
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux  [] } 
DEBU[0000] Trying "docker.io/library/alpine:latest" ... 
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage 
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage ([btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb) 
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb 
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb 
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb 
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb 
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb 
DEBU[0000] using systemd mode: false                    
DEBU[0000] No hostname set; container's hostname will default to runtime default 
DEBU[0000] Loading seccomp profile from "/usr/share/containers/seccomp.json" 
DEBU[0000] Allocated lock 27 for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb" 
DEBU[0000] Created container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" 
DEBU[0000] Container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" has work directory "/mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata" 
DEBU[0000] Container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" has run directory "/var/run/podman/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata" 
DEBU[0000] Handling terminal attach                     
INFO[0000] Received shutdown.Stop(), terminating!        PID=18606
DEBU[0000] Enabling signal proxying                     
DEBU[0000] Mounted container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" at "/mnt/podman-storage/btrfs/subvolumes/5e0f0219a6a9ea6b9515a5517ece74918ee2aa48078a1acd9625f0aca195251e" 
DEBU[0000] Made network namespace at /var/run/netns/netns-df26baeb-2b07-198b-6297-4e5edf5a92c1 for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0000] Created root filesystem for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 at /mnt/podman-storage/btrfs/subvolumes/5e0f0219a6a9ea6b9515a5517ece74918ee2aa48078a1acd9625f0aca195251e 
[DEBUG netavark::network::validation] "Validating network namespace..."
                                                                       [DEBUG netavark::commands::setup] "Setting up..."
                                                                                                                        [INFO  netavark::firewall] Using iptables firewall driver
            [DEBUG netavark::network::bridge] Setup network podman
                                                                  [DEBUG netavark::network::bridge] Container interface name: eth0 with IP addresses [172.18.95.73/16]
 [DEBUG netavark::network::bridge] Bridge name: podman0 with IP addresses [172.18.0.1/16]
                                                                                         [DEBUG netavark::network::core_utils] Setting sysctl value for net.ipv4.ip_forward to 1
           [DEBUG netavark::network::core_utils] Setting sysctl value for /proc/sys/net/ipv6/conf/eth0/autoconf to 0
                                                                                                                    [INFO  netavark::network::netlink] Adding route (dest: 0.0.0.0/0 ,gw: 172.18.0.1, metric 100)
                                            [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-1D8721804F16F created on table nat
                                                                                                                                             [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_ISOLATION_2 created on table filter
                                                                          [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_ISOLATION_3 created on table filter
       [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_FORWARD exists on table filter
                                                                                                    [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_FORWARD exists on table filter
                            [DEBUG netavark::firewall::varktables::helpers] rule -d 172.18.0.0/16 -j ACCEPT created on table nat and chain NETAVARK-1D8721804F16F
                                                                                                                                                                 [DEBUG netavark::firewall::varktables::helpers] rule ! -d 224.0.0.0/4 -j MASQUERADE created on table nat and chain NETAVARK-1D8721804F16F
                                                                                                                                     [DEBUG netavark::firewall::varktables::helpers] rule -s 172.18.0.0/16 -j NETAVARK-1D8721804F16F created on table nat and chain POSTROUTING
                                                                                                          [DEBUG netavark::firewall::varktables::helpers] rule -d 172.18.0.0/16 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT created on table filter and chain NETAVARK_FORWARD
                                                                                                                  [DEBUG netavark::firewall::varktables::helpers] rule -s 172.18.0.0/16 -j ACCEPT created on table filter and chain NETAVARK_FORWARD
                                                                               [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-SETMARK exists on table nat
             [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-SETMARK exists on table nat
                                                                                                                [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-MASQ exists on table nat
                                           [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-MASQ exists on table nat
                                                                                                                                           [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-DNAT exists on table nat
                                                                      [DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-DNAT exists on table nat
 [DEBUG netavark::firewall::varktables::helpers] rule -j MARK  --set-xmark 0x2000/0x2000 exists on table nat and chain NETAVARK-HOSTPORT-SETMARK
                                                                                                                                                [DEBUG netavark::firewall::varktables::helpers] rule -j MASQUERADE -m comment --comment 'netavark portfw masq mark' -m mark --mark 0x2000/0x2000 exists on table nat and chain NETAVARK-HOSTPORT-MASQ
           [DEBUG netavark::firewall::varktables::helpers] rule -j NETAVARK-HOSTPORT-DNAT -m addrtype --dst-type LOCAL exists on table nat and chain PREROUTING
                                                                                                                                                               [DEBUG netavark::firewall::varktables::helpers] rule -j NETAVARK-HOSTPORT-DNAT -m addrtype --dst-type LOCAL exists on table nat and chain OUTPUT
                                                                                                                                          [DEBUG netavark::commands::setup] {
                "podman": StatusBlock {
                                                   dns_search_domains: Some(
                                                                                            [],
                                                                                                           ),
                                                                                                                         dns_server_ips: Some(
                                                                                                                                                              [],
                                                                                                                                                                             ),
                      interfaces: Some(
                                                       {
                                                                            "eth0": NetInterface {
                                                                                                                          mac_address: "fe:84:9e:e3:96:eb",
                                                                                                                                                                                   subnets: Some(
                                                        [
                                                                                         NetAddress {
                                                                                                                                         gateway: Some(
                                                                                                                                                                                               172.18.0.1,
                                                                         ),
                                                                                                               ipnet: 172.18.95.73/16,
                                                                                                                                                                      },
                               ],
                                                         ),
                                                                               },
                                                                                                 },
                                                                                                               ),
                                                                                                                         },
                                                                                                                               }
                                                                                                                                [DEBUG netavark::commands::setup] "Setup complete"
             DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode subscription 
DEBU[0000] Setting Cgroup path for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 to /libpod_parent/libpod-509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d 
DEBU[0000] Workdir "/" resolved to host path "/mnt/podman-storage/btrfs/subvolumes/5e0f0219a6a9ea6b9515a5517ece74918ee2aa48078a1acd9625f0aca195251e" 
DEBU[0000] Created OCI spec for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 at /mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/config.json 
DEBU[0000] /usr/libexec/podman/conmon messages will be logged to syslog 
DEBU[0000] running conmon: /usr/libexec/podman/conmon    args="[--api-version 1 -c 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 -u 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 -r /usr/bin/crun -b /mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata -p /var/run/podman/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/pidfile -n zen_hertz --exit-dir /var/run/libpod/exits --full-attach -l k8s-file:/mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/ctr.log --log-level debug --syslog -t --conmon-pidfile /var/run/podman/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /mnt/podman-storage --exit-command-arg --runroot --exit-command-arg /var/run/podman --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --network-config-dir --exit-command-arg  --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /space/podman/volumes --exit-command-arg --db-backend --exit-command-arg sqlite --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg btrfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938]"
DEBU[0000] Received: 19209                              
INFO[0000] Got Conmon PID as 19207                      
DEBU[0000] Created container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 in OCI runtime 
DEBU[0000] Adding nameserver(s) from network status of '[]' 
DEBU[0000] Adding search domain(s) from network status of '[]' 
DEBU[0000] Attaching to container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0000] Starting container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 with command [/bin/sh] 
DEBU[0000] Received a resize event: {Width:165 Height:46} 
DEBU[0000] Started container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0000] Notify sent successfully

Tear-down:

/ # date ; exit
Wed Sep 13 09:45:46 UTC 2023
DEBU[0028] Exceeded conmon timeout waiting for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 to exit 
DEBU[0028] Checking if container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 should restart 
DEBU[0028] Removing container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0028] Cleaning up container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0028] Tearing down network namespace at /var/run/netns/netns-df26baeb-2b07-198b-6297-4e5edf5a92c1 for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
[DEBUG netavark::commands::teardown] "Tearing down.."
[INFO  netavark::firewall] Using iptables firewall driver
[INFO  netavark::network::bridge] removing bridge podman0
[DEBUG netavark::commands::teardown] "Teardown complete"
DEBU[0029] Successfully cleaned up container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0029] Unmounted container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" 
DEBU[0029] Removing all exec sessions for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 
DEBU[0029] Container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 storage is already unmounted, skipping... 
DEBU[0029] Called run.PersistentPostRunE(podman run -it --rm --log-level debug docker.io/alpine:latest) 
DEBU[0029] Shutting down engines                        
Wed 13 Sep 10:46:07 BST 2023

Unfortunately there's no per-debug-statement timing information, but Exceeded conmon timeout waiting for container looks suspicious.

@Luap99
Copy link
Member

Luap99 commented Sep 13, 2023

Exceeded conmon timeout waiting for container yes this is the line printed after 20s of waiting

case !conmonAlive:
// Give the exit code at most 20 seconds to
// show up in the DB. That should largely be
// enough for the cleanup process.
timerDuration := time.Second * 20
conmonTimer = *time.NewTimer(timerDuration)
conmonTimerSet = true
case conmonAlive:
// Continue waiting if conmon's still running.
return false, -1, nil
}
}
timedout := ""
if !containerRemoved {
// If conmon is dead for more than $timerDuration or if the
// container has exited properly, try to look up the exit code.
select {
case <-conmonTimer.C:
logrus.Debugf("Exceeded conmon timeout waiting for container %s to exit", id)
timedout = " [exceeded conmon timeout waiting for container to exit]"
default:

@Luap99
Copy link
Member

Luap99 commented Sep 13, 2023

Can you run podman ps -a in another terminal while it is waiting, then we see what state the container is in.

@srcshelton
Copy link
Contributor Author

you could experiment with overlay on a different store:

podman --storage-driver overlay --root /somewhere/else ...

and see how that performs.

Ah - you might be on to something here!

Running from tmpfs:

$ sudo podman run -it --rm --storage-driver overlay --root /tmp/podman-alt-root docker.io/library/alpine:latest ; date
/ # date ; exit
Wed Sep 13 09:53:52 UTC 2023
Wed 13 Sep 10:53:52 BST 2023

… running from NVMe, but using the overlay driver:

$ sudo podman run -it --rm --storage-driver overlay --root /mnt/nvme/podman-alt docker.io/library/alpine:latest ; date
/ # date ; exit
Wed Sep 13 09:56:05 UTC 2023
Wed 13 Sep 10:56:05 BST 2023

So the issue does seem to be related to the btrfs storage-driver.

… apologies, my recollection is that it had always been slow even before migrating to btrfs (which was a change I only made a few months ago).

Also, thanks for the image-copy command - I wasn't aware that was an option!

@srcshelton
Copy link
Contributor Author

srcshelton commented Sep 13, 2023

Can you run podman ps -a in another terminal while it is waiting, then we see what state the container is in.

docker.io/alpine:latest  /bin/sh  14 seconds ago  Exited (0) 2 seconds ago

… so the state seems to have moved to Exited even though the instance is waiting on conmon (which only appears to happen with the btrfs driver).

@rhatdan
Copy link
Member

rhatdan commented Sep 14, 2023

None of the core maintainers works on the BTRFS driver, so unlikely to be fixed unless community members step up.

@srcshelton
Copy link
Contributor Author

Is it worth filing a bug against https://github.com/containers/conmon, or is that the same audience of maintainers/contributors as here - or indeed, does the above debugging information indicate that the issue is (triggered) within podman rather than conmon?

@giuseppe
Copy link
Member

hard to see where is the error without being able to reproduce it.

I'd suggest you manually try running podman container cleanup while the cleanup hangs to see if you can catch any useful error, or use something like /usr/share/bcc/tools/execsnoop to detect what processes run when the container terminates

@srcshelton
Copy link
Contributor Author

I've reproduced this issue on a couple of systems which are using the overlay driver, one amd64 and one arm64.

amd64:

$ sudo podman run -it --rm docker.io/alpine:latest ; date
Password:
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 7264a8db6415 done
Copying config 7e01a0d0a1 done
Writing manifest to image destination
/ # date ; exit
Thu Sep 14 14:18:01 UTC 2023
Thu 14 Sep 15:18:22 BST 2023

Configuration:

host:
  arch: amd64
  buildahVersion: 1.31.2
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: app-containers/conmon-2.1.8
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: 00e08f4a9ca5420de733bf542b930ad58e1a7e7d'
  cpuUtilization:
    idlePercent: 96.75
    systemPercent: 2.85
    userPercent: 0.4
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: gentoo
    version: "2.14"
  eventLogger: file
  freeLocks: 2032
  hostname: apu2e
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.4-gentoo
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1112412160
  memTotal: 4115214336
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: app-containers/aardvark-dns-1.7.0
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: app-containers/netavark-1.7.0
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: app-containers/crun-1.9
    path: /usr/bin/crun
    version: |-
      crun version 1.9
      commit: a538ac4ea1ff319bcfe2bf81cb5c6f687e2dc9d3
      rundir: /var/run/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2684350464
  swapTotal: 2684350464
  uptime: 5306h 12m 48.00s (Approximately 221.08 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  localhost:5000:
    Blocked: false
    Insecure: true
    Location: localhost:5000
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: localhost:5000
    PullFromMirror: ""
  search:
  - docker.io
  - docker.pkg.github.com
  - quay.io
  - public.ecr.aws
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 6
    paused: 0
    running: 4
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.ignore_chown_errors: "false"
  graphRoot: /storage/containers/podman/storage
  graphRootAllocated: 31526481920
  graphRootUsed: 16571924480
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /storage/containers/podman/tmp
  imageStore:
    number: 315
  runRoot: /storage/containers/podman/run
  transientStore: false
  volumePath: /storage/containers/podman/volumes
version:
  APIVersion: 4.6.2
  Built: 1693466451
  BuiltTime: Thu Aug 31 08:20:51 2023
  GitCommit: 5db42e86862ef42c59304c38aa583732fd80f178
  GoVersion: go1.20.7
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.2

Concurrent containers:

$ sudo podman ps -a
CONTAINER ID  IMAGE                                          COMMAND               CREATED      STATUS                  PORTS       NAMES
e11efc411a22  localhost/service.sys-apps.irqbalance:1.9.2    --foreground --pi...  13 days ago  Up 13 days                          openrc-irqbalance-1.9.2
a4b0d0ddda5b  localhost/service.app-admin.metalog:20220214   -p /var/run/metal...  13 days ago  Up 13 days                          openrc-metalog-20220214
929329c36baf  localhost/service.net-misc.openntpd:6.2_p3-r2  -d -f /etc/opennt...  13 days ago  Up 13 days                          openrc-openntpd-6.2_p3-r2
8fab39f502fa  localhost/service.net-proxy.privoxy:3.0.33     --no-daemon --pid...  13 days ago  Up 13 days                          openrc-privoxy-3.0.33

arm64:

$ sudo podman run -it --rm docker.io/alpine:latest ; date
Password:
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 9fda8d8052c6 done
Copying config f6648c04cd done
Writing manifest to image destination
/ # date ; exit
Thu Sep 14 14:18:51 UTC 2023
Thu 14 Sep 15:19:11 BST 2023

Configuration:

host:
  arch: arm64
  buildahVersion: 1.31.2
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: app-containers/conmon-2.1.7
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.1.7, commit: f633919178f6c8ee4fb41b848a056ec33f8d707d'
  cpuUtilization:
    idlePercent: 95.94
    systemPercent: 3.32
    userPercent: 0.74
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: gentoo
    version: "2.14"
  eventLogger: file
  freeLocks: 2044
  hostname: turnpike
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.21-v8+
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 2312245248
  memTotal: 8190758912
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: app-containers/aardvark-dns-1.7.0
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: app-containers/netavark-1.7.0
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: app-containers/crun-1.8.5
    path: /usr/bin/crun
    version: |-
      crun version 1.8.5
      commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
      rundir: /var/run/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 8589406208
  swapTotal: 8589930496
  uptime: 92h 29m 14.00s (Approximately 3.83 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  localhost:5000:
    Blocked: false
    Insecure: true
    Location: localhost:5000
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: localhost:5000
    PullFromMirror: ""
  search:
  - docker.io
  - docker.pkg.github.com
  - quay.io
  - public.ecr.aws
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 4
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /storage/containers/podman/storage
  graphRootAllocated: 67049664512
  graphRootUsed: 17296240640
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /storage/containers/podman/tmp
  imageStore:
    number: 238
  runRoot: /storage/containers/podman/run.tmp
  transientStore: false
  volumePath: /storage/containers/podman/volumes
version:
  APIVersion: 4.6.1
  Built: 1692036008
  BuiltTime: Mon Aug 14 19:00:08 2023
  GitCommit: f3069b3ff48e30373c33b3f5976f15abf8cfee20
  GoVersion: go1.20.5
  Os: linux
  OsArch: linux/arm64
  Version: 4.6.1

Concurrent containers:

$ sudo podman ps -a
CONTAINER ID  IMAGE                                          COMMAND               CREATED     STATUS      PORTS       NAMES
af3fa1acec15  localhost/service.app-admin.syslog-ng:4.2.0    --cfgfile /etc/sy...  2 days ago  Up 2 days               openrc-syslog-ng-4.2.0
93c3ee4b92c5  localhost/service.net-dns.bind:9.16.41         -u named -f           2 days ago  Up 2 days               openrc-bind-9.16.41
004d89162a52  localhost/service.net-misc.openntpd:6.2_p3-r2  -d -f /etc/opennt...  2 days ago  Up 2 days               openrc-openntpd-6.2_p3-r2
8c8cc3dce354  localhost/service.net-misc.dhcp:4.4.3_p1-r4    -cf /etc/dhcp/dhc...  2 days ago  Up 2 days               openrc-dhcp-4.4.3_p1-r4

… which is interesting, as on the original system the problem went away when temporarily changing to the overlay driver - which suggests that the problem is not with the btrfs (or overlay) driver, but nevertheless using a pristine storage directory makes it go away.

Also, both bolted and sqlite database backends are affected.

Given this, could there be any link to total image size within podman storage and this delay?

Original system:

sudo podman info --debug | grep -F podman | awk '{ print $2 }' | xargs -rn 1 sudo du -hxs
1.5G    /mnt/podman-storage
113M    /space/podman/volumes

New amd64 system:

$ sudo podman info --debug | grep -F '/storage/' | awk '{ print $2 }' | xargs -rn 1 sudo du -hxs
16G     /storage/containers/podman/storage
8.0K    /storage/containers/podman/tmp
120K    /storage/containers/podman/run
42M     /storage/containers/podman/volumes

arm64 system:

16G     /storage/containers/podman/storage
12K     /storage/containers/podman/tmp
72K     /storage/containers/podman/run.tmp
4.0K    /storage/containers/podman/volumes

@vrothberg
Copy link
Member

I am absolutely incapable of reproducing. Can you run /usr/share/bcc/tools/execsnoop in parallel and share which processes are being run? Does it also happen without --rm? If so, can you after that share podman ps --all?

@srcshelton
Copy link
Contributor Author

  • Run without --rm the issue is still there:
$ sudo podman run --net none docker.io/library/alpine:latest 'date'; date
Mon Sep 18 14:14:54 UTC 2023
Mon 18 Sep 15:15:15 BST 2023

Output of podman ps -a:

$ sudo podman ps -a | grep 'alpine.*date'
61eef5c90f21  docker.io/library/alpine:latest  date  38 seconds ago  Exited (0) 39 seconds ago  upbeat_swanson

There's no significant delay removing this Exited container:

$ time sudo podman rm upbeat_swanson
upbeat_swanson

real    0m0.216s
user    0m0.002s
sys     0m0.003s
  • Interestingly, if I include an invalid command, podman exits immediately (e.g. the problem is not seen):
$ sudo podman run --net none docker.io/library/alpine:latest 'exit'
Error: crun: executable file `exit` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

… although in this case, the container persists with state Created. If I then remove this Created container, the result is also almost instant:

$ time sudo podman rm 1364702f0f2a
1364702f0f2a

real    0m0.063s
user    0m0.003s
sys     0m0.002s

I'm not familiar with 'execsnoop', but I'll take a look now...

@vrothberg
Copy link
Member

Thanks for sharing!

I'm not familiar with 'execsnoop', but I'll take a look now...

It's part of bcc-tools and a veeeeery helpful script.

@srcshelton
Copy link
Contributor Author

I feel I'm making progress - the bcc utilities require LLVM/clang which is quite chunky, so I thought I'd try running it within a container.

With a default/unconfigured installation, as expected, podman within a container exists instantly.

However, if I do nothing more than import my configuration, change the driver to overlay, and create the referenced directories, then the problem again returns.

… so the problem only appears with some non-default setting in /etc/containers/ which I seem to have set.

I'll keep digging...

@srcshelton
Copy link
Contributor Author

… it's something in containers.conf, which I have as follows:

[containers]
init_path = "/usr/libexec/podman/catatonit"
[network]
network_backend = "netavark"
default_subnet="172.18.0.0/16"
[engine]
cgroup_manager = "cgroupfs"
conmon_path = [
  "/usr/libexec/podman/conmon",
  "/usr/bin/conmon"
]
database_backend="sqlite"
env = ["TMPDIR=/space/podman/tmp"]
events_logger = "file"
events_logfile_path = "/var/log/podman/events.log"
events_logfile_max_size = 10485760
infra_command = "/usr/bin/pause"
infra_image = "gcr.io/google_containers/pause-amd64:3.2"
remote_uri= "unix://var/run/podman/podman.sock"
static_dir = "/space/podman/static"
tmp_dir = "/var/run/libpod"
volume_path = "/space/podman/volumes"
runtime = "crun"
runtime_supports_json = ["crun", "runc"]
runtime_supports_nocgroups = ["crun"]
runtime_supports_kvm = []
image_copy_tmp_dir = "/space/podman/tmp"
[engine.runtimes]
runc = [
  "/usr/bin/runc"
]
crun = [
  "/usr/bin/crun"
]
kata = []

@srcshelton
Copy link
Contributor Author

Urgh - I've found the specific issue:

  • If static_dir is set in /etc/containers/containers.conf, then there's a 20-second delay on container exit which goes away once the setting is removed.

… which I suspect means that some component has hard-coded the default (/var/lib/containers/storage/libpod?) and is breaking if the expected content isn't there?

Rough reproducer:

In a fresh container, install podman and then run:

[ -s /etc/containers/policy.json.example ] && [ ! -s /etc/containers/policy.json ] && cp /etc/containers/policy.json.example /etc/containers/policy.json
podman stop -a ;
podman rm -v -a ;
podman image rm -f -a ;
podman system reset -f ;
[ -s /etc/containers/containers.conf ] && rm -r $( sed 's/#.*$//' /etc/containers/containers.conf | grep -o '"/[^"]\+"' | grep -v -e 'bin/' -e 'libexec/' -e '\.log"' ) ;
podman system reset -f ;
[ -s /etc/containers/containers.conf ] && mkdir -p $( sed 's/#.*$//' /etc/containers/containers.conf | grep -o '"/[^"]\+"' | grep -v -e 'bin/' -e 'libexec/' -e '\.log"' ) ;
podman image pull docker.io/library/alpine:latest ;
podman run --net none docker.io/library/alpine:latest 'date' ; date

… then create /etc/containers/containers.conf containing:

[engine]
static_dir = "/tmp/podman/static"

… and re-run the above: the path used for static_dir doesn't matter: if it's missing, containers exit instantly - if it's present, there's a 20 second timeout.

(This is with crun, I'm not sure whether this is related. I've not tried with runc installed)

@srcshelton
Copy link
Contributor Author

Furthermore, the specified static_dir contains only a bolt_state.db which hasn't changed since May.

Is appears that the actual bolt_state.db is now in (graph root)/libpod regardless of the static_dir setting.

Is static_dir now vestigial?

Interestingly, a system reset may be needed to fix this, as commenting static_dir on the host system (with running containers) and trying to run date in Alpine still gives a 20s delay.


I also have:

database_backend="sqlite"

… in /etc/containers/containers.conf, and yet I see /var/lib/containers/cache (is this a hard-coded path which can't be configured?) contains:

# ls -l cache/
total 74096
-rw------- 1 root root 92598272 Sep 18 10:41 blob-info-cache-v1.boltdb
-rw-r--r-- 1 root root    32768 Sep 16 15:27 blob-info-cache-v1.sqlite

… why do I have a boltdb file which is more up to date (and much larger) than the sqlite one - I thought sqlite was a complete replacement?

@srcshelton
Copy link
Contributor Author

srcshelton commented Sep 18, 2023

Update: Removing static_dir from containers.conf isn't enough to resolve the issue on a non-fresh system: even after a reboot, the 20s delay persists.

(Although running with --transient-store also prevents the delay from occurring)

@vrothberg
Copy link
Member

  • If static_dir is set in /etc/containers/containers.conf, then there's a 20-second delay on container exit which goes away once the setting is removed.

I can reproduce as well with this setting. Great work tracking this down, @srcshelton! As to why this happens needs some investigation.

… why do I have a boltdb file which is more up to date (and much larger) than the sqlite one - I thought sqlite was a complete replacement?

I assume you changed the db_backend, so the previous one will continue lingering around.

@vrothberg vrothberg changed the title podman slow to tear-down containers podman with custom static_dir slow to tear-down containers Sep 19, 2023
@vrothberg
Copy link
Member

I made the observation that the cleanup process does not see (or read) the containers.conf file correctly. I configured mine in the $HOME directory and noticed that podman container cleanup doesn't pick up the configured value. I'll continue digging.

@srcshelton
Copy link
Contributor Author

I can raise a new bug report for this, but I suspect it might be the same root-cause:

Even with database_backend="sqlite" in /etc/containers/containers.conf, I'm still seeing /var/lib/containers/cache/blob-info-cache-v1.boltdb created and used, and /var/lib/containers/cache/blob-info-cache-v1.sqlite unchanged in the past 3 days, despite several reboots and container restarts.

@vrothberg
Copy link
Member

Yes, there is something really strange going on. container cleanup doesn't seem to pick up the correct values from containers.conf. I'll be buried in meetings today but will take a look.

Cc: @mheon

@vrothberg
Copy link
Member

Even with database_backend="sqlite" in /etc/containers/containers.conf, I'm still seeing /var/lib/containers/cache/blob-info-cache-v1.boltdb

This is expected. The blob-info-cache is something the image library was using.

vrothberg added a commit to vrothberg/libpod that referenced this issue Sep 25, 2023
The processing and setting of the static and volume directories was
scattered across the code base (including c/common) leading to subtle
errors that surfaced in containers#19938.

There were multiple issues that I try to summarize below:

 - c/common loaded the graphroot from c/storage to set the defaults for
   static and volume dir.  That ignored Podman's --root flag and
   surfaced in containers#19938 and other bugs.  c/common does not set the
   defaults anymore which gives Podman the ability to detect when the
   user/admin configured a custom directory (not empty value).

 - When parsing the CLI, Podman (ab)uses containers.conf structures to
   set the defaults but also to override them in case the user specified
   a flag.  The --root flag overrode the static dir which is wrong and
   broke a couple of use cases.  Now there is a dedicated field for in
   the "PodmanConfig" which also includes a containers.conf struct.

 - The defaults for static and volume dir and now being set correctly
   and adhere to --root.

 - The CONTAINERS_CONF_OVERRIDE env variable has not been passed to the
   cleanup process.  I believe that _all_ env variables should be passed
   to conmon to avoid such subtle bugs.

Overall I find that the code and logic is scattered and hard to
understand and follow.  I refrained from larger refactorings as I really
just want to get containers#19938 fixed and then go back to other priorities.

containers/common#1659 broke three pkg/machine
tests.  Those have been commented out until getting fixed.

Fixes: containers#19938
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Dec 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants