Slow performances depending on the volume used by the podman machine #19467

slemeur · 2023-08-01T08:35:10Z

Issue Description

When using Podman Desktop I'm getting a strange behavior where the podman service running in the podman machine is becoming unresponsive.

Please see the corresponding Podman Desktop issue:
Podman Desktop Issue

Checking in the podman machine, I can see that the podman service is consuming a lot of CPU. There are probably requests being sent by podman desktop, while podman is already overloaded. When stopping podman desktop, the podman service is getting back normal CPU usage.

Maybe this could be due to the time podman takes to calculate something which is constantly asked by podman desktop - it overload podman and as there is no cache, it does go off.

Steps to reproduce the issue

I'm not entirely sure on how to reproduce the issue. I think it appears after I have been using the podman save and podman load, which create a large file.

On the other side:

When I mount my $HOME:$HOME folder I'm hitting the issue
When I create an empty folder and mount it with the podman machine, I don't get the issue
When I moved the tar file from podman save into an empty folder and mounting it to podman machine, I don't get the issue

Describe the results you received

Slow performance

Describe the results you expected

Podman stays responsive

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 99.57
    systemPercent: 0.25
    userPercent: 0.18
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "38"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 501
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.3.12-200.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 7743348736
  memTotal: 8113209344
  networkBackend: netavark
  networkBackendInfo:
    backend: ""
    dns: {}
  ociRuntime:
    name: crun
    package: crun-1.8.5-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.5
      commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
      rundir: /run/user/501/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/user/501/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-12.fc38.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 1h 50m 3.00s (Approximately 0.04 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 0
    stopped: 7
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 8531062784
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 9
  runRoot: /run/user/501/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.1
  Built: 1685123928
  BuiltTime: Fri May 26 19:58:48 2023
  GitCommit: ""
  GoVersion: go1.20.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Mac OS Ventura Intel

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

The text was updated successfully, but these errors were encountered:

Luap99 · 2023-08-01T09:10:46Z

@vrothberg @ashley-cui Can one of you take a look and try to reproduce?
If you podman machine ssh you can observe a high cpu usage for the system service. It may only be triggered when running podman-desktop as this keeps the service process up due the constant event listening.

vrothberg · 2023-08-01T09:12:09Z

Sorry, I am currently flooded.

rhatdan · 2023-08-01T14:51:58Z

Perhaps this is podman machine attempting to figure out the disk space used in a Volume, which would run lots of CPU and hammer the disk as it walks disk tree?

If podman desktop is asking for the size of volumes or containers modifications on images, this could cause Podman to get very active.

baude · 2023-08-02T14:53:31Z

I looked at this; hold tight. My observation is in the case where it is slow, the service is being flooded with requests including a lot of df (expensive!) requests. PD folks are looking at that and I will keep an eye on it.

vrothberg · 2023-08-08T12:46:39Z

Out of curiosity, is there an update on this issue?

benoitf · 2023-08-09T16:34:40Z

I can reproduce using only podman on macOS

And there is a huge difference using rootless or rootful.

pre-reqs:

podman machine init
podman machine start
podman machine ssh

then if I run 5 containers like

podman run -d docker.io/redis/redis-stack:latest
podman run -d docker.io/redis/redis-stack:latest
podman run -d docker.io/redis/redis-stack:latest
podman run -d docker.io/redis/redis-stack:latest
podman run -d docker.io/redis/redis-stack:latest

and do the df command

[core@localhost ~]$ time podman system df
TYPE           TOTAL       ACTIVE      SIZE        RECLAIMABLE
Images         1           1           527.2MB     0B (0%)
Containers     5           5           9.578MB     0B (0%)
Local Volumes  0           0           0B          0B (0%)

real	0m0.083s
user	0m0.061s
sys	0m0.000s

so, very fast

now if I switch to rootful

sudo podman run -d docker.io/redis/redis-stack:latest
sudo podman run -d docker.io/redis/redis-stack:latest
sudo podman run -d docker.io/redis/redis-stack:latest
sudo podman run -d docker.io/redis/redis-stack:latest
sudo podman run -d docker.io/redis/redis-stack:latest

[core@localhost ~]$ time sudo podman system df
TYPE           TOTAL       ACTIVE      SIZE        RECLAIMABLE
Images         1           1           527.2MB     0B (0%)
Containers     5           5           9.578MB     0B (0%)
Local Volumes  0           0           0B          0B (0%)

real	0m6.397s
user	0m2.324s
sys	0m3.663s

then it's like 7s

vrothberg · 2023-08-10T11:27:39Z

Thanks @benoitf! I can even reproduce on Fedora with Podman v4.6.

Very similar numbers: 0.071s rootless vs 7.562s root

vrothberg · 2023-08-10T11:37:01Z

With an instrumented binary:

podman (fix-19467) $ sudo ./bin/podman system df
[sudo] password for vrothberg:                  
ERRO[0000] image 1.666118ms                     
ERRO[0007] containers 7.470431037s              
ERRO[0007] volumes 114.444µs

Processing containers seems to be the bottleneck:
https://github.com/containers/podman/blob/main/pkg/domain/infra/abi/system.go#L284-L324

vrothberg · 2023-08-10T11:52:04Z

@giuseppe @nalind can you check what's going on?

veillard · 2023-08-10T12:30:09Z

I did an strace on the podman machine df on Stevan machine, 2GB of scan of /var/lib/containers/storage/ is where the time is spent, each file get an newfstatat() which sounds normal, then 2 lgetxattr security.capability and 2 llistxattr securi
ty.selinux which is a bit weird to me, ltrace shows malloc()/free() opendir()/dirfd()/closedir() which looks all fine but wit bursts of SIGURG() signals across the 3 processes involved, so that could be something weird in the way the processes are communicating together:
....
6550 --- SIGURG (Urgent I/O condition) ---
6548 --- SIGURG (Urgent I/O condition) ---
6548 --- SIGURG (Urgent I/O condition) ---
6552 --- SIGURG (Urgent I/O condition) ---
6552 --- SIGURG (Urgent I/O condition) ---
6552 --- SIGURG (Urgent I/O condition) ---
6552 --- SIGURG (Urgent I/O condition) ---
6548 --- SIGURG (Urgent I/O condition) ---
6552 --- SIGURG (Urgent I/O condition) ---
....
Also clearly podman machine df puts a lock, which blocks parallel operations, on one hand it's sensibe
because running those du(1) like operations should return the same and runnning multiple will lead to
trashing, but at the same time this is read only and serializing them is useless that would obviously benefit from
caching .

Daniel

veillard · 2023-08-10T12:31:24Z

Also while the podman machine df is run, top on the host only shows qemu using more CPU than normal

giuseppe · 2023-08-14T09:40:57Z

rootless doesn't use naive diff since metacopy up is not supported for overlay in a user namespace.

We could probably drop the naive diff in any case and just look at the directory size:

diff --git a/drivers/overlay/overlay.go b/drivers/overlay/overlay.go
index 6b6f20637..7e91dd161 100644
--- a/drivers/overlay/overlay.go
+++ b/drivers/overlay/overlay.go
@@ -2162,10 +2162,6 @@ func (d *Driver) getLowerDiffPaths(id string) ([]string, error) {
 // and its parent and returns the size in bytes of the changes
 // relative to its base filesystem directory.
 func (d *Driver) DiffSize(id string, idMappings *idtools.IDMappings, parent string, parentMappings *idtools.IDMappings, mountLabel string) (size int64, err error) {
-       if d.options.mountProgram == "" && (d.useNaiveDiff() || !d.isParent(id, parent)) {
-               return d.naiveDiff.DiffSize(id, idMappings, parent, parentMappings, mountLabel)
-       }
-
        p, err := d.getDiffPath(id)
        if err != nil {
                return 0, err

vrothberg · 2023-08-14T11:07:24Z

We could probably drop the naive diff in any case and just look at the directory size:

Yes, that does the trick. At the moment, that drops it from 19.3s to 0.11s on my machine. A mind-blowing difference.

I think it's OK to just look at the directory size; at least for this use case.

Computing the diff size for the rootful overlay storage driver used the naive diff. The reasoning was that we have made use of rootful copy-up. The downside is a mind-blowing performance penalty in comparison to the rootless case. Hence, drop the naive diff and only compute the size of the directory which is absolutely sufficient for the motivating use case of podman-system-df. This drops the execution of system-df from 19.3s to 0.11s listing 5 containers and 1 image. Fixes: github.com/containers/podman/issues/19467 Signed-off-by: Giuseppe Scrivano <[email protected]> Signed-off-by: Valentin Rothberg <[email protected]>

vrothberg · 2023-08-14T11:14:04Z

I opened containers/storage#1688.

Computing the diff size for the rootful overlay storage driver used the naive diff. The reasoning was that we have made use of rootful copy-up. The downside is a mind-blowing performance penalty in comparison to the rootless case. Hence, drop the naive diff and only compute the size of the directory which is absolutely sufficient for the motivating use case of podman-system-df. This drops the execution of system-df from 19.3s to 0.11s listing 5 containers and 1 image. Fixes: github.com/containers/podman/issues/19467 Signed-off-by: Valentin Rothberg <[email protected]>

Mainly to merge fixes for containers#19467 into the main branch. Fixes: containers#19467 Signed-off-by: Valentin Rothberg <[email protected]>

The performance issue in containers#19467 drove me to add a benchmark for system-df to avoid regressing on it in the future. Comparing current HEAD to v4.6.0 yields ``` /home/vrothberg/containers/podman/bin/podman system df ran 201.47 times faster than /usr/bin/podman system df ``` Signed-off-by: Valentin Rothberg <[email protected]>

slemeur added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2023

github-actions bot added the remote Problem is in podman-remote label Aug 1, 2023

slemeur mentioned this issue Aug 1, 2023

Podman Desktop becomes out of sync when mounting large volume podman-desktop/podman-desktop#3376

Closed

Luap99 added machine podman-desktop labels Aug 1, 2023

vrothberg mentioned this issue Aug 14, 2023

overlay: Diffsize: avoid naive diff containers/storage#1688

Merged

vrothberg added a commit to vrothberg/libpod that referenced this issue Aug 23, 2023

vendor c/storage@6902c2d

e092f88

Mainly to merge fixes for containers#19467 into the main branch. Fixes: containers#19467 Signed-off-by: Valentin Rothberg <[email protected]>

vrothberg mentioned this issue Aug 23, 2023

vendor c/storage@6902c2d #19709

Merged

openshift-merge-robot closed this as completed in #19709 Aug 23, 2023

vrothberg mentioned this issue Aug 23, 2023

[CI:DOCS] hack/perf/system-df.sh: add df benchmarks #19710

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performances depending on the volume used by the podman machine #19467

Slow performances depending on the volume used by the podman machine #19467

slemeur commented Aug 1, 2023

Luap99 commented Aug 1, 2023

vrothberg commented Aug 1, 2023

rhatdan commented Aug 1, 2023

baude commented Aug 2, 2023

vrothberg commented Aug 8, 2023

benoitf commented Aug 9, 2023 •

edited

Loading

vrothberg commented Aug 10, 2023 •

edited

Loading

vrothberg commented Aug 10, 2023 •

edited

Loading

vrothberg commented Aug 10, 2023

veillard commented Aug 10, 2023

veillard commented Aug 10, 2023

giuseppe commented Aug 14, 2023

vrothberg commented Aug 14, 2023

vrothberg commented Aug 14, 2023

Slow performances depending on the volume used by the podman machine #19467

Slow performances depending on the volume used by the podman machine #19467

Comments

slemeur commented Aug 1, 2023

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Luap99 commented Aug 1, 2023

vrothberg commented Aug 1, 2023

rhatdan commented Aug 1, 2023

baude commented Aug 2, 2023

vrothberg commented Aug 8, 2023

benoitf commented Aug 9, 2023 • edited Loading

vrothberg commented Aug 10, 2023 • edited Loading

vrothberg commented Aug 10, 2023 • edited Loading

vrothberg commented Aug 10, 2023

veillard commented Aug 10, 2023

veillard commented Aug 10, 2023

giuseppe commented Aug 14, 2023

vrothberg commented Aug 14, 2023

vrothberg commented Aug 14, 2023

benoitf commented Aug 9, 2023 •

edited

Loading

vrothberg commented Aug 10, 2023 •

edited

Loading

vrothberg commented Aug 10, 2023 •

edited

Loading