Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression with podman system service causing high cpu usage #12308

Closed
rod260185 opened this issue Nov 16, 2021 · 19 comments
Closed

Possible regression with podman system service causing high cpu usage #12308

rod260185 opened this issue Nov 16, 2021 · 19 comments
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@rod260185
Copy link

rod260185 commented Nov 16, 2021

I have the same problem now (#7946), with this configuration:
host:
arch: amd64
buildahVersion: 1.23.1
cgroupControllers:

  • cpuset
  • cpu
  • io
  • memory
  • hugetlb
  • pids
  • rdma
  • misc
    cgroupManager: systemd
    cgroupVersion: v2
    conmon:
    package: conmon-2.0.30-1.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: unknown'
    cpus: 4
    distribution:
    distribution: '"opensuse-microos"'
    version: "20211107"
    eventLogger: journald
    hostname: ev00000895
    idMappings:
    gidmap: null
    uidmap: null
    kernel: 5.14.14-1-default
    linkmode: dynamic
    logDriver: journald
    memFree: 11157291008
    memTotal: 20200374272
    ociRuntime:
    name: runc
    package: runc-1.0.2-1.2.x86_64
    path: /usr/bin/runc
    version: |-
    runc version 1.0.2
    spec: 1.0.2-dev
    go: go1.13.15
    libseccomp: 2.5.2
    os: linux
    remoteSocket:
    exists: true
    path: /run/podman/podman.sock
    security:
    apparmorEnabled: true
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
    serviceIsRemote: false
    slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.11-1.2.x86_64
    version: |-
    slirp4netns version 1.1.11
    commit: unknown
    libslirp: 4.6.1
    SLIRP_CONFIG_VERSION_MAX: 3
    libseccomp: 2.5.2
    swapFree: 0
    swapTotal: 0
    uptime: 4m 7.01s
    plugins:
    log:
  • k8s-file
  • none
  • journald
    network:
  • bridge
  • macvlan
    volume:
  • local
    registries:
    search:
  • registry.opensuse.org
  • docker.io
    store:
    configFile: /etc/containers/storage.conf
    containerStore:
    number: 27
    paused: 0
    running: 24
    stopped: 3
    graphDriverName: btrfs
    graphOptions: {}
    graphRoot: /var/lib/containers/storage
    graphStatus:
    Build Version: 'Btrfs v5.14.1 '
    Library Version: "102"
    imageStore:
    number: 235
    runRoot: /run/containers/storage
    volumePath: /var/lib/containers/storage/volumes
    version:
    APIVersion: 3.4.1
    Built: 1634688000
    BuiltTime: Tue Oct 19 21:00:00 2021
    GitCommit: ""
    GoVersion: go1.13.15
    OsArch: linux/amd64
    Version: 3.4.1

It seems to be some kind of regression. Though my problem comes with the use of docker extension of VSCode, which is inspecting the api through the podman.sock.

Originally posted by @rod260185 in #7946 (comment)

@rod260185
Copy link
Author

Additionaly, when a service using docker-compose uses rabbitmq, it triggers a cpu usage of 120-160% in 4 CPUs.

@rod260185 rod260185 changed the title Possible regression with podman system service Possible regression with podman system service causing high cpu usage Nov 16, 2021
@Luap99
Copy link
Member

Luap99 commented Nov 16, 2021

Can you share the API calls to get into this situation? Also the server debug log would be helpful.

@rod260185
Copy link
Author

rod260185 commented Nov 17, 2021

I can't inspect what call are causing it, but it happens when I use the default VSCode docker extension, which I think does some regular api calls, like "podman image ls", "podman ps -a" etc. As also the taiga-docker running through docker-compose, which, first get the 2 rabbitmq services down at start up, but then when I restart just then, they work, but causes this eternal high cpu consuming loop with podman API. I will need a help with how I get this server debug log.

I noticed that the rootless API also have the same problem, as I have now two APIs with high CPU usage, that are taking me 2/3 of the 4 3GHz cores.

@rhatdan
Copy link
Member

rhatdan commented Nov 17, 2021

Could you turn on logging at the server side and see if you can figure out what the server is doing
--log-level=debug.

@rod260185
Copy link
Author

I did it and exported a system log after I start rabbitmq services on taiga and used VS Code Extension "Docker", it seems related with conmon.
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:4c9846a172155e344066941749b9952a17419d2835c244913c4cac0096e7455d""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@0bf09b798ed8482f6723cdab38f9c6e4eab671d569e7ba5d41fc7b0bf2644904""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:0bf09b798ed8482f6723cdab38f9c6e4eab671d569e7ba5d41fc7b0bf2644904""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@e23479e83ecf4009958f6750299953892f02602fc101bc0f8708600edda62df5""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:e23479e83ecf4009958f6750299953892f02602fc101bc0f8708600edda62df5""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@efbff82a59aca4964778acd6bf628a9d41a5f30eda00b06804e23461ccce8295""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:efbff82a59aca4964778acd6bf628a9d41a5f30eda00b06804e23461ccce8295""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:7f0d80c744114c33a84d47a16f989038b31ac2992cace6b50f1e03590776314f""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@5804835e0a7a4c2e1de543bcadb74ad57d674579cf40b2e88a9eb18f729ca1f2""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:5804835e0a7a4c2e1de543bcadb74ad57d674579cf40b2e88a9eb18f729ca1f2""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@fa6c6c1eadce56ded0c766c0d465498e310baad23615c7b5275e605562c1d6b1""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:fa6c6c1eadce56ded0c766c0d465498e310baad23615c7b5275e605562c1d6b1""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@28231b07225c455cced78e37b52ed0c96e54724721e5911fac7506a5bfa8ba0c""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:28231b07225c455cced78e37b52ed0c96e54724721e5911fac7506a5bfa8ba0c""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@3b39ed2c715dd3d85a18e1ce436f8cc7e0b185e26f23306a7702d902ac07dcce""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:3b39ed2c715dd3d85a18e1ce436f8cc7e0b185e26f23306a7702d902ac07dcce""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@0617a2192ed59ce93c17726ef12bd52d5575d36f1e72582bdc3b41ef65ca3754""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:0617a2192ed59ce93c17726ef12bd52d5575d36f1e72582bdc3b41ef65ca3754""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@4c9846a172155e344066941749b9952a17419d2835c244913c4cac0096e7455d""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:4c9846a172155e344066941749b9952a17419d2835c244913c4cac0096e7455d""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@0bf09b798ed8482f6723cdab38f9c6e4eab671d569e7ba5d41fc7b0bf2644904""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:0bf09b798ed8482f6723cdab38f9c6e4eab671d569e7ba5d41fc7b0bf2644904""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@e23479e83ecf4009958f6750299953892f02602fc101bc0f8708600edda62df5""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:e23479e83ecf4009958f6750299953892f02602fc101bc0f8708600edda62df5""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@efbff82a59aca4964778acd6bf628a9d41a5f30eda00b06804e23461ccce8295""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:efbff82a59aca4964778acd6bf628a9d41a5f30eda00b06804e23461ccce8295""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@28231b07225c455cced78e37b52ed0c96e54724721e5911fac7506a5bfa8ba0c""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:28231b07225c455cced78e37b52ed0c96e54724721e5911fac7506a5bfa8ba0c""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="parsed reference into "[btrfs@/var/lib/containers/storage+/run/containers/storage]@5804835e0a7a4c2e1de543bcadb74ad57d674579cf40b2e88a9eb18f729ca1f2""
Nov 17 08:18:51 ev00000897 podman[972]: time="2021-11-17T08:18:51-03:00" level=debug msg="exporting opaque data as blob "sha256:5804835e0a7a4c2e1de543bcadb74ad57d674579cf40b2e88a9eb18f729ca1f2""

@rod260185
Copy link
Author

Nov 17 08:18:51 ev00000897 podman[972]: @ - - [17/Nov/2021:08:18:51 -0300] "GET /containers/json?all=true HTTP/1.1" 200 45653 "" ""
This seems related with VSCode.

@rhatdan
Copy link
Member

rhatdan commented Nov 17, 2021

@giuseppe PTAL

@vnznznz
Copy link

vnznznz commented Dec 17, 2021

I think I have the same problem. rootless podman on Fedora 35. podman images takes ~10s for 134 images and causes a short spike in cpu usage.

But when I visit the Docker tab in VSCode and it starts querying the podman daemon I get a lot of these messages in the debug log. Even after VSCode runs into a timeout error daemon keeps logging these for a while until it stops, causing a long spike in cpu usage.

Dec 17 13:26:46 fedora.fritz.box podman[45238]: time="2021-12-17T13:26:46+01:00" level=debug msg="parsed reference into \"[overlay@/home/vinz/.local/share/containers/storage+/run/user/1000/containers]@2fc3789bf0497da8c20ebb45b3f95ae61656f3d89936dad81f4d9e694baf07b8\""
Dec 17 13:26:46 fedora.fritz.box podman[45238]: time="2021-12-17T13:26:46+01:00" level=debug msg="exporting opaque data as blob \"sha256:2fc3789bf0497da8c20ebb45b3f95ae61656f3d89936dad81f4d9e694baf07b8\""

When I have multiple instances of VSCode open all querying podman daemon my system becomes very sluggish.

Is this a problem with VSCode hitting the API too hard or should podman be faster in serving image information? Is 134 images too much? Should I prune them more often?

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 8
  distribution:
    distribution: fedora
    variant: workstation
    version: "35"
  eventLogger: journald
  hostname: fedora.fritz.box
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.15.4-201.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 13687418880
  memTotal: 33360867328
  ociRuntime:
    name: crun
    package: crun-1.3-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.3
      commit: 8e5757a4e68590326dafe8a8b1b4a584b10a1370
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 4h 5m 26.37s (Approximately 0.17 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/vinz/.config/containers/storage.conf
  containerStore:
    number: 11
    paused: 0
    running: 6
    stopped: 5
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/vinz/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 927
  runRoot: /run/user/1000/containers
  volumePath: /home/vinz/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.2
  Built: 1636748737
  BuiltTime: Fri Nov 12 21:25:37 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 3.4.2

@vnznznz
Copy link

vnznznz commented Dec 17, 2021

Here are all the GET requests of a freshly booted VSCode Docker extension (I pressed refresh once)

14:17:43.784 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:43.784 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:45.784 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:45.784 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:47.784 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:47.785 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:49.786 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:49.786 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:51.787 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:51.787 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:53.786 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:53.787 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:55.034 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:55.787 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:55.787 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:57.786 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:57.787 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1
--
14:17:59.787 >>> process code [54228 -> 3275] len 78(78)
GET /containers/json?all=true HTTP/1.1
--
14:17:59.788 >>> process code [54228 -> 3275] len 114(114)
GET /images/json?filters=%7B%22dangling%22%3A%5B%22false%22%5D%7D HTTP/1.1

@vnznznz
Copy link

vnznznz commented Dec 17, 2021

Ok after pruning my podman images down to 23, performance is back to nominal.

@rhatdan
Copy link
Member

rhatdan commented Dec 18, 2021

@vrothberg thoughts?

@vrothberg
Copy link
Member

There was a regression in c/common that was fixed with containers/common@f745869fbc76 which has made it into Podman's main branch. It has not been backported to Podman v3.4.x.

@rhatdan please let me know if we should backport or wait for 4.0

@rhatdan
Copy link
Member

rhatdan commented Dec 20, 2021

Probably should back port.

@vrothberg
Copy link
Member

vrothberg commented Dec 20, 2021

@mheon, WDYT, since you mentioned a 3.5 earlier today?

@mheon
Copy link
Member

mheon commented Dec 21, 2021

I was going to focus 3.5 mostly on OS X backports and fixes, and this does seem to qualify as a podman-remote fix. Feel free to backport to the v3.4 branch, I'll be basing v3.5 on v3.4

@vrothberg
Copy link
Member

Here's a PR to backport the fix along with some other libimage-related fixes that will come in handy: containers/common#866

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2022

@vrothberg Is this fixed in the main branch?

@vrothberg
Copy link
Member

@vrothberg Is this fixed in the main branch?

Yes, we can close.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

6 participants