Periodical log API call cause high CPU/memory usage #14879

atsushi-tsuda · 2022-07-09T12:39:51Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

The periodical, every 3 seconds for example, API call of getting log will cause the high CPU/memory usage in podman API service.

Steps to reproduce the issue:

Install podman
Note: I tried Podman 3.0.2, 3.3.1 and 4.0.2, but the result does not change.
Run a "do-nothing" container
Call log acquition API every 3 seconds like below.

while :
do
    curl -m 3 --output - --unix-socket /var/run/podman/podman.sock -v http://localhost/containers/<Container ID>/logs?follow=1&since=0&stderr=1&stdout=1&tail=all > /dev/null
done

Describe the results you received:
Podman CPU/Memory usage became high after several hours later and lead resource shortage and OOM on the Podman's host.

#	Podman version	CPU usage(%)	Memory usage(%)	Used Memory(KiB)	Duration
1	3.0.1	2.6 -> 75.4	1.3 -> 36.8	55,288 -> 1,483,168	6h
2	3.3.2	2 -> 15.4	1.4 -> 28.8	59,848 -> 1,160,136	6h
3	4.0.1	0.3 -> 27.6	0.3 -> 19.7	62,112 -> 3,176,820	17h

Describe the results you expected:

Podman API service runs in a limited resource usage.

Additional information you deem important (e.g. issue happens only occasionally):

None.

Output of podman version:

[root@localhost ~]# podman version
Client:       Podman Engine
Version:      4.0.2
API Version:  4.0.2
Go Version:   go1.17.7

Built:      Tue Apr 19 19:16:32 2022
OS/Arch:    linux/amd64

Output of podman info --debug:

[root@localhost ~]# podman info --debug
host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.0-1.module+el8.6.0+14877+f643d2d6.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: 87b7a9037cbd1d81288bdf2d6705dfda889f7cf9'
  cpus: 4
  distribution:
    distribution: '"rhel"'
    version: "8.4"
  eventLogger: file
  hostname: localhost.localdomain
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-305.el8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 12447719424
  memTotal: 16512086016
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.0.3-2.module+el8.6.0+14877+f643d2d6.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.3
      spec: 1.0.2-dev
      go: go1.17.7
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-2.module+el8.6.0+14877+f643d2d6.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 8472489984
  swapTotal: 8472489984
  uptime: 17h 1m 4.81s (Approximately 0.71 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  rdocker:6000:
    Blocked: false
    Insecure: true
    Location: rdocker:6000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: rdocker:6000
  search:
  - [registry.fedoraproject.org](http://registry.fedoraproject.org/)
  - [registry.access.redhat.com](http://registry.access.redhat.com/)
  - [registry.centos.org](http://registry.centos.org/)
  - [docker.io](http://docker.io/)
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1650363392
  BuiltTime: Tue Apr 19 19:16:32 2022
  GitCommit: ""
  GoVersion: go1.17.7
  OsArch: linux/amd64
  Version: 4.0.2

Package info (e.g. output of rpm -q podman or apt list podman):

[root@localhost ~]# rpm -q podman
podman-4.0.2-6.module+el8.6.0+14877+f643d2d6.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

None.

The text was updated successfully, but these errors were encountered:

rhatdan · 2022-07-09T14:25:17Z

This looks like some kind of leak. You could verify this by changing your script to fireing the service off at > 5 seconds. Since the service is supposed to shut down after 5 seconds, when inactive. If you changed your curl script to fire every 10 seconds, do you still see the problem?

atsushi-tsuda · 2022-07-11T13:44:07Z

@rhatdan

Thank you for comments.

I tried 10 seconds interval with podman 4.0.1.
The increase of CPU/memory usage became slow, but the tendency to increase them seems to be unchanged.

CPU usage(%)	Memory usage(%)	Used Memory(KiB)	Duration after service start
0.3	0.3	62,112	0h
3.8	2.2	366,504	5h
7.3	4.7	762,448	13h

Actually, I'm using log API with follow option and disconnect from client after 3 or 10 seconds after connection established, so that the podman API service might run all the time.

Luap99 · 2022-07-11T16:34:16Z

Does the container produce a lot of output or does it even happen without any container output?

atsushi-tsuda · 2022-07-12T00:12:48Z

Does the container produce a lot of output or does it even happen without any container output?

-> The container output not much but some logs. The log is 49,152 Byte (=48KiB) and 883 lines.

When running a single podman logs this is not really important since we will exit when we finish reading the logs. However for the system service this is very important. Leaking goroutines will cause an increased memory and CPU ussage over time. Both the the event and log backend have goroutine leaks with both the file and journald drivers. The journald backend has the problem that journal.Wait(IndefiniteWait) will block until we get a new journald event. So when a client closes the connection the goroutine would still wait until there is a new journal entry. To fix this we just wait for a maximum of 5 seconds, after that we can check if the client connection was closed and exit correctly in this case. For the file backend we can fix this by waiting for either the log line or context cancel at the same time. Currently it would block waiting for new log lines and only check afterwards if the client closed the connection and thus hang forever if there are no new log lines. [NO NEW TESTS NEEDED] I am open to ideas how we can test memory leaks in CI. To test manually run a container like this: `podman run --log-driver $driver --name test -d alpine sh -c 'i=1; while [ "$i" -ne 1000 ]; do echo "line $i"; i=$((i + 1)); done; sleep inf'` where `$driver` can be either `journald` or `k8s-file`. Then start the podman system service and use: `curl -m 1 --output - --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock -v 'http://d/containers/test/logs?follow=1&since=0&stderr=1&stdout=1' &>/dev/null` to get the logs from the API and then it closes the connection after 1 second. Now run the curl command several times and check the memory usage of the service. Fixes containers#14879 Signed-off-by: Paul Holzinger <[email protected]>

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 9, 2022

Luap99 self-assigned this Jul 13, 2022

Luap99 mentioned this issue Jul 14, 2022

Podman socket performance issues #14941

Closed

Luap99 mentioned this issue Jul 20, 2022

fix goroutine leaks in events and logs backend #14984

Merged

openshift-merge-robot closed this as completed in #14984 Jul 21, 2022

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodical log API call cause high CPU/memory usage #14879

Periodical log API call cause high CPU/memory usage #14879

atsushi-tsuda commented Jul 9, 2022

rhatdan commented Jul 9, 2022

atsushi-tsuda commented Jul 11, 2022 •

edited

Loading

Luap99 commented Jul 11, 2022

atsushi-tsuda commented Jul 12, 2022

Periodical log API call cause high CPU/memory usage #14879

Periodical log API call cause high CPU/memory usage #14879

Comments

atsushi-tsuda commented Jul 9, 2022

rhatdan commented Jul 9, 2022

atsushi-tsuda commented Jul 11, 2022 • edited Loading

Luap99 commented Jul 11, 2022

atsushi-tsuda commented Jul 12, 2022

atsushi-tsuda commented Jul 11, 2022 •

edited

Loading