Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodical log API call cause high CPU/memory usage #14879

Closed
atsushi-tsuda opened this issue Jul 9, 2022 · 4 comments · Fixed by #14984
Closed

Periodical log API call cause high CPU/memory usage #14879

atsushi-tsuda opened this issue Jul 9, 2022 · 4 comments · Fixed by #14984
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@atsushi-tsuda
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

The periodical, every 3 seconds for example, API call of getting log will cause the high CPU/memory usage in podman API service.

Steps to reproduce the issue:

  1. Install podman
    Note: I tried Podman 3.0.2, 3.3.1 and 4.0.2, but the result does not change.
  2. Run a "do-nothing" container
  3. Call log acquition API every 3 seconds like below.
while :
do
    curl -m 3 --output - --unix-socket /var/run/podman/podman.sock -v http://localhost/containers/<Container ID>/logs?follow=1&since=0&stderr=1&stdout=1&tail=all > /dev/null
done

Describe the results you received:
Podman CPU/Memory usage became high after several hours later and lead resource shortage and OOM on the Podman's host.

# Podman version CPU usage(%) Memory usage(%) Used Memory(KiB) Duration
1 3.0.1 2.6 -> 75.4 1.3 -> 36.8 55,288 -> 1,483,168 6h
2 3.3.2 2 -> 15.4 1.4 -> 28.8 59,848 -> 1,160,136 6h
3 4.0.1 0.3 -> 27.6 0.3 -> 19.7 62,112 -> 3,176,820 17h

Describe the results you expected:

Podman API service runs in a limited resource usage.

Additional information you deem important (e.g. issue happens only occasionally):

None.

Output of podman version:

[root@localhost ~]# podman version
Client:       Podman Engine
Version:      4.0.2
API Version:  4.0.2
Go Version:   go1.17.7

Built:      Tue Apr 19 19:16:32 2022
OS/Arch:    linux/amd64

Output of podman info --debug:

[root@localhost ~]# podman info --debug
host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.0-1.module+el8.6.0+14877+f643d2d6.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: 87b7a9037cbd1d81288bdf2d6705dfda889f7cf9'
  cpus: 4
  distribution:
    distribution: '"rhel"'
    version: "8.4"
  eventLogger: file
  hostname: localhost.localdomain
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-305.el8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 12447719424
  memTotal: 16512086016
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.0.3-2.module+el8.6.0+14877+f643d2d6.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.3
      spec: 1.0.2-dev
      go: go1.17.7
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-2.module+el8.6.0+14877+f643d2d6.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 8472489984
  swapTotal: 8472489984
  uptime: 17h 1m 4.81s (Approximately 0.71 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  rdocker:6000:
    Blocked: false
    Insecure: true
    Location: rdocker:6000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: rdocker:6000
  search:
  - [registry.fedoraproject.org](http://registry.fedoraproject.org/)
  - [registry.access.redhat.com](http://registry.access.redhat.com/)
  - [registry.centos.org](http://registry.centos.org/)
  - [docker.io](http://docker.io/)
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1650363392
  BuiltTime: Tue Apr 19 19:16:32 2022
  GitCommit: ""
  GoVersion: go1.17.7
  OsArch: linux/amd64
  Version: 4.0.2

Package info (e.g. output of rpm -q podman or apt list podman):

[root@localhost ~]# rpm -q podman
podman-4.0.2-6.module+el8.6.0+14877+f643d2d6.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

None.

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 9, 2022
@rhatdan
Copy link
Member

rhatdan commented Jul 9, 2022

This looks like some kind of leak. You could verify this by changing your script to fireing the service off at > 5 seconds. Since the service is supposed to shut down after 5 seconds, when inactive. If you changed your curl script to fire every 10 seconds, do you still see the problem?

@atsushi-tsuda
Copy link
Author

atsushi-tsuda commented Jul 11, 2022

@rhatdan

Thank you for comments.

I tried 10 seconds interval with podman 4.0.1.
The increase of CPU/memory usage became slow, but the tendency to increase them seems to be unchanged.

CPU usage(%) Memory usage(%) Used Memory(KiB) Duration after service start
0.3 0.3 62,112 0h
3.8 2.2 366,504 5h
7.3 4.7 762,448 13h

Actually, I'm using log API with follow option and disconnect from client after 3 or 10 seconds after connection established, so that the podman API service might run all the time.

@Luap99
Copy link
Member

Luap99 commented Jul 11, 2022

Does the container produce a lot of output or does it even happen without any container output?

@atsushi-tsuda
Copy link
Author

Does the container produce a lot of output or does it even happen without any container output?

-> The container output not much but some logs. The log is 49,152 Byte (=48KiB) and 883 lines.

@Luap99 Luap99 self-assigned this Jul 13, 2022
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 20, 2022
When running a single podman logs this is not really important since we
will exit when we finish reading the logs. However for the system
service this is very important. Leaking goroutines will cause an
increased memory and CPU ussage over time.

Both the the event and log backend have goroutine leaks with both the
file and journald drivers.

The journald backend has the problem that journal.Wait(IndefiniteWait)
will block until we get a new journald event. So when a client closes
the connection the goroutine would still wait until there is a new
journal entry. To fix this we just wait for a maximum of 5 seconds,
after that we can check if the client connection was closed and exit
correctly in this case.

For the file backend we can fix this by waiting for either the log line
or context cancel at the same time. Currently it would block waiting for
new log lines and only check afterwards if the client closed the
connection and thus hang forever if there are no new log lines.

[NO NEW TESTS NEEDED] I am open to ideas how we can test memory leaks in
CI.
To test manually run a container like this:
`podman run --log-driver $driver  --name test -d alpine sh -c 'i=1; while [ "$i" -ne 1000 ]; do echo "line $i"; i=$((i + 1)); done; sleep inf'`
where `$driver` can be either `journald` or `k8s-file`.
Then start the podman system service and use:
`curl -m 1 --output -  --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock -v 'http://d/containers/test/logs?follow=1&since=0&stderr=1&stdout=1' &>/dev/null`
to get the logs from the API and then it closes the connection after 1 second.
Now run the curl command several times and check the memory usage of the service.

Fixes containers#14879

Signed-off-by: Paul Holzinger <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Jul 26, 2022
When running a single podman logs this is not really important since we
will exit when we finish reading the logs. However for the system
service this is very important. Leaking goroutines will cause an
increased memory and CPU ussage over time.

Both the the event and log backend have goroutine leaks with both the
file and journald drivers.

The journald backend has the problem that journal.Wait(IndefiniteWait)
will block until we get a new journald event. So when a client closes
the connection the goroutine would still wait until there is a new
journal entry. To fix this we just wait for a maximum of 5 seconds,
after that we can check if the client connection was closed and exit
correctly in this case.

For the file backend we can fix this by waiting for either the log line
or context cancel at the same time. Currently it would block waiting for
new log lines and only check afterwards if the client closed the
connection and thus hang forever if there are no new log lines.

[NO NEW TESTS NEEDED] I am open to ideas how we can test memory leaks in
CI.
To test manually run a container like this:
`podman run --log-driver $driver  --name test -d alpine sh -c 'i=1; while [ "$i" -ne 1000 ]; do echo "line $i"; i=$((i + 1)); done; sleep inf'`
where `$driver` can be either `journald` or `k8s-file`.
Then start the podman system service and use:
`curl -m 1 --output -  --unix-socket $XDG_RUNTIME_DIR/podman/podman.sock -v 'http://d/containers/test/logs?follow=1&since=0&stderr=1&stdout=1' &>/dev/null`
to get the logs from the API and then it closes the connection after 1 second.
Now run the curl command several times and check the memory usage of the service.

Fixes containers#14879

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants