Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman stats reports only the container's PID 1 information, which is not so useful for systemd containers #12400

Closed
jfroy opened this issue Nov 23, 2021 · 1 comment · Fixed by #12403
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@jfroy
Copy link

jfroy commented Nov 23, 2021

/kind bug

Description

podman stats seems to report the container's PID 1 information, and this is perhaps not as useful for systemd containers.

Steps to reproduce the issue:

  1. Build and run a simple systemd-based container image:
FROM registry.fedoraproject.org/fedora:33
RUN dnf install -y systemd && \
    dnf clean all
CMD [ "/usr/sbin/init" ]
# podman image build --rm -t localhost/fedora-systemd:latest .`
# podman container run --rm --detach --name fedora-systemd01 localhost/fedora-systemd:latest
  1. Run podman stats
# podman stats --no-stream --no-reset --latest
ID            NAME              CPU %       MEM USAGE / LIMIT  MEM %       NET IO             BLOCK IO    PIDS
86eddf9e8ebe  fedora-systemd01  4.40%       1.159MB / 66.86GB  0.00%       1.076kB / 2.212kB  -- / --     1

Describe the results you received:

The returned stats are for the container's PID 1 only (e.g. systemd) and excludes other processes. This can be confirmed by straceing podman (using memory for this example):

strace -f -e trace=file podman stats --no-stream --no-reset --latest 2>&1 | grep memory.current
[pid 453023] openat(AT_FDCWD, "/sys/fs/cgroup/machine.slice/libpod-86eddf9e8ebe881f362e4e6eb28f10f13b8ca24ac8bb3d86f1373f80f0bcd6ab.scope/init.scope/memory.current", O_RDONLY|O_CLOEXEC) = 9

If instead the containers' parent cgroup is inspected, a more reasonable number is obtained:

# cat /sys/fs/cgroup/machine.slice/libpod-86eddf9e8ebe881f362e4e6eb28f10f13b8ca24ac8bb3d86f1373f80f0bcd6ab.scope/memory.current 
31125504

Describe the results you expected:

By default as a user I expected podman stats to give me numbers "for the whole container".

I imagine there may be a compatibility concern where historically containers have been single-process and thus podman stats should continue to report only about PID 1. Perhaps then a flag could be added.

Output of podman version:

Version:      3.2.3
API Version:  3.2.3
Go Version:   go1.15.14
Built:        Thu Sep 23 14:22:19 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.21.3
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.29-1.module_el8.4.0+886+c9a8d9ad.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: 97bba1e91aaab5be2e93bacd34ec4e66655a02ae'
  cpus: 32
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: file
  hostname: localhost
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.0-1.el8.elrepo.x86_64
  linkmode: dynamic
  memFree: 26997882880
  memTotal: 66860367872
  ociRuntime:
    name: runc
    package: runc-1.0.0-74.rc95.module_el8.4.0+886+c9a8d9ad.x86_64
    path: /usr/bin/runc
    version: |-
      runc version spec: 1.0.2-dev
      go: go1.15.14
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 33762570240
  swapTotal: 33764143104
  uptime: 192h 52m 17.55s (Approximately 8.00 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 2
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /home/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 36
  runRoot: /run/containers/storage
  volumePath: /home/containers/storage/volumes
version:
  APIVersion: 3.2.3
  Built: 1632432139
  BuiltTime: Thu Sep 23 14:22:19 2021
  GitCommit: ""
  GoVersion: go1.15.14
  OsArch: linux/amd64
  Version: 3.2.3

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.2.3-0.11.module_el8.4.0+942+d25aada8.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

I have not tried the latest version of podman, but I have tried with locally built crun from a recent commit and that made no difference.

# /usr/local/crun/bin/crun --version
crun version 1.2.34-e2dd
commit: e2ddb7c503b78b5446824b7ac40293fb461344ee
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL

Additional environment details (AWS, VirtualBox, physical, etc.):

Base system is centos 8, but running the latest elrepo kernel. systemd configured to use the unified hierarchy. SELinux is in permissive mode.

# uname -r
5.15.0-1.el8.elrepo.x86_64
# systemctl --version
systemd 239 (239-45.el8)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy
# cat /proc/cmdline 
... systemd.unified_cgroup_hierarchy=1
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 23, 2021
@giuseppe giuseppe self-assigned this Nov 24, 2021
@giuseppe
Copy link
Member

opened a PR: #12403

giuseppe added a commit to giuseppe/libpod that referenced this issue Nov 24, 2021
improve the heuristic to detect the scope that was created for the container.
This is necessary with systemd running as PID 1, since it moves itself
to a different sub-cgroup, thus stats would not account for other
processes in the same container.

Closes: containers#12400

Signed-off-by: Giuseppe Scrivano <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Dec 6, 2021
improve the heuristic to detect the scope that was created for the container.
This is necessary with systemd running as PID 1, since it moves itself
to a different sub-cgroup, thus stats would not account for other
processes in the same container.

Closes: containers#12400

Signed-off-by: Giuseppe Scrivano <[email protected]>

<MH: Fixed cherry-pick conflicts>

Signed-off-by: Matthew Heon <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants