Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman stats as root broken #13824

Closed
nivekuil opened this issue Apr 11, 2022 · 12 comments · Fixed by #14421
Closed

podman stats as root broken #13824

nivekuil opened this issue Apr 11, 2022 · 12 comments · Fixed by #14421
Assignees
Labels
Good First Issue This issue would be a good issue for a first time contributor to undertake. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@nivekuil
Copy link

nivekuil commented Apr 11, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Steps to reproduce the issue:

  1. Install podman on Arch Linux (4.0.3), Centos 8 Stream (4.0.2) or use Fedora Coreos (4.0.2)

  2. log in as root

  3. podman stats

Describe the results you received:
It just prints Error: Link not found

Describe the results you expected:
not that

Additional information you deem important (e.g. issue happens only occasionally):
podman stats seems to work when run as non-root, tested on Arch Linux, but I get the same error there when run as root.

Using the socket API:
curl --unix-socket /run/podman/podman.sock 'http://d/v4.0.0/libpod/containers/stats'
{"Error":{},"Stats":null}

Output of podman version:
4.0.3 and 4.0.2

** podman info -D **

From coreos:

host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 4
  distribution:
    distribution: fedora
    variant: coreos
    version: "36"
  eventLogger: journald
  hostname: dro-1
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.17.0-300.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 14985670656
  memTotal: 16773632000
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.3-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.3
      commit: 61c9600d1335127eba65632731e2d72bc3f0b9e8
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 40h 43m 52.05s (Approximately 1.67 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 11
    paused: 0
    running: 11
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1646319369
  BuiltTime: Thu Mar  3 14:56:09 2022
  GitCommit: ""
  GoVersion: go1.18beta2
  OsArch: linux/amd64
  Version: 4.0.2
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 11, 2022
@Luap99
Copy link
Member

Luap99 commented Apr 11, 2022

Can you share how do you create your containers? Did you use the network connect/disconnect commands?

@nivekuil
Copy link
Author

The network is configured statically at pod create time like --network hatchery-local-bridge:interface_name=local-bridge,ip=fd42::7cc1:445d. I think you're on to something that it's something to do with how I create the containers though, since I don't have any containers running rootless and that's where podman stats still works. The only other suspicious option I see is --security-opt=seccomp=unconfined

@Luap99
Copy link
Member

Luap99 commented Apr 11, 2022

No this is about the network configuration.
Reproducer:

podman network create t1
podman run --network podman --name test  -d alpine top
podman network connect t1 test
podman network disconnect podman test
podman stats
Error: Link not found

This is the problem:

// FIXME get the interface from the container netstatus
dev := "eth0"

Should be a simple fix if you are interested in creating a PR for that.

@Luap99 Luap99 added the Good First Issue This issue would be a good issue for a first time contributor to undertake. label Apr 11, 2022
@nivekuil
Copy link
Author

How is that supposed to work when the container has multiple interfaces?

@Luap99
Copy link
Member

Luap99 commented Apr 11, 2022

I don't think it does at the moment. The only idea would be to add the sum off all interfaces but I don't think this is a good idea either because of the potential integer overflow.

@nivekuil
Copy link
Author

If we currently imply the existence of an eth0, then a container without any interfaces would also break podman stats? For my purposes I'm trying to extract prometheus metrics from podman, and I would prefer that the stats API be more resilient, delivering what it can even if the network info is unavailable. That might be more immediately tractable than a proper implementation of network stats?

@Luap99
Copy link
Member

Luap99 commented Apr 11, 2022

We already have the network interface names in the network status. This just needs to be changed so that we take a actual interface name from that. If there is no interface name in the status we need to skip obviously.

@nivekuil
Copy link
Author

My concern is whether it would be better to skip and give no data instead of picking one arbitrarily, especially if the API doesn't give feedback about which interface was picked, and give potentially misleading data. The method of picking an interface also should be deterministic based on the network name for consistency across hosts.

@Luap99
Copy link
Member

Luap99 commented Apr 12, 2022

I agree it should be deterministic but skipping it completely is not a good idea. Most people only have one interface so they do not care. Also there could be users who already rely on the returned network stats.

@nivekuil
Copy link
Author

Right, if there is only one then it would not be an arbitrary selection. But it might be better to skip if there are >1 for now.

@vrothberg
Copy link
Member

Ping. Do you ~argree on the solution?

@Luap99
Copy link
Member

Luap99 commented May 4, 2022

I checked what docker is doing. The api is returning the statistics per interface.
The cli will then add them together.

Since we can not break the API at the moment I think we could add them in the backend for now (and fix it in 5.0). This should make it deterministic. Not sure if we need to prevent an integer overflow.

@Luap99 Luap99 self-assigned this May 30, 2022
Luap99 added a commit to Luap99/libpod that referenced this issue May 31, 2022
Hardcoding the interface name is a bad idea. We have no control over the
actual interface name since the user can change it.

The correct thing is to read them from the network status. Since the
contianer can have more than one interface we have to add the RX/TX
values. The other values are currently not used.

For podman 5.0 we should change it so that the API can return the
statistics per interface and the client should sum the TX/RX for the
command output. This is what docker is doing.

Fixes containers#13824

Signed-off-by: Paul Holzinger <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Jun 14, 2022
Hardcoding the interface name is a bad idea. We have no control over the
actual interface name since the user can change it.

The correct thing is to read them from the network status. Since the
contianer can have more than one interface we have to add the RX/TX
values. The other values are currently not used.

For podman 5.0 we should change it so that the API can return the
statistics per interface and the client should sum the TX/RX for the
command output. This is what docker is doing.

Fixes containers#13824

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Good First Issue This issue would be a good issue for a first time contributor to undertake. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants