Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New flake: unable to obtain cgroup stats: open /sys/fs/cgroup/pids/pids.current: ENOENT #8397

Closed
edsantiago opened this issue Nov 18, 2020 · 13 comments · Fixed by #8422
Closed
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

This one started just yesterday, and it's hitting us all over the place. The basic gist is:

$ podman-remote [options] pod stats --no-stream ...
Error: unable to obtain cgroup stats: open /sys/fs/cgroup/pids/pids.current: no such file or directory

Only on ubuntu-19 and -20; only root, not rootless. Seems to happen both local and remote. What changed yesterday?


Podman pod stats [It] podman stats on running pods

Podman pod stats [It] podman stats on a specific running pod

Podman pod stats [It] podman stats on a specific running pod with name

Podman pod stats [It] podman stats on a specific running pod with shortID

Podman pod stats [It] podman stats on net=host post

Podman pod stats [It] podman stats with GO template

Podman pod stats [It] podman stats with json output

@edsantiago edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Nov 18, 2020
@rhatdan
Copy link
Member

rhatdan commented Nov 18, 2020

@giuseppe @cevich WDYT?
Seems like something is screwed up in cgroups V1 on these boxes? Could their be new versions of Ubuntu out there?

@cevich
Copy link
Member

cevich commented Nov 18, 2020

Oh! I seem to remember encountering this problem as well recently. I don't recall updating the images recently, but I could be wrong, and that's the only way they could change versions 😕 @edsantiago commit 92e31a2 updated images I built the day before, so Tue Nov 10th.

Both Ubuntu's (19/20) are pulling cri-o-runc from OBS. I'm not sure how to tell if/when those packages were updated in relation to the 10th. @lsm5 do you have any insight?

If not, it leaves the "hard" way: Go trolling through the Cirrus-CI build history, comparing the contents of package_versions script output. I can take a look at that tomorrow afternoon most likely.

@edsantiago
Copy link
Member Author

So, #8290 merged yesterday, and it deals with stats and cgroups. My quick look at it doesn't find anything that could be related to this failure, but it's the only cgroup or stats PR in the last few weeks. @vrothberg would you mind looking at the failures above and seeing if your PR could have possibly caused this?

@vrothberg
Copy link
Member

#8290 may be related but it doesn't add up yet.

@giuseppe WDYT?

@giuseppe
Copy link
Member

#8290 may be related but it doesn't add up yet.

@giuseppe WDYT?

it could be. If the host is on cgroup v1, there are multiple controllers present in /proc/PID/cgroup and we may be using the wrong one?

@cevich
Copy link
Member

cevich commented Nov 19, 2020

Yes...both Ubuntu 19/20 are cgv1 w/ runc coming from OBS.

@edsantiago
Copy link
Member Author

I don't think it's a runc issue; I did a quick check yesterday, comparing package_versions from today vs 2 weeks ago, and runc/crun are the same. I'm feeling more and more certain that it's #8290

@zhangguanzhang
Copy link
Collaborator

https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/5207530957701120/html/int-remote-ubuntu-20-root-host.log.html

         Running: podman-remote [options] pod stats --no-stream
[+2840s] Error: unable to obtain cgroup stats: open /sys/fs/cgroup/pids/pids.current: no such file or directory

on my host ,

[root@Centos8 ~]# ls -l /sys/fs/cgroup/pids/ | grep pids
# no output

@edsantiago
Copy link
Member Author

Ooooh! If you still have access to this system, can you post the results of cat /proc/$$/cgroup please?

@zhangguanzhang
Copy link
Collaborator

[root@Centos8 ~]# ls -l /sys/fs/cgroup/pids/ | grep pids
[root@Centos8 ~]# cat /proc/$$/cgroup
12:freezer:/
11:perf_event:/
10:hugetlb:/
9:cpu,cpuacct:/user.slice
8:net_cls,net_prio:/
7:memory:/user.slice/user-0.slice/session-78.scope
6:pids:/user.slice/user-0.slice/session-78.scope
5:rdma:/
4:devices:/user.slice
3:blkio:/user.slice
2:cpuset:/
1:name=systemd:/user.slice/user-0.slice/session-78.scope

@edsantiago
Copy link
Member Author

Thank you! @giuseppe @vrothberg I hope that helps?

vrothberg added a commit to vrothberg/libpod that referenced this issue Nov 20, 2020
When running on cgroups v1, `/proc/{PID}/cgroup` has multiple entries,
each pointing potentially to a different cgroup.  Some may be empty,
some may point to parents.

The one we really need is the libpod-specific one, which always is the
longest path.  So instead of looking at the first entry, look at all and
select the longest one.

Fixes: containers#8397
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member

Opened #8422 after talking with @giuseppe

@vrothberg
Copy link
Member

Thanks for providing all the data, @edsantiago

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants