-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman run or restore: requested cgroup controller pids
is not available
#9752
Comments
Two more instances, both on the same test run: sys: podman run : --userns=keep-id: passwd file is modifiable
sys: podman run : add username to /etc/passwd if --userns=keep-id
|
And another: again in remote ubuntu-2010 root |
And yet another:
|
I think some runner machine does not enable the |
Another:
|
Two more: sys: podman run : user namespace preserved root ownership
sys: podman run docker-archive
|
Now seeing it in buildah CI, in setup (the registry thing) in https://github.com/containers/buildah/pull/3186/checks?check_run_id=2450159721 |
A friendly reminder that this issue had no activity for 30 days. |
I just saw this on my own laptop, testing main @ 192d16e6a3c4801dee468b6b7f4de52952a80b09, on a # /home/esm/src/atomic/2018-02.podman/libpod/bin/podman container restore 342c2357fdd47755f5f6231b361968485bc343a05953fe2ccea6dcab1d9dcb6e
Error: OCI runtime error: the requested cgroup controller `pids` is not available It has worked all day, and ran fine on Local, not remote, so I've removed the remote tag. Podman run [It] podman run with cgroups=split
|
Okay... so I'm working on #11957, and in my local (laptop) testing, I'm seeing this flake about once in every 5-10 runs. This means that, if my PR gets merged, it will flake in half of CI runs. This flake needs to be fixed. Pretty please? Here's the best reproducer I can offer: $ while :;do sudo bin/podman run -d --name foo quay.io/libpod/testimage:20210610 sh -c 'while :;do cat /proc/uptime;done';sudo bin/podman container checkpoint foo;sudo bin/podman container logs foo >/dev/null;sudo bin/podman container inspect foo >/dev/null;sleep 0.5;sudo bin/podman container restore foo || break;sudo bin/podman container rm -f -t 0 foo;done
...
4476405605bf413c7f2305ec9e19abba6044175514dcb4947b024daf3c97cfa3
Error: OCI runtime error: the requested cgroup controller `pids` is not available It's a poor reproducer: in one attempt, it failed within seconds. On another, it ran fine for 15 minutes. I will try to work on a better one, but right now I need to move on for the day. |
pids
is not availablepids
is not available
Here's a slightly different reproducer; this one has failed at 2s, 54s, 283s, 294s. t0=$SECONDS;while :;do sudo bin/podman run -d --name foo quay.io/libpod/testimage:20210610 sh -c "while :;do awk '{print $1}' </proc/uptime | tr -d .;sleep 0.1;done";sleep 0.1;sudo bin/podman container logs foo >/dev/null;sudo bin/podman container checkpoint foo;sleep 0.4;sudo bin/podman container restore foo || break;sudo bin/podman container rm -f -t 0 foo;done;t1=$SECONDS;echo $((t1 - t0)) seconds |
Yep, that works well enough. 2s, 20s, 177s, always less than 5 minutes. One more data point: after this crash, retrying still fails but a different way: $ sudo bin/podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
373c121f8693 quay.io/libpod/testimage:20210610 sh -c while :;do ... 4 minutes ago Exited (0) 4 minutes ago foo
$ sudo bin/podman container restore foo
Error: OCI runtime error: sd-bus call: File exists |
In my experience, a major milestone in fixing races is getting a fast reproducer, so this is excellent. What sort of environment is that being done under? Always Eyeballing the environments above, it looks like a lot of Ubuntu 21.10 + Has @giuseppe taken a look at this? |
Oops! I forgot to mention: that's on my laptop (f34) using main as of yesterday. And yes, crun. |
Oh that's interesting, so plenty of CPU and memory available then. Ya I think this is mosdef @giuseppe territory. |
it is an issue in crun (or better I think in the kernel), but in any case, we need to account for it in crun. I am still validating my patch, opening a PR as soon as I've finished testing it |
It seems the kernel can return EBUSY when a process was moved to a sub-cgroup and the controllers are enabled in its parent cgroup. On EBUSY retry a few times until a controller could be enabled. Reported: containers/podman#9752 Signed-off-by: Giuseppe Scrivano <[email protected]>
PR here: containers/crun#758 |
It seems the kernel can return EBUSY when a process was moved to a sub-cgroup and the controllers are enabled in its parent cgroup. On EBUSY retry a few times until a controller could be enabled. Reported: containers/podman#9752 Signed-off-by: Giuseppe Scrivano <[email protected]>
The "requested cgroup controller 'pids' is not available" message also appeared (not recently) in the cgroups=split test: Podman run [It] podman run with cgroups=split
|
containers/crun#758 has merged on Oct 19 and since CI is using newer cruns, I think we're good to close. Please reopen if I am mistaken. |
People are still running into this on RHEL 8 boxes. As per #1897579 on the bugzilla. Running the containers rootless with a linger enabled user, with user units CGroupsV2 enabled on the host via systemd running with modified defaults in DefaultCPUAccounting=yes
DefaultIOAccounting=yes
DefaultMemoryAccounting=yes
DefaultTasksAccounting=yes Passed through to the user via [Slice]
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
TasksAccounting=yes Podman configured to use [containers]
runtime = "crun" Running default kernel Doing a system-wide However, it will still yield broken rootless containers on host restart. Is a systemd user unit dependency for CGroup setup on the podman user units to ensure creation a possibility? |
Can you comment to that effect in the Bugzilla? We can swap it over to point at systemd, but having more context on what fix is necessary would be good. |
Another one of those hard-to-track-down flakes that appears in different tests:
Only two instances, both in the last two days, both on Ubuntu 2010:
sys: podman cp - will not recognize symlink pointing into host space
sys: Verify /run/.containerenv exist
The text was updated successfully, but these errors were encountered: