podman run or restore: requested cgroup controller `pids` is not available #9752

edsantiago · 2021-03-18T16:18:07Z

Another one of those hard-to-track-down flakes that appears in different tests:

# podman-remote run [...]
Error: error preparing container <sha> for attach: the requested cgroup controller `pids` is not available: OCI runtime error

Only two instances, both in the last two days, both on Ubuntu 2010:

sys: podman cp - will not recognize symlink pointing into host space

gce_instance:ubuntu : sys remote ubuntu-2010 root host
- PR [v3.0.1-rhel] podman cp: ignore EPERMs in rootless mode #9732
  - 03-17 10:25

sys: Verify /run/.containerenv exist

gce_instance:ubuntu : sys remote ubuntu-2010 root host
- PR [NO TESTS NEEDED] pkg/bindings/images.Build(): fix a race condition in error reporting #9717
  - 03-16 18:00

The text was updated successfully, but these errors were encountered:

edsantiago · 2021-03-24T14:33:14Z

Two more instances, both on the same test run:

sys: podman run : --userns=keep-id: passwd file is modifiable

gce_instance:ubuntu : sys remote ubuntu-2010 root host
- PR Bump to v3.1.0-RC2 #9789
  - 03-23 14:37

sys: podman run : add username to /etc/passwd if --userns=keep-id

gce_instance:ubuntu : sys remote ubuntu-2010 root host
- PR Bump to v3.1.0-RC2 #9789
  - 03-23 14:37

edsantiago · 2021-03-31T11:44:47Z

And another: again in remote ubuntu-2010 root

edsantiago · 2021-03-31T14:10:22Z

And yet another:

gce_instance:ubuntu : sys podman ubuntu-2010 root host
- PR [NO TESTS NEEDED] Remove semantic version suffices from API calls #9878
  - 03-30 13:11

zhangguanzhang · 2021-04-01T07:50:46Z

I think some runner machine does not enable the pids cgroup

edsantiago · 2021-04-06T12:30:27Z

Another:

instance:GCEInstance : sys remote ubuntu-2010 root host
- PR [ci:docs] Update release notes to indicate CVE fix #9939
  - 04-05 11:54

edsantiago · 2021-04-08T13:29:54Z

Two more:

sys: podman run : user namespace preserved root ownership

instance:GCEInstance : sys podman ubuntu-2010 root host
- PR use updated ubuntu images #9963
  - 04-07 12:08

sys: podman run docker-archive

instance:GCEInstance : sys podman ubuntu-2010 root host
- PR use updated ubuntu images #9963
  - 04-07 11:49

edsantiago · 2021-04-27T17:55:14Z

Now seeing it in buildah CI, in setup (the registry thing) in https://github.com/containers/buildah/pull/3186/checks?check_run_id=2450159721

github-actions · 2021-05-28T00:10:27Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2021-10-13T18:44:26Z

I just saw this on my own laptop, testing main @ 192d16e6a3c4801dee468b6b7f4de52952a80b09, on a podman container restore.

# /home/esm/src/atomic/2018-02.podman/libpod/bin/podman container restore 342c2357fdd47755f5f6231b361968485bc343a05953fe2ccea6dcab1d9dcb6e
Error: OCI runtime error: the requested cgroup controller `pids` is not available

It has worked all day, and ran fine on !!, so it's a flake.

Local, not remote, so I've removed the remote tag.

Podman run [It] podman run with cgroups=split

fedora-34 : int podman fedora-34 root host
- PR Cirrus: Fix defunct package metadata breaking cache #11834
  - 10-01 16:37
ubuntu-2104 : int podman ubuntu-2104 root host
- PR Only add 127.0.0.1 entry to /etc/hosts with --net=none #11605
  - 09-16 13:11

edsantiago · 2021-10-17T15:28:17Z

Okay... so I'm working on #11957, and in my local (laptop) testing, I'm seeing this flake about once in every 5-10 runs. This means that, if my PR gets merged, it will flake in half of CI runs. This flake needs to be fixed. Pretty please? Here's the best reproducer I can offer:

$ while :;do sudo bin/podman run -d --name foo quay.io/libpod/testimage:20210610 sh -c 'while :;do cat /proc/uptime;done';sudo bin/podman container checkpoint foo;sudo bin/podman container logs foo >/dev/null;sudo bin/podman container inspect foo >/dev/null;sleep 0.5;sudo bin/podman container restore foo || break;sudo bin/podman container rm -f -t 0 foo;done
...
4476405605bf413c7f2305ec9e19abba6044175514dcb4947b024daf3c97cfa3
Error: OCI runtime error: the requested cgroup controller `pids` is not available

It's a poor reproducer: in one attempt, it failed within seconds. On another, it ran fine for 15 minutes. I will try to work on a better one, but right now I need to move on for the day.

edsantiago · 2021-10-17T19:41:49Z

Here's a slightly different reproducer; this one has failed at 2s, 54s, 283s, 294s.

t0=$SECONDS;while :;do sudo bin/podman run -d --name foo quay.io/libpod/testimage:20210610 sh -c "while :;do awk '{print $1}' </proc/uptime | tr -d .;sleep 0.1;done";sleep 0.1;sudo bin/podman container logs foo >/dev/null;sudo bin/podman container checkpoint foo;sleep 0.4;sudo bin/podman container restore foo || break;sudo bin/podman container rm -f -t 0 foo;done;t1=$SECONDS;echo $((t1 - t0)) seconds

edsantiago · 2021-10-17T20:07:45Z

Yep, that works well enough. 2s, 20s, 177s, always less than 5 minutes.

One more data point: after this crash, retrying still fails but a different way:

$ sudo bin/podman ps -a
CONTAINER ID  IMAGE                              COMMAND               CREATED        STATUS                    PORTS       NAMES
373c121f8693  quay.io/libpod/testimage:20210610  sh -c while :;do ...  4 minutes ago  Exited (0) 4 minutes ago              foo
$ sudo bin/podman container restore foo
Error: OCI runtime error: sd-bus call: File exists

cevich · 2021-10-18T16:24:17Z

Yep, that works well enough. 2s, 20s, 177s, always less than 5 minutes.

In my experience, a major milestone in fixing races is getting a fast reproducer, so this is excellent. What sort of environment is that being done under? Always crun and never runc?

Eyeballing the environments above, it looks like a lot of Ubuntu 21.10 + crun. A few mentions of F34, which I assume are also crun.

Has @giuseppe taken a look at this?

edsantiago · 2021-10-18T16:26:39Z

Oops! I forgot to mention: that's on my laptop (f34) using main as of yesterday. And yes, crun.

cevich · 2021-10-18T17:10:56Z

Oh that's interesting, so plenty of CPU and memory available then. Ya I think this is mosdef @giuseppe territory.

giuseppe · 2021-10-19T12:42:01Z

it is an issue in crun (or better I think in the kernel), but in any case, we need to account for it in crun. I am still validating my patch, opening a PR as soon as I've finished testing it

It seems the kernel can return EBUSY when a process was moved to a sub-cgroup and the controllers are enabled in its parent cgroup. On EBUSY retry a few times until a controller could be enabled. Reported: containers/podman#9752 Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe · 2021-10-19T14:12:15Z

PR here: containers/crun#758

It seems the kernel can return EBUSY when a process was moved to a sub-cgroup and the controllers are enabled in its parent cgroup. On EBUSY retry a few times until a controller could be enabled. Reported: containers/podman#9752 Signed-off-by: Giuseppe Scrivano <[email protected]>

edsantiago · 2021-11-15T13:18:58Z

The "requested cgroup controller 'pids' is not available" message also appeared (not recently) in the cgroups=split test:

Podman run [It] podman run with cgroups=split

fedora-34 : int podman fedora-34 root host
- PR Cirrus: Fix defunct package metadata breaking cache #11834
  - 10-01 16:37
ubuntu-2104 : int podman ubuntu-2104 root host
- PR Only add 127.0.0.1 entry to /etc/hosts with --net=none #11605
  - 09-16 13:11

vrothberg · 2021-12-14T10:13:14Z

containers/crun#758 has merged on Oct 19 and since CI is using newer cruns, I think we're good to close. Please reopen if I am mistaken.

emansom · 2022-06-05T13:07:51Z

People are still running into this on RHEL 8 boxes. As per #1897579 on the bugzilla.

Running the containers rootless with a linger enabled user, with user units podman and podman-restart enabled. The latter always fails on reboot, given no cgroup controllers for the user have been setup.

CGroupsV2 enabled on the host via systemd.unified_cgroup_hierarchy=1 on the kernel cmdline.

systemd running with modified defaults in /etc/systemd/system.conf

DefaultCPUAccounting=yes
DefaultIOAccounting=yes
DefaultMemoryAccounting=yes
DefaultTasksAccounting=yes

Passed through to the user via /etc/systemd/system/user-.slice.d/override.conf

[Slice]
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
TasksAccounting=yes

Podman configured to use crun by default in /etc/containers/containers.conf

[containers]
runtime = "crun"

Running default kernel 4.18.0-372.9.1.el8.x86_64 and default systemd version systemd-239-58.el8.x86_64.

Doing a system-wide systemctl daemon-reload works as workaround.

However, it will still yield broken rootless containers on host restart. Is a systemd user unit dependency for CGroup setup on the podman user units to ensure creation a possibility?

emansom · 2022-06-06T12:12:53Z

Seems related to issue #9578 / #9512 in systemd. Can that fix be backported to RHEL 8?

mheon · 2022-06-06T13:32:17Z

Can you comment to that effect in the Bugzilla? We can swap it over to point at systemd, but having more context on what fix is necessary would be good.

edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. remote Problem is in podman-remote labels Mar 18, 2021

github-actions bot added the stale-issue label May 28, 2021

edsantiago removed remote Problem is in podman-remote stale-issue labels Oct 13, 2021

edsantiago changed the title ~~remote: podman run: requested cgroup controller pids is not available~~ podman run or restore: requested cgroup controller pids is not available Oct 17, 2021

edsantiago mentioned this issue Oct 18, 2021

System tests: new checkpoint test #11957

Merged

giuseppe mentioned this issue Oct 19, 2021

cgroup: fix race condition when enabling controllers containers/crun#758

Merged

edsantiago mentioned this issue Nov 15, 2021

Cirrus: Bump Fedora to release 35 #11795

Merged

vrothberg closed this as completed Dec 14, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman run or restore: requested cgroup controller `pids` is not available #9752

podman run or restore: requested cgroup controller `pids` is not available #9752

edsantiago commented Mar 18, 2021

edsantiago commented Mar 24, 2021

edsantiago commented Mar 31, 2021

edsantiago commented Mar 31, 2021

zhangguanzhang commented Apr 1, 2021

edsantiago commented Apr 6, 2021

edsantiago commented Apr 8, 2021

edsantiago commented Apr 27, 2021

github-actions bot commented May 28, 2021

edsantiago commented Oct 13, 2021

edsantiago commented Oct 17, 2021

edsantiago commented Oct 17, 2021

edsantiago commented Oct 17, 2021

cevich commented Oct 18, 2021

edsantiago commented Oct 18, 2021

cevich commented Oct 18, 2021

giuseppe commented Oct 19, 2021

giuseppe commented Oct 19, 2021

edsantiago commented Nov 15, 2021

vrothberg commented Dec 14, 2021

emansom commented Jun 5, 2022 •

edited

Loading

emansom commented Jun 6, 2022 •

edited

Loading

mheon commented Jun 6, 2022

podman run or restore: requested cgroup controller pids is not available #9752

podman run or restore: requested cgroup controller pids is not available #9752

Comments

edsantiago commented Mar 18, 2021

sys: podman cp - will not recognize symlink pointing into host space

sys: Verify /run/.containerenv exist

edsantiago commented Mar 24, 2021

sys: podman run : --userns=keep-id: passwd file is modifiable

sys: podman run : add username to /etc/passwd if --userns=keep-id

edsantiago commented Mar 31, 2021

edsantiago commented Mar 31, 2021

zhangguanzhang commented Apr 1, 2021

edsantiago commented Apr 6, 2021

edsantiago commented Apr 8, 2021

sys: podman run : user namespace preserved root ownership

sys: podman run docker-archive

edsantiago commented Apr 27, 2021

github-actions bot commented May 28, 2021

edsantiago commented Oct 13, 2021

Podman run [It] podman run with cgroups=split

edsantiago commented Oct 17, 2021

edsantiago commented Oct 17, 2021

edsantiago commented Oct 17, 2021

cevich commented Oct 18, 2021

edsantiago commented Oct 18, 2021

cevich commented Oct 18, 2021

giuseppe commented Oct 19, 2021

giuseppe commented Oct 19, 2021

edsantiago commented Nov 15, 2021

Podman run [It] podman run with cgroups=split

vrothberg commented Dec 14, 2021

emansom commented Jun 5, 2022 • edited Loading

emansom commented Jun 6, 2022 • edited Loading

mheon commented Jun 6, 2022

podman run or restore: requested cgroup controller `pids` is not available #9752

podman run or restore: requested cgroup controller `pids` is not available #9752

emansom commented Jun 5, 2022 •

edited

Loading

emansom commented Jun 6, 2022 •

edited

Loading