podman run --uidmap: fails on cgroups v1 #15025

edsantiago · 2022-07-21T14:43:00Z

Anything using --uidmap is failing on cgroups v1 with runc:

# podman run --uidmap 0:10001:10002 --rm --hostname BtoukoyxkBlQPjuQDYLbVeZzC quay.io/libpod/testimage:20220615 grep BtoukoyxkBlQPjuQDYLbVeZzC /etc/hosts
Error: runc: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount /proc/self/fd/11:/sys/fs/cgroup/systemd (via /proc/self/fd/12), flags: 0x20502f: operation not permitted: OCI permission denied
[ rc=126 (** EXPECTED 0 **) ]

The text was updated successfully, but these errors were encountered:

@Luap99

...and enable the at-test-time confirmation, the one that double-checks that if CI requests runc we actually use runc. This exposed a nasty surprise in our setup: there are steps to define $OCI_RUNTIME, but that's actually a total fakeout! OCI_RUNTIME is used only in e2e tests, it has no effect whatsoever on actual podman itself as invoked via command line such as in system tests. Solution: use containers.conf Given how fragile all this runtime stuff is, I've also added new tests (e2e and system) that will check $CI_DESIRED_RUNTIME. Image source: containers/automation_images#146 Since we haven't actually been testing with runc, we need to fix a few tests: - handle an error-message change (make it work in both crun and runc) - skip one system test, "survive service stop", that doesn't work with runc and I don't think we care. ...and skip a bunch, filing issues for each: - containers#15013 pod create --share-parent - containers#15014 timeout in dd - containers#15015 checkpoint tests time out under $CONTAINER - containers#15017 networking timeout with registry - containers#15018 restore --pod gripes about missing --pod - containers#15025 run --uidmap broken - containers#15027 pod inspect cgrouppath broken - ...and a bunch more ("podman pause") that probably don't even merit filing an issue. Also, use /dev/urandom in one test (was: /dev/random) because the test is timing out and /dev/urandom does not block. (But the test is still timing out anyway, even with this change) Also, as part of the VM switch we are now using go 1.18 (up from 1.17) and this broke the gitlab tests. Thanks to @Luap99 for a quick fix. Also, slight tweak to containers#15021: include the timeout value, and reword message so command string is at end. Also, fixed a misspelling in a test name. Fixes: containers#14833 Signed-off-by: Ed Santiago <[email protected]>

giuseppe · 2022-07-22T16:32:12Z

I've launched manually that command in a Ubuntu 2204 VM configured with cgroupv1 with:

# sed -i -e 's/^GRUB_CMDLINE_LINUX=""/GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub
# upgrade-grub
# reboot

and I don't see that failure:

# podman --runtime runc run --uidmap 0:10001:10002 --rm --hostname BtoukoyxkBlQPjuQDYLbVeZzC quay.io/libpod/testimage:20220615 grep BtoukoyxkBlQPjuQDYLbVeZzC /etc/hosts
10.88.0.14      BtoukoyxkBlQPjuQDYLbVeZzC nostalgic_euclid

# podman --runtime /root/runc-upstream/runc run --uidmap 0:10001:10002 --rm --hostname BtoukoyxkBlQPjuQDYLbVeZzC quay.io/libpod/testimage:20220615 grep BtoukoyxkBlQPjuQDYLbVeZzC /etc/hosts
10.88.0.15      BtoukoyxkBlQPjuQDYLbVeZzC busy_thompson

# podman --cgroup-manager cgroupfs --runtime /root/runc/runc run --uidmap 0:10001:10002 --rm --hostname BtoukoyxkBlQPjuQDYLbVeZzC quay.io/libpod/testimage:20220615 grep BtoukoyxkBlQPjuQDYLbVeZzC /etc/hosts
10.88.0.16      BtoukoyxkBlQPjuQDYLbVeZzC keen_booth

# podman --cgroup-manager cgroupfs --runtime runc run --uidmap 0:10001:10002 --rm --hostname BtoukoyxkBlQPjuQDYLbVeZzC quay.io/libpod/testimage:20220615 grep BtoukoyxkBlQPjuQDYLbVeZzC /etc/hosts
10.88.0.17      BtoukoyxkBlQPjuQDYLbVeZzC objective_fermat

same results using Podman from the main branch.

Could be something in our images or the runc version used?

@cevich could you point me to how the image was created?

edsantiago · 2022-07-25T12:10:50Z

There's a script called hack/get_ci_vm that (if it works) can be used to get a CI VM that you can ssh into.

What I normally do these days is, if I have a PR that fails, go to the Cirrus log page, there's a button at top right that says Re-Run, it has an option to Re-run with an in-browser ssh session. It doesn't last long, but it works well enough. You could submit a PR that comments out my Skips, then when it fails, do that magic Re-run and poke in the system.

(Or, you can see if hack/get_ci_vm will work for you.)

giuseppe · 2022-07-25T12:38:04Z

Thanks! That was helpful.

It is an issue in the runc version installed in the Ubuntu image.

It is already fixed upstream with d370e3c04660201e72ba6968342ce964c31a2d7f

edsantiago · 2022-07-25T12:43:23Z

Thank you! I guess I know what I'm doing with my week.

cevich · 2022-07-25T15:29:31Z

Thanks @edsantiago and @giuseppe for tracking this down. Getting this fixed in our CI (finally) will be a significant improvement. There are plenty of people running podman in CGv1 environments. So having things like --uidmap not break for them is important. Knowing about these problems early in developers PR's will goes a long way toward increasing confidence in future released packages.

Source: containers/automation_images#157 Reason: see if new Ubuntu images have fixed runc Fixes: containers#15025 Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2022-08-10T18:49:26Z

Anyone know where we stand on this? Has the magic version been updated on Ubuntu? Has anyone built VMs recently?

This is an empty commit and empty PR. Purpose is to see if Ubuntu has picked up a fixed runc, which would allow us to close containers/podman#15025 Signed-off-by: Ed Santiago <[email protected]>

cevich · 2022-08-10T18:55:36Z

Most recent images I'm aware of are c5473162832904192 (two-days old IIRC). Maybe there are clues in quay.io/libpod/ubuntu_podman:c5473162832904192 (warning: 600+MB image):

||/ Name           Version        Architecture Description
+++-==============-==============-============-=================================
ii  runc           1.1.0-0ubuntu1 amd64        Open Container Project - runtime

So nothing there. The nuclear-option here is to just ask @lsm5 to roll a custom one for us as we've done for many years. He's been avoiding that (IIRC) b/c it's a PITA to maintain.

edsantiago · 2022-08-10T18:55:57Z

Never mind. No images built since July 27. Opened containers/automation_images#166 to see if Ubuntu is fixed (and to find out what new bugs appear)

edsantiago · 2022-08-29T20:14:09Z

Hi. Me again. I took a look at Ubuntu image built in containers/automation_images#178 . Nope, still the same broken runc. Carry on.

cevich · 2022-08-30T20:58:51Z

Nope, still the same broken runc. Carry on.

There should be another Ubuntu release coming up in October (22.10). Worth checking if there's a "beta" available and if it has the "right" runc version?

edsantiago · 2022-08-30T21:12:27Z

Not really. I care primarily about RHEL, secondarily about Fedora, and pretty much zero about Ubuntu. My experience with Ubuntu is that every time they fix one thing, they break two others, so every VM update is an ordeal. My preference, therefore, is to minimize VM updates.

cevich · 2022-08-30T21:26:17Z

OMG Ubuntu is a nightmare. I'd prefer to not care about it either, unfortunately, it's quite popular 😞 In any case, given your priorities, I'd suggest not even bothering to check until 22.10 comes out in our CI. I highly doubt we'll see runc change at all/ever in 22.04.

giuseppe · 2022-08-31T11:30:11Z

would it make sense to test cgroupv1 and runc onto another distro where we have more control (like Fedora or CentOS Stream)? We could restrict Ubuntu to the default configuration with crun and cgroup v2.

edsantiago · 2022-08-31T12:56:52Z

I would love to get away from Ubuntu; see #15337. And it's not just cgroups v1/v2, see #15360 (comment)

rhatdan · 2022-08-31T13:30:50Z

Would it make more sense to test on centos/stream 8 for V1 and runc, drop old versions of Ubuntu and only test crun/v2

lsm5 · 2022-08-31T13:46:52Z

ATM, we're only testing the LTS (which is also the current latest) Ubuntu. /cc @cevich

lsm5 · 2022-08-31T13:48:43Z

I guess I can push a new runc to Fedora and use debbuild to generate debian package as well. I don't want to add it to my autobuilder job but I can do runc updates on-demand. Would that work?

giuseppe · 2022-08-31T14:59:15Z

IMO we are not really testing Ubuntu if we use different packages. We end up doing the job without much advantage out of it.

cevich · 2022-08-31T19:47:53Z

IMO we are not really testing Ubuntu if we use different packages.

And THAT is the crux of the matter right there.

I also fear we'd end up in that same boat if we tried testing with CentOS, it's just "too old" for the bleeding-edge needs of upstream CI. Heck, we've even turned off F35 in CI despite it still being fully supported due to not wanting to deal with old crap.

So for the "how to test runc" issue which seems central here: I don't see any technical reason we couldn't add a runc item to some parts of our test matrix, it's easy enough to switch it at runtime (the package is typically already installed). Logistically it's a bit of a hassle on humans, simply given the number of tasks we're running in CI.

I can reduce the number of tasks presented, but the trade off is, more are bundled together which has it's own down-sides for flakes and re-running.

edsantiago · 2022-08-31T20:01:57Z

"Too old" may be exactly what we want: our goal, remember, is to test RHEL — or, since we can't test RHEL, a proxy.

cevich · 2022-09-01T14:10:24Z

No, assuming we're talking about upstream-CI, that puts us in exactly the same position. We wouldn't be testing an OS that a user would ever actually use - it becomes a franken' OS due to replacing vast numbers of (for example) bleeding-edge podman dependencies.

Big-picture architecture-wise, remember upstream CI is to catch low-hanging fruit problems as quickly as possible, before they make it downstream. That DOES NOT relieve downstream of the responsibility for additional integration testing, in THEIR USERS environment. In other words, we should not try to mix two different testing-contexts all within upstream. That road only leads to pain.

I would even go so far as to suggest, perhaps we shouldn't even be testing Fedora releases upstream, only rawhide. We've not gone there yet but I wouldn't consider it unreasonable.

github-actions · 2022-10-02T00:12:50Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2023-01-09T20:37:45Z

Abandoned. Should anyone other than me care about this, please reopen.

To silence my find-obsolete-skips script: - containers#11784 : issue closed wont-fix - containers#15013 : issue closed, we no longer test with runc - containers#15014 : bump timeout, see if that fixes things - containers#15025 : issue closed, we no longer test with runc ...and one FIXME not associated with an issue, ubuntu-related, and we no longer test ubuntu. Signed-off-by: Ed Santiago <[email protected]>

To silence my find-obsolete-skips script, remove the '#' from the following issues in skip messages: containers#11784 containers#15013 containers#15025 containers#17433 containers#17436 containers#17456 Also update the messages to reflect the fact that the issues will never be fixed. Also remove ubuntu skips: we no longer test ubuntu. Also remove one buildah skip that is no longer applicable: Fixes: containers#17520 Signed-off-by: Ed Santiago <[email protected]>

edsantiago mentioned this issue Jul 21, 2022

Bump VMs, to Ubuntu 2204 with cgroups v1 #14972

Merged

edsantiago added a commit to edsantiago/libpod that referenced this issue Jul 25, 2022

Cirrus: Bump VMs

751fe63

Source: containers/automation_images#157 Reason: see if new Ubuntu images have fixed runc Fixes: containers#15025 Signed-off-by: Ed Santiago <[email protected]>

This was referenced Jul 25, 2022

Cirrus: Bump VMs #15065

Closed

Help preserve release-branch CI VM Images containers/automation_images#157

Merged

edsantiago mentioned this issue Aug 10, 2022

SELinux broken on aarch64 #15280

Closed

edsantiago mentioned this issue Aug 10, 2022

See if Ubuntu has new runc containers/automation_images#166

Closed

edsantiago mentioned this issue Aug 16, 2022

CI: Reevaluate use of Ubuntu #15337

Closed

github-actions bot added the stale-issue label Oct 2, 2022

edsantiago added the jetsam "...cargo that is cast overboard to lighten the load in time of distress" label Jan 9, 2023

edsantiago closed this as completed Jan 9, 2023

edsantiago mentioned this issue Jan 9, 2023

Tests: remove/update obsolete skips #17047

Closed

edsantiago mentioned this issue Jul 13, 2023

Tests: remove/update obsolete skips #19234

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman run --uidmap: fails on cgroups v1 #15025

podman run --uidmap: fails on cgroups v1 #15025

edsantiago commented Jul 21, 2022

giuseppe commented Jul 22, 2022

edsantiago commented Jul 25, 2022

giuseppe commented Jul 25, 2022

edsantiago commented Jul 25, 2022

cevich commented Jul 25, 2022

edsantiago commented Aug 10, 2022

cevich commented Aug 10, 2022

edsantiago commented Aug 10, 2022

edsantiago commented Aug 29, 2022

cevich commented Aug 30, 2022

edsantiago commented Aug 30, 2022

cevich commented Aug 30, 2022

giuseppe commented Aug 31, 2022

edsantiago commented Aug 31, 2022

rhatdan commented Aug 31, 2022

lsm5 commented Aug 31, 2022

lsm5 commented Aug 31, 2022

giuseppe commented Aug 31, 2022

cevich commented Aug 31, 2022

edsantiago commented Aug 31, 2022

cevich commented Sep 1, 2022 •

edited

Loading

github-actions bot commented Oct 2, 2022

edsantiago commented Jan 9, 2023

podman run --uidmap: fails on cgroups v1 #15025

podman run --uidmap: fails on cgroups v1 #15025

Comments

edsantiago commented Jul 21, 2022

giuseppe commented Jul 22, 2022

edsantiago commented Jul 25, 2022

giuseppe commented Jul 25, 2022

edsantiago commented Jul 25, 2022

cevich commented Jul 25, 2022

edsantiago commented Aug 10, 2022

cevich commented Aug 10, 2022

edsantiago commented Aug 10, 2022

edsantiago commented Aug 29, 2022

cevich commented Aug 30, 2022

edsantiago commented Aug 30, 2022

cevich commented Aug 30, 2022

giuseppe commented Aug 31, 2022

edsantiago commented Aug 31, 2022

rhatdan commented Aug 31, 2022

lsm5 commented Aug 31, 2022

lsm5 commented Aug 31, 2022

giuseppe commented Aug 31, 2022

cevich commented Aug 31, 2022

edsantiago commented Aug 31, 2022

cevich commented Sep 1, 2022 • edited Loading

github-actions bot commented Oct 2, 2022

edsantiago commented Jan 9, 2023

cevich commented Sep 1, 2022 •

edited

Loading