-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman run --uidmap: fails on cgroups v1 #15025
Comments
...and enable the at-test-time confirmation, the one that double-checks that if CI requests runc we actually use runc. This exposed a nasty surprise in our setup: there are steps to define $OCI_RUNTIME, but that's actually a total fakeout! OCI_RUNTIME is used only in e2e tests, it has no effect whatsoever on actual podman itself as invoked via command line such as in system tests. Solution: use containers.conf Given how fragile all this runtime stuff is, I've also added new tests (e2e and system) that will check $CI_DESIRED_RUNTIME. Image source: containers/automation_images#146 Since we haven't actually been testing with runc, we need to fix a few tests: - handle an error-message change (make it work in both crun and runc) - skip one system test, "survive service stop", that doesn't work with runc and I don't think we care. ...and skip a bunch, filing issues for each: - containers#15013 pod create --share-parent - containers#15014 timeout in dd - containers#15015 checkpoint tests time out under $CONTAINER - containers#15017 networking timeout with registry - containers#15018 restore --pod gripes about missing --pod - containers#15025 run --uidmap broken - containers#15027 pod inspect cgrouppath broken - ...and a bunch more ("podman pause") that probably don't even merit filing an issue. Also, use /dev/urandom in one test (was: /dev/random) because the test is timing out and /dev/urandom does not block. (But the test is still timing out anyway, even with this change) Also, as part of the VM switch we are now using go 1.18 (up from 1.17) and this broke the gitlab tests. Thanks to @Luap99 for a quick fix. Also, slight tweak to containers#15021: include the timeout value, and reword message so command string is at end. Also, fixed a misspelling in a test name. Fixes: containers#14833 Signed-off-by: Ed Santiago <[email protected]>
I've launched manually that command in a Ubuntu 2204 VM configured with cgroupv1 with:
and I don't see that failure:
same results using Podman from the main branch. Could be something in our images or the runc version used? @cevich could you point me to how the image was created? |
There's a script called What I normally do these days is, if I have a PR that fails, go to the Cirrus log page, there's a button at top right that says Re-Run, it has an option to Re-run with an in-browser ssh session. It doesn't last long, but it works well enough. You could submit a PR that comments out my (Or, you can see if |
Thanks! That was helpful. It is an issue in the runc version installed in the Ubuntu image. It is already fixed upstream with d370e3c04660201e72ba6968342ce964c31a2d7f |
Thank you! I guess I know what I'm doing with my week. |
Thanks @edsantiago and @giuseppe for tracking this down. Getting this fixed in our CI (finally) will be a significant improvement. There are plenty of people running podman in CGv1 environments. So having things like |
Source: containers/automation_images#157 Reason: see if new Ubuntu images have fixed runc Fixes: containers#15025 Signed-off-by: Ed Santiago <[email protected]>
Anyone know where we stand on this? Has the magic version been updated on Ubuntu? Has anyone built VMs recently? |
This is an empty commit and empty PR. Purpose is to see if Ubuntu has picked up a fixed runc, which would allow us to close containers/podman#15025 Signed-off-by: Ed Santiago <[email protected]>
Most recent images I'm aware of are
So nothing there. The nuclear-option here is to just ask @lsm5 to roll a custom one for us as we've done for many years. He's been avoiding that (IIRC) b/c it's a PITA to maintain. |
Never mind. No images built since July 27. Opened containers/automation_images#166 to see if Ubuntu is fixed (and to find out what new bugs appear) |
Hi. Me again. I took a look at Ubuntu image built in containers/automation_images#178 . Nope, still the same broken |
There should be another Ubuntu release coming up in October ( |
Not really. I care primarily about RHEL, secondarily about Fedora, and pretty much zero about Ubuntu. My experience with Ubuntu is that every time they fix one thing, they break two others, so every VM update is an ordeal. My preference, therefore, is to minimize VM updates. |
OMG Ubuntu is a nightmare. I'd prefer to not care about it either, unfortunately, it's quite popular 😞 In any case, given your priorities, I'd suggest not even bothering to check until |
would it make sense to test cgroupv1 and runc onto another distro where we have more control (like Fedora or CentOS Stream)? We could restrict Ubuntu to the default configuration with crun and cgroup v2. |
I would love to get away from Ubuntu; see #15337. And it's not just cgroups v1/v2, see #15360 (comment) |
Would it make more sense to test on centos/stream 8 for V1 and runc, drop old versions of Ubuntu and only test crun/v2 |
ATM, we're only testing the LTS (which is also the current latest) Ubuntu. /cc @cevich |
I guess I can push a new runc to Fedora and use debbuild to generate debian package as well. I don't want to add it to my autobuilder job but I can do runc updates on-demand. Would that work? |
IMO we are not really testing Ubuntu if we use different packages. We end up doing the job without much advantage out of it. |
And THAT is the crux of the matter right there. I also fear we'd end up in that same boat if we tried testing with CentOS, it's just "too old" for the bleeding-edge needs of upstream CI. Heck, we've even turned off F35 in CI despite it still being fully supported due to not wanting to deal with old crap. So for the "how to test runc" issue which seems central here: I don't see any technical reason we couldn't add a I can reduce the number of tasks presented, but the trade off is, more are bundled together which has it's own down-sides for flakes and re-running. |
"Too old" may be exactly what we want: our goal, remember, is to test RHEL — or, since we can't test RHEL, a proxy. |
No, assuming we're talking about upstream-CI, that puts us in exactly the same position. We wouldn't be testing an OS that a user would ever actually use - it becomes a franken' OS due to replacing vast numbers of (for example) bleeding-edge podman dependencies. Big-picture architecture-wise, remember upstream CI is to catch low-hanging fruit problems as quickly as possible, before they make it downstream. That DOES NOT relieve downstream of the responsibility for additional integration testing, in THEIR USERS environment. In other words, we should not try to mix two different testing-contexts all within upstream. That road only leads to pain. I would even go so far as to suggest, perhaps we shouldn't even be testing Fedora releases upstream, only rawhide. We've not gone there yet but I wouldn't consider it unreasonable. |
A friendly reminder that this issue had no activity for 30 days. |
Abandoned. Should anyone other than me care about this, please reopen. |
To silence my find-obsolete-skips script: - containers#11784 : issue closed wont-fix - containers#15013 : issue closed, we no longer test with runc - containers#15014 : bump timeout, see if that fixes things - containers#15025 : issue closed, we no longer test with runc ...and one FIXME not associated with an issue, ubuntu-related, and we no longer test ubuntu. Signed-off-by: Ed Santiago <[email protected]>
To silence my find-obsolete-skips script, remove the '#' from the following issues in skip messages: containers#11784 containers#15013 containers#15025 containers#17433 containers#17436 containers#17456 Also update the messages to reflect the fact that the issues will never be fixed. Also remove ubuntu skips: we no longer test ubuntu. Also remove one buildah skip that is no longer applicable: Fixes: containers#17520 Signed-off-by: Ed Santiago <[email protected]>
Anything using
--uidmap
is failing on cgroups v1 with runc:The text was updated successfully, but these errors were encountered: