-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force CgroupsV1 on Ubuntu #146
Conversation
Exactly the same problem as last week but with a different version. That one griped about 3.3, this one is griping about 3.4. I will re-run in about 4 hours, that should fix it.
|
Cirrus CI build successful. Image ID |
criu failure is actually b0rkage in glibc. Being tracked here for now: checkpoint-restore/criu#1935 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @edsantiago for taking this on. Note: The AWS images don't show up yet in the 'new image ID' comment posted by the github-actions bot. You have to manually go into the cirrus task (for each AWS cache-image
) and pull out the "AMI-" ID. I've got a jira card to fix this in the pipeline.
Example, for |
/hold @cevich thanks but this cannot merge: criu is totally broken. I don't know when it'll be fixed. |
f869817
to
5239508
Compare
Cirrus CI build successful. Image ID |
5239508
to
34e1a6a
Compare
Cirrus CI build successful. Image ID |
34e1a6a
to
54167a6
Compare
Cirrus CI build successful. Image ID |
a68c90d
to
acf8000
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one small question/change request.
Reminder: Rebase this to pick up the golang 1.18 change. Also I'm going to add labels to block the |
Built image ID: For podman-machine, the x86_64 AMI ID is: |
Well, the resulting images are manifesting all sorts of crises, but all of them seem to be our fault, not the images. @cevich this is ready for review at your convenience. I've confirmed that the resulting images use runc on Ubuntu, and that the criu works. |
LGTM |
This is fairly typical, and can be the seeds for months long podman PRs 😞 As Lokesh found, they can be quite overwhelming. I recommend focusing on one problem at a time and leaning on the team extensively for help. I'll take a look as well... |
# >>> PLEASE REMOVE THIS ONCE CRIU GETS FIXED IN REGULAR UBUNTU! | ||
# >>> (No, I -- Ed -- have no idea how to even check that, sorry). | ||
# Context: https://github.com/containers/podman/pull/14972 | ||
# Context: https://github.com/checkpoint-restore/criu/issues/1935 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, thanks for the comment and links.
Note: Fedora-35 is disabled due to missing golang 1.18 Ref: containers/automation_images#140 and containers/automation_images#149 and containers/automation_images#146 Signed-off-by: Chris Evich <[email protected]>
...and enable the at-test-time confirmation, the one that double-checks that if CI requests runc we actually use runc. This exposed a nasty surprise in our setup: there are steps to define $OCI_RUNTIME, but that's actually a total fakeout! OCI_RUNTIME is used only in e2e tests, it has no effect whatsoever on actual podman itself as invoked via command line such as in system tests. Solution: use containers.conf Given how fragile all this runtime stuff is, I've also added new tests (e2e and system) that will check $CI_DESIRED_RUNTIME. Image source: containers/automation_images#146 Since we haven't actually been testing with runc, we need to fix a few tests: - handle an error-message change (make it work in both crun and runc) - skip one system test, "survive service stop", that doesn't work with runc and I don't think we care. ...and skip a bunch, filing issues for each: - containers#15013 pod create --share-parent - containers#15014 timeout in dd - containers#15015 checkpoint tests time out under $CONTAINER - containers#15017 networking timeout with registry - containers#15018 restore --pod gripes about missing --pod - containers#15025 run --uidmap broken - containers#15027 pod inspect cgrouppath broken - ...and a bunch more ("podman pause") that probably don't even merit filing an issue. Also, use /dev/urandom in one test (was: /dev/random) because the test is timing out and /dev/urandom does not block. (But the test is still timing out anyway, even with this change) Also, as part of the VM switch we are now using go 1.18 (up from 1.17) and this broke the gitlab tests. Thanks to @Luap99 for a quick fix. Also, slight tweak to containers#15021: include the timeout value, and reword message so command string is at end. Also, fixed a misspelling in a test name. Fixes: containers#14833 Signed-off-by: Ed Santiago <[email protected]>
PR #115 removed a force-cgroups-v2 setup for Ubuntu, possibly
assuming that Ubuntu uses cgroups v1 by default? That doesn't
seem to be the case: the Ubuntu I've looked at (via Cirrus
rerun-with-terminal) seems to default to v2. End result is
that we've been running CI for months without testing runc.
This PR forces cgroups v1 on Ubuntu, via grub boot args.
As of 2022-07-20 the version of criu in Ubuntu is broken,
which requires us to install from something called OBS.
There was some OBS-installing code present, but it didn't
lend itself to reuse, so I refactored it and added a
temporary use-criu-from-obs line with a timestamped FIXME.
Signed-off-by: Ed Santiago [email protected]