Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force CgroupsV1 on Ubuntu #146

Merged
merged 1 commit into from
Jul 21, 2022
Merged

Conversation

edsantiago
Copy link
Member

@edsantiago edsantiago commented Jul 6, 2022

PR #115 removed a force-cgroups-v2 setup for Ubuntu, possibly
assuming that Ubuntu uses cgroups v1 by default? That doesn't
seem to be the case: the Ubuntu I've looked at (via Cirrus
rerun-with-terminal) seems to default to v2. End result is
that we've been running CI for months without testing runc.
This PR forces cgroups v1 on Ubuntu, via grub boot args.

As of 2022-07-20 the version of criu in Ubuntu is broken,
which requires us to install from something called OBS.
There was some OBS-installing code present, but it didn't
lend itself to reuse, so I refactored it and added a
temporary use-criu-from-obs line with a timestamped FIXME.

Signed-off-by: Ed Santiago [email protected]

@edsantiago
Copy link
Member Author

Exactly the same problem as last week but with a different version. That one griped about 3.3, this one is griping about 3.4. I will re-run in about 4 hours, that should fix it.

    ubuntu: The following packages have unmet dependencies:
    ubuntu:  libsystemd-dev : Depends: libsystemd0 (= 249.11-0ubuntu3) but 249.11-0ubuntu3.4 is to be installed
    ubuntu:  libudev-dev : Depends: libudev1 (= 249.11-0ubuntu3) but 249.11-0ubuntu3.4 is to be installed
    ubuntu: E: Unable to correct problems, you have held broken packages.
    ubuntu:     exit(100)

@github-actions
Copy link

github-actions bot commented Jul 7, 2022

Cirrus CI build successful. Image ID c5075989926510592 ready for use.

@edsantiago
Copy link
Member Author

Looks like criu is broken in f35 and ubuntu.

@edsantiago
Copy link
Member Author

criu failure is actually b0rkage in glibc. Being tracked here for now: checkpoint-restore/criu#1935

Copy link
Member

@cevich cevich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @edsantiago for taking this on. Note: The AWS images don't show up yet in the 'new image ID' comment posted by the github-actions bot. You have to manually go into the cirrus task (for each AWS cache-image) and pull out the "AMI-" ID. I've got a jira card to fix this in the pipeline.

@cevich
Copy link
Member

cevich commented Jul 11, 2022

Example, for fedora-aws Cache Image, you can look at the manifest.json artifact to see the AMI ID.

@edsantiago
Copy link
Member Author

/hold

@cevich thanks but this cannot merge: criu is totally broken. I don't know when it'll be fixed.

@github-actions
Copy link

Cirrus CI build successful. Image ID c5005250640740352 ready for use.

@github-actions
Copy link

Cirrus CI build successful. Image ID c5316115306905600 ready for use.

@github-actions
Copy link

Cirrus CI build successful. Image ID c4996377506742272 ready for use.

@edsantiago edsantiago force-pushed the ubuntu_cgroups_v1 branch 6 times, most recently from a68c90d to acf8000 Compare July 20, 2022 13:50
Copy link
Member

@cevich cevich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one small question/change request.

cache_images/ubuntu_packaging.sh Show resolved Hide resolved
@cevich
Copy link
Member

cevich commented Jul 20, 2022

Reminder: Rebase this to pick up the golang 1.18 change. Also I'm going to add labels to block the prior-fedora builds. Since the team decided to suspend testing there for now (due to golang 1.18 unavailability).

@cevich cevich added no_prior-fedora Don't build any prior-fedora images no_prior-fedora_podman Don't build the prior-fedora_podman image labels Jul 20, 2022
@cevich
Copy link
Member

cevich commented Jul 20, 2022

Built image ID: c6706201604915200 (bot is broken ATM)

For podman-machine, the x86_64 AMI ID is: ami-0829a020372a04284

@edsantiago
Copy link
Member Author

Well, the resulting images are manifesting all sorts of crises, but all of them seem to be our fault, not the images.

@cevich this is ready for review at your convenience. I've confirmed that the resulting images use runc on Ubuntu, and that the criu works.

@rhatdan
Copy link
Member

rhatdan commented Jul 21, 2022

LGTM

@cevich
Copy link
Member

cevich commented Jul 21, 2022

all of them seem to be our fault, not the images.

This is fairly typical, and can be the seeds for months long podman PRs 😞 As Lokesh found, they can be quite overwhelming. I recommend focusing on one problem at a time and leaning on the team extensively for help. I'll take a look as well...

# >>> PLEASE REMOVE THIS ONCE CRIU GETS FIXED IN REGULAR UBUNTU!
# >>> (No, I -- Ed -- have no idea how to even check that, sorry).
# Context: https://github.com/containers/podman/pull/14972
# Context: https://github.com/checkpoint-restore/criu/issues/1935
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, thanks for the comment and links.

@cevich cevich merged commit 4f34a04 into containers:main Jul 21, 2022
cevich added a commit to cevich/buildah that referenced this pull request Jul 21, 2022
@edsantiago edsantiago deleted the ubuntu_cgroups_v1 branch July 21, 2022 19:41
edsantiago added a commit to edsantiago/libpod that referenced this pull request Jul 22, 2022
...and enable the at-test-time confirmation, the one that
double-checks that if CI requests runc we actually use runc.
This exposed a nasty surprise in our setup: there are steps to
define $OCI_RUNTIME, but that's actually a total fakeout!
OCI_RUNTIME is used only in e2e tests, it has no effect
whatsoever on actual podman itself as invoked via command
line such as in system tests. Solution: use containers.conf

Given how fragile all this runtime stuff is, I've also added
new tests (e2e and system) that will check $CI_DESIRED_RUNTIME.

Image source: containers/automation_images#146

Since we haven't actually been testing with runc, we need
to fix a few tests:

  - handle an error-message change (make it work in both crun and runc)
  - skip one system test, "survive service stop", that doesn't
    work with runc and I don't think we care.

...and skip a bunch, filing issues for each:

  - containers#15013 pod create --share-parent
  - containers#15014 timeout in dd
  - containers#15015 checkpoint tests time out under $CONTAINER
  - containers#15017 networking timeout with registry
  - containers#15018 restore --pod gripes about missing --pod
  - containers#15025 run --uidmap broken
  - containers#15027 pod inspect cgrouppath broken
  - ...and a bunch more ("podman pause") that probably don't
    even merit filing an issue.

Also, use /dev/urandom in one test (was: /dev/random) because
the test is timing out and /dev/urandom does not block. (But
the test is still timing out anyway, even with this change)

Also, as part of the VM switch we are now using go 1.18 (up
from 1.17) and this broke the gitlab tests. Thanks to @Luap99
for a quick fix.

Also, slight tweak to containers#15021: include the timeout value, and
reword message so command string is at end.

Also, fixed a misspelling in a test name.

Fixes: containers#14833

Signed-off-by: Ed Santiago <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no_prior-fedora_podman Don't build the prior-fedora_podman image no_prior-fedora Don't build any prior-fedora images
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants