Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cirrus: Update VM Images; Both Fedora and Ubuntu "prior" flavors run with CGroupsV1 & runc #8312

Merged
merged 7 commits into from
Dec 16, 2020

Conversation

cevich
Copy link
Member

@cevich cevich commented Nov 12, 2020

TODO: Open issues for disabled tests

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2020
@cevich cevich force-pushed the new_ubuntu_images branch 3 times, most recently from 7f981c6 to 675aec0 Compare November 16, 2020 20:05
@cevich
Copy link
Member Author

cevich commented Nov 17, 2020

@rhatdan
Copy link
Member

rhatdan commented Nov 17, 2020

@cevich Can we switch to using crun?

@cevich
Copy link
Member Author

cevich commented Nov 17, 2020

@rhatdan Yes, I believe we can...or at least make an attempt. I was actually looking into this last night, it seems that systemd in Ubuntu enables cgroupsv2 the same way as Fedora (just not by default).

@mheon @baude @giuseppe Q: Assuming it can be done, are all'y'all okay with the only testing of runc, occurring on "prior" Ubuntu (20.04 LTS) VMs? This implies we would drop testing with runc in the future, when the 20.04 VMs go out of style.

@rhatdan
Copy link
Member

rhatdan commented Nov 17, 2020

Well maybe at some point someone will update ubuntu version of runc. We could also switch one of the fedora tests to use runc.

@cevich
Copy link
Member Author

cevich commented Nov 17, 2020

We could also switch one of the fedora tests to use runc.

This is true...and based on past experience, much easier to obtain necessary package updates.

@cevich cevich force-pushed the new_ubuntu_images branch 4 times, most recently from 66ac642 to bc5b220 Compare December 8, 2020 15:27
@cevich cevich force-pushed the new_ubuntu_images branch 6 times, most recently from 5730d1e to 10765a3 Compare December 10, 2020 15:48
@cevich cevich changed the title WIP: Cirrus: Add support for Ubuntu 20.x Cirrus: Add support for Ubuntu 20.x Dec 10, 2020
@cevich cevich force-pushed the new_ubuntu_images branch 2 times, most recently from a8159a0 to 0ff630d Compare December 10, 2020 20:19
@cevich
Copy link
Member Author

cevich commented Dec 10, 2020

Oof! This is the best I can do, I really need help sorting through why/fixing the 11x F32 (CGv1) rootless tests are failing.

@cevich cevich force-pushed the new_ubuntu_images branch 3 times, most recently from d69b9b5 to db8f43a Compare December 11, 2020 18:57
@cevich cevich marked this pull request as ready for review December 11, 2020 19:02
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2020
@edsantiago
Copy link
Member

Thank you! #8726 is halfway through CI, maybe an hour left. I'm monitoring it closely to restart flakes quickly; will ping you once it's merged.

@cevich
Copy link
Member Author

cevich commented Dec 15, 2020

maybe an hour left.

Gee wiz, I've been saying the exact same thing on this PR...for about a month 😄

If only it were that simple...keep in mind THIS PR will then need to be rebased, and rerun all the tests again...with fingers crossed nothing got missed. I dunno if the team wants to wait that long or not.

@edsantiago
Copy link
Member

#8726 is merged!

Previously automation always dropped the minor version number for
distributions.  This was intended for presentation and conditional
simplicity. Bash does not support non-integer comparison natively.

With the release of version 20.10, supporting testing with it and
the LTS release (20.04) requires scripts to consider minor version
numbers for Ubuntu VMs.  This is necessary because many times in
the past, some behaviors needed to be conditional on the release
version number.

With this commit, the images and embedded scripts/tooling uses an
altered format of `$UBUNTU_NAME', `$PRIOR_UBUNTU_NAME`, and (crucially)
`$OS_RELEASE_VER` and `$OS_REL_VER`.  Any `.` characters appearing
in the official version (from `/etc/os-release`) are dropped, and
the result is concatenated.

For example the current Ubuntu LTS version is `20.04`.  Prior to
this commit, `$OS_RELEASE_VER` would have been `20`.  With this
change, `$OS_RELEASE_VER` will now show `2004`.  Similarly `20.10`
is shown as `2010`.

Signed-off-by: Chris Evich <[email protected]>
These tests fail with `Error: opening file `io.bfq.weight` for writing:
Permission denied: OCI permission denied`.  Upon examination of the
VMs, it was found the kernel and OS lacks support for the `BFQ`
scheduler (which supplies the `weight` option).  The only available
schedulers are `none` and `mq-deadline`.

Note: Recently updated F32 (prior-fedora) and Ubuntu 20.04
(prior-ubuntu) VMs always use CGroupsV1 with runc.  F33 and
Ubuntu 20.10 were updated to always use CGroupsV2 with crun.

Signed-off-by: Chris Evich <[email protected]>
Nearly/all of the 'podman stats' tests fail on Fedora when
executing testing inside a container, and CGroupsV1 is used on the
host.  The typical failure message is of the form `Error: unable to
load cgroup at /machine.slice/.../: cgroup deleted`.

Note: Recently updated F32 (prior-fedora) and Ubuntu 20.04
(prior-ubuntu) VMs always use CGroupsV1 with runc.  F33 and
Ubuntu 20.10 were updated to always use CGroupsV2 with crun.

Signed-off-by: Chris Evich <[email protected]>
This should be addressed by PR
containers#8685

Note: Recently updated F32 (prior-fedora) and Ubuntu 20.04
(prior-ubuntu) VMs always use CGroupsV1 with runc.  F33 and
Ubuntu 20.10 were updated to always use CGroupsV2 with crun.

Signed-off-by: Chris Evich <[email protected]>
When running as rootless, on a CgroupV1 host these tests all report:
`Error: pod stats is not supported in rootless mode without cgroups v2`

Note: Recently updated F32 (prior-fedora) and Ubuntu 20.04
(prior-ubuntu) VMs always use CGroupsV1 with runc.  F33 and
Ubuntu 20.10 were updated to always use CGroupsV2 with crun.

Signed-off-by: Chris Evich <[email protected]>
These tests simply will not work under these conditions.

Note: Recently updated F32 (prior-fedora) and Ubuntu 20.04
(prior-ubuntu) VMs always use CGroupsV1 with runc.  F33 and
Ubuntu 20.10 were updated to always use CGroupsV2 with crun.

Signed-off-by: Chris Evich <[email protected]>
@edsantiago
Copy link
Member

Tricky. I'm on it.

@edsantiago
Copy link
Member

Fix (tested on f32 cgroupsv1 VM; also on my f32 cgroupsv2 laptop):

diff --git a/test/system/600-completion.bats b/test/system/600-completion.bats
index 1e43cdc41..d1244c7d7 100644
--- a/test/system/600-completion.bats
+++ b/test/system/600-completion.bats
@@ -8,6 +8,17 @@

 load helpers

+# Returns true if we are able to podman-pause
+function _can_pause() {
+    # Even though we're just trying completion, not an actual unpause,
+    # podman barfs with:
+    #    Error: unpause is not supported for cgroupv1 rootless containers
+    if is_rootless && is_cgroupsv1; then
+        return 1
+    fi
+    return 0
+}
+
 function check_shell_completion() {
     local count=0

@@ -70,8 +81,13 @@ function check_shell_completion() {
                     ;;

                 *CONTAINER*)
+                    # podman unpause fails early on rootless cgroupsv1
+                    if [[ $cmd = "unpause" ]] && ! _can_pause; then
+                        continue 2
+                    fi
+
                     run_completion "$@" $cmd "${extra_args[@]}" ""
-                    is "$output" ".*-$random_container_name${nl}" "Found expected container in suggestions"
+                    is "$output" ".*-$random_container_name${nl}" "Found expected container in suggestions for '$cmd'"

                     match=true
                     # resume
@@ -212,7 +228,9 @@ function _check_completion_end() {
     run_podman create --name created-$random_container_name $IMAGE
     run_podman run --name running-$random_container_name -d $IMAGE top
     run_podman run --name pause-$random_container_name -d $IMAGE top
-    run_podman pause pause-$random_container_name
+    if _can_pause; then
+        run_podman pause pause-$random_container_name
+    fi
     run_podman run --name exited-$random_container_name -d $IMAGE echo exited

     # create pods for each state

downloadable diff

@Luap99 PTAL

Thanks Ed Santiago <[email protected]> for the fix.

Signed-off-by: Chris Evich <[email protected]>
@cevich
Copy link
Member Author

cevich commented Dec 16, 2020

@edsantiago tip, if you're using the WebUI: you can drag-n-drop files directly into the comment box. It will upload and automatically splat in the necessary markdown.

@edsantiago
Copy link
Member

I'm a crusty old UNIX user from the 1980s. I've heard of drag & drop, but it requires newfangled window managers or something like that which I've never bothered upgrading to. Now get off my lawn!

@cevich
Copy link
Member Author

cevich commented Dec 16, 2020

Now get off my lawn!

Nownow ---> I <---- also grew up in the 80's, but with M$ DOS...shudder. Hell, I only graduated (downgraded) from a flip-phone just a few years ago. If I can learn to clicky-clicky...so can you 😅

The truly magical part is the uploading...It seems to go into some special storage location associated with the repository, but which is otherwise inaccessible (in the WebUI). Like this:

somefile.txt

@cevich
Copy link
Member Author

cevich commented Dec 16, 2020

@edsantiago we got really lucky here. Something just fell over on the Cirrus-CI side at the end of the run. The "Success" task (a noop) normally starts within seconds, it took a very slow 1:25 to get going. Anyway, thanks for your help clearing up the system-testing issue. I recommend we don't push our luck any further before the hollow-days break. No need to subject people to Ubuntu 19, or hold up Dan's PR any longer.

@edsantiago
Copy link
Member

/lgtm
/hold

(Disclosure: I did the vast majority of review yesterday; right now I just reviewed the diffs between this and the prior PR revision). This is a large enough PR that I'd like one more pair of eyes before merging, hence the hold.

Nice work @cevich . Thank you for your patience with my intransigence.

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Dec 16, 2020
@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2020

/hold cancel
LGTM

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2020
@rhatdan
Copy link
Member

rhatdan commented Dec 16, 2020

/approve

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cevich, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2020
@openshift-merge-robot openshift-merge-robot merged commit bacb2fc into containers:master Dec 16, 2020
@cevich cevich deleted the new_ubuntu_images branch June 30, 2021 18:08
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants