Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump VMs, to Ubuntu 2204 with cgroups v1 #14972

Merged
merged 1 commit into from
Jul 22, 2022

Conversation

edsantiago
Copy link
Member

@edsantiago edsantiago commented Jul 19, 2022

...and enable the at-test-time confirmation, the one that
double-checks that if CI requests runc we actually use runc.
This exposed a nasty surprise in our setup: there are steps to
define $OCI_RUNTIME, but that's actually a total fakeout!
OCI_RUNTIME is used only in e2e tests, it has no effect
whatsoever on actual podman itself as invoked via command
line such as in system tests. Solution: use containers.conf

Given how fragile all this runtime stuff is, I've also added
new tests (e2e and system) that will check $CI_DESIRED_RUNTIME.

Image source: containers/automation_images#146

Since we haven't actually been testing with runc, we need
to fix a few tests:

  • handle an error-message change (make it work in both crun and runc)
  • skip one system test, "survive service stop", that doesn't
    work with runc and I don't think we care.

...and skip a bunch, filing issues for each:

Also, use /dev/urandom in one test (was: /dev/random) because
the test is timing out and /dev/urandom does not block. (But
the test is still timing out anyway, even with this change)

Also, as part of the VM switch we are now using go 1.18 (up
from 1.17) and this broke the gitlab tests. Thanks to @Luap99
for a quick fix.

Also, slight tweak to #15021: include the timeout value, and
reword message so command string is at end.

Also, fixed a misspelling in a test name.

Fixes: #14833

Signed-off-by: Ed Santiago [email protected]

We are now testing with runc again. Golang has been updated from 17 to 18.

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 19, 2022
@edsantiago
Copy link
Member Author

Ubuntu (cgroups v1, runc) is failing on restore with what looks like the same issue as checkpoint-restore/criu#1935:

# bin/podman --runtime runc --storage-driver vfs run -d --name foo quay.io/libpod/testimage:20220615 top
186bb39301e62a9bf3376a6b3ef0fcd77f268ce78cd1fca94fd095062679893b
# bin/podman --runtime runc --storage-driver vfs container checkpoint foo
186bb39301e62a9bf3376a6b3ef0fcd77f268ce78cd1fca94fd095062679893b
# bin/podman --runtime runc --storage-driver vfs container restore foo
Error: OCI runtime error: runc: criu failed: type NOTIFY errno 0
log file: /var/lib/containers/storage/vfs-containers/186bb39301e62a9bf3376a6b3ef0fcd77f268ce78cd1fca94fd095062679893b/userdata/restore.log
...
(00.070611) pie: 1: Preadv 0x55aa68fd5000:4096... (7 iovs)
(00.070677) pie: 1: `- returned 65536
(00.070680) pie: 1:    `- skip pagemap
(00.070682) pie: 1:    `- skip pagemap
(00.070685) pie: 1:    `- skip pagemap
(00.070687) pie: 1:    `- skip pagemap
(00.070690) pie: 1:    `- skip pagemap
(00.070692) pie: 1:    `- skip pagemap
(00.070695) pie: 1:    `- skip pagemap
(00.070757) Error (criu/cr-restore.c:1492): 219029 stopped by signal 11: Segmentation fault
(00.071136) mnt: Switching to new ns to clean ghosts
(00.071492) Error (criu/cr-restore.c:2447): Restoring FAILED.

@adrianreber any advice on how to get criu rebuilt for Ubuntu?

@edsantiago
Copy link
Member Author

Confirming evidence, I think:

# grep -R RSEQ_SIG /usr/include
/usr/include/x86_64-linux-gnu/bits/rseq.h:/* RSEQ_SIG is a signature required before each abort handler code.
/usr/include/x86_64-linux-gnu/bits/rseq.h:   RSEQ_SIG is used with the following reserved undefined instructions, which
/usr/include/x86_64-linux-gnu/bits/rseq.h:#define RSEQ_SIG        0x53053053

dpkg -l shows criu 3.16.1-2 and libc6 3.16.1-2

@adrianreber
Copy link
Collaborator

@rst0git knows how to update the obs and launchpad packages

@rst0git
Copy link
Contributor

rst0git commented Jul 19, 2022

@edsantiago I pushed an update for version 3.17.1 in OBS.

@edsantiago
Copy link
Member Author

@rst0git thank you! I see what looks like an ubuntu 2204 log indicating that it succeeded (assuming 'xUbuntu' == 'Ubuntu'). Do you have a sense for how long that will then take to get into standard Ubuntu repos?

@rst0git
Copy link
Contributor

rst0git commented Jul 19, 2022

Do you have a sense for how long that will then take to get into standard Ubuntu repos?

A simple test with Ubuntu 22.04 container indicates that the criu package has been updated:

podman run -it ubuntu:22.04 bash

apt-get update && apt-get install -y sudo curl gpg
echo 'deb http://download.opensuse.org/repositories/devel:/tools:/criu/xUbuntu_22.04/ /' | tee /etc/apt/sources.list.d/devel:tools:criu.list
curl -fsSL https://download.opensuse.org/repositories/devel:tools:criu/xUbuntu_22.04/Release.key | gpg --dearmor | tee /etc/apt/trusted.gpg.d/devel_tools_criu.gpg > /dev/null
apt update
apt install -y criu
criu --version

@edsantiago
Copy link
Member Author

Thanks again. I've restarted the VM build, which will take a few hours; then I'll need to resubmit this PR using those images (plus some runc fixes); that too will take a few hours. Progress!

@edsantiago
Copy link
Member Author

Sigh. nope. I'll try again tomorrow.

@edsantiago
Copy link
Member Author

Failing in pod create --share-parent test:

Expected
      <string>: host
  to contain substring
      <string>: 1e6d7d50e0d46c60c9906a5518d06969235bf521ea7e71db6dc153131e977e6c

The string in question is something to do with cgroups.

@cdoern this is your code, could you spare some cycles to tell me how to fix it? The problem here is that it's failing in cgroupsv1 with runc, because we haven't been testing runc. TIA.

@edsantiago
Copy link
Member Author

Failing in Remote build .containerignore filtering embedded directory with a timeout in the dd, even after I switched to /dev/urandom.

@jwhonce this is your code, could you please tell me how to fix it? Like, for instance, is it absolutely necessary for the file to be 1G? Is there something else that could be causing the timeout?

@edsantiago
Copy link
Member Author

@containers/podman-maintainers cry for help: all checkpoint tests are hanging in container environment. Everything that runs podman checkpoint or restore times out in 90s.

I can't reproduce. I've tried:

$ sudo bin/podman run --rm --privileged --cgroupns=host -v $(pwd):/home/podman -v /dev/fuse:/dev/fuse -it quay.io/libpod/fedora_podman:c6706201604915200 bash
[root@9fcf6ce7a5a1 podman]# pm() { /home/podman/bin/podman --network-backend netavark --storage-driver vfs --cgroup-manager cgroupfs --events-backend file "$@"; }

[root@9fcf6ce7a5a1 podman]# pm all-sorts-of-things

Maybe I'm missing some magic option.

Two requests:

  1. Can someone please tell me how to fix this?
  2. Can some Go expert please fix this stupid code so it actually emits useful output? https://github.com/containers/podman/blob/b5612df55060317bcef92de48e5b2cd400374814/test/utils/utils.go#L367-L372

Right now that fails with "command timed out but I'm not going to tell you what command it was nor what its output was". I think it might be slightly more helpful to say "this was the command that timed out, and this was the output I got up to this point". I can't figure it out from the GInkgo docs, and have spent much too long on it already.

Thank you!

@edsantiago
Copy link
Member Author

Sigh, another one. @containers/podman-maintainers who owns the gitlab tests? They're failing hard:

Obtaining necessary gitlab-runner testing bits
+ ssh some12100dude@localhost -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o CheckHostIP=no env GOPATH=/var/tmp/go go get -u github.com/jstemmer/go-junit-report  # /var/tmp/go/src/github.com/containers/podman/./contrib/cirrus/setup_environment.sh:353 in main()
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
go: go.mod file not found in current directory or any parent directory.
	'go get' is no longer supported outside a module.
	To build and install a command, use 'go install' with a version,
	like 'go install example.com/cmd@latest'
	For more information, see https://golang.org/doc/go-get-install-deprecation
	or run 'go help get' or 'go help install'.

...and I bet a nickel it's because of the switch go go 1.18. This is not something I'm going to try to solve right now, sorry.

@edsantiago
Copy link
Member Author

Another new one having to do with checkpoint/restore and pods, I think? log. Symptom:

 # podman-remote [options] container restore --pod POD-SHA -i /tmp/checkpoint-bar.tar.gz
  Error: cannot restore pod container without --pod

@edsantiago
Copy link
Member Author

And another hard failure, timed out waiting for port XXXX, ubuntu remote only:

Expected
               <*fmt.wrapError | 0xc000a35440>: {
                   msg: "error running \"podman-registry\": : `podman-registry -i docker-archive:/tmp/quay.io-libpod-registry-2.6.tar start` failed: Getting image source signatures\nCopying blob sha256:9d08b7a37338462dc618dd3c25c2c4f713fe3e833f75561950a92bd9130f77ac\nCopying blob sha256:d91b7ec2cc52de561af295d6ea89c55c0c891f8728e65d794e96422f9e815ca7\nCopying blob sha256:849b4a0a6bf5273a3ea90c036c62439d058946f28bb4ad4a8d19255d6237f475\nCopying blob sha256:0341148c78bcafac0ba6f9e0b9056f91af7a380ac1b67ca8d770bdd4eb532bb2\nCopying blob sha256:7444ea29e45e927abea1f923bf24cac20deaddea603c4bb1c7f2f5819773d453\nCopying config sha256:10b45af23ff36baa99dda944a461425494a4bd103f3d4361d30e929f13aa8dda\nWriting manifest to image destination\nStoring signatures\npodman-registry: Timed out waiting for port 5353\n  (exit status 1)",
                   err: <*errors.errorString | 0xc0008971c0>{
                       s: "`podman-registry -i docker-archive:/tmp/quay.io-libpod-registry-2.6.tar start` failed: Getting image source signatures\nCopying blob sha256:9d08b7a37338462dc618dd3c25c2c4f713fe3e833f75561950a92bd9130f77ac\nCopying blob sha256:d91b7ec2cc52de561af295d6ea89c55c0c891f8728e65d794e96422f9e815ca7\nCopying blob sha256:849b4a0a6bf5273a3ea90c036c62439d058946f28bb4ad4a8d19255d6237f475\nCopying blob sha256:0341148c78bcafac0ba6f9e0b9056f91af7a380ac1b67ca8d770bdd4eb532bb2\nCopying blob sha256:7444ea29e45e927abea1f923bf24cac20deaddea603c4bb1c7f2f5819773d453\nCopying config sha256:10b45af23ff36baa99dda944a461425494a4bd103f3d4361d30e929f13aa8dda\nWriting manifest to image destination\nStoring signatures\npodman-registry: Timed out waiting for port 5353\n  (exit status 1)",
                   },
               }

Seems networking-related. Are netavark et al up-to-date on Ubuntu?

@edsantiago
Copy link
Member Author

Another sigh. containerized failed again, with checkpoint hangs as expected. But I instrumented the timeout code so it would dump stdout/stderr... and nothing.

@edsantiago
Copy link
Member Author

Good news is, int non-remote ubuntu tests all pass (root and rootless), indicating that we have a working criu again.

@Luap99
Copy link
Member

Luap99 commented Jul 21, 2022

Sigh, another one. @containers/podman-maintainers who owns the gitlab tests? They're failing hard:

Obtaining necessary gitlab-runner testing bits
+ ssh some12100dude@localhost -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o CheckHostIP=no env GOPATH=/var/tmp/go go get -u github.com/jstemmer/go-junit-report  # /var/tmp/go/src/github.com/containers/podman/./contrib/cirrus/setup_environment.sh:353 in main()
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
go: go.mod file not found in current directory or any parent directory.
	'go get' is no longer supported outside a module.
	To build and install a command, use 'go install' with a version,
	like 'go install example.com/cmd@latest'
	For more information, see https://golang.org/doc/go-get-install-deprecation
	or run 'go help get' or 'go help install'.

...and I bet a nickel it's because of the switch go go 1.18. This is not something I'm going to try to solve right now, sorry.

I don't know anything about the gitlab test but go 1.18 uses module mode by default and without go.mod this fails. The correct way to install things is go install not go get -u

@Luap99
Copy link
Member

Luap99 commented Jul 21, 2022

@edsantiago This also pins the version to prevent breaking PRs when the tool is updated:

diff --git a/contrib/cirrus/setup_environment.sh b/contrib/cirrus/setup_environment.sh
index 4952f8dd2..e2d0c655a 100755
--- a/contrib/cirrus/setup_environment.sh
+++ b/contrib/cirrus/setup_environment.sh
@@ -351,7 +351,7 @@ case "$TEST_FLAVOR" in
         slug="gitlab.com/gitlab-org/gitlab-runner"
         helper_fqin="registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest-pwsh"
         ssh="ssh $ROOTLESS_USER@localhost -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o CheckHostIP=no env GOPATH=$GOPATH"
-        showrun $ssh go get -u github.com/jstemmer/go-junit-report
+        showrun $ssh go install github.com/jstemmer/go-junit-report/[email protected]
         showrun $ssh git clone https://$slug $GOPATH/src/$slug
         showrun $ssh make -C $GOPATH/src/$slug development_setup
         showrun $ssh bash -c "'cd $GOPATH/src/$slug && GOPATH=$GOPATH go get .'"

Although I guess it would make more sense to directly install this into the VM images to reduce downloads and compile time at runtime.

@Luap99
Copy link
Member

Luap99 commented Jul 21, 2022

And another hard failure, timed out waiting for port XXXX, ubuntu remote only:

Expected
               <*fmt.wrapError | 0xc000a35440>: {
                   msg: "error running \"podman-registry\": : `podman-registry -i docker-archive:/tmp/quay.io-libpod-registry-2.6.tar start` failed: Getting image source signatures\nCopying blob sha256:9d08b7a37338462dc618dd3c25c2c4f713fe3e833f75561950a92bd9130f77ac\nCopying blob sha256:d91b7ec2cc52de561af295d6ea89c55c0c891f8728e65d794e96422f9e815ca7\nCopying blob sha256:849b4a0a6bf5273a3ea90c036c62439d058946f28bb4ad4a8d19255d6237f475\nCopying blob sha256:0341148c78bcafac0ba6f9e0b9056f91af7a380ac1b67ca8d770bdd4eb532bb2\nCopying blob sha256:7444ea29e45e927abea1f923bf24cac20deaddea603c4bb1c7f2f5819773d453\nCopying config sha256:10b45af23ff36baa99dda944a461425494a4bd103f3d4361d30e929f13aa8dda\nWriting manifest to image destination\nStoring signatures\npodman-registry: Timed out waiting for port 5353\n  (exit status 1)",
                   err: <*errors.errorString | 0xc0008971c0>{
                       s: "`podman-registry -i docker-archive:/tmp/quay.io-libpod-registry-2.6.tar start` failed: Getting image source signatures\nCopying blob sha256:9d08b7a37338462dc618dd3c25c2c4f713fe3e833f75561950a92bd9130f77ac\nCopying blob sha256:d91b7ec2cc52de561af295d6ea89c55c0c891f8728e65d794e96422f9e815ca7\nCopying blob sha256:849b4a0a6bf5273a3ea90c036c62439d058946f28bb4ad4a8d19255d6237f475\nCopying blob sha256:0341148c78bcafac0ba6f9e0b9056f91af7a380ac1b67ca8d770bdd4eb532bb2\nCopying blob sha256:7444ea29e45e927abea1f923bf24cac20deaddea603c4bb1c7f2f5819773d453\nCopying config sha256:10b45af23ff36baa99dda944a461425494a4bd103f3d4361d30e929f13aa8dda\nWriting manifest to image destination\nStoring signatures\npodman-registry: Timed out waiting for port 5353\n  (exit status 1)",
                   },
               }

Seems networking-related. Are netavark et al up-to-date on Ubuntu?

Likely still some parallel cni and netavark use?

@edsantiago
Copy link
Member Author

Likely still some parallel cni and netavark use?

Well, the failure is 100% and it's been happening since my first iteration on this PR. That is: I've never seen it pass. That suggests something harder than a race condition.

@Luap99
Copy link
Member

Luap99 commented Jul 21, 2022

Likely still some parallel cni and netavark use?

Well, the failure is 100% and it's been happening since my first iteration on this PR. That is: I've never seen it pass. That suggests something harder than a race condition.

I am not saying it is a race, just that something is still using CNI while the e2e test use netavark, check the iptables output to be sure.

@cdoern
Copy link
Contributor

cdoern commented Jul 21, 2022

...and, re-pushing because #14876 merged and nobody's going to look at this PR until tomorrow anyway.

Getting my test writing skills ready for whatever 14867 introduced... I think I avoided most of the issues.

@edsantiago
Copy link
Member Author

@cdoern your PR includes lots of "bps" strings, so I'm guessing all these new bps failures are related. I'm outta here for the day (LONG DAY) but would appreciate your help. TIA.

@edsantiago edsantiago force-pushed the ubuntu_cgroups_v1 branch 2 times, most recently from 0e600b3 to 6823f5c Compare July 22, 2022 00:19
@cdoern
Copy link
Contributor

cdoern commented Jul 22, 2022

@cdoern your PR includes lots of "bps" strings, so I'm guessing all these new bps failures are related. I'm outta here for the day (LONG DAY) but would appreciate your help. TIA.

sorry about that, the good news is these failures are cropping up with both iops and bps devices so I can narrow it down. will keep you updated

@edsantiago
Copy link
Member Author

Don't sweat it. I really want to get this merged, because I don't want to be here on Friday, so I've resubmitted with Skips which you can then fix at your leisure.

...and enable the at-test-time confirmation, the one that
double-checks that if CI requests runc we actually use runc.
This exposed a nasty surprise in our setup: there are steps to
define $OCI_RUNTIME, but that's actually a total fakeout!
OCI_RUNTIME is used only in e2e tests, it has no effect
whatsoever on actual podman itself as invoked via command
line such as in system tests. Solution: use containers.conf

Given how fragile all this runtime stuff is, I've also added
new tests (e2e and system) that will check $CI_DESIRED_RUNTIME.

Image source: containers/automation_images#146

Since we haven't actually been testing with runc, we need
to fix a few tests:

  - handle an error-message change (make it work in both crun and runc)
  - skip one system test, "survive service stop", that doesn't
    work with runc and I don't think we care.

...and skip a bunch, filing issues for each:

  - containers#15013 pod create --share-parent
  - containers#15014 timeout in dd
  - containers#15015 checkpoint tests time out under $CONTAINER
  - containers#15017 networking timeout with registry
  - containers#15018 restore --pod gripes about missing --pod
  - containers#15025 run --uidmap broken
  - containers#15027 pod inspect cgrouppath broken
  - ...and a bunch more ("podman pause") that probably don't
    even merit filing an issue.

Also, use /dev/urandom in one test (was: /dev/random) because
the test is timing out and /dev/urandom does not block. (But
the test is still timing out anyway, even with this change)

Also, as part of the VM switch we are now using go 1.18 (up
from 1.17) and this broke the gitlab tests. Thanks to @Luap99
for a quick fix.

Also, slight tweak to containers#15021: include the timeout value, and
reword message so command string is at end.

Also, fixed a misspelling in a test name.

Fixes: containers#14833

Signed-off-by: Ed Santiago <[email protected]>
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Restarted the flake. Thanks everybody!

@Luap99
Copy link
Member

Luap99 commented Jul 22, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2022
@openshift-merge-robot openshift-merge-robot merged commit 99bf6f9 into containers:main Jul 22, 2022
@edsantiago edsantiago deleted the ubuntu_cgroups_v1 branch July 22, 2022 12:07
@edsantiago
Copy link
Member Author

edsantiago commented Jul 22, 2022

Thanks everyone.

There may be in-flight PRs that break cgroups V1, such that once they merge they will break CI for everyone. Please consider that during your PR reviews over the next week, and please suggest that everyone rebase.

@cevich
Copy link
Member

cevich commented Jul 22, 2022

Thanks @edsantiago for your efforts here, so much fun isn't it 😉 Seriously though, I'm happy to see you successfully navigated building new VM images and integrating them into podman CI. I know there's some testing fallout from this merging, however I don't believe I got an answer to my update question: Could you help me update these VM images in c/skopeo, c/image, and c/storage? I've got one going for buildah already.

@edsantiago
Copy link
Member Author

@cevich I'm sorry, I seem to have missed that question, and can't even find it in my email archives. If I understand your reference correctly, you would like me to submit .cirrus.yml PRs for those three projects, updating the cXXX number? If so, would you like me to wait until containers/automation_images#157 merges instead?

If I misunderstood, could you please point me at your original request, or help me understand? Again, I'm sorry for not being able to find it.

@cevich
Copy link
Member

cevich commented Jul 25, 2022

submit .cirrus.yml PRs for those three projects, updating the cXXX number? If so, would you like me to wait until containers/automation_images#157 merges instead?

Yes please. There's no need to wait on 157, it doesn't change anything in the VM images, only a few tooling containers. There are 14-repos total that use GCE images, and probably 8 desperately in need of an update. Most of the time once podman CI is passing, it goes pretty smoothly, though the Ubuntu version and runc update could cause some hiccups. So help with this is very much appreciated.

  • aardvark-dns (Good enough)
  • buildah (DONE)
  • BuildSourceImage
  • common
  • conmon
  • dnsname
  • image (I got it)
  • netavark (Good enough)
  • oci-seccomp-bpf-hook
  • podman (DONE)
  • podman-py (I got it)
  • skopeo (I got it)
  • storage (I got it)
  • udica (I got it)

If I misunderstood, could you please point me at your original request, or help me understand? Again, I'm sorry for not being able to find it.

No worries, comments get eaten by github sometimes, and it's not like we receive a small amount of github mail.
Found it (it was burried): #14972 (comment)

@edsantiago
Copy link
Member Author

Well, we need to rebuild VMs anyway (#15025), so I'll wait until that's fixed.

I tried Re-run on your PR, it still fails (same criu bug on ubuntu), so no point in doing anything until that's fixed.

@cevich
Copy link
Member

cevich commented Jul 25, 2022

Ugh, -sigh-, okay, I guess I thought it would be simple this time 😢 Okay, I'll /hold all the PR's I already opened.

edsantiago added a commit to edsantiago/libpod that referenced this pull request Aug 16, 2022
Two fixes done in containers#14972 (the "oops test under runc again" PR
which was not backported into 4.2):

 - "survive service stop" - skip. Test is only applicable
   under crun.
 - "volume exec/noexec" - update the expected error message

One hail-mary fix for a test failure seen in RHEL87 gating:

 - "nonexistent labels" - slight tweak to expected error message

None of these fixes will actually be tested in CI, because v4.2
does not run any runc tests. We'll have to wait and see what
happens on the next RHEL build.

Signed-off-by: Ed Santiago <[email protected]>
edsantiago added a commit to edsantiago/libpod that referenced this pull request Aug 18, 2022
This exposed a nasty bug in our system-test setup: Ubuntu (runc)
was writing a scratch containers.conf file, and setting CONTAINERS_CONF
to point to it. This was well-intentionedly introduced in containers#10199 as
part of our long sad history of not testing runc. What I did not
understand at that time is that CONTAINERS_CONF is **dangerous**:
it does not mean "I will read standard containers.conf and then
override", it means "I will **IGNORE** standard containers.conf
and use only the settings in this file"! So on Ubuntu we were
losing all the default settings: capabilities, sysctls, all.

Yes, this is documented in containers.conf(5) but it is such
a huge violation of POLA that I need to repeat it.

In containers#14972, as yet another attempt to fix our runc crisis, I
introduced a new runc-override mechanism: create a custom
/etc/containers/containers.conf when OCI_RUNTIME=runc.
Unlike the CONTAINERS_CONF envariable, the /etc file
actually means what you think it means: "read the default
file first, then override with the /etc file contents".
I.e., we get the desired defaults. But I didn't remember
this helpers.bash workaround, so our runc testing has
actually been flawed: we have not been testing with
the system containers.conf. This commit removes the
no-longer-needed and never-actually-wanted workaround,
and by virtue of testing the cap-drops in kube generate,
we add a regression test to make sure this never happens
again.

It's a little scary that we haven't been testing capabilities.

Also scary: this PR requires python, for converting yaml to json.
I think that should be safe: python3 'import yaml' and 'json'
works fine on a RHEL8.7 VM from 1minutetip.

Signed-off-by: Ed Santiago <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: runc is not being tested
8 participants