Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cirrus,ci: default to overlay if using vfs #18822

Merged
merged 1 commit into from
Jun 9, 2023

Conversation

flouthoc
Copy link
Collaborator

@flouthoc flouthoc commented Jun 8, 2023

In debian environment we are hitting an edge-case where older buildah version is not compatible with newer podman version because both of them are using different storage driver.

I.e

  • Podmand defaults to native overlay.
  • Older buildah version defaults to vfs.

See discussions below for more details

Does this PR introduce a user-facing change?

cirrus,ci: default to `overlay` for debian env

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 8, 2023
@flouthoc flouthoc force-pushed the debian-use-overlay branch 5 times, most recently from d1a406d to c292a39 Compare June 8, 2023 06:09
@flouthoc
Copy link
Collaborator Author

flouthoc commented Jun 8, 2023

@Luap99 @cevich @edsantiago @nalind PTAL

@flouthoc flouthoc force-pushed the debian-use-overlay branch 2 times, most recently from 77abac0 to c632ac5 Compare June 8, 2023 08:16
contrib/cirrus/test.conf Outdated Show resolved Hide resolved
contrib/cirrus/runner.sh Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the debian-use-overlay branch from c632ac5 to 5f4d924 Compare June 8, 2023 09:21
@flouthoc flouthoc requested a review from Luap99 June 8, 2023 09:22
@flouthoc flouthoc force-pushed the debian-use-overlay branch from 5f4d924 to 7f638f0 Compare June 8, 2023 10:01
@edsantiago edsantiago added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 8, 2023
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is frighteningly complicated and duplicatey. Can it be done like this instead?

diff --git a/contrib/cirrus/setup_environment.sh b/contrib/cirrus/setup_environment.sh
index f3068787c..b3da90147 100755
--- a/contrib/cirrus/setup_environment.sh
+++ b/contrib/cirrus/setup_environment.sh
@@ -100,6 +100,17 @@ esac
 # shellcheck disable=SC2154
 printf "[engine]\ndatabase_backend=\"$CI_DESIRED_DATABASE\"\n" > /etc/containers/containers.conf.d/92-db.conf
 
+# Blah blah include a FIXME comment to reevaluate after (e.g.) 2023-12-31
+buildah_storage=$(buildah info --format '{{.store.GraphDriverName}}')
+if [[ "$buildah_storage" != "overlay" ]]; then
+    conf=/etc/containers/storage.conf
+    if [[ -e $conf ]]; then
+        die "FATAL! INTERNAL ERROR! Cannot override $conf"
+    fi
+    msg "Overriding $conf, setting overlay (was: $buildah_storage)"
+    printf "...driver=overlay\n" >$conf
+fi
+
 if ((CONTAINER==0)); then  # Not yet running inside a container
     # Discovered reemergence of BFQ scheduler bug in kernel 5.8.12-200
     # which causes a kernel panic when system is under heavy I/O load.

@flouthoc flouthoc force-pushed the debian-use-overlay branch from 7f638f0 to 82f00ae Compare June 8, 2023 13:06
@flouthoc
Copy link
Collaborator Author

flouthoc commented Jun 8, 2023

@edsantiago Thanks looks much better, I have a doubt about if [[ -e $conf ]]; then check, shouldn't we just create it instead of breaking here ?

Also I previously tried setting similar check in setup_environment.sh before but I was getting weird build errors, maybe I was trying it in the wrong place. I hope it should pass this time.

@edsantiago
Copy link
Member

edsantiago commented Jun 8, 2023

/etc/containers/storage.conf should never exist on a CI system. If it does, that means someone has done something special to change defaults -- and if we overwrite it, we will be clobbering that and changing expectations. I believe we should panic early and loudly if this happens.

PS f38 flake was unlinkat/EBUSY. I restarted.

if [[ "$buildah_storage" != "overlay" ]]; then
conf=/etc/containers/storage.conf
msg "Overriding $conf, setting overlay (was: $buildah_storage)"
printf '[storage]\ndriver = "overlay"\nrunroot = "/run/containers/storage"\ngraphroot = "/var/lib/containers/storage"\n' >$conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all the other things (runroot, graphroot) needed?

And, please reinstate the check for existing storage.conf, thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| Why are all the other things (runroot, graphroot) needed?

@edsantiago When default file is selected and runroot is empty we hit error https://github.com/containers/storage/blob/main/types/options.go#L159 here, this needs to be fixed on c/storage I'll open a different PR on c/storage for this.

@flouthoc flouthoc force-pushed the debian-use-overlay branch from 82f00ae to ae2faa2 Compare June 8, 2023 15:26
@cevich
Copy link
Member

cevich commented Jun 8, 2023

Since this only applies to debian (and likely, only for a short while), consider moving this block under the debian-host setup case - around line 142-147.

Also, I'm guessing this only impacts a very specific and known range of buildah versions. Perhaps consider adding a TODO comment indicating this change may be removed once buildah version XYZ is present in CI VM images.

@flouthoc flouthoc requested a review from edsantiago June 8, 2023 15:27
@flouthoc flouthoc changed the title cirrus,ci: default to overlay for debian env cirrus,ci: default to overlay if using vfs Jun 8, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 8, 2023
@flouthoc flouthoc force-pushed the debian-use-overlay branch from ae2faa2 to c0cf74e Compare June 8, 2023 15:40
@flouthoc
Copy link
Collaborator Author

flouthoc commented Jun 8, 2023

@cevich done added a note.

@edsantiago
Copy link
Member

LGTM but let's wait for CI on #13808

@edsantiago
Copy link
Member

CI still failing in what looks like exactly the same way, in buildah treadmill, even with your PR cherrypicked. I'll try to rerun in terminal and figure out what happened.

# for more details.
# TODO: remove this once all CI VM have newer buildah version. (i.e where buildah
# does not defaults to using `vfs` as storage driver)
buildah_storage=$(buildah info --format '{{.store.GraphDriverName}}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When run as root, this always returns overlay. I'm sorry, I didn't know that. I guess it's OK to revert to checking for "debian", unless someone can think of a better test.

@flouthoc flouthoc force-pushed the debian-use-overlay branch from c0cf74e to 21694bf Compare June 8, 2023 18:02
@flouthoc
Copy link
Collaborator Author

flouthoc commented Jun 8, 2023

@edsantiago I reverted to old commit, could you cherry-pick this one ?

@edsantiago
Copy link
Member

gah, no, that one is really not acceptable. I am testing a simpler fix in #13808 right now.

@edsantiago
Copy link
Member

just stay tuned. Once #13808 passes I will comment here. (Probably very, very late your time, so just check back in tomorrow)

@flouthoc flouthoc force-pushed the debian-use-overlay branch from 21694bf to ec2034b Compare June 8, 2023 18:05
@edsantiago
Copy link
Member

edsantiago commented Jun 8, 2023

Good news, bad news.

Good news: debian rootless passed.

Bad news: new failure seen in remote debian root and also remote f38 aarch64 root:

<+009ms> # # podman-remote --url unix:/tmp/podman_tmp_9HtW pod rm c3cc31049b088d86ef22ad5b279ce5c1d01b88887116bc9f6486b9b0a28f2080
<+104ms> # Error: 2 errors occurred:
         # 	* removing container 0c7966924a988d4e00f2585da31627a0db0621e6fbd6531cea4d02779e3bcfe2 from pod c3cc31049b088d86ef22ad5b279ce5c1d01b88887116bc9f6486b9b0a28f2080: cleaning up storage: removing container 0c7966924a988d4e00f2585da31627a0db0621e6fbd6531cea4d02779e3bcfe2 root filesystem: unmounting "/var/lib/containers/storage/overlay/4e8aaf7a108d0bf91f873379b4558162620f9e4ffaca942d06ef4d40778456bc/merged": invalid argument
         # 	* removing container 55ade4c16aa0fffb758fc893be4d24c8fdaabcde14eec89df7885b42c4b03881 from pod c3cc31049b088d86ef22ad5b279ce5c1d01b88887116bc9f6486b9b0a28f2080: cleaning up storage: removing container 55ade4c16aa0fffb758fc893be4d24c8fdaabcde14eec89df7885b42c4b03881 root filesystem: unmounting "/var/lib/containers/storage/overlay/1ef8c38e52bd1d345801a0a4a7b2e0631b85965881be472dcb6db0afd6c3a65b/merged": invalid argument

After that, everything is hosed: nothing ever works again, all tests fail.

It looks a heck of a lot like unlinkat/EBUSY, except this one is unmounting/EINVAL. I'm debating right now if I should re-run them. UPDATE: yep, they were flakes. Flakes I've never seen before, hence, probably a problem with the new vendoring and probably something we'll be seeing soon in podman.

@edsantiago
Copy link
Member

@flouthoc here's a patch you can use. Please apply it exactly precisely 100% as it is there, no edits. That will guarantee that it will pass (modulo flakes) in the treadmill.

In debian environment we are hitting an edge-case where older buildah
version is not compatible with newer podman version because both of them
are using different storage driver.

I.e
* Podmand defaults to native `overlay`.
* Older buildah version defaults to `vfs`.

See discussions below for more details
* containers#18510 (comment)

Co-authored-by: Ed Santiago <[email protected]>
Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the debian-use-overlay branch from ec2034b to 02432fc Compare June 9, 2023 05:14
@flouthoc flouthoc requested a review from edsantiago June 9, 2023 05:14
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 9, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, flouthoc, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Luap99,edsantiago,flouthoc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2023
@Luap99
Copy link
Member

Luap99 commented Jun 9, 2023

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2023
@openshift-merge-robot openshift-merge-robot merged commit 6db512d into containers:main Jun 9, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 8, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants