Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build multi-arch buildah images #2958

Closed
barthy1 opened this issue Feb 3, 2021 · 30 comments · Fixed by #3006
Closed

Build multi-arch buildah images #2958

barthy1 opened this issue Feb 3, 2021 · 30 comments · Fixed by #3006

Comments

@barthy1
Copy link
Contributor

barthy1 commented Feb 3, 2021

This issue is created to track the discussion about best ways to build multi-arch images for buildah.
So far several CI tools were discussed:

  • Cirrus-CI
    • default CI tool for buildah
    • no hardware available, except x86_64
    • no docker buildx from the box for hardware emulation in builds

Conclusion: Needs more investigation, probably buildah bud --arch .. is a solution, still needs to setup everything from scratch

  • Travis
    • no use in buildah repo
    • has real hardware to run the builds or tests
    • have limitation in free usage
    • limited to physically available hardware to support(amd64, arm64, ppc64le, s390x)
    • has already multi-arch build setup in skopeo to use as example

Conclusion: depends on community opinion?? if Travis should be back on the table for buildah

  • Github Actions
    • no use in buildah repo
    • no hardware available, except x86_64
    • has "library" to run multi-arch builds in emulation mode

Conclusion: easy to use, but new CI tool

Long term approach - Cirrus-CI should be the right way to do everything.
Short term approach - probably, Github Actions as easiest way to prepare the code.

Start of discussion - containers/skopeo#1104 (comment)

cc @cevich @TomSweeneyRedHat @rhatdan

@cevich
Copy link
Member

cevich commented Feb 3, 2021

Re: Cirrus-CI - we 100% control the contents of the VM images and the speed/size of the VMs. The only restrictions are those imposed by the underlying cloud (currently GCP). Adding packages is pretty easy. The only possible exception are images for MacOS and Windows tasks, we use the default ones provided by the platform.

@cevich
Copy link
Member

cevich commented Feb 3, 2021

Re: Travis - The multi-arch hardware is really the only plus. The Ubuntu-centric-ness and slowness/limitations are HUGE detractors.

@cevich
Copy link
Member

cevich commented Feb 3, 2021

Re: Github Actions - Biggest attraction is the "slick" containers-based workflow. IMHO, this is also biggest detractor - it's not very portable to other CI systems. Secondarily, it's a relatively immature platform (some strange/quirky "features" and limitations), and I fear (given time) they may start imposing more heavy-handed usage limitations like Travis.

@cevich
Copy link
Member

cevich commented Feb 3, 2021

Oh! Another option here could be to use OBS, IIRC they do have native hardware available...but I have zero knowledge of how it works. @vrothberg knows more about this.

@rhatdan
Copy link
Member

rhatdan commented Feb 3, 2021

The issue with buildah bud --arch ...
Is that downloading the dnf/yum xml content and processing it can take forever in emulation mode. Using ubi8 images is much better then Fedora for this, since it has much smaller differences between the image and the latest updates.

@cevich
Copy link
Member

cevich commented Feb 3, 2021

Dang. Those are the kinds of things that seem to get to be slow *and arrive at 99% before suddenly failing.

Maybe is there an arch-sensitive dnf downloader we could use to pre-populate all/most of the needed packages locally, before running the build? Makes for an "interesting" Containerfile...but might drastically improve performance.

@rhatdan
Copy link
Member

rhatdan commented Feb 3, 2021

Would love to talk to the dnf guys here to see if this is something that could be done on local arch and mounted into the container.

@cevich
Copy link
Member

cevich commented Feb 4, 2021

On F32, man dnf download shows that it DOES accept a --arch option along with a --resolve option to make it also grab dependencies.

edit: The --resolve option operates WRT the local system, so that probably won't be very helpful. So before using dnf download we would need to be able to feed it a complete dep-solved list of packages 😖

another edit: There is a --forcearch general option to dnf, but it specifically says it requires QEMU emulation to operate properly. ugh! Exactly the problem we want to avoid 😖

@cevich
Copy link
Member

cevich commented Feb 4, 2021

Oh! Wait, maybe the docs are lying, because I found --downloadonly with --forcearch does seem to work as long as you also specify --installroot, no emulation required. To proove QEMU need not be involved, it can be done using a F33 (latest) container. I was able to cache over 200 dependencies for s390x podman package (F30 release chosen at random):

[cevich@localhost ~]$
[cevich@localhost ~]$ IROOT=/tmp/s390_root
[cevich@localhost ~]$ VCD="$IROOT/var/cache/dnf"
[cevich@localhost ~]$ mkdir -p $VCD
[cevich@localhost ~]$ $PRUN dnf install -y --releasever=30 --forcearch=s390x --installroot=$IROOT --downloadonly podman
Fedora 30 openh264 (From Cisco) - s390x               5.4 kB/s | 4.3 kB     00:00
Fedora Modular 30 - s390x                             1.3 MB/s | 2.6 MB     00:01
Fedora Modular 30 - s390x - Updates                   5.3 MB/s | 4.1 MB     00:00
Fedora 30 - s390x - Updates                           8.9 MB/s |  21 MB     00:02
Fedora 30 - s390x                                     9.1 MB/s |  64 MB     00:07
Dependencies resolved.
======================================================================================
 Package                     Arch   Version                             Repo     Size
======================================================================================
Installing:
 podman                      s390x  2:1.8.0-4.fc30                      updates  13 M
Installing dependencies:
 acl                         s390x  2.2.53-3.fc30                       fedora   67 k
...cut...
 trousers                    s390x  0.3.13-12.fc30                      fedora  127 k

Transaction Summary
======================================================================================
Install  231 Packages

Total download size: 167 M
Installed size: 737 M
DNF will only download packages for the transaction.
Downloading Packages:
(1/231): audit-libs-3.0-0.15.20191104git1c2f876.fc30. 758 kB/s | 103 kB     00:00
...cut...
(231/231): tar-1.32-1.fc30.s390x.rpm                  629 kB/s | 855 kB     00:01
--------------------------------------------------------------------------------------
Total                                                 7.3 MB/s | 167 MB     00:22
warning: /tmp/s390_root/var/cache/dnf/updates-297edde6bb84db09/packages/audit-libs-3.0-0.15.20191104git1c2f876.fc30.s390x.rpm: Header V3 RSA/SHA256 Signature, key ID cfc659b9: NOKEY
Fedora 30 - s390x - Updates                           1.6 MB/s | 1.6 kB     00:00
Importing GPG key 0xCFC659B9:
 Userid     : "Fedora (30) <[email protected]>"
 Fingerprint: F1D8 EC98 F241 AAF2 0DF6 9420 EF3C 111F CFC6 59B9
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-30-s390x
Key imported successfully
Complete!
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
[cevich@localhost ~]$ ls $IROOT
var
[cevich@localhost ~]$ du -shxc $VCD
357M    /tmp/s390_root/var/cache/dnf
357M    total
[cevich@localhost ~]$

So, if we could do something like that to seed the cache (even w/o using a container like I did), then run the emulated buildah bud -v $VCD:$VCD:Z --arch... it should pickup packages from cache. Bonus: No need to clean the dnf cache in a RUN command inside the Dockerfile, since the volume-mount will simply be absent at runtime.

@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2021

Nice work, we have -v $VCD:$VCD:O for this use case, which should handle everything perfectly.

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 5, 2021

From my side, I finished sample Github action to build buildah stable version.
If you are interested, you can have a look - https://github.com/barthy1/buildah/blob/mult_arch_build/.github/workflows/multi-arch-build.yaml

Build and push for 4 archs took 36m https://github.com/barthy1/buildah/runs/1839925493?check_suite_focus=true

@cevich
Copy link
Member

cevich commented Feb 10, 2021

@barthy1 my apologies for the slow attention, my attention is divided across many fronts 😞 Looking now...

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 10, 2021

@cevich np, it's not a PR so far. Just some demonstration of build Github action.

@cevich
Copy link
Member

cevich commented Feb 10, 2021

...some minor suggestions:

Line 35: Use date -u --iso-8601=seconds I think would improve readability.
Line 58: This looks like some kind of conditional to me, but is confusing, would it be clearer if done as an if: on the step?

Overall it looks really good actually. The 36-minute full-build time isn't that bad actually, considering it would only happen on branches and cron-schedule (i.e. nobody's waiting for it). Unless you're working on something further, this looks ready to me for a PR. I'm happy to setup the robot accounts and github-action secrets once we have a PR.

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 10, 2021

Line 58: This looks like some kind of conditional to me, but is confusing, would it be clearer if done as an if: on the step?

This condition is done actually to push the images only when it's on branches and cron-schedule. For other case it would be just build (to make sure that build actually works) without push. The end of your comment makes me think that it's better to run this step only when we want to push (if:) and skip it for other cases (to avoid waiting problem).

Unless you're working on something further, this looks ready to me for a PR. I'm happy to setup the robot accounts and github-action secrets once we have a PR.

I am going to add one more part to current setup to build upstream buildah version(in addition to stable one) in parallel. After that I will submit PR and ping you regarding secrets.

@cevich
Copy link
Member

cevich commented Feb 10, 2021

and skip it for other cases (to avoid waiting problem).

Thanks for the clarification. Ya, I think that might be beneficial, people like waiting even less than machines 😄

I am going to add one more part to current setup to build upstream buildah version(in addition to stable one)

This is where things with github-actions can make people loose their minds. IIRC, some workflows (like cron) only operate from the worflow yaml on the default branch. Others (pull request) will run on branches (if the yaml is there). Regardless, when a new release of buildah happens, the workflow yaml will appear on multiple branches, but only sometimes be used 😖

I would suggest splitting up the YAML into multiple files, based on the different on conditions. However last I checked, there's no way to re-use steps from one workflow in another, so this would lead to lots of ugly duplication 😖 Anyway, point is, I think initially it's perfectly fine to have a PR just focus on getting the latest-master working. Leave the other branches up to a future PR - it will likely need lots and lots of comments covering necessary complications due to (IMHO) the horrible design-decisions WRT branches and workflow YAML.

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 10, 2021

hmm, I guess I was not clear enough.
I'd like just to repeat the same setup as it was done for skopeo.
Meaning build images for
upstream version - https://github.com/containers/buildah/tree/master/contrib/buildahimage/upstream
and push to quay.io/buildah/upstream:master
and
stable version - https://github.com/containers/buildah/tree/master/contrib/buildahimage/stable
and push to quay.io/buildah/stable:v... and quay.io/containers/buildah:v...

This can be done in one workflow using matrix to reuse almost everything accept path to Dockerfile and tags

@cevich
Copy link
Member

cevich commented Feb 10, 2021

Oh I gotcha, yeah that's no problem.

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 15, 2021

The PR is created - #3006
Sorry, it takes some time to figure out the challenge, that docker login doesn't support having creds for 2 repos of 1 org (quay.io/containers and quay.io/buildah), so had to have the login and push steps separately for 2 repos :(
My test workflow run https://github.com/barthy1/buildah/actions/runs/558348033
For stable(contrib/buildahimage/stable) it was - 30m
For upstream (contrib/buildahimage/upstream) - 1h12m
At least build are running in parallel.

@cevich
Copy link
Member

cevich commented Feb 15, 2021

docker login doesn't support having creds for 2 repos of 1 org

Ahh yes, "that" problem 😞

@james-crowley
Copy link

Sorry I am late to the discussion. I actually worked with Cirrus CI on enabling multi-arch workers/Persistent Workers. You should be able to add any architecture(arm64, ppc64le, s390x, amd64) to their platform. Once the workers are configured to and you should be able to deploy jobs.

You can see more information in this issue: cirruslabs/cirrus-ci-docs#263

The persistent workers worked very well. If buildah needs access to s390x and ppc64le workers let me know.

Thank you @barthy1 outlining everything!

@rhatdan Let me know if you need help/want access to hardware.

@TomSweeneyRedHat
Copy link
Member

@barthy1 first off, TYVM for your fine work here.
Does podman login not let you sign in to both? I've not looked at the PR yet, but don't know if that would be an option or not for it.

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 15, 2021

@TomSweeneyRedHat hmm, on my local test machine podman login had same behaviour as docker login, however I have Fedora 30 with old podman and even installed docker at the same machine :(( so can be a lot of confusions..
If latest podman should resolve problem with login, I will definitely test it with github actions workflow(yes, it can make code better), as podman is now available among preinstalled tools - actions/runner-images#320

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 15, 2021

just tested the behaviour in github actions. podman login to 2 quay repos one after another and after that 1 docker push with tags for 2 quay repos fails with unauthorized error :(

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 15, 2021

at least confirmed that now podman can be used with Github actions easily :)

@cevich
Copy link
Member

cevich commented Feb 15, 2021

Sorry I am late to the discussion. I actually worked with Cirrus CI on enabling multi-arch workers/Persistent Workers. You should be able to add any architecture(arm64, ppc64le, s390x, amd64) to their platform. Once the workers are configured to and you should be able to deploy jobs.

@james-crowley I caught that feature the other week, very cool! Thanks for helping with that! As badly designed/immature as github-actions is, and the slowness of emulation, the implementation (in this case) is quite compact. Mainly we're using Github-Actions here because it happened first and @barthy1 already volunteered, not technical superiority. So I certainly won't write-off moving this over to native hardware under Cirrus-CI eventually, esp. if hardware is volunteered 😁 I would very much prefer to stay on one, common automation platform anyway.

@barthy1
Copy link
Contributor Author

barthy1 commented Feb 16, 2021

@cevich feel free just to stop working on my PR, if you think you can have Cirrus-CI setup with native hardware fast.
Anyway, Cirrus-CI is target platform at the end, and my final goal is to have multi-arch builds, not use Github actions ;)

@cevich
Copy link
Member

cevich commented Feb 16, 2021

I don't think it will be "fast". We would need to script all the build, tag, and push aspects + test that it all works at scale, using a brand-new Cirrus-CI feature. Whereas you've got something already (in your PR) completely "scripted" and (it sounds like) already tested. Despite it being non-optimized and on the non-preferred Github Actions platform, "working and tested" is nearly always better than "theoretical and untested" and most certainly better than the current single-arch "cron jobs on a laptop" situation.

I've considered the "Sunken Cost Fallacy", but I think in this case it's fine to go out with what you have in your PR. Call it a "prototype" or "proof-of-concept" if you like. Then we plan on followup development of an optimized setup on Cirrus + Native hardware. The main reason is...during the development of the optimized setup, however long that takes, the community can benefit from having automated multi-arch images built. It can also act as a fall-back, in case native hardware + Cirrus doesn't work out in the end.

Does that make sense? (I'm okay with being wrong about this)

@fkorotkov
Copy link

Just stumbled on this issue and not sure if my information is relevant now for for future researchers...

Cirrus can build multi-arch images and use buildx. @cevich here is the packer template for the image with QEMU.

@cevich
Copy link
Member

cevich commented Aug 19, 2021

Thanks @fkorotkov, I'll keep this in my back-pocket.

For this specific issue, we've got this mostly working in github-actions but I'm so very much not a fan. Our container-tools image build-logic has grown well beyond what simple buildx can easily accommodate. Meanwhile, github-actions has proven itself several times, as overly complex, restrictive, unreadable, and unmaintainable by non-experts.

So I'm currently working to re-implement as simple bash scripts, we can then run on our Cirrus-CI VMs. Since I'm going that way anyway, moving over to our own container tooling also makes sense. See also #3268

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants