Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce x86_64 & ARM64 fedora container images #381

Closed
wants to merge 1 commit into from

Conversation

cevich
Copy link
Member

@cevich cevich commented Aug 16, 2024

Depends on: #380

At the time of this commit, podman's Makefile has a target to allow
validating code changes locally (validatepr). However, it's based
on a bespoke image completely unassociated with the image used in CI.
This can easily lead to a situation where validation passes in the local
environment but fails in CI. Support the podman validatepr target
use of quay.io/libpod/fedora_podman:latest images by performing
a manifest-list build that includes arm64 (a.k.a. aarch64).

The trade-off here is image build-time, since emulation is
extremely slow (over an hour). Therefore, the container_images CI
task has also been removed as a dependency from base_images CI task,
allowing them to run in parallel.

Note: This will not impact pulling the image, since the client always
only pulls the layers necessary for the indicated architecture.

Ref: Podman's validatepr Makefile target

@cevich cevich force-pushed the multiarch_podman_image branch 7 times, most recently from 30393ed to d417fd4 Compare August 19, 2024 18:55
Makefile Outdated Show resolved Hide resolved
@cevich cevich force-pushed the multiarch_podman_image branch 2 times, most recently from 2b88b98 to 4f9f677 Compare August 20, 2024 13:54
@cevich cevich requested review from baude and edsantiago August 20, 2024 15:18
@cevich cevich force-pushed the multiarch_podman_image branch from 4f9f677 to ad726ea Compare August 20, 2024 15:48
@cevich cevich marked this pull request as ready for review August 20, 2024 15:54
Copy link

Cirrus CI build successful. Found built image names and IDs:

Stage Image Name IMAGE_SUFFIX
base debian do-not-use
base fedora do-not-use
base fedora-aws do-not-use
base fedora-aws-arm64 do-not-use
base image-builder do-not-use
base prior-fedora do-not-use
cache build-push c20240820t151015z-f40f39d13
cache debian c20240820t151015z-f40f39d13
cache fedora c20240820t151015z-f40f39d13
cache fedora-aws c20240820t151015z-f40f39d13
cache fedora-netavark c20240820t151015z-f40f39d13
cache fedora-netavark-aws-arm64 c20240820t151015z-f40f39d13
cache fedora-podman-aws-arm64 c20240820t151015z-f40f39d13
cache fedora-podman-py c20240820t151015z-f40f39d13
cache prior-fedora c20240820t151015z-f40f39d13
cache rawhide c20240820t151015z-f40f39d13
cache win-server-wsl c20240820t151015z-f40f39d13

@edsantiago
Copy link
Member

debian prior-fedora fedora fedora-aws rawhide
base 13.5 39-1.5 Generic ? 42-0
41-0 ⇑
kernel 6.10.4-1 6.5.6-300 6.8.5-301 6.8.5-301 6.8.5-301
6.9.12-1 ⇑ 6.9.12-100 ⇑ 6.9.12-200 ⇑ 6.9.12-200 ⇑
grub2-common 2.12-5 2.06-121 2.06-123 2.06-123 2.12-4
2.06-124 ⇑
aardvark-dns 1.9.0-2 1.12.1-1 1.12.1-1 1.12.1-1 1.12.1-1
1.6.0-3 ⇑ 1.11.0-1 ⇑ 1.11.0-3 ⇑
netavark 1.9.0-4 1.12.1-1 1.12.2-1 1.12.1-1 1.12.2-1
1.6.0-2.1 ⇑ 1.11.0-1 ⇑ 1.12.1-1 ⇑ 1.11.0-3 ⇑ 1.11.0-3 ⇑
buildah 1.37.1+ds1-2 1.37.0-1 1.37.1-1 1.37.0-1 1.37.1-1
1.35.3+ds1-3 ⇑ 1.36.0-1 ⇑ 1.37.0-1 ⇑ 1.36.0-1 ⇑ 1.37.0-1 ⇑
containers-common ? 1-99 0.60.1-1 0.60.0-1 0.60.1-1
0.60.0-1 ⇑ 0.60.0-1 ⇑
crun 1.16.1-1 1.15-1 1.15-1 1.15-1 1.15-2
1.15-1 ⇑
docker-ce 5:27.1.2-1~debian.12~bookworm ? ? ? ?
5:27.1.1-1~debian.12~bookworm ⇑
golang 2:1.22~3 1.21.12-1 1.22.6-1 1.22.6-1 1.23.0-2
1.22.5-1 ⇑ 1.22.5-1 ⇑ 1.23~rc2-1 ⇑
gvisor-tap-vsock ? 0.7.4-1 ? ? ?
0.7.3-1 ⇑
passt 2024-08-14 2024-06-24 2024-08-14 2024-08-14 2024-08-14
2024-07-26 ⇑ 2024-07-26 ⇑ 2024-06-24 ⇑ 2024-07-26 ⇑
podman 5.2.1+ds1-2 4.9.4-1 5.2.1-1 5.2.1-1 5.2.1-1
5.0.3+ds1-5 ⇑ 5.2.0-1 ⇑ 5.1.2-1 ⇑ 5.2.0~rc2-1 ⇑
runc 1.1.12+ds1-5 1.1.12-1 1.1.12-3 1.1.12-3 1.1.12-4
1.1.12+ds1-2 ⇑
skopeo 1.13.3+ds1-2+b2 1.16.0-1 1.16.0-1 1.16.0-1 1.16.0-1
1.13.3+ds1-2+b1 ⇑ 1.15.2-1 ⇑ 1.15.2-1 ⇑
systemd 256.5-1 254.16-1 255.10-3 255.10-3 256.4-1
256.4-2 ⇑ 255.10-1 ⇑ 255.10-1 ⇑

@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

Hrmmm, it looks like a bunch of tasks are still waiting for the container task to finish. We don't want that since the new multi-arch fedora container takes >1hr to build.

@cevich cevich force-pushed the multiarch_podman_image branch from ad726ea to 0b53866 Compare August 20, 2024 17:43
@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

Force-push: Fixed up commit message + simplified diff slightly. I'm not sure why many/most of the VM build tasks aren't running in parallel w/ container-based tasks. Maybe it's some Cirrus-CI quota/restriction.

@edsantiago
Copy link
Member

Can you give context on what it is you noticed? All I saw was green CI

@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

Can you give context on what it is you noticed?

The commit message needed a minor tweak. I thought I could simplify the YAML and just make everything use the &IBI_VM alias, but I forgot the nested virt stuff is important too -sigh-.

@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

More detail: I afraid it "accidentally" passed. This test job failure makes me think we simply got lucky and were assigned a machine that supports nested-virt (which is required for the base image builds). I want to try and encode that requirement into the &IBI_VM alias so it's guaranteed.

@cevich cevich force-pushed the multiarch_podman_image branch from 0b53866 to 7c226ef Compare August 20, 2024 19:15
@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

There, I think this should be better. The last thing I want is to add a flake into these builds 😕

@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

Re:

I'm not sure why many/most of the VM build tasks aren't running in parallel w/ container-based tasks.

I was looking at the task scheduling sequence (green-bars on the right). I seemed like a bunch of tasks were waiting for the new, slow container build. It may have just been a fluke though, nothing in the dependency tree suggested the VM builds should block.

@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

Error: statfs /dev/kvm: no such file or directory

That's the nested-virt isn't supported problem 😞

@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

Hrmmm, okay, taking a step back. Let me just go back to the CI-green commit, then only fix the commit message, and add in the enable_nested_virtualization: true.

At the time of this commit, podman's Makefile has a target to allow
validating code changes locally (`validatepr`).  However, it's based
on a bespoke image completely unassociated with the image used in CI.
This can easily lead to a situation where validation passes in the local
environment but fails in CI.  Support the podman `validatepr` target
use of `quay.io/libpod/fedora_podman:latest` images by performing
a manifest-list build that includes `arm64` (a.k.a. `aarch64`).

The trade-off here is image build-time, since emulation is
extremely slow (over an hour).  Therefore, the `container_images` CI
task has also been removed as a dependency from `base_images` CI task,
allowing them to run in parallel.

Note: This will not impact pulling the image, since the client always
only pulls the layers necessary for the indicated architecture.

Signed-off-by: Chris Evich <[email protected]>
@cevich cevich force-pushed the multiarch_podman_image branch from 7c226ef to 0b13b48 Compare August 20, 2024 19:30
@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

I was looking at the task scheduling sequence (green-bars on the right).

Confirmed, this is misleading. VM builds are running concurrently with the container builds as intended.

@cevich
Copy link
Member Author

cevich commented Aug 20, 2024

I'm going to abandon this, the emulated build is just too slow. The overall CI VM image build process is complex and lengthy enough. Nobody want more complexity and an even slower (1-1/2 hour) container build on top.

If somebody wants to take this up in the future, I'd suggest doing a native arm64 build, then (somehow) combining the two into a manifest-list after-the-fact. This is also complex, but at least it'll run quickly (like 15m probably).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants