-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build ARM docker images #4794
Build ARM docker images #4794
Conversation
Signed-off-by: Ali Ariff <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aliariff , I'm starting to look into this.
One first observation: the go-deps
first stage in the Dockerfiles calls ./bin/install-deps
to build some dependencies and thus warm the cache. I believe that also requires using GOARCH=...
to match the target arch for the following stages.
@alpeb |
Tested the Dockerfile changes and the images are building for me locally. I chatted with @aliariff and the build portions of the GitHub actions succeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that I had failed to make explicit is that now that we're relying on the github cache for caching the stage layers, we have to try to limit the number and size of all the images because the cache gets filled at 2GB. After that point the oldest images start to get evicted. Also we'd like to keep down the gcr registry expenses 😉
With that in mind, I think it's best to rely on go
's excellent cross-platform abilities instead, and not using the --platform
flag at all. If I understand correctly, whenever we use buildx --platform
we create separate images for each platform for each stage, so we should avoid that.
So for example for Dockerfile-proxy
, we shouldn't use BUILDPLATFORM
and TARGETARCH
as they'll always point to linux/amd64
. The go-deps
stage remains as is, the fetch
stage also runs on the default platform, but fetches the appropriate proxy binary for a platform given by a new ARG
such as LINKERD_ARCH
that we'd have to pass as an argument to the bin/docker-build-proxy
. OTOH the final stage would start with FROM --platform=${LINKERD_ARCH} $RUNTIME_IMAGE as runtime
and that final stage would have to have separate images for each arch.
controller/Dockerfile
would have all its stages default to linux/amd64
. Likewise, the golang
stage will have to specify the target arch for the go build
command. And the final stage scratch
doesn't require a specific architecture, from my understanding. The same applies to cli/Dockerfile-bin
.
Like Dockerfile-proxy
, web/Dockerfile
, cni-plugin/Dockerfile
and Dockerfile-debug
require an arch env for the go build
command and only the final stage requires a platform qualifier.
For grafana/Dockerfile
, we need to check if they publish for other archs.
Let me know if all this makes sense to you. The buildx docs aren't very clear on some parts, so I'm basing all this on my own experimentation, which can be wrong. Let me know if you'd like to talk offline to clarify things.
I had added some inline comments to the changeset, but I've just deleted them since all that will change given my previous text wall 😬 |
@alpeb And for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @aliariff . It wasn't clear to me that buildx --platforms
could use the same linux/amd64
base images in stages while iterating through the different architectures 👍
To reiterate my understanding: all intermediary images will be built off of linux/amd64
base images and will result in 3 different images each. OTOH the final stages will be built off of each separate arch, also resulting in 3 images.
That makes sense for all the stages except for webpack-bundle
which I believe doesn't produce binaries. But we can leave that as an optimization for later.
Also, to avoid the image proliferation I mentioned in my previous comment, we'd only want to generate these multiplatform images for the release.yml
workload.
I've added some more comments below.
Dockerfile-proxy
Outdated
FROM --platform=$BUILDPLATFORM golang:1.14.2-alpine as go-deps | ||
WORKDIR /linkerd-build | ||
COPY go.mod go.sum ./ | ||
COPY bin/install-deps bin/ | ||
RUN go mod download | ||
RUN ./bin/install-deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stage will be ran separately for each arch and go install
will use that arch, so this is fine. What do you think though about using ARG TARGETARCH
and passing that to install-deps
, so it's clearer what's going on, like you did in the stages below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, but then it will result in 3 different go-deps
layers. It will get reused on their arch only, not across the arch. That is my understanding. LMKWYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that stage already producing 3 different layers? This is an extract of the output I see currently:
=> CACHED [linux/amd64 go-deps 2/6] WORKDIR /linkerd-build 0.0s
=> CACHED [linux/amd64 go-deps 3/6] COPY go.mod go.sum ./ 0.0s
=> CACHED [linux/amd64 go-deps 4/6] COPY bin/install-deps bin/ 0.0s
=> CACHED [linux/amd64 go-deps 5/6] RUN go mod download 0.0s
=> CACHED [linux/amd64 go-deps 6/6] RUN ./bin/install-deps 0.0s
=> CACHED [linux/amd64 golang 1/7] WORKDIR /linkerd-build 0.0s
=> CACHED [linux/amd64 golang 2/7] COPY pkg/flags pkg/flags 0.0s
=> CACHED [linux/amd64 golang 3/7] COPY pkg/tls pkg/tls 0.0s
=> CACHED [linux/amd64 golang 4/7] COPY pkg/version pkg/version 0.0s
=> CACHED [linux/amd64 go-deps 2/6] WORKDIR /linkerd-build 0.0s
=> CACHED [linux/amd64 go-deps 3/6] COPY go.mod go.sum ./ 0.0s
=> CACHED [linux/amd64 go-deps 4/6] COPY bin/install-deps bin/ 0.0s
=> CACHED [linux/amd64 go-deps 5/6] RUN go mod download 0.0s
=> CACHED [linux/amd64 go-deps 6/6] RUN ./bin/install-deps 0.0s
=> CACHED [linux/amd64 golang 1/7] WORKDIR /linkerd-build 0.0s
=> CACHED [linux/amd64 golang 2/7] COPY pkg/flags pkg/flags 0.0s
=> CACHED [linux/amd64 golang 3/7] COPY pkg/tls pkg/tls 0.0s
=> CACHED [linux/amd64 golang 4/7] COPY pkg/version pkg/version 0.0s
=> [linux/amd64 golang 5/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -mod=readonly ./pkg/... 5.0s
=> [linux/amd64 golang 5/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -mod=readonly ./pkg/... 4.9s
=> CACHED [linux/amd64 go-deps 2/6] WORKDIR /linkerd-build 0.0s
=> CACHED [linux/amd64 go-deps 3/6] COPY go.mod go.sum ./ 0.0s
=> CACHED [linux/amd64 go-deps 4/6] COPY bin/install-deps bin/ 0.0s
=> CACHED [linux/amd64 go-deps 5/6] RUN go mod download 0.0s
=> CACHED [linux/amd64 go-deps 6/6] RUN ./bin/install-deps
go-deps
is running 3 times, with the same base linux/amd64
, but I think each run is for each different arch. Which is fine since the ensuing golang
stage requires a go
cache built for each one of the archs.
Signed-off-by: Ali Ariff <[email protected]>
Signed-off-by: Ali Ariff <[email protected]>
Signed-off-by: Ali Ariff <[email protected]>
Signed-off-by: Ali Ariff <[email protected]>
Signed-off-by: Ali Ariff <[email protected]>
After discussion with @alpeb |
Signed-off-by: Ali Ariff <[email protected]>
Another discussion with @alpeb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aliariff . This tested fine in my fork triggering the release.yml
workflow and pushing the final images to gcr.io, but only as long as the build wasn't repeated. So if the images are already published in gcr.io and another build tries to push the same images, the docker_build
job fails with something like:
------
> exporting to image:
------
failed to solve: rpc error: code = Unknown desc = failed commit on ref "manifest-sha256:c7256921cfd892be4bd7989189478065a8679bb457ff4c29fea09a26807dcf83": unexpected status: 401 Unauthorized
(see https://github.com/alpeb/linkerd2/runs/943114279)
In the buildx repo there are mentions of this error, but only in the context of very long builds, which is not our case, specially when running with the cache hit: docker/buildx#324 and docker/buildx#327
Looks like it has to do with the manifests and not the images themselves...
Would you please dig around and see what could be going on? I see this could impact us whenever we'd hit some flakiness in the release build, which is not uncommon, and this problem would imply having to clear the images in gcr.io before triggering the build again.
Signed-off-by: Ali Ariff <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks @aliariff !
I'll re-trigger the build because it seems github was having some downtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aliariff can you please merge main
into this branch again so it picks the static checks workaround?
Signed-off-by: Ali Ariff <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great. I have a comment about the purpose that DOCKER_PUSH
serves and if we would ever use it without DOCKER_MULTIARCH
.
Once I understand that I have no blockers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @aliariff 👍
@cpretzer @alpeb @kleimkuhler
These changes enable us to build ARM images.
Changes:
DOCKER_MULTIARCH
andDOCKER_PUSH
. When set, it will build multi-arch images and push them to the registry. See Building images for multi-arch with --load parameter fails docker/buildx#59 for why it must be pushed to the registry.crazy-max/ghaction-docker-buildx
is necessary as it already configured with the ability to perform cross-compilation (using QEMU) so we can just use it, instead of manually set up it.buildx
now make default global arguments. (See: https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope)Follow-up: