Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build ARM docker images #4794

Merged
merged 10 commits into from
Aug 5, 2020
Merged

Conversation

aliariff
Copy link
Contributor

@aliariff aliariff commented Jul 24, 2020

@cpretzer @alpeb @kleimkuhler
These changes enable us to build ARM images.

Changes:

Follow-up:

  • Releasing the CLI binary file in ARM architecture. The docker images resulting from these changes already build in the ARM arch. Still, we need to make another adjustment like how to retrieve those binaries and to name it correctly as part of Github Release artifacts.

Signed-off-by: Ali Ariff <[email protected]>
@aliariff aliariff requested a review from a team as a code owner July 24, 2020 18:21
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aliariff , I'm starting to look into this.
One first observation: the go-deps first stage in the Dockerfiles calls ./bin/install-deps to build some dependencies and thus warm the cache. I believe that also requires using GOARCH=... to match the target arch for the following stages.

@aliariff
Copy link
Contributor Author

@alpeb
So far the resulting images that come from this, is working fine in RPi cluster (without putting GOARCH in bin/install-deps ). Is this because maybe the different behavior of go install and go build?

@cpretzer
Copy link
Contributor

Tested the Dockerfile changes and the images are building for me locally. I chatted with @aliariff and the build portions of the GitHub actions succeed.

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I had failed to make explicit is that now that we're relying on the github cache for caching the stage layers, we have to try to limit the number and size of all the images because the cache gets filled at 2GB. After that point the oldest images start to get evicted. Also we'd like to keep down the gcr registry expenses 😉

With that in mind, I think it's best to rely on go's excellent cross-platform abilities instead, and not using the --platform flag at all. If I understand correctly, whenever we use buildx --platform we create separate images for each platform for each stage, so we should avoid that.

So for example for Dockerfile-proxy, we shouldn't use BUILDPLATFORM and TARGETARCH as they'll always point to linux/amd64. The go-deps stage remains as is, the fetch stage also runs on the default platform, but fetches the appropriate proxy binary for a platform given by a new ARG such as LINKERD_ARCH that we'd have to pass as an argument to the bin/docker-build-proxy. OTOH the final stage would start with FROM --platform=${LINKERD_ARCH} $RUNTIME_IMAGE as runtime and that final stage would have to have separate images for each arch.

controller/Dockerfile would have all its stages default to linux/amd64. Likewise, the golang stage will have to specify the target arch for the go build command. And the final stage scratch doesn't require a specific architecture, from my understanding. The same applies to cli/Dockerfile-bin.

Like Dockerfile-proxy, web/Dockerfile, cni-plugin/Dockerfile and Dockerfile-debug require an arch env for the go build command and only the final stage requires a platform qualifier.

For grafana/Dockerfile, we need to check if they publish for other archs.

Let me know if all this makes sense to you. The buildx docs aren't very clear on some parts, so I'm basing all this on my own experimentation, which can be wrong. Let me know if you'd like to talk offline to clarify things.

@alpeb
Copy link
Member

alpeb commented Jul 27, 2020

I had added some inline comments to the changeset, but I've just deleted them since all that will change given my previous text wall 😬

@aliariff
Copy link
Contributor Author

aliariff commented Jul 28, 2020

@alpeb
I think your approach is the other way around from what I am doing here. But the basic idea of cross-compilation is the same.
I utilize the buildx build --platform .... the reason I use FROM --platform=$BUILDPLATFORM in the Dockerfile is for reusing the image so that we can perform cross-compilation. in this case, the image for the go-deps and fetch will only the linux/amd64 version. So from that same image linux/amd64 it will perform cross-compilation to another arch.
And for the runtime image, it will be based on the target platform, in this case, we specify when calling buildx build --platform linux/amd64,linux/arm64,....
Because By default, the target platform of the build request is used (ref).
that's why we need to explicitly write BUILDPLATFORM.
That is also why I don't change the Dockerfile for grafana part because by default it will retrieving the correct arch (ie. target platform) when we use buildx build --platform ....

And for grafana yes they already support the arm ref.

@aliariff
Copy link
Contributor Author

aliariff commented Jul 28, 2020

this is what happens when we dont specify the BUILDPLATFORM
image
the go-deps part will try to pull 3 different images for every arch.

but then when we specify it.
image
it changed to only once pull the golang image, and the other arch will reuse that image from cache. not only the image but reuse the whole stage in this example of go-deps, means that it is also cached the bin/install-deps result. so the other arch just reuse that stage.

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @aliariff . It wasn't clear to me that buildx --platforms could use the same linux/amd64 base images in stages while iterating through the different architectures 👍

To reiterate my understanding: all intermediary images will be built off of linux/amd64 base images and will result in 3 different images each. OTOH the final stages will be built off of each separate arch, also resulting in 3 images.

That makes sense for all the stages except for webpack-bundle which I believe doesn't produce binaries. But we can leave that as an optimization for later.

Also, to avoid the image proliferation I mentioned in my previous comment, we'd only want to generate these multiplatform images for the release.yml workload.

I've added some more comments below.

Dockerfile-proxy Outdated
Comment on lines 5 to 10
FROM --platform=$BUILDPLATFORM golang:1.14.2-alpine as go-deps
WORKDIR /linkerd-build
COPY go.mod go.sum ./
COPY bin/install-deps bin/
RUN go mod download
RUN ./bin/install-deps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stage will be ran separately for each arch and go install will use that arch, so this is fine. What do you think though about using ARG TARGETARCH and passing that to install-deps, so it's clearer what's going on, like you did in the stages below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, but then it will result in 3 different go-deps layers. It will get reused on their arch only, not across the arch. That is my understanding. LMKWYT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that stage already producing 3 different layers? This is an extract of the output I see currently:

=> CACHED [linux/amd64 go-deps 2/6] WORKDIR /linkerd-build                                                                                                                                                                               0.0s
 => CACHED [linux/amd64 go-deps 3/6] COPY go.mod go.sum ./                                                                                                                                                                                0.0s
 => CACHED [linux/amd64 go-deps 4/6] COPY bin/install-deps bin/                                                                                                                                                                           0.0s
 => CACHED [linux/amd64 go-deps 5/6] RUN go mod download                                                                                                                                                                                  0.0s 
 => CACHED [linux/amd64 go-deps 6/6] RUN ./bin/install-deps                                                                                                                                                                               0.0s 
 => CACHED [linux/amd64 golang 1/7] WORKDIR /linkerd-build                                                                                                                                                                                0.0s 
 => CACHED [linux/amd64 golang 2/7] COPY pkg/flags pkg/flags                                                                                                                                                                              0.0s 
 => CACHED [linux/amd64 golang 3/7] COPY pkg/tls pkg/tls                                                                                                                                                                                  0.0s 
 => CACHED [linux/amd64 golang 4/7] COPY pkg/version pkg/version                                                                                                                                                                          0.0s 
 => CACHED [linux/amd64 go-deps 2/6] WORKDIR /linkerd-build                                                                                                                                                                               0.0s 
 => CACHED [linux/amd64 go-deps 3/6] COPY go.mod go.sum ./                                                                                                                                                                                0.0s
 => CACHED [linux/amd64 go-deps 4/6] COPY bin/install-deps bin/                                                                                                                                                                           0.0s 
 => CACHED [linux/amd64 go-deps 5/6] RUN go mod download                                                                                                                                                                                  0.0s
 => CACHED [linux/amd64 go-deps 6/6] RUN ./bin/install-deps                                                                                                                                                                               0.0s
 => CACHED [linux/amd64 golang 1/7] WORKDIR /linkerd-build                                                                                                                                                                                0.0s
 => CACHED [linux/amd64 golang 2/7] COPY pkg/flags pkg/flags                                                                                                                                                                              0.0s
 => CACHED [linux/amd64 golang 3/7] COPY pkg/tls pkg/tls                                                                                                                                                                                  0.0s
 => CACHED [linux/amd64 golang 4/7] COPY pkg/version pkg/version                                                                                                                                                                          0.0s
 => [linux/amd64 golang 5/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm go build -mod=readonly ./pkg/...                                                                                                                                     5.0s
 => [linux/amd64 golang 5/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -mod=readonly ./pkg/...                                                                                                                                   4.9s
 => CACHED [linux/amd64 go-deps 2/6] WORKDIR /linkerd-build                                                                                                                                                                               0.0s
 => CACHED [linux/amd64 go-deps 3/6] COPY go.mod go.sum ./                                                                                                                                                                                0.0s
 => CACHED [linux/amd64 go-deps 4/6] COPY bin/install-deps bin/                                                                                                                                                                           0.0s
 => CACHED [linux/amd64 go-deps 5/6] RUN go mod download                                                                                                                                                                                  0.0s
 => CACHED [linux/amd64 go-deps 6/6] RUN ./bin/install-deps

go-deps is running 3 times, with the same base linux/amd64, but I think each run is for each different arch. Which is fine since the ensuing golang stage requires a go cache built for each one of the archs.

@aliariff
Copy link
Contributor Author

After discussion with @alpeb
We decided to use GOARCH in go-deps stage.
The reason is that when we don't pass the arch argument in that stage, in golang stage which is later stage based on go-deps to perform cross-compilation will only get faster on amd64 arch, and the other arch compilation takes a long time because the go-deps is base on amd64 only.
And yes, this will result in the cache layers getting bigger because go-deps stage now will have three different layers. We will address that issue on separate PR.

@aliariff
Copy link
Contributor Author

aliariff commented Aug 3, 2020

Another discussion with @alpeb,
Conclude that we will not use separate cache key for the multi-arch images. Because GitHub will remove any cache entries that have not been accessed in over 7 days. We know that most of the use case for the release.yml workflow is running weekly when releasing the edge version. Therefore there is no point in using separate cache key for multi-arch because, in the end, it is going to be used again in a week, and Github already removes those cache entries.

Ref: https://docs.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aliariff . This tested fine in my fork triggering the release.yml workflow and pushing the final images to gcr.io, but only as long as the build wasn't repeated. So if the images are already published in gcr.io and another build tries to push the same images, the docker_build job fails with something like:

------
 > exporting to image:
------
failed to solve: rpc error: code = Unknown desc = failed commit on ref "manifest-sha256:c7256921cfd892be4bd7989189478065a8679bb457ff4c29fea09a26807dcf83": unexpected status: 401 Unauthorized

(see https://github.com/alpeb/linkerd2/runs/943114279)

In the buildx repo there are mentions of this error, but only in the context of very long builds, which is not our case, specially when running with the cache hit: docker/buildx#324 and docker/buildx#327

Looks like it has to do with the manifests and not the images themselves...

Would you please dig around and see what could be going on? I see this could impact us whenever we'd hit some flakiness in the release build, which is not uncommon, and this problem would imply having to clear the images in gcr.io before triggering the build again.

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @aliariff !
I'll re-trigger the build because it seems github was having some downtime.

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aliariff can you please merge main into this branch again so it picks the static checks workaround?

Copy link
Contributor

@cpretzer cpretzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢

Excellent work @aliariff and great reviews @alpeb

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great. I have a comment about the purpose that DOCKER_PUSH serves and if we would ever use it without DOCKER_MULTIARCH.

Once I understand that I have no blockers.

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @aliariff 👍

@kleimkuhler kleimkuhler merged commit 61d7ded into linkerd:main Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants