Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build arm64 main image #1487

Merged
merged 26 commits into from
Apr 16, 2023
Merged

Conversation

felixvanoost
Copy link
Contributor

@felixvanoost felixvanoost commented Mar 25, 2023

Builds an arm64 variant of the main Kroki Docker image by leveraging the changes introduced in #1476. All images should now natively support the ARM architecture. This includes the following changes:

  • Creates a nightly workflow to build the multi-arch images every night at 02:00 UTC.
  • Restricts the Docker Buildx builder to 2 parallel threads. This matches the 2 cores available on the GitHub CI runners and prevents timeouts during the build process.
  • Runs a free disk space action at the start of the CI workflow to recover extra disk space from the CI runner. This is now necessary to successfully complete the build process.

@ggrossetie
Copy link
Member

Could you please rebase on main? Thanks!

@felixvanoost felixvanoost force-pushed the build-arm64-main-image branch from c2364ad to 5e9964b Compare March 25, 2023 22:26
@felixvanoost
Copy link
Contributor Author

The images are building locally but I've been running into several issues with the CI:

  • The pipeline was initially reporting a Docker error 317 due to an issue with cargo install needing too long / too much memory to get the crates.io index as described in this issue. I resolved this by upgrading to the latest Rust release and enabling sparse-registry.
  • Next we starting getting some strange network issues with npm install, which I resolved by setting maxsockets=1 but is probably not doing the pipeline any favours in terms of speed.
  • Now, it's failing because of a timeout with many of the tests. No idea what's causing this but I will continue looking later.

The full pipeline now takes over an hour because it has to build both arm64 (using QEMU, which is slower than native) and amd64 builds. Not ideal, and will probably need to be resolved through more static images.

@ggrossetie
Copy link
Member

The full pipeline now takes over an hour because it has to build both arm64 (using QEMU, which is slower than native) and amd64 builds. Not ideal, and will probably need to be resolved through more static images.

Indeed 🙀
However, I don't think we need to build both arm64 and amd64 on every commit. We could setup a nightly build (with both amd64 and arm64) and a CI build with only amd64.

The images are building locally but I've been running into several issues with the CI:

We were already close to the GitHub runner limit and I guess adding the arm64 build tipped the balance.
Anyway, thanks for finding workarounds and improvements 👍🏻

Also it might be possible to reduce the parallelization of buildx by running it multiple times on different targets:

docker buildx bake kroki-python
docker buildx bake kroki-js
docker buildx bake kroki-java

@felixvanoost felixvanoost force-pushed the build-arm64-main-image branch from d36bb7b to a161e3a Compare March 31, 2023 22:53
@felixvanoost
Copy link
Contributor Author

I've defined a nightly CI workflow that will build the multi-arch images and also split up the docker buildx bake calls for the main vs. companion images to reduce parallelisation. All of this works fine locally, but we're still getting a timeout error for some of the tests defined in /vega/tests/test.js.

@ggrossetie I couldn't figure out where the 2000ms timeout is defined for these tests. Is there a global config somewhere? I think that the problem would be resolved by increasing the timeout slightly, because running the arm64 image on QEMU is slower than native.

@ggrossetie
Copy link
Member

ggrossetie commented Apr 2, 2023

I couldn't figure out where the 2000ms timeout is defined for these tests. Is there a global config somewhere? I think that the problem would be resolved by increasing the timeout slightly, because running the arm64 image on QEMU is slower than native.

I believe this is the default value of Mocha but you can override it using this.timeout(15000) (15 seconds in milliseconds) inside the describe function in

kroki/vega/tests/test.js

Lines 14 to 15 in b039f17

describe('#convert', function () {
it('should throw UnsafeIncludeError in secure mode when the Vega-Lite specification contains data.url', async function () {

@felixvanoost
Copy link
Contributor Author

After resolving the parallelism issues (I think), I've now encountered issues with running out of disk space on the runners. For now I'm trying to run this action at the start of the workflow, and I'll see if that fixes things.

@ggrossetie Please let me know if you'd like me to stop this; I don't want to waste all your CI minutes. 🙃

@felixvanoost
Copy link
Contributor Author

This is now good to go from my side. I've updated the PR description above with the main changes.

I tested both regular (native-arch) and multi-arch CI workflows manually in the two last commits. Nightly and release builds are multi-arch, regular builds are native-arch.

@ggrossetie
Copy link
Member

@felixvanoost perfect, I will take a closer look this weekend 👍🏻

@ggrossetie
Copy link
Member

I will try to build the arm64 image on an Ampere A1 Compute machine. I heard that QEMU is ~20 times slower.

@ggrossetie
Copy link
Member

Also, in the next version we will use UMLet and PlantUML as native images (using GraalVM). So we need to produce arm64 compatible native images.

Comment on lines 26 to 33
- name: Free up disk space
uses: jlumbroso/free-disk-space@main
with:
android: true
dotnet: true
haskell: true
large-packages: true
swap-storage: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a condition to use this action only if build_multiarch is true

Comment on lines 50 to 55
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
config-inline: |
[worker.oci]
max-parallelism = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here,we should set max-parallelism to 2 when build_multiarch is true

@ggrossetie ggrossetie merged commit 9cd4a45 into yuzutech:main Apr 16, 2023
@ggrossetie
Copy link
Member

Thanks for all your great work @felixvanoost 🙌🏻

@felixvanoost
Copy link
Contributor Author

My pleasure!

ggrossetie added a commit that referenced this pull request Apr 16, 2023
Co-authored-by: Guillaume Grossetie <[email protected]>
@felixvanoost
Copy link
Contributor Author

felixvanoost commented Apr 17, 2023

I will try to build the arm64 image on an Ampere A1 Compute machine. I heard that QEMU is ~20 times slower.

@ggrossetie A native arm64 runner would be great. I ended up using cross-compilation (see here) for most of the builds, so the CI build times should be much lower than they were initally.

QEMU is now only being used for D2, Pikchr, Mermaid, Wireviz, Blockdiag, and to put together the main Kroki image. Of those, D2, Pikchr, and Mermaid can most likely be cross-compiled with some additional effort.

@felixvanoost felixvanoost deleted the build-arm64-main-image branch April 17, 2023 03:10
@ggrossetie
Copy link
Member

@felixvanoost I get the following errors when building images:

WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed

Not sure if it means that the disk is full or if the latest version of buildx has a bug...? Does this error sound familiar?

@ggrossetie
Copy link
Member

Apparently, when something is failing it throws this error but this is not the root cause

@felixvanoost
Copy link
Contributor Author

felixvanoost commented Apr 19, 2023

@felixvanoost I get the following errors when building images:

WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
WARNING: local cache import at /tmp/.buildx-cache not found due to err: could not read /tmp/.buildx-cache/index.json: open /tmp/.buildx-cache/index.json: no such file or directory
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed
ERROR: (*service).Write failed

Not sure if it means that the disk is full or if the latest version of buildx has a bug...? Does this error sound familiar?

I've never experienced this before. Is this happening directly in the workflow?

Update: The latest nightly workflow just passed here.

@mariotoffia
Copy link

@felixvanoost thanks for you're porting kroki (of which I'm ever grateful @ggrossetie) to arm64!!

Cheers,
Mario :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants