Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balena build / deploy --build selects base image of wrong architecture (multiarch image manifest) #1508

Closed
MatthewCroughan opened this issue Nov 10, 2019 · 23 comments · Fixed by #2301
Assignees

Comments

@MatthewCroughan
Copy link

MatthewCroughan commented Nov 10, 2019

balena build and balena deploy --build may fail with "exec format error" or related mismatched architecture errors when the base image in a Dockerfile FROM line is a multiarch base image, where a single name:tag reference includes multiple images of different architectures. Example: FROM ubuntu where ubuntu is an image name that refers to multiple images of different architectures (ARM, ARM 64, 386, x86-64, PowerPC 64 LE, IBM Z, etc).

balena build and balena deploy --build have never had support for multiarch base images. Balena's original solution (that predates Docker's multiarch solution) is Dockerfile templates which work well with balenalib base images. While balenalib has alternatives to FROM ubuntu, e.g. FROM balenalib/raspberrypi3-ubuntu, there are no alternatives to other base images like FROM nginx or FROM telegraph, so the CLI needs to implement support for such multiarch base images.

Workaround
A workaround is to append the sha256 hash to image name on the FROM line, as detailed in #1508 (comment).

Edited from Matthew's original report:

https://forums.balena.io/t/balena-cloud-cli-builder-is-still-not-able-to-interpret-image-manifests/43203/3
I've already typed a lot here, and the references are included.

@MatthewCroughan MatthewCroughan changed the title Balena build unable to interpret image manifests Balena build unable to interpret dockerfile image manifests Nov 10, 2019
@pdcastro
Copy link
Contributor

Related to issue cloud builder should understand and use the docker hub manifest files (private repo)

@balena-ci
Copy link
Contributor

[pdcastro] This issue has attached support thread https://jel.ly.fish/#/support-thread~d0b11b4d-1ff9-45fa-855a-453d612ff58a

@MatthewCroughan
Copy link
Author

@pdcastro The builder within BalenaOS does not understand --platform whereas the Balena cloud builder and balena build do understand this argument.

The result is that I cannot work with both the Local and Cloud methods in mind, I can only pick and choose one. This is a bit of a show-stopper for the project I'm working on.

https://github.com/DynamicDevices/ming/issues/6

@pdcastro pdcastro added this to the Sorted Backlog milestone Nov 18, 2019
@pdcastro
Copy link
Contributor

@MatthewCroughan
Copy link
Author

@richbayliss This is the issue I referenced tonight at IoT Liverpool.

@MatthewCroughan
Copy link
Author

This is the single most annoying issue I have encountered with Balena, that if resolved would make my day and allow me to get on with things. Is there a timeline for when Docker Hub manifests will be supported?

@MatthewCroughan
Copy link
Author

MatthewCroughan commented May 27, 2020

The easiest way to fix this, for now, would be to change these variables to match the Docker Hub manifest rather than some arbitrary string.

For example, why is Raspberry Pi 1 rpi? That's not an architecture. The correct definition would be arm according to docker hub. Maybe a second variable could be made to appease this issue, %DOCKER_HUB_ARCH% which would specify the correct architecture as defined by the docker hub for each board, rather than an armv7hf or rpi which the docker hub does not understand.

https://www.balena.io/docs/learn/develop/dockerfile/

Device Name BALENA_MACHINE_NAME BALENA_ARCH
Raspberry Pi (1, Zero, Zero W) raspberry-pi rpi

@pdcastro
Copy link
Contributor

pdcastro commented Jun 1, 2020

I've come across the following dependency (resin-multibuild) update and I wonder if it would help with this issue:
https://github.com/balena-io-modules/resin-multibuild/pull/74/files#diff-b49d24115cb2d977cab0072034b6c012R119

balena CLI is currently at resin-multibuild v4.5.1, and that update was to resin-multibuild v4.6.0. We should update it soon, but it needs testing first.

@jellyfish-bot
Copy link

[pdcastro] This issue has attached support thread https://jel.ly.fish/ae60add5-3450-4fd0-8f5b-9194875b2919

@jellyfish-bot
Copy link

[pdcastro] This issue has attached support thread https://jel.ly.fish/73473ac9-0b86-4f3c-a950-279c6ebf9753

@jellyfish-bot
Copy link

[pdcastro] This issue has attached support thread https://jel.ly.fish/3f656584-1da5-4b5b-afb2-d49294e2e38c

@jellyfish-bot
Copy link

[pdcastro] This issue has attached support thread https://jel.ly.fish/8364ae96-2b05-425c-aff3-c3058cf98f3c

@jellyfish-bot
Copy link

[rahul-thakoor] This issue has attached support thread https://jel.ly.fish/f0910369-0b58-48f3-9214-1e6fe0d16388

@pdcastro
Copy link
Contributor

pdcastro commented Mar 17, 2021

Relevant recent comments in issue #1408 -

Until this issue is resolved, a workaround is described below -- it's a copy and paste of the original post in the balena forums.

The following example regards the multiarch telegraf image.

[...] the workaround is to append a sha256 digest to the FROM line of your Dockerfile, thus "manually selecting" the base image architecture. To do so, check the different sha256 digest for each available architecture on the Dockerhub page:

https://hub.docker.com/layers/telegraf/library/telegraf/1.15.3/images/sha256-655213a2041fb9eed5e8129baac75bc460005fb1f4ead6d7f3ddafd4bc884a89?context=explore

Dockerhub arch Device Dockerfile FROM line
arm64/v8 RPi 4 FROM telegraf:1.15.3@sha256:28fb4583d35fa39690bf84934165d41bb7f8d62686490fc87e149a5261abd6b7
arm/v7 RPi 3 FROM telegraf:1.15.3@sha256:f0665c76213e129ceba8d76443011ea8332d850ebc3749577cd498f141bed7d8
amd64 Intel NUC FROM telegraf:1.15.3@sha256:655213a2041fb9eed5e8129baac75bc460005fb1f4ead6d7f3ddafd4bc884a89

Choose one of the Dockerfile FROM lines from the table above, depending on your device type. The sha256 digests above were extracted from the Dockerhub page linked above, by using the dropdown box to select the target architecture.

@pdcastro pdcastro changed the title Balena build unable to interpret dockerfile image manifests balena build / deploy --build selects base image of wrong architecture (multiarch image manifest) Jun 7, 2021
@jellyfish-bot
Copy link

[pdcastro] This issue has attached support thread https://jel.ly.fish/0ccf5bab-fbf3-4f09-a3c5-f844a6a046d7

@toochevere toochevere self-assigned this Jul 21, 2021
toochevere pushed a commit that referenced this issue Jul 28, 2021
Bump version of balena-multibuild to the one that supports multiarch
Remove previous hack to avoid sending platform information to multibuild

Change-type: minor
Signed-off-by: Paul Jonathan <[email protected]>
See: #1508
toochevere pushed a commit that referenced this issue Sep 16, 2021
Bump version of balena-multibuild to the one that supports multiarch
Remove previous hack to avoid sending platform information to multibuild

Change-type: minor
Signed-off-by: Paul Jonathan <[email protected]>
See: #1508
toochevere pushed a commit that referenced this issue Sep 16, 2021
Bump version of balena-multibuild to the one that supports multiarch
Remove previous hack to avoid sending platform information to multibuild

Change-type: minor
Signed-off-by: Paul Jonathan <[email protected]>
See: #1508
toochevere pushed a commit that referenced this issue Sep 17, 2021
Bump version of balena-multibuild to the one that supports multiarch
Remove previous hack to avoid sending platform information to multibuild

Change-type: minor
Signed-off-by: Paul Jonathan <[email protected]>
See: #1508
pdcastro pushed a commit that referenced this issue Sep 17, 2021
Bump version of balena-multibuild to the one that supports multiarch
Remove previous hack to avoid sending platform information to multibuild

Change-type: minor
Signed-off-by: Paul Jonathan <[email protected]>
See: #1508
pdcastro pushed a commit that referenced this issue Sep 17, 2021
Bump version of balena-multibuild to the one that supports multiarch
Remove previous hack to avoid sending platform information to multibuild

Change-type: minor
Signed-off-by: Paul Jonathan <[email protected]>
See: #1508
toochevere pushed a commit that referenced this issue Sep 23, 2021
Bump version of balena-multibuild to the one that supports multiarch
Remove previous hack to avoid sending platform information to multibuild

Change-type: minor
Signed-off-by: Paul Jonathan <[email protected]>
See: #1508
@ghost ghost closed this as completed in #2301 Sep 23, 2021
@pdcastro
Copy link
Contributor

At long last, this issue was resolved in CLI v12.49.0. 🎉

@maggie44
Copy link

At long last, this issue was resolved in CLI v12.49.0. 🎉

What exactly has been implemented, and how far does it go? It now has full multi arch image support so we no longer need to specify the sha? Does this apply for cloud builds or just local pushes? Does this mean there will be a gradual transition away from using the %%BALENA_ARCH%% string and we can just specify the image name?

@toochevere
Copy link
Contributor

At long last, this issue was resolved in CLI v12.49.0. tada

What exactly has been implemented, and how far does it go? It now has full multi arch image support so we no longer need to specify the sha? Does this apply for cloud builds or just local pushes? Does this mean there will be a gradual transition away from using the %%BALENA_ARCH%% string and we can just specify the image name?

@Maggie0002 See answers below:

What exactly has been implemented, and how far does it go?

These changes allow support for manifest lists when doing a CLI build where there is more than one architecture. What does this mean?

  • If you are targeting a base image that has only one possible architecture (application/vnd.docker.distribution.manifest.v1+prettyjws or application/vnd.docker.distribution.manifest.v2+json media type in the manifest) then things are basically the same. The architecture of the image is what it is.
  • If you are targeting a base image that has multiple possible architectures (application/vnd.docker.distribution.manifest.list.v2+json media type in the manifest), for example busybox, then the matching architecture is selected.
    Be careful about the case of mixing manifest lists with single arch images. There are some edge cases where this could create a build that Docker does not know how to resolve correctly, and which end up in an exec error. At the moment, this case will produce a warning. In the future, we hope to address this case as well.

It now has full multi arch image support so we no longer need to specify the sha?

That's the idea.

Does this mean there will be a gradual transition away from using the %%BALENA_ARCH%% string and we can just specify the image name?

  • The %%BALENA_ARCH%% variable is still supported. There are no plans to deprecate it.
  • At the moment Balena is not using manifest lists. That may be something added in the future. I will bring up this possibility in a Product discussion, but it would likely not be on the near-term roadmap.

NOTE: These changes affect the CLI, they are not yet implemented in the cloud builder. But that should follow within the next few weeks since the bulk of the important changes are in code that is shared between them.

@pdcastro
Copy link
Contributor

pdcastro commented Sep 24, 2021

Be careful about the case of mixing manifest lists with single arch images.

To clarify, this is the case of a multi-stage Dockerfile with multiple FROM lines, where some FROM lines use multiarch images and other FROM lines use single-arch images.

There is no problem with the case of multiple services (in a docker-compose.yml file) where some services use single-arch images and other services use multiarch images.

they are not yet implemented in the cloud builder

From balena CLI users' point of view, I understand that balena push <myFleet> (push to the balenaCloud builder) has had support for multiarch images for a year or so. What was pending was adding similar support to balena build and balena deploy (this GitHub issue). The codebase unification between the balenaCloud builder and the balena CLI is just an implementation detail. :-)

@maggie44
Copy link

maggie44 commented Sep 24, 2021

Thanks for the clarifications.

To avoid leaving a question without an explanation, I am sure there are plenty of reasons for why someone would want the balenalib images to be multi arch, but mine include:

  1. Navigating them on Docker Hub is a nightmare. Hunting down the dates for fixed images is a real pain point. Partly due to Docker Hub not having a useful interface for searching, but also because of the sheer number of images.
  2. There are a whole bunch of images missing for different architectures, I frequently run into issues: Missing Base Images for some Architectures balena-io-library/base-images#696 & https://forums.balena.io/t/missing-images-from-balenalib/319708/30
  3. In my development environments, I would like the developers to be working within the Balena images to ensure consistency between the dev and production environments. But that means when building a development environment specifying only one architecture in the Dockerfiles because of the lack of multiarch images. Which then means that all the developers have to use the same system or manually change the Dockerfile to suit their system (which then gets accidentally committed to GitHub). This was always an issue, but not as bad because the vast majority of people used MacBooks so I could specify that image. With the introduction of the new MacBook M1's on a different architecture, now people are spread out much more between to different architectures and the issue keeps growing. For now I have specified official images instead of Balena's, which are multi arch and is good enough but not ideal.
  4. The obvious, cleaner Dockerfiles and smoother workflows.

Best case scenario, would be the shift towards using multi-arch images instead of having to specify the variables like %%BALENA_ARCH%%, which it seems is now doable, but obviously a big shift in workflow for many people so understandably a more gradual process.

Short term, it would be nice if the balenalib could start including a multi-arch image, alongside all the current ones, While not resolving some of the above issues, it seems like it would be an additional build process that could be added in without becoming a breaking change. It could allow those willing to gradually start moving over to that system, and would resolve my missing images issue I keep coming across, and dev env issue.

Some food for thought to throw into your product discussion hat.

Thanks again.

@pdcastro
Copy link
Contributor

pdcastro commented Sep 24, 2021

Thanks for the feedback @Maggie0002 👍   There is something I would like to understand better. Balenalib images are available for not only multiple CPU architectures, but also multiple device types including development boards, for example:

FROM balenalib/raspberrypi3-debian
FROM balenalib/fincm3-debian
FROM balenalib/jetson-nano-debian
FROM balenalib/jetson-xavier-nx-devkit-seeed-2mic-hat-debian
etc.

Where the images are not only specific to a CPU architecture like ARM v7, but also specific to the on-board peripherals of the device type or development board, including possibly additional device drivers, libraries, co-processor support, etc.

So when you say you would like balenalib images to be multi arch, two alternative meanings come to my mind:

  1. A large, single FROM balenalib/debian image (to continue the example above) that included device drivers, libraries etc for all supported device types and development boards.
  2. Having the same set of balenalib images as currently exist, one for each device type / dev board, but having each of those be available for multiple CPU architectures so that you could, for example, execute code that is non-peripheral-hardware-specific on a developer's laptop (Intel or Apple silicon), cloud CI environment and the device itself. For example, balenalib/raspberrypi3-debian would be a multi arch image that supported the Raspberry Pi 3 architecture (ARM v7) but also Apple Silicon for a MacBook laptop and amd64 for a cloud server / CI environment.

I suspect option 1 would be opposed by balena's team because image sizes would grow significantly (e.g. Jetson devices require gigabytes of JetPack libraries) and might require extra logic to choose which device driver to load. As for option 2, it sounds interesting, even though it would multiply the already sheer number of "actual images" (counting each architecture the image is available for) by, hmm, maybe 3 (device's own architecture plus amd64 plus Apple Silicon), which I guess is doable.

Or maybe you had something else in mind altogether?

@maggie44
Copy link

maggie44 commented Sep 24, 2021

I didn’t give very good descriptions there, apologies.

I choose not to use the name of the board for an image, such as raspberrypi3-Debian as it includes a bunch of device specific drivers that aren’t of use to me. I prefer smaller image sizes so opt for the arch specific images. So an example from my Dockerfile is:

balenalib/%%BALENA_ARCH%%-alpine-python:3.8.10-3.13-20210603

Which would resolve on a raspberry pi 4 to something like:

balenalib/aarch64-alpine-python:3.8.10-3.13-20210603

I then use that same image for all devices that are aarch64, whether they be raspberry pi or orange pi etc etc all attached to the same fleet.

With support for multi arch images in the cloud and now CLI, an image with a name like balenalib/alpine-python:3.8.10-3.13-20210603 could now resolve to aarch64 by itself. Or on my developers computers, to amd64, or whatever is needed. All in one tidy package.

Indeed it was thought it could be in parallel to the current images to avoid it being a breaking change, which would initially ballon the amount of images but I imagine many people using the BALENA_ARCH like me would gradually switch and an option in the future may present itself to reduce the number of images.

Assuming my understanding of the images is correct, it’s a little surprising that images specified by name of board rather than by arch is the default considering the cost to image size and Balena’s goals of being as lean as possible.

@pdcastro
Copy link
Contributor

@toochevere FYI, I've added this to our tracking system: (restricted access)
https://jel.ly.fish/pattern-user-balenalib-images-multiarch-ccc7d37

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment