Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux ARM64 wheel build with QEMU fails because cibuildwheel tries execute container in x86_64 mode #1771

Closed
jonded94 opened this issue Mar 1, 2024 · 10 comments · Fixed by #1961

Comments

@jonded94
Copy link

jonded94 commented Mar 1, 2024

Description

I'm currently working on a pull request (chatnoir-eu/chatnoir-resiliparse#34) enabling Linux ARM64 wheel builds for a certain package.

They use a special manylinux container since they need some other dependencies available in their build environment. I already prepared containers for both x86_64 and aarch64 that are available here (built from this Dockerfile and this workflow):

Trying both of these out locally works without a problem; I'm working on a Linux x86_64 machine and ran the aarch64 docker container through QEMU.

The wheel builds on Linux x86_64 work without a problem, but unfortunately the aarch64 one crashes: https://github.com/jonded94/chatnoir-resiliparse/actions/runs/8109883876/job/22165995057

Starting container image ghcr.io/jonded94/resiliparse-manylinux_2_28_aarch64...
  
  info: This container will host the build for cp38-manylinux_aarch64, cp39-manylinux_aarch64, cp310-manylinux_aarch64, cp311-manylinux_aarch64, cp312-manylinux_aarch64...
  Unable to find image 'ghcr.io/jonded94/resiliparse-manylinux_2_28_aarch64:latest' locally
  latest: Pulling from jonded94/resiliparse-manylinux_2_28_aarch64
  no matching manifest for linux/amd64 in the manifest list entries

It's correctly detecting that it wants to build aarch64 wheels now, correctly chooses the resiliparse-manylinux_2_28_aarch64 image but apparently tries to execute it in x86_64 mode: no matching manifest for linux/amd64 in the manifest list entries. This is also apparent through the executed docker command: ['docker', 'create', '--env=CIBUILDWHEEL', '--env=SOURCE_DATE_EPOCH', '--name=cibuildwheel-b6474ae9-7c52-468f-96c7-d91c73f4ae12', '--interactive', '--volume=/:/host', 'ghcr.io/jonded94/resiliparse-manylinux_2_28_aarch64', '/bin/bash']. Here, it should say --platform linux/arm64/v8 but that's missing.

The workflow that executes these wheel builds is this one here: https://github.com/jonded94/chatnoir-resiliparse/blob/develop/.github/workflows/build-wheels.yml

Build log

https://github.com/jonded94/chatnoir-resiliparse/actions/runs/8109883876/job/22165995057

CI config

https://github.com/jonded94/chatnoir-resiliparse/blob/develop/.github/workflows/build-wheels.yml

@joerick
Copy link
Contributor

joerick commented Mar 1, 2024

I'm not sure why docker wouldn't try to run this image under emulation - that's what it does with the manylinux images, which are invoked the exact same way.

Others might know more.

I can offer a workaround though, CIBW_CONTAINER_ENGINE lets you specify extra arguments to the create call. This is a global option, so you'd have to invoke cibuildwheel twice, but you could do something like

# build native wheels
cibuildwheel --archs auto .
# build aarch64
export CIBW_CONTAINER_ENGINE="docker; create_args: --platform linux/arm64/v8"
cibuildwheel --archs aarch64 .

@jonded94
Copy link
Author

jonded94 commented Mar 1, 2024

Thanks for the extremely fast response! :)

So I tried it with this bash script with the default image (compilation would fail, but I want to see if at least the docker container works properly):

#!/usr/bin/env bash

export CIBW_SKIP="*-musllinux*"
export CIBW_ARCHS_LINUX="aarch64"
export CIBW_TEST_SKIP="*-manylinux_aarch64"

pipx run cibuildwheel --platform linux --archs aarch64 fastwarc
Build options:
  platform: linux
  architectures: aarch64
  [...]
  manylinux_images: 
    aarch64: quay.io/pypa/manylinux2014_aarch64:2024-01-23-12ffabc

And even there it says for me

Status: Downloaded newer image for quay.io/pypa/manylinux2014_aarch64:2024-01-23-12ffabc
WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested

So it's using the offical image, the correct aarch64 also, but still tries to execute it as an amd image? EDIT: To clarify, it was not crashing immediatly as in the Github Action. It just won't be able to actually build the wheel later down the line because build dependencies are missing in the official image. As @joerick pointed out, it's able to still run this image because binfmt_misc will transparently run it through QEMU. But the important thing here is that the Github Action crashes instead.

With your hint export CIBW_CONTAINER_ENGINE="docker; create_args: --platform linux/arm64/v8" that error vanishes!

Is there any way to fix this without having to introduce separate build steps with this ~"hack"? I'd like that because in this job it's actually doing a 2 package (fastwarc + resiliparse), 3 OS (linux, windows, macos), 2 architectures (amd64,arm64) build and I don't want to introduce verbosity/complexity here.. :/

Interesting sideinfo:

$ docker run --rm quay.io/pypa/manylinux2014_aarch64:2024-01-23-12ffabc uname -m
WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested
aarch64
$ docker run --platform linux/arm64/v8 --rm quay.io/pypa/manylinux2014_aarch64:2024-01-23-12ffabc uname -m                                                                                                                                            
aarch64

EDIT: I guess something's going wrong here? :/ (i.e. it's missing --platform linux/arm64 not statically but it has to be determined dynamically depending on image type?) https://github.com/pypa/cibuildwheel/blob/main/cibuildwheel/linux.py#L438

@joerick
Copy link
Contributor

joerick commented Mar 1, 2024

So it's using the offical image, the correct aarch64 also, but still tries to execute it as an amd image?

I think that warning is saying that Docker is kinda confused that the image is a different arch to the machine arch. But it tries to run it anyway, and thanks to the binfmt_misc feature of the Linux kernel, it is emulated via QEMU.

The confusing part here is that the behaviour is different on your image.

Having said that, there is an argument that it's kinda a bug smell that Docker is producing this warning in normal use. I also don't see much downside in cibuildwheel adding a --platform linux/something argument to the docker create call - it shouldn't reduce flexibility, since we're already hardcoding the architecture through the archs option. Let's see if other maintainers agree, but a PR to manually specify this would be okay as far as I'm concerned.

@mayeut is our resident manylinux expert, let's see what he thinks...

@jonded94
Copy link
Author

jonded94 commented Mar 1, 2024

Note that my example above was just forcing a local Linux & ARM64 build; in the actual repository, the one github action should ideally be able to cope with Linux/Mac/Windows + Arm64/AMD64. That's why I at least think one can not easily introduce a hardcoded --platform linux/arm64/v8 parameter without affecting all the other builds and/or having to build quite a bit of verbosity around it.

I don't know exactly why the particular custom-built image is such a problem (if it is?). Adjusting the script above slightly to use that:

#!/usr/bin/env bash

export CIBW_SKIP="*-musllinux*"
export CIBW_MANYLINUX_AARCH64_IMAGE=ghcr.io/jonded94/resiliparse-manylinux_2_28_aarch64
export CIBW_ARCHS_LINUX="aarch64"
export CIBW_TEST_SKIP="*-manylinux_aarch64"

pipx run cibuildwheel --platform linux fastwarc

This will result in

Build options:
  platform: linux
  architectures: aarch64
  manylinux_images: abc
    aarch64: ghcr.io/jonded94/resiliparse-manylinux_2_28_aarch64
  [...]
Starting container image ghcr.io/jonded94/resiliparse-manylinux_2_28_aarch64...

info: This container will host the build for cp38-manylinux_aarch64, cp39-manylinux_aarch64, cp310-manylinux_aarch64, cp311-manylinux_aarch64, cp312-manylinux_aarch64...
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested
[...]
Building cp38-manylinux_aarch64 wheel
[...]
Building wheel...
[...]
Compiling fastwarc/warc.pyx because it changed.
Compiling fastwarc/stream_io.pyx because it changed.
Compiling fastwarc/tools.pyx because it changed.

So locally, it just works. For some reason, the github action does have problems.

@jonded94
Copy link
Author

jonded94 commented Mar 9, 2024

Inspecting the manifest (which docker will do to determine which image to actually pull), the one we built in a github action yields this:

$ docker manifest inspect ghcr.io/chatnoir-eu/resiliparse-manylinux_2_28_aarch64:latest
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.oci.image.index.v1+json",
   "manifests": [
      {
         "mediaType": "application/vnd.oci.image.manifest.v1+json",
         "size": 5050,
         "digest": "sha256:62402018c6a2ff5f14f144289883b38de9a197beb8ffb221b36385652ba75aa1",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      [...] 
   ]
}

Doing that on the official manylinux image interestingly shows no platform information at all:

$ docker manifest inspect quay.io/pypa/manylinux_2_28_aarch64:latest
{
        "schemaVersion": 2,
        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
        "config": {
                "mediaType": "application/vnd.docker.container.image.v1+json",
                "size": 11705,
                "digest": "sha256:3fcf30885f8c2eeb0d36dfc173db53522668267b78183c35231ce7f9b8d0771e"
        },
        "layers": [
                {
                        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                        "size": 70937092,
                        "digest": "sha256:f2f48c0b14d161531648c44c8d55fb85968c3715156e8ee10e4926fa9763d8ce"
                },
                [...] # just bunch and bunch of layers
}

So the diff is:

  • Custom github image does say which architecture and os it belongs to
  • Official image does not, but on the other hand does export information about its layers

I think the point about the layer information could be because we left out all the metadata annotation (https://github.com/docker/metadata-action) but I'm unsure here; shouldn't be relevant after all.

Could it be that using the official image works because docker will "blindly" pull that image (given there is only one image after all and that one does not even platform annotations) and only at execution time, through binfmt, will execute it using the right platform? While on the other hand, on custom built images that have the platform annotated explicitly, docker pull already fails because it can not find any matching image?

If that's the case, this is somewhat of a big missing feature since all custom built images with platform information could then lead to problems?

@joerick
Copy link
Contributor

joerick commented Mar 15, 2024

I've just pushed #1792 to provide a better workaround for this issue. I think I'd like to see some evidence that it's a wider issue than just you before we make any bigger changes. The setting is pretty widely used.

@phoerious
Copy link

It's certainly reproducible on my GitHub actions pipeline.

@bryanwweber
Copy link

I've also just run into this issue here with a custom manylinux image: https://github.com/Cantera/pypi-packages/actions/runs/9979506048/job/27589540410#step:6:184 Let me know if I can help debug!

@mayeut
Copy link
Member

mayeut commented Jul 21, 2024

It seems the difference is that the custom images are using a manifest list but manylinux images are not.
The architecture is indeed present in manylinux images. You need the verbose flag to see it:

manylinux % docker manifest inspect --verbose quay.io/pypa/manylinux_2_28_aarch64:latest
{
	"Ref": "quay.io/pypa/manylinux_2_28_aarch64:latest",
	"Descriptor": {
		"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
		"digest": "sha256:38c1bb53de43f9067ba4eb696898dae520d153fdca62e578325a7be4248551fd",
		"size": 4499,
		"platform": {
			"architecture": "arm64",
			"os": "linux"
		}
	},
...

The issue might thus be multi-arch images (even if only one arch listed).

@jfolz
Copy link

jfolz commented Jul 22, 2024

I also just ran into this with the aarch64 image from here.
At least for me, this looks like it's an issue with the images or container engine, as trying to run the image directly confuses podman, i.e. it cannot find an amd64 arch image under what should be aarch64.

Manifest for completeness:

$ podman manifest inspect --verbose ghcr.io/musicalninjas/quay.io/pypa/manylinux2014_aarch64-rust:latest
{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "manifests": [
        {
            "mediaType": "application/vnd.oci.image.manifest.v1+json",
            "size": 4483,
            "digest": "sha256:03a096219a06aae4c69f82b6bf2141cbedf9a0b854a9ce0ebcbc2837b8beadbf",
            "platform": {
                "architecture": "arm64",
                "os": "linux"
            }
        },
        {
            "mediaType": "application/vnd.oci.image.manifest.v1+json",
            "size": 567,
            "digest": "sha256:e71141c11662baeb0a319cc1e2d15ad06cf732aea78ece0176ef591c2e0dda2f",
            "platform": {
                "architecture": "unknown",
                "os": "unknown"
            }
        }
    ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants