Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use manylinux Docker image for aarch64-linux #68

Closed
wants to merge 5 commits into from

Conversation

stanhu
Copy link

@stanhu stanhu commented Apr 11, 2022

Previously Ubuntu 20.04 was used to build aarch64-linux, which
caused native gems to require glibc v2.29 or higher. However, this
prevented native gems from being used on Amazon Linux 2, Debian 10,
and more.

To improve this, let's use the manylinux 2014 image as we do with the
x86 builds. Since this image is based off CentOS 7, which uses glibc
2.16, this significantly improves the compatiblity of ARM64 builds.

Relates to sparklemotion/nokogiri#2470

@stanhu stanhu force-pushed the sh-use-manylinux-aarch64 branch 3 times, most recently from af02ff4 to 4f0a9a7 Compare April 11, 2022 05:08
@stanhu stanhu changed the title Use ManyLinux Docker image for aarch64-linux Use manylinux Docker image for aarch64-linux Apr 11, 2022
stanhu added 2 commits April 10, 2022 22:09
Previously Ubuntu 20.04 was used to build `aarch64-linux`, which
caused native gems to require glibc v2.29 or higher. However, this
prevented native gems from being used on Amazon Linux 2, Debian 10,
and more.

To improve this, let's use the manylinux 2014 image as we do with the
x86 builds. Since this image is based off CentOS 7, which uses glibc
2.16, this significantly improves the compatiblity of ARM64 builds.

Relates to sparklemotion/nokogiri#2470
1fd116a added special case handling of aarch64-linux, but now that
we're using a manylinux image we can drop this.
@stanhu stanhu force-pushed the sh-use-manylinux-aarch64 branch from 4f0a9a7 to d9adb49 Compare April 11, 2022 05:10
@flavorjones
Copy link
Collaborator

I've kicked off CI.

Looking at the commit history, I think that I erred by not using manylinux when adding aarch64-linux in c5358ea (it was already in use at that time for all the other linux builds (since 8e85ee8)).

And now I wonder if we should use manylinux for arm-linux as well ... let's tackle that in a separate PR.

@flavorjones
Copy link
Collaborator

So we can see the challenge in the failed build, @stanhu -- the manylinux aarch64 docker image contains aarch64 binaries, which won't run on x86 hosts (like github actions and the majority of linux dev machines as of 2022-04). All the other docker images will run on x86 hosts.

I have this same problem running on my dev machine. I'm curious if you tested this locally? Were you able to get it to work without using qemu?

I think it might be possible to package multiarch/qemu-user-static into this Docker container, if you're up for it.

@stanhu
Copy link
Author

stanhu commented Apr 11, 2022

@flavorjones Yeah, I wasn't able to build on my x86 Linux machine, but I think this image built fine on my Apple M1 machine?

I've attempted to build under QEMU for this platform in 53d541c. Is there any way I can kick off CI for my own testing here? UPDATE: Oh, I see you have to approve. 😄

@stanhu stanhu force-pushed the sh-use-manylinux-aarch64 branch 2 times, most recently from 44ef7c2 to 53d541c Compare April 11, 2022 17:04
- name: Build docker image
run: |
docker buildx create --driver docker-container --use
bundle exec rake build:${PLATFORM} RCD_DOCKER_BUILD="docker buildx build --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load"
bundle exec rake build:${PLATFORM} RCD_DOCKER_BUILD="docker buildx build ${DOCKER_BUILDX_ARGS} --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load"
Copy link
Author

@stanhu stanhu Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this doesn't help developers trying to run a local bundle exec rake build. Maybe we should push this change down into the Rakefile?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I'm getting this locally:

$ bundle exec rake build:aarch64-linux  RCD_DOCKER_BUILD="docker buildx build --platform=linux/arm64 --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load"
docker buildx build --platform=linux/arm64 --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load -f tmp/docker/common-Dockerfile.mri.aarch64-linux .
[+] Building 3.1s (11/38)
 => [internal] load build definition from common-Dockerfile.mri.aarch64-linux                                                                                                                                                                0.0s
 => => transferring dockerfile: 6.85kB                                                                                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                            0.0s
 => => transferring context: 2B                                                                                                                                                                                                              0.0s
 => [internal] load metadata for quay.io/pypa/manylinux2014_aarch64:latest                                                                                                                                                                   0.1s
 => [internal] load build context                                                                                                                                                                                                            0.0s
 => => transferring context: 663B                                                                                                                                                                                                            0.0s
 => [ 1/34] FROM quay.io/pypa/manylinux2014_aarch64@sha256:8fcd071d89dab8043aaf97360662d84833c2af54cb65bffcf05cd0ed2fda4b06                                                                                                                  0.0s
 => => resolve quay.io/pypa/manylinux2014_aarch64@sha256:8fcd071d89dab8043aaf97360662d84833c2af54cb65bffcf05cd0ed2fda4b06                                                                                                                    0.0s
 => CACHED [ 2/34] RUN yum install -y autoconf gcc-c++ libtool readline-devel sqlite-devel ruby openssl-devel xz cmake sudo less libffi-devel git wget                                                                                       0.0s
 => CACHED [ 3/34] RUN rm -f /usr/local/bin/sudo &&     groupadd -r sudo &&     echo "%sudo  ALL=(ALL)       ALL" >> /etc/sudoers                                                                                                            0.0s
 => CACHED [ 4/34] RUN groupadd -r rvm && useradd -r -g rvm -G sudo -p "" --create-home rvm                                                                                                                                                  0.0s
 => CACHED [ 5/34] RUN echo "source /etc/profile.d/rvm.sh" >> /etc/rubybashrc &&     echo "source /etc/rubybashrc" >> /etc/bashrc &&     echo "source /etc/rubybashrc" >> /etc/bash.bashrc                                                   0.0s
 => CACHED [ 6/34] RUN mkdir ~/.gnupg &&     chmod 700 ~/.gnupg &&     echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf                                                                                                                          0.0s
 => ERROR [ 7/34] RUN gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB &&     (curl -L http://get.rvm.io | sudo bash) &&     bash -c "         sour  2.6s
------
 > [ 7/34] RUN gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB &&     (curl -L http://get.rvm.io | sudo bash) &&     bash -c "         source /etc/rubybashrc &&         rvm autolibs disable &&         rvmsudo rvm cleanup all ":
#0 0.225 gpg: keyring `/home/rvm/.gnupg/secring.gpg' created
#0 0.229 gpg: keyring `/home/rvm/.gnupg/pubring.gpg' created
#0 0.237 gpg: requesting key D39DC0E3 from hkp server keyserver.ubuntu.com
#0 0.237 gpg: requesting key 39499BDB from hkp server keyserver.ubuntu.com
#0 1.260 gpg: /home/rvm/.gnupg/trustdb.gpg: trustdb created
#0 1.274 gpg: key D39DC0E3: public key "Michal Papis (RVM signing) <[email protected]>" imported
#0 1.288 gpg: key 39499BDB: public key "Piotr Kuczynski <[email protected]>" imported
#0 1.303 gpg: no ultimately trusted keys found
#0 1.304 gpg: Total number processed: 2
#0 1.304 gpg:               imported: 2  (RSA: 2)
#0 1.407 /etc/rubybashrc: line 1: /etc/profile.d/rvm.sh: No such file or directory
#0 1.417   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#0 1.423                                  Dload  Upload   Total   Spent    Left  Speed
100   194  100   194    0     0    607      0 --:--:-- --:--:-- --:--:--   664
 16 24535   16  4113    0     0   3621      0  0:00:06  0:00:01  0:00:05 27604
#0 2.529 curl: (23) Failed writing body (1354 != 1371)
------
WARNING: local cache import at tmp/build-cache not found due to err: could not read tmp/build-cache/index.json: open tmp/build-cache/index.json: no such file or directory
common-Dockerfile.mri.aarch64-linux:26
--------------------
  25 |     # install rvm, RVM 1.26.0+ has signed releases, source rvm for usage outside of package scripts
  26 | >>> RUN gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB && \
  27 | >>>     (curl -L http://get.rvm.io | sudo bash) && \
  28 | >>>     bash -c " \
  29 | >>>         source /etc/rubybashrc && \
  30 | >>>         rvm autolibs disable && \
  31 | >>>         rvmsudo rvm cleanup all "
  32 |
--------------------
error: failed to solve: process "/bin/sh -c gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB &&     (curl -L http://get.rvm.io | sudo bash) &&     bash -c \"         source /etc/rubybashrc &&         rvm autolibs disable &&         rvmsudo rvm cleanup all \"" did not complete successfully: exit code: 1
rake aborted!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works if I save the curl output to a file:

curl -L -o /tmp/rvm.sh
bash /tmp/rvm.sh

Copy link
Author

@stanhu stanhu Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly sudo is not working on arm64:

$ docker run --platform linux/arm64 -it quay.io/pypa/manylinux2014_aarch64
<snip>
[root@6462ad36c7cf /]#  rm -f /usr/local/bin/sudo && \
>     groupadd -r sudo && \
>     echo "%sudo  ALL=(ALL)       ALL" >> /etc/sudoers
[root@6462ad36c7cf /]#  groupadd -r rvm && useradd -r -g rvm -G sudo -p "" --create-home rvm
[root@6462ad36c7cf /]# su rvm
sud[rvm@6462ad36c7cf /]$ sudo
Error while loading /usr/bin/sudo: Permission denied
[rvm@6462ad36c7cf /]$ ls -al /usr/bin/sudo
---s--x--x 1 root root 204648 Oct 14 12:43 /usr/bin/sudo
[rvm@6462ad36c7cf /]$ id
uid=999(rvm) gid=996(rvm) groups=996(rvm),997(sudo)

Works fine on x86:

$ docker run -it quay.io/pypa/manylinux2014_x86_64 bash
<snip>
[rvm@91b718d778e2 /]$ which sudo
/opt/rh/devtoolset-10/root/usr/bin/sudo
[rvm@91b718d778e2 /]$ ls -al /usr/bin/sudo
---s--x--x 1 root root 151424 Oct 14 12:28 /usr/bin/sudo
[rvm@91b718d778e2 /]$ sudo echo "hi"
hi
id[rvm@91b718d778e2 log]$ id
uid=999(rvm) gid=996(rvm) groups=996(rvm),997(sudo)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my Apple M1, the same sudo command works fine with the quay.io/pypa/manylinux2014_aarch64 image. This makes me wonder if this is just a QEMU issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9537a81 should enable the binfmt_misc OC flags to make sudo work under aarch64-linux.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/rake-compiler/rake-compiler-dock/runs/5985250394?check_suite_focus=true is running, but it's pretty slow. Feels like we should be using a native arm64 machine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stanhu It's a good question (about sudo). I haven't been present for the whole lifetime of RCD, but the use of an rvm user dates back to 66f33ef. That commit also introduced most of the usage of sudo in the Dockerfile. I'm not sure why it's not being run by the root account, maybe @larskanis remembers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we should be using a native arm64 machine

Unfortunately, github actions doesn't provide a native amd64 architecture today.

@flavorjones
Copy link
Collaborator

I kicked off CI again (because you're a first-time contributor, the repo's CI setting requires me to push a button).

@stanhu stanhu force-pushed the sh-use-manylinux-aarch64 branch from 53d541c to 07d69a3 Compare April 11, 2022 22:13
This makes it possible for sudo to work with the `rvm` user.  Without
the OC binfmt flags, `qemu-arm-static` prevents running setuid
executables for non-root users.
@flavorjones
Copy link
Collaborator

flavorjones commented Apr 12, 2022

@stanhu It looks like the docker image builds OK with the current changes ... but it feels like the changeset may be larger than necessary. On my dev machine, simply running docker run --rm --privileged multiarch/qemu-user-static --reset -p yes --credential yes allows me to build the image (run everything in the Dockerfile).

Is that something you tried on Github Actions in one of the iterations already? It's a much simpler change and so I assume I'm missing something, but wanted to ask.

@flavorjones
Copy link
Collaborator

I want to gut-check with @larskanis about how he feels about needing to run QEMU (or equivalent) in order to build gems in the aarch64-linux container. I personally am a little concerned with how slow gem building is (I'm running a Nokogiri build on my dev machine with the manylinux container) and the additional configuration of QEMU may be challenging for some users.

@stanhu
Copy link
Author

stanhu commented Apr 12, 2022

I want to gut-check with @larskanis about how he feels about needing to run QEMU (or equivalent) in order to build gems in the aarch64-linux container. I personally am a little concerned with how slow gem building is (I'm running a Nokogiri build on my dev machine with the manylinux container) and the additional configuration of QEMU may be challenging for some users.

Yeah, the Docker image under QEMU is taking over an hour to build. This is too long. How does the Ubuntu 20.04 build work here? Is it cross-compiling for aarch64-linux?

@larskanis
Copy link
Member

To be honest, I don't like the approach. Manylinux has a different philosophy: they only target linux and don't cross build, while RCD is all about cross build, for different target platforms, but all running on x86_64-linux. Manylinux also has the downside, that it uses a different distribution (Centos) than the other target platforms (Ubuntu) and so gem projects have to use different commands to do the same things (apt vs. yum) depending on the platform. Manylinux also increases the size of the images a lot and docker layer caching/reusing can not be used. Now we have another downside, in such a way, that it's awfully slow.

So rather than increasing the use of Manylinux I'd like to remove it and instead use an approach similar to the x64-mingw-ucrt image: We build our own cross build toolchain in a separate docker image and copy the necessary files into our distributed image for x86_64 / aarch64 linux platforms.

@stanhu stanhu closed this Apr 12, 2022
@stanhu
Copy link
Author

stanhu commented Apr 12, 2022

Thanks, I closed this since it's way too slow.

So rather than increasing the use of Manylinux I'd like to remove it and instead use an approach similar to the x64-mingw-ucrt image: We build our own cross build toolchain in a separate docker image and copy the necessary files into our distributed image for x86_64 / aarch64 linux platforms.

Is x64-mingw-ucrt is based off Ubuntu 20.04? How does this approach ensure that the binary is compiled with older glibc versions? Is there a reason Ubuntu 18.04 can't be the default instead of 20.04?

@stanhu
Copy link
Author

stanhu commented Jul 12, 2022

@larskanis We got bitten by the glibc v2.29 issue again in sparklemotion/nokogiri#2470. Could you look at my questions above and point me at the x64-mingw-ucrt Dockerfile?

UPDATE: I see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants