-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use manylinux Docker image for aarch64-linux #68
Conversation
af02ff4
to
4f0a9a7
Compare
Previously Ubuntu 20.04 was used to build `aarch64-linux`, which caused native gems to require glibc v2.29 or higher. However, this prevented native gems from being used on Amazon Linux 2, Debian 10, and more. To improve this, let's use the manylinux 2014 image as we do with the x86 builds. Since this image is based off CentOS 7, which uses glibc 2.16, this significantly improves the compatiblity of ARM64 builds. Relates to sparklemotion/nokogiri#2470
1fd116a added special case handling of aarch64-linux, but now that we're using a manylinux image we can drop this.
4f0a9a7
to
d9adb49
Compare
I've kicked off CI. Looking at the commit history, I think that I erred by not using manylinux when adding And now I wonder if we should use manylinux for |
So we can see the challenge in the failed build, @stanhu -- the manylinux aarch64 docker image contains aarch64 binaries, which won't run on x86 hosts (like github actions and the majority of linux dev machines as of 2022-04). All the other docker images will run on x86 hosts. I have this same problem running on my dev machine. I'm curious if you tested this locally? Were you able to get it to work without using qemu? I think it might be possible to package |
@flavorjones Yeah, I wasn't able to build on my x86 Linux machine, but I think this image built fine on my Apple M1 machine? I've attempted to build under QEMU for this platform in 53d541c. Is there any way I can kick off CI for my own testing here? UPDATE: Oh, I see you have to approve. 😄 |
44ef7c2
to
53d541c
Compare
.github/workflows/ci.yml
Outdated
- name: Build docker image | ||
run: | | ||
docker buildx create --driver docker-container --use | ||
bundle exec rake build:${PLATFORM} RCD_DOCKER_BUILD="docker buildx build --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load" | ||
bundle exec rake build:${PLATFORM} RCD_DOCKER_BUILD="docker buildx build ${DOCKER_BUILDX_ARGS} --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this doesn't help developers trying to run a local bundle exec rake build
. Maybe we should push this change down into the Rakefile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why I'm getting this locally:
$ bundle exec rake build:aarch64-linux RCD_DOCKER_BUILD="docker buildx build --platform=linux/arm64 --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load"
docker buildx build --platform=linux/arm64 --cache-from=type=local,src=tmp/build-cache --cache-to=type=local,dest=tmp/build-cache-new --load -f tmp/docker/common-Dockerfile.mri.aarch64-linux .
[+] Building 3.1s (11/38)
=> [internal] load build definition from common-Dockerfile.mri.aarch64-linux 0.0s
=> => transferring dockerfile: 6.85kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for quay.io/pypa/manylinux2014_aarch64:latest 0.1s
=> [internal] load build context 0.0s
=> => transferring context: 663B 0.0s
=> [ 1/34] FROM quay.io/pypa/manylinux2014_aarch64@sha256:8fcd071d89dab8043aaf97360662d84833c2af54cb65bffcf05cd0ed2fda4b06 0.0s
=> => resolve quay.io/pypa/manylinux2014_aarch64@sha256:8fcd071d89dab8043aaf97360662d84833c2af54cb65bffcf05cd0ed2fda4b06 0.0s
=> CACHED [ 2/34] RUN yum install -y autoconf gcc-c++ libtool readline-devel sqlite-devel ruby openssl-devel xz cmake sudo less libffi-devel git wget 0.0s
=> CACHED [ 3/34] RUN rm -f /usr/local/bin/sudo && groupadd -r sudo && echo "%sudo ALL=(ALL) ALL" >> /etc/sudoers 0.0s
=> CACHED [ 4/34] RUN groupadd -r rvm && useradd -r -g rvm -G sudo -p "" --create-home rvm 0.0s
=> CACHED [ 5/34] RUN echo "source /etc/profile.d/rvm.sh" >> /etc/rubybashrc && echo "source /etc/rubybashrc" >> /etc/bashrc && echo "source /etc/rubybashrc" >> /etc/bash.bashrc 0.0s
=> CACHED [ 6/34] RUN mkdir ~/.gnupg && chmod 700 ~/.gnupg && echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf 0.0s
=> ERROR [ 7/34] RUN gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB && (curl -L http://get.rvm.io | sudo bash) && bash -c " sour 2.6s
------
> [ 7/34] RUN gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB && (curl -L http://get.rvm.io | sudo bash) && bash -c " source /etc/rubybashrc && rvm autolibs disable && rvmsudo rvm cleanup all ":
#0 0.225 gpg: keyring `/home/rvm/.gnupg/secring.gpg' created
#0 0.229 gpg: keyring `/home/rvm/.gnupg/pubring.gpg' created
#0 0.237 gpg: requesting key D39DC0E3 from hkp server keyserver.ubuntu.com
#0 0.237 gpg: requesting key 39499BDB from hkp server keyserver.ubuntu.com
#0 1.260 gpg: /home/rvm/.gnupg/trustdb.gpg: trustdb created
#0 1.274 gpg: key D39DC0E3: public key "Michal Papis (RVM signing) <[email protected]>" imported
#0 1.288 gpg: key 39499BDB: public key "Piotr Kuczynski <[email protected]>" imported
#0 1.303 gpg: no ultimately trusted keys found
#0 1.304 gpg: Total number processed: 2
#0 1.304 gpg: imported: 2 (RSA: 2)
#0 1.407 /etc/rubybashrc: line 1: /etc/profile.d/rvm.sh: No such file or directory
#0 1.417 % Total % Received % Xferd Average Speed Time Time Time Current
#0 1.423 Dload Upload Total Spent Left Speed
100 194 100 194 0 0 607 0 --:--:-- --:--:-- --:--:-- 664
16 24535 16 4113 0 0 3621 0 0:00:06 0:00:01 0:00:05 27604
#0 2.529 curl: (23) Failed writing body (1354 != 1371)
------
WARNING: local cache import at tmp/build-cache not found due to err: could not read tmp/build-cache/index.json: open tmp/build-cache/index.json: no such file or directory
common-Dockerfile.mri.aarch64-linux:26
--------------------
25 | # install rvm, RVM 1.26.0+ has signed releases, source rvm for usage outside of package scripts
26 | >>> RUN gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB && \
27 | >>> (curl -L http://get.rvm.io | sudo bash) && \
28 | >>> bash -c " \
29 | >>> source /etc/rubybashrc && \
30 | >>> rvm autolibs disable && \
31 | >>> rvmsudo rvm cleanup all "
32 |
--------------------
error: failed to solve: process "/bin/sh -c gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB && (curl -L http://get.rvm.io | sudo bash) && bash -c \" source /etc/rubybashrc && rvm autolibs disable && rvmsudo rvm cleanup all \"" did not complete successfully: exit code: 1
rake aborted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works if I save the curl
output to a file:
curl -L -o /tmp/rvm.sh
bash /tmp/rvm.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly sudo
is not working on arm64:
$ docker run --platform linux/arm64 -it quay.io/pypa/manylinux2014_aarch64
<snip>
[root@6462ad36c7cf /]# rm -f /usr/local/bin/sudo && \
> groupadd -r sudo && \
> echo "%sudo ALL=(ALL) ALL" >> /etc/sudoers
[root@6462ad36c7cf /]# groupadd -r rvm && useradd -r -g rvm -G sudo -p "" --create-home rvm
[root@6462ad36c7cf /]# su rvm
sud[rvm@6462ad36c7cf /]$ sudo
Error while loading /usr/bin/sudo: Permission denied
[rvm@6462ad36c7cf /]$ ls -al /usr/bin/sudo
---s--x--x 1 root root 204648 Oct 14 12:43 /usr/bin/sudo
[rvm@6462ad36c7cf /]$ id
uid=999(rvm) gid=996(rvm) groups=996(rvm),997(sudo)
Works fine on x86:
$ docker run -it quay.io/pypa/manylinux2014_x86_64 bash
<snip>
[rvm@91b718d778e2 /]$ which sudo
/opt/rh/devtoolset-10/root/usr/bin/sudo
[rvm@91b718d778e2 /]$ ls -al /usr/bin/sudo
---s--x--x 1 root root 151424 Oct 14 12:28 /usr/bin/sudo
[rvm@91b718d778e2 /]$ sudo echo "hi"
hi
id[rvm@91b718d778e2 log]$ id
uid=999(rvm) gid=996(rvm) groups=996(rvm),997(sudo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my Apple M1, the same sudo
command works fine with the quay.io/pypa/manylinux2014_aarch64
image. This makes me wonder if this is just a QEMU issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've validated that enabling the OC flags for /proc/sys/fs/binfmt_misc/qemu-aarch64
fixes the problem. References:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9537a81 should enable the binfmt_misc
OC flags to make sudo
work under aarch64-linux
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/rake-compiler/rake-compiler-dock/runs/5985250394?check_suite_focus=true is running, but it's pretty slow. Feels like we should be using a native arm64 machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stanhu It's a good question (about sudo). I haven't been present for the whole lifetime of RCD, but the use of an rvm
user dates back to 66f33ef. That commit also introduced most of the usage of sudo
in the Dockerfile. I'm not sure why it's not being run by the root
account, maybe @larskanis remembers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should be using a native arm64 machine
Unfortunately, github actions doesn't provide a native amd64 architecture today.
I kicked off CI again (because you're a first-time contributor, the repo's CI setting requires me to push a button). |
53d541c
to
07d69a3
Compare
This makes it possible for sudo to work with the `rvm` user. Without the OC binfmt flags, `qemu-arm-static` prevents running setuid executables for non-root users.
@stanhu It looks like the docker image builds OK with the current changes ... but it feels like the changeset may be larger than necessary. On my dev machine, simply running Is that something you tried on Github Actions in one of the iterations already? It's a much simpler change and so I assume I'm missing something, but wanted to ask. |
I want to gut-check with @larskanis about how he feels about needing to run QEMU (or equivalent) in order to build gems in the aarch64-linux container. I personally am a little concerned with how slow gem building is (I'm running a Nokogiri build on my dev machine with the manylinux container) and the additional configuration of QEMU may be challenging for some users. |
Yeah, the Docker image under QEMU is taking over an hour to build. This is too long. How does the Ubuntu 20.04 build work here? Is it cross-compiling for |
To be honest, I don't like the approach. Manylinux has a different philosophy: they only target linux and don't cross build, while RCD is all about cross build, for different target platforms, but all running on x86_64-linux. Manylinux also has the downside, that it uses a different distribution (Centos) than the other target platforms (Ubuntu) and so gem projects have to use different commands to do the same things (apt vs. yum) depending on the platform. Manylinux also increases the size of the images a lot and docker layer caching/reusing can not be used. Now we have another downside, in such a way, that it's awfully slow. So rather than increasing the use of Manylinux I'd like to remove it and instead use an approach similar to the x64-mingw-ucrt image: We build our own cross build toolchain in a separate docker image and copy the necessary files into our distributed image for x86_64 / aarch64 linux platforms. |
Thanks, I closed this since it's way too slow.
Is |
@larskanis We got bitten by the glibc v2.29 issue again in sparklemotion/nokogiri#2470. Could you look at my questions above UPDATE: I see it. |
Previously Ubuntu 20.04 was used to build
aarch64-linux
, whichcaused native gems to require glibc v2.29 or higher. However, this
prevented native gems from being used on Amazon Linux 2, Debian 10,
and more.
To improve this, let's use the manylinux 2014 image as we do with the
x86 builds. Since this image is based off CentOS 7, which uses glibc
2.16, this significantly improves the compatiblity of ARM64 builds.
Relates to sparklemotion/nokogiri#2470