Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault while compiling 24.2 release for linux/amd64 #5625

Closed
AndrewDryga opened this issue Jan 20, 2022 · 8 comments
Closed

Segfault while compiling 24.2 release for linux/amd64 #5625

AndrewDryga opened this issue Jan 20, 2022 · 8 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@AndrewDryga
Copy link
Contributor

AndrewDryga commented Jan 20, 2022

Describe the bug
While trying to update our container to Erlang 24.2 we got into a situation where it will never build due to a segfault:

#10 1072.2 === Leaving application tftp
#10 1072.2 make[2]: Leaving directory '/usr/src/otp_src_24.2/lib/tftp'
#10 1072.2 make[1]: Leaving directory '/usr/src/otp_src_24.2/lib'
#10 1072.2 make[1]: Entering directory '/usr/src/otp_src_24.2/erts'
#10 1072.3 make[2]: Entering directory '/usr/src/otp_src_24.2/erts/start_scripts'
#10 1072.4  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_clean.rel
#10 1072.4  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_sasl.rel
#10 1072.5  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_all_example.rel
#10 1072.5  GEN	/usr/src/otp_src_24.2/erts/start_scripts/no_dot_erlang.rel
#10 1072.5  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_clean.script
#10 1072.8 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#10 1072.8 make[2]: *** [Makefile:84: /usr/src/otp_src_24.2/erts/start_scripts/start_clean.script] Segmentation fault
#10 1072.8 make[2]: Leaving directory '/usr/src/otp_src_24.2/erts/start_scripts'
#10 1072.8 make[1]: *** [Makefile:67: local_setup] Error 2
#10 1072.8 make[1]: Leaving directory '/usr/src/otp_src_24.2/erts'
#10 1072.8 make: *** [Makefile:1070: local_setup] Error 2
------
Dockerfile:26

The host used to compile the Erlang is running M1 chip. The compilation would only crash for linux/amd64 target and it was not happening for Erlang 23.3.4.10 (everything else was the same).

To Reproduce
Dockerfile can be found here: https://github.com/Nebo15/alpine-erlang/blob/master/Dockerfile

docker buildx use --create
docker buildx build --platform=linux/386,linux/amd64,linux/arm64 --tag "nebo15/alpine-erlang:24.2" ./

Any hints if this is an Erlang thing or maybe we should dig into qemu issues in Alpine 3.15 and M1 Mac? (For the official Docker container Alpine 3.15 works just fine.)

@AndrewDryga AndrewDryga added the bug Issue is reported as a bug label Jan 20, 2022
@garazdawi
Copy link
Contributor

When https://github.com/esl/packages was being built, I have a vague recollection of being asked about a similar problem which in the end turned out to be something about docker and qemu. It is a bit odd that it managed to build all .beam files but segfaults when the boot scripts are built, but from what I recall that was the same place as the other fault was.

Maybe you can have a look at esl/packages and see if there are any hints about what to do about it there.

@AndrewDryga
Copy link
Contributor Author

@rnewson hi 👋, I'm sorry for poking you out of random but I checked your Dockerfiles and they don't give any tricks to work around segfaults, maybe you can shine some light on this issue?

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Jan 24, 2022
@rnewson
Copy link

rnewson commented Jan 24, 2022

Hi, Sorry I don't recall having to work around segfaults (or encountering any). The esl/packages scripts for erlang use docker buildx and cross compilation (that is, they always build on the host architecture, for performance reasons). So perhaps avoiding emulation helps avoid some causes of segfaults during compilation?

@rnewson
Copy link

rnewson commented Jan 24, 2022

glancing at your dockerfile and it seems you might be attempted cross compilation too. A mistake there could explain things. In yours you set --build and in mine I set --host. I think the definitions of those terms, for GCC cross compile, are a little unintuitive;

the build platform on which the compilation is performed, and the host platform on which the resulting executable is expected to run

I found that the cross compilation toolkit did the right thing if I told it only the target system (with --host). You control the build architecture with how you invoke docker.

@AndrewDryga
Copy link
Contributor Author

Thank you @rnewson, I'll give it a try ❤️.

@AndrewDryga
Copy link
Contributor Author

Unfortunately, using --host did not work (same as using both host and build options). I've noticed there is another segfault during compilation that did not stop the compilation process (which is weird):

#18 2160.0 === Leaving application eunit
#18 2160.0 make[2]: Leaving directory '/usr/src/otp_src_24.2/lib/eunit'
#18 2160.0 make[2]: Entering directory '/usr/src/otp_src_24.2/lib/ssh'
#18 2160.1 === Entering application ssh
#18 2160.1 make[3]: Entering directory '/usr/src/otp_src_24.2/lib/ssh/src'
#18 2160.2 SED x86_64-pc-linux-musl /usr/src/otp_src_24.2
#18 2160.3  GEN	/usr/src/otp_src_24.2/lib/ssh/src/deps/ssh.d
#18 2160.6 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#18 2160.6 Segmentation fault
#18 2160.8 make[3]: Nothing to be done for 'opt'.
#18 2160.8 make[3]: Leaving directory '/usr/src/otp_src_24.2/lib/ssh/src'
#18 2160.8 make[3]: Entering directory '/usr/src/otp_src_24.2/lib/ssh/doc/src'
#18 2161.0 make[3]: Nothing to be done for 'opt'.
#18 2161.0 make[3]: Leaving directory '/usr/src/otp_src_24.2/lib/ssh/doc/src'
#18 2161.0 === Leaving application ssh
#18 2161.0 make[2]: Leaving directory '/usr/src/otp_src_24.2/lib/ssh'
#18 2161.0 make[2]: Entering directory '/usr/src/otp_src_24.2/lib/eldap'
#18 2161.1 === Entering application eldap

later final segfault is still during the compilation of boot scripts:

#18 2162.7 === Leaving application tftp
#18 2162.7 make[2]: Leaving directory '/usr/src/otp_src_24.2/lib/tftp'
#18 2162.7 make[1]: Leaving directory '/usr/src/otp_src_24.2/lib'
#18 2162.8 make[1]: Entering directory '/usr/src/otp_src_24.2/erts'
#18 2162.9 make[2]: Entering directory '/usr/src/otp_src_24.2/erts/start_scripts'
#18 2163.0  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_clean.rel
#18 2163.0  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_sasl.rel
#18 2163.0  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_all_example.rel
#18 2163.1  GEN	/usr/src/otp_src_24.2/erts/start_scripts/no_dot_erlang.rel
#18 2163.1  GEN	/usr/src/otp_src_24.2/erts/start_scripts/start_clean.script
#18 2163.4 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#18 2163.4 make[2]: *** [Makefile:84: /usr/src/otp_src_24.2/erts/start_scripts/start_clean.script] Segmentation fault
#18 2163.4 make[2]: Leaving directory '/usr/src/otp_src_24.2/erts/start_scripts'
#18 2163.4 make[1]: *** [Makefile:67: local_setup] Error 2
#18 2163.4 make[1]: Leaving directory '/usr/src/otp_src_24.2/erts'
#18 2163.4 make: *** [Makefile:1070: local_setup] Error 2

Should we close this issue though? I looks like an issue with docker buildx and qemu..

@garazdawi
Copy link
Contributor

Should we close this issue though? I looks like an issue with docker buildx and qemu..

Yes, I think so. There is not much that we can do to help this. If you do find a solution though, please make a note here so that anyone else that comes across this will now what to do.

@garazdawi
Copy link
Contributor

https://erlangforums.com/t/otp-25-0-rc3-release-candidate-3-is-released/1317/24 turns out that the problem is a bug/feature of qemu. Not much that we can do about it without compromising the security of all JIT nodes, so if you want this to work, the fix should be in qemu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

4 participants