-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc distributed for riscv64 linux segfaults on almost anything #117022
Comments
Can you compile a hello world program? Anything at all? The segfault here is from the |
hello world is fine |
Hi. I think this may be related. Observed when running cargo and rustc on qemu's virt machine emulation with an ubuntu root filesystem:
Note that:
The backtrace seems to be triggered by a read of some VDSO symbol by cargo~rustc et al which isn't nice. However, the hello binary is built and runs correctly. Happy to provide any more info. Thanks |
Another data point. When I try to build the rust compiler natively I get a similar backtrace for rustc but this time it appears to trigger a fatal SIGSEGV:
Git hash for the rust source checkout: |
Could these be related to #117101 (comment)?
|
@saethlin: It does appear likely. Will try to dig a bit. |
I probably have the same or similar issue: I am using current stable rustc directly on riscv64 (Ubuntu 23.10). Even a simple "hello world" spits out weird stuff:
It run's though:
System info:
|
I don't think so, but let me give the background: I just now created an QEMU RISC-V vm based on https://cdimage.ubuntu.com/releases/23.10/release/ubuntu-23.10-preinstalled-server-riscv64.img.xz following the instructions on https://wiki.ubuntu.com/RISC-V/QEMU (adjusted for the different image). I booted this image and updated it to the development version ( Now a
but a more evolved projects fails with
However #117101 (comment) claims that this should work with ld version 2.41 (Sigh I realize now that this is redundant with the previous comment. Hopefully it adds some value). |
It reproduces with at least 1.73.0, 1.72.0, 1.70.0, and 1.68.0. I haven't found a workaround. |
Better? If you'd prefer something else just say and I'll paste it in. (Up-thread someone said they could compile hello world) |
Ha, thanks. Well, as best I can tell everything will complain and just about anything non-trivial will fail. Debian 12 with ld 2.40 doesn't appear to have an issue, but both Ubuntu 23.10 (ld 2.40) and Ubuntu 24.04 (ld 2.41) reproduces this. |
I had a try, and it was fine on Ubuntu 22.04. ubuntu@x86:~$ qemu-system-riscv64 -machine virt -m 2048 -nographic -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.bin -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -device virtio-net-device,netdev=eth0 -netdev user,id=eth0 -device virtio-rng-pci -drive file=ubuntu.img,format=raw,if=virtio
...
ubuntu@riscv:~$ uname -a
Linux ubuntu 5.19.0-1021-generic #23~22.04.1-Ubuntu SMP Thu Jun 22 12:49:35 UTC 2023 riscv64 riscv64 riscv64 GNU/Linux
ubuntu@riscv:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
ubuntu@riscv:~$ rustc --version
rustc 1.74.0 (79e9716c9 2023-11-13)
ubuntu@riscv:~$ cd hello/
ubuntu@riscv:~/hello$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.42s
Running `target/debug/hello`
Hello!
Hello, world!
ubuntu@riscv:~/hello$ cd ..
ubuntu@riscv:~$ ./sample-rust 1 2 3 # this is a program I compiled with Ferrocene earlier
Hello
Arg =
invalid args
Arg 0 = "./sample-rust"
Arg 1 = "1"
Arg 2 = "2"
Arg 3 = "3"
ubuntu@riscv:~$ file ./sample-rust
./sample-rust: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=f366420af49ba5f83b91cc4322e780655f1cb0cf, for GNU/Linux 4.15.0, with debug_info, not stripped |
I can replicate the errors 23.10 though
So what is $ gdb `which rustc`
(gdb) run -- /home/ubuntu/hello/src/main.rs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
process 3130 is executing new program: /home/ubuntu/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/rustc
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffef823c40 (LWP 3132)]
[New Thread 0x7fffede1bc40 (LWP 3133)]
[New Thread 0x7fffedc1ac40 (LWP 3134)]
[New Thread 0x7fffec7fec40 (LWP 3135)]
[New Thread 0x7fffec5fdc40 (LWP 3136)]
[Thread 0x7fffec7fec40 (LWP 3135) exited]
[Thread 0x7fffec5fdc40 (LWP 3136) exited]
Thread 3 "rustc" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fffede1bc40 (LWP 3133)]
syscall (syscall_number=98, arg1=<optimized out>, arg2=137, arg3=2, arg4=0,
arg5=0, arg6=-1, arg7=98) at ../sysdeps/unix/sysv/linux/riscv/syscall.c:27
27 ../sysdeps/unix/sysv/linux/riscv/syscall.c: No such file or directory.
(gdb) where
#0 syscall (syscall_number=98, arg1=<optimized out>, arg2=137, arg3=2,
arg4=0, arg5=0, arg6=-1, arg7=98)
at ../sysdeps/unix/sysv/linux/riscv/syscall.c:27
#1 0x00007ffff7f1e09e in std::sys::unix::futex::futex_wait ()
at library/std/src/sys/unix/futex.rs:62
#2 std::sys::unix::locks::futex_condvar::Condvar::wait_optional_timeout ()
at library/std/src/sys/unix/locks/futex_condvar.rs:49
#3 std::sys::unix::locks::futex_condvar::Condvar::wait ()
at library/std/src/sys/unix/locks/futex_condvar.rs:33
#4 0x00007ffff4d2d230 in <jobserver::HelperState>::for_each_request::<jobserver::imp::spawn_helper::{closure#1}::{closure#0}> ()
from /home/ubuntu/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-32688bc33ce513ef.so
#5 0x00007ffff4d2f06e in std::sys_common::backtrace::__rust_begin_short_backtrace::<jobserver::imp::spawn_helper::{closure#1}, ()> ()
from /home/ubuntu/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-32688bc33ce513ef.so
#6 0x00007ffff4d2f0c2 in std::panicking::try::do_call::<core::panic::unwind_safe::AssertUnwindSafe<<std::thread::Builder>::spawn_unchecked_<jobserver::imp::spawn_helper::{closure#1}, ()>::{closure#1}::{closure#0}>, ()> ()
from /home/ubuntu/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-32688bc33ce513ef.so
#7 0x00007ffff4d2f8fa in __rust_try.llvm.14045574179110023769 ()
--Type <RET> for more, q to quit, c to continue without paging--
from /home/ubuntu/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-32688bc33ce513ef.so
#8 0x00007ffff4d3089e in <<std::thread::Builder>::spawn_unchecked_<jobserver::imp::spawn_helper::{closure#1}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} ()
from /home/ubuntu/.rustup/toolchains/stable-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-32688bc33ce513ef.so
#9 0x00007ffff7f1cbf0 in alloc::boxed::{impl#47}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> ()
at library/alloc/src/boxed.rs:2007
#10 alloc::boxed::{impl#47}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> ()
at library/alloc/src/boxed.rs:2007
#11 std::sys::unix::thread::{impl#2}::new::thread_start ()
at library/std/src/sys/unix/thread.rs:108
#12 0x00007fffef908956 in start_thread (arg=<optimized out>)
at ./nptl/pthread_create.c:444
#13 0x00007fffef959bf0 in __thread_start_clone3 ()
at ../sysdeps/unix/sysv/linux/riscv/clone3.S:71 The last comment on #102155 seems to be the same issue (but not the OP, that was different). libc thinks 98 is But Syscall 98 does not appear on https://jborza.com/post/2021-05-11-riscv-linux-syscalls/. Instead the Futex syscall is given as 422. Edit: Header agrees its 98 so that website is maybe just wrong. ubuntu@riscv:/usr/include$ grep "__NR_futex" . -R
./riscv64-linux-gnu/bits/syscall.h:#ifdef __NR_futex
./riscv64-linux-gnu/bits/syscall.h:# define SYS_futex __NR_futex
./riscv64-linux-gnu/bits/syscall.h:#ifdef __NR_futex_time64
./riscv64-linux-gnu/bits/syscall.h:# define SYS_futex_time64 __NR_futex_time64
./riscv64-linux-gnu/bits/syscall.h:#ifdef __NR_futex_waitv
./riscv64-linux-gnu/bits/syscall.h:# define SYS_futex_waitv __NR_futex_waitv
./asm-generic/unistd.h:#define __NR_futex 98
./asm-generic/unistd.h:__SC_3264(__NR_futex, sys_futex_time32, sys_futex)
./asm-generic/unistd.h:#define __NR_futex_time64 422
./asm-generic/unistd.h:__SYSCALL(__NR_futex_time64, sys_futex)
./asm-generic/unistd.h:#define __NR_futex_waitv 449
./asm-generic/unistd.h:__SYSCALL(__NR_futex_waitv, sys_futex_waitv) |
Ubuntu's own ubuntu@riscv:~/hello$ which rustc
/usr/bin/rustc
ubuntu@riscv:~/hello$ rustc --version
rustc 1.71.1 (eb26296b5 2023-08-03) (built from a source tarball)
ubuntu@riscv:~/hello$ rustc ./src/main.rs
/lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so(+0x7d240c)[0x7fff8a3d240c]
linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7fff8d839800]
/lib/riscv64-linux-gnu/libc.so.6(syscall+0x16)[0x7fff899e1406]
/lib/riscv64-linux-gnu/libstd-e74a09c2d8dfb358.so(_ZN3std3sys4unix5locks13futex_condvar7Condvar4wait17h02293b5d419ad109E+0x64)[0x7fff89b44c0a]
/lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so(+0x2a92d42)[0x7fff8c692d42]
/lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so(+0x2a93234)[0x7fff8c693234]
/lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so(+0x2a95160)[0x7fff8c695160]
/lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so(+0x2a95208)[0x7fff8c695208]
/lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so(+0x2a943ae)[0x7fff8c6943ae]
/lib/riscv64-linux-gnu/libstd-e74a09c2d8dfb358.so(rust_metadata_std_4809e9723d3e348c+0x7a798)[0x7fff89b03798]
/lib/riscv64-linux-gnu/libc.so.6(+0x6a956)[0x7fff89991956]
/lib/riscv64-linux-gnu/libc.so.6(+0xbbbf0)[0x7fff899e2bf0]
ubuntu@riscv:~/hello$ And it goes pop in the same place: $ gdb rustc
(gdb) run -- src/main.rs
syscall (syscall_number=98, arg1=<optimized out>, arg2=137, arg3=2, arg4=0,
arg5=0, arg6=-1, arg7=98) at ../sysdeps/unix/sysv/linux/riscv/syscall.c:27
27 ../sysdeps/unix/sysv/linux/riscv/syscall.c: No such file or directory.
(gdb) where
#0 syscall (syscall_number=98, arg1=<optimized out>, arg2=137, arg3=2,
arg4=0, arg5=0, arg6=-1, arg7=98)
at ../sysdeps/unix/sysv/linux/riscv/syscall.c:27
#1 0x00007ffff42f6c7e in std::sys::unix::locks::futex_mutex::Mutex::lock_contended () from /lib/riscv64-linux-gnu/libstd-e74a09c2d8dfb358.so
#2 0x00007ffff6e92d42 in ?? ()
from /lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so
#3 0x00007ffff6e93234 in ?? ()
from /lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so
#4 0x00007ffff6e95160 in ?? ()
from /lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so
#5 0x00007ffff6e95208 in ?? ()
from /lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so
#6 0x00007ffff6e943ae in ?? ()
from /lib/riscv64-linux-gnu/librustc_driver-192663414457c4de.so
#7 0x00007ffff4303798 in ?? ()
from /lib/riscv64-linux-gnu/libstd-e74a09c2d8dfb358.so
#8 0x00007ffff4191956 in start_thread (arg=<optimized out>)
at ./nptl/pthread_create.c:444
#9 0x00007ffff41e2bf0 in __thread_start_clone3 ()
at ../sysdeps/unix/sysv/linux/riscv/clone3.S:71 |
So I booted the Ubuntu 22.04 image with a default 5.19 kernel (built from source) and that was fine. Then I built a 6.6 kernel and booted it with that and ... it was also fine. So it's not "new kernel changed something and broke futexes". Then I booted Ubuntu 23.10 (with its default kernel) and that was still broken. So I tried 23.10 with my new custom 6.6 kernel and that was now fine! ubuntu@ubuntu:~$ uname -a
Linux ubuntu 6.6.0 #3 SMP Thu Nov 23 23:22:59 GMT 2023 riscv64 riscv64 riscv64 GNU/Linux
ubuntu@ubuntu:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
ubuntu@ubuntu:~$ rustc ./hello/src/main.rs
ubuntu@ubuntu:~$ ./main
Hello, world! So there's something about the default kernel 6.5 in Ubuntu 23.10 that is broken, but that is fine if you build a defconfig vanilla 6.6 kernel. I guess someone could grab the 6.5 kernel source from Ubuntu and build it from scratch. Maybe there's actually a bug in the Ubuntu GCC so it miscompiles the kernel? I'm using the GCC 12.2 from Debian 12 to cross-compile my kernels because that's what this test machine has on it. It's looking fairly unlikely it's a rustc bug though. Or it would be a hell of a rustc bug if it only showed up on some kernel versions and not in others. |
There's a complicating factor. I tried running it under gdb catch the issue. I don't know if this is relevant but I no longer can reproduce the issue with the simple (Edited to reflect retesting with a larger example). |
@thejpster did you try running some bigger loads, say, |
Building a basic main.rs was a very reliable trigger on 23.10 with the stock kernel, but yeah, I can push it a bit and see. Perhaps you could also try a vanilla kernel and see if that helps? |
$ cargo install ripgrep
(much time passes - I only gave the VM 2GB of RAM)
Installed package `ripgrep v13.0.0` (executable `rg`)
$ rg --version
ripgrep 13.0.0
-SIMD -AVX (compiled)
$ uname -a
Linux ubuntu 6.6.0 #3 SMP Thu Nov 23 23:22:59 GMT 2023 riscv64 riscvc64 riscv64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
$ |
Alright, I compiled a 6.5 kernel using Ubuntu's config (copied from /boot/config-6.5.0-9-generic) using the Debian compiler and it crashes rustc just like the stock 23.10 kernel does. So there's something in Ubuntu's kernel 6.5 that isn't in the vanilla 6.6. I guess I could try making a vanilla 6.5 - maybe the bug was fixed in 6.6? Then I could look at the diffs between the two. But really at this point it really doesn't look like a rustc issue. |
ooooh, vanilla 6.5 crashes with a 23.10 disk image. So maybe it was fixed in 6.6 and it wasn't an Ubuntu specific bug. |
Addition to my report at #117022 (comment):
|
Are there any other kernels you could try? I've found 6.4 works and 6.6 works, so it seems a 6.5 specific issue. |
Sorry, I don't know how to compile and install kernels, let alone cross-arch. But maybe I will try with other pre-baked images. One thing I've noticed though: With rust 1.73.0 as mentioned in #117022 (comment) I've "just" get a warning for every compilation step, but resulting binaries seem fine (e.g. ripgrep works).
Interestingly, I can even compile ripgrep in spite of the SIGSEGV messages, and it works. |
I could easily believe that rustc started using futexes in another part of the compilation process and the bug seems to be around kernel handling of the futex syscall in kernel 6.5 specifically. Has anyone seen it fail on anything other than kernel 6.5? |
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-high |
We should probably open a bug upstream on Canonical's issue tracker, as their kernel seems pretty broken and folks are going to keep falling over this. |
Well I bounced off their IRC support channel (who told me to read the page I had just read and which had sent me to the IRC channel). I can only suggest running |
This may be https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2042388. I updated to 6.5.0-14 which has a fix for that bug, and the issue went away. I was able to compile hello world without issue. Now leaving it compiling ripgrep (which takes about an hour). |
Thank you. I apologize, it never occurred to me that it could have been a kernel bug. I think we can close this now as it's neither a rustc nor even a Rust bug. |
I can also confirm the bug is fixed for me on my SBC after upgrading kernel from 6.5.0-13 to 6.5.0-14 on Ubuntu 23.10. For reference, here is also a bug report on upstream kernel: https://bugzilla.kernel.org/show_bug.cgi?id=217923 Fix seems to be included in Linux 6.6: https://lwn.net/Articles/947826/ Thanks for your work and information. |
great work folks. Thanks for the follow-up on this issue. Going to close as not-our-bug @rustbot label -P-high -regression-stable-to-stable |
Meta
rustc --version --verbose
:cat /proc/version
:Backtrace
The text was updated successfully, but these errors were encountered: