-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUST_BACKTRACE=full
loop with -Cpanic=abort
on aarch64-unknown-linux-gnu
#123733
Comments
I think this was already reported? #123686 |
It might be a duplicate of #121817, cc @wesleywiser |
#97235 adds a lint (which isn't even executed with panic=abort), and should have no codegen changes. The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with Adding |
OK, I don't know. The bisected nightly works fine when I build it myself, which implies that it's sensitive to the particular configuration, but that makes it hard to try and narrow this further. Or if the bug only arose there because of happenstance code changes, the root cause might not be in that range at all. |
I can reproduce the infinite backtrace with 2020-03-01 and the target is gone in 2020-01-01 so I suspect that this bug has been in the target since we started distributing artifacts for it. I cannot reproduce the infinite backtrace printing with The behavior of the MIR inliner is sensitive to the stable crate hash, so if the behavior of the MIR inliner can impact whether the fatal optimization happens, it may not be possible to bisect this. |
To be clear, the behavior seems wrong to me even using
Buggy in a different way, but still buggy, so maybe it's not worth splitting hairs. |
Shouldn't the unwinder just give up if it reaches a frame with no unwinding information? This seems to be the issue here since the loop start when the unwinder reaches the first frame compiled with This target uses libgcc as the unwinder, this might be worth looking more deeply into. A truncated backtrace is normal if everything is compiled with Also note that the call into panic is a non-returning tail call. This has caused issues in the past on the ARM target (#69231) since LLVM would assume the LR value is dead and leave it dangling. If this happens to hold a function pointer to |
Huh? I thought AArch64 code was being compiled with frame pointers? |
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-high |
For backtracing, libgcc does have a mechanism to use a fallback mechanism if unwind info is not available. Although I just checked that it is implemented for PPC only, so this wasn't the case for aarch64. This unfortunately makes me unable to explain why the looping behaviour disappears when we force frame pointer though.
Doesn't look like it. https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs Apparently we have frame pointers enabled for non-leaf functions on Apple platforms, but not Linux. |
Another interesting data point. When building for
Note that the last line is clearly incorrect: |
More clues:
What should have happened is that we stop the backtrace at |
…mulacrum ci: test cargo on `aarch64-gnu` Since `aarch64-unknown-linux-gnu` is a tier-1 target, we should also test cargo on it, especially since cargo's own CI doesn't cover this yet. This might have helped us discover rust-lang#123733 sooner, which is not a cargo problem but was uncovered by a new cargo test (which we'll have to skip for now). Everything else passes in my local run, so at least we'll have a guard against future regressions.
|
OK I figured out the root cause of the problem. It's because LLVM is clobbering the link register on the tail call:
Note how the incoming LR value is not saved anywhere and is later clobbered by the BL instruction. Also, my earlier statement was incorrect,
This unwind info is claiming that the return address is present in the link register (this is the default if not overridden), but this is clearly incorrect in this case since the link register has been clobbered. So the unwinder is behaving correctly, it's just that LLVM is emitting incorrect unwind info (or that it should be preserving the caller's link register). |
Here is the LLVM IR for this function. I think the root of the problem is that we are assigning a personality function to this function, which forces LLVM to emit DWARF frame metadata for this function. However this is a nounwind function that only calls other nounwind functions, so LLVM clobbering the link register is perfectly valid. I see 2 ways of fixing this:
; ModuleID = '<stdin>'
source_filename = "rust_out.7d368e479ffa2ca5-cgu.0"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-musl"
%"std::panicking::begin_panic::Payload<&str>" = type { %"core::option::Option<&str>" }
%"core::option::Option<&str>" = type { ptr, [1 x i64] }
@vtable.1 = external hidden unnamed_addr constant <{ ptr, [16 x i8], ptr, ptr }>, align 8
; Function Attrs: inlinehint noreturn nounwind
define hidden fastcc void @"_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h275b05efe23611e5E"(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %_1) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
%_4 = alloca %"std::panicking::begin_panic::Payload<&str>", align 8
call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_4)
%inner.0 = load ptr, ptr %_1, align 8, !nonnull !3, !align !4, !noundef !3
%0 = getelementptr inbounds i8, ptr %_1, i64 8
%inner.1 = load i64, ptr %0, align 8, !noundef !3
store ptr %inner.0, ptr %_4, align 8
%1 = getelementptr inbounds i8, ptr %_4, i64 8
store i64 %inner.1, ptr %1, align 8
%2 = getelementptr inbounds i8, ptr %_1, i64 16
%_6 = load ptr, ptr %2, align 8, !nonnull !3, !align !5, !noundef !3
call void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1 %_4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.1, ptr noalias noundef readonly align 8 dereferenceable_or_null(48) null, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) %_6, i1 noundef zeroext true, i1 noundef zeroext false) #4
unreachable
}
; Function Attrs: nounwind
declare noundef i32 @rust_eh_personality(i32 noundef, i32 noundef, i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1
; Function Attrs: noreturn nounwind
declare void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(24), ptr noalias noundef readonly align 8 dereferenceable_or_null(48), ptr noalias noundef readonly align 8 dereferenceable(24), i1 noundef zeroext, i1 noundef zeroext) unnamed_addr #2
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #3
attributes #0 = { inlinehint noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #1 = { nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #2 = { noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #4 = { noreturn nounwind }
!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}
!0 = !{i32 8, !"PIC Level", i32 2}
!1 = !{i32 7, !"PIE Level", i32 2}
!2 = !{!"rustc version 1.78.0-nightly (c67326b06 2024-03-15)"}
!3 = !{}
!4 = !{i64 1}
!5 = !{i64 8} |
I suppose we can also just emit the |
…mulacrum ci: test cargo on `aarch64-gnu` Since `aarch64-unknown-linux-gnu` is a tier-1 target, we should also test cargo on it, especially since cargo's own CI doesn't cover this yet. This might have helped us discover rust-lang#123733 sooner, which is not a cargo problem but was uncovered by a new cargo test (which we'll have to skip for now). Everything else passes in my local run, so at least we'll have a guard against future regressions.
Mmm. We probably ought to be adopting frame pointers across the board for AAPCS64 reasons. It's technically acceptable for a platform to deviate, but it should be explicit. |
I think this also affects riscv64 in Fedora/RISCV. Looking at the builds logs it seems to be |
To clarify, Fedora does force-enable frame pointers for x86_64 and aarch64, so this might be affecting Rust builds on Fedora. |
I reproduced the problem using nightlies for this report, so I don't think Fedora's configuration matters. |
@Amanieu I reproduced it. In my case, I saw that the LLVM IR output was: ; std::panicking::begin_panic::{{closure}}
; Function Attrs: inlinehint noreturn nounwind
define internal void @"_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h4691076880adeaeeE"(ptr align 8 %_1) unnamed_addr #5 personality ptr @rust_eh_personality {
start:
%0 = alloca [16 x i8], align 8
%_7 = alloca [16 x i8], align 8
%_4 = alloca [16 x i8], align 8
%inner.0 = load ptr, ptr %_1, align 8
%1 = getelementptr inbounds i8, ptr %_1, i64 8
%inner.1 = load i64, ptr %1, align 8
store ptr %inner.0, ptr %_7, align 8
%2 = getelementptr inbounds i8, ptr %_7, i64 8
store i64 %inner.1, ptr %2, align 8
%3 = load ptr, ptr %_7, align 8
%4 = getelementptr inbounds i8, ptr %_7, i64 8
%5 = load i64, ptr %4, align 8
store ptr %3, ptr %_4, align 8
%6 = getelementptr inbounds i8, ptr %_4, i64 8
store i64 %5, ptr %6, align 8
%7 = getelementptr inbounds i8, ptr %_1, i64 16
%_6 = load ptr, ptr %7, align 8
; call std::panicking::rust_panic_with_hook
call void @_ZN3std9panicking20rust_panic_with_hook17h9e3397264ef8c828E(ptr align 1 %_4, ptr align 8 @vtable.1, ptr align 8 null, ptr align 8 %_6, i1 zeroext true, i1 zeroext false) #13
unreachable
bb1: ; No predecessors!
%8 = load ptr, ptr %0, align 8
%9 = getelementptr inbounds i8, ptr %0, i64 8
%10 = load i32, ptr %9, align 8
%11 = insertvalue { ptr, i32 } poison, ptr %8, 0
%12 = insertvalue { ptr, i32 } %11, i32 %10, 1
resume { ptr, i32 } %12
} Due to the In any case, if I understand correctlly, the condition that sets the personality function in this case is: https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_ssa/src/mir/mod.rs#L182 which I don't fully understand TBH. Also for everyone without access to a aarch64 machine and want to reproduce (paths are maybe a bit different): echo 'fn main() { panic!() }' | rustc - -Cpanic=abort --target aarch64-unknown-linux-gnu -C "linker=aarch64-linux-gnu-gcc" && qemu-aarch64 -E LD_LIBRARY_PATH="/usr/aarch64-linux-gnu/lib64" -E RUST_BACKTRACE=full -L /usr/aarch64-linux-gnu/ ./rust_out |
@Reflexe Essentially, whenever we add a personality function we must ensure that we also add the |
Adding a personality function forces LLVM to generate unwinding info that might be incorrect. To solve it, always apply the UWTable attribute when setting a personality function. Fixes rust-lang#123733
@Amanieu Thanks! this was pretty easy to implement, I created #125844. There are two issues still
The last entry is incorrect here. Not sure if this is related or we should open another issue for this one. |
Adding a personality function forces LLVM to generate unwinding info that might be incorrect. To solve it, always apply the UWTable attribute when setting a personality function. Fixes rust-lang#123733
Adding a personality function forces LLVM to generate unwinding info that might be incorrect. To solve it, always apply the UWTable attribute when setting a personality function. Fixes rust-lang#123733
Yes, it generates an incorrect entry if the
No, the last entry here is correct: the backtrace properly ends at __rust_end_short_backtrace which comes from user code and therefore has no unwinding metadata. |
When compiled with
-Cpanic=abort
, a program as simple asfn main() { panic!(); }
gets into a loop withRUST_BACKTRACE=full
onaarch64-unknown-linux-gnu
:With
RUST_BACKTRACE=1
, it has no backtrace at all:Current toolchain:
Using
cargo bisect-rustc
, this problem seems to be quite old:Comparison: 46b8c23...f2d9393
#97235 looks like the most obvious candidate, cc @nbdd0121 @Amanieu
The reason I'm noticing this now is because there's a new cargo test
panic_abort_doc_tests
from rust-lang/cargo#13388, which I found hanging on Fedora aarch64 in a scratch build of 1.78-beta. After about 2 hours, rustdoc got aSIGKILL
, which I assume is an OOM from capturing all that backtrace output. When reproducing this hang, I found that running itsRUST_BACKTRACE=full /tmp/rustdoctestGHjZyP/rust_out
myself showed the backtrace loop.This is similar to #123686, but that proposed fix only applies to Windows.
The text was updated successfully, but these errors were encountered: