Infinite backtrace, odd backtrace behavior and "corrupted stack?" in GDB #139

japaric · 2018-10-26T18:26:34Z

STR

$ cargo generate --git ~/rust-embedded/cortex-m-quickstart --name app && cd app

$ cargo add panic-abort

$ # modify examples/panic.rs to use panic-abort

$ # change memory.x to match the memory layout of the STM32F3DISCOVERY

$ # uncomment thumbv7em-none-eabi target and gdb runner in .cargo/config 

$ cargo run --example panic
(gdb) backtrace  # "corrupt stack?"
#0  DefaultPreInit ()
    at /home/japaric/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/cortex-m-rt-0.6.5/src/lib.rs:556
#1  0x08000412 in Reset ()
    at /home/japaric/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/cortex-m-rt-0.6.5/src/lib.rs:496
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) continue
Breakpoint 4, main () at examples/panic.rs:29
29          panic!("Oops")

(gdb) backtrace  # where did Reset go?
#0  main () at examples/panic.rs:29

(gdb) continue
Breakpoint 3, rust_begin_unwind (_info=0x10001fb4) at (..)
31          unsafe { intrinsics::abort() }

(gdb) backtrace  # infinite backtrace
#0  rust_begin_unwind (_info=0x10001fb4) at (..)
#1  0x080006b0 in core::panicking::panic_fmt () at libcore/panicking.rs:77
#2  0x080006b0 in core::panicking::panic_fmt () at libcore/panicking.rs:77
#3  0x080006b0 in core::panicking::panic_fmt () at libcore/panicking.rs:77
(..)

Metadata

$ rustc -V
rustc 1.31.0-nightly (f99911a4a 2018-10-23)

$ grep cortex-m-rt Cargo.lock
 "cortex-m-rt 0.6.5 (registry+https://github.com/rust-lang/crates.io-index)"

$ arm-none-eabi-gdb --version
GNU gdb (GDB) 8.2

Has anyone experienced these issues? I've seen them with previous releases of cortex-m-rt so I don't think they are related to the recent changes to HardFault. Also, in the above logs all backtrace invocations are before reaching HardFault.

The text was updated successfully, but these errors were encountered:

adamgreen · 2018-10-26T19:29:44Z

I did see similar issues when I was running through "Chapter 6 - Hello, world!" of the Discovery book. I took a quick look at the time and seem to remember that the panic related functions were modifying important registers like LR without first saving them on the stack. I didn't really dig into it since I am a Rust noob and I figured that was the expected behavior for Rust's panic handling code. If you want, I can investigate this. I don't think my HardFault change has anything to do with this but I can make sure that is the case and I think they are probably a similar type of issue.

japaric · 2018-10-26T22:38:06Z

You are correct. core::panicking::panic modifies LR without first pushing it onto the stack.

$ cd $(rustc --print sysroot)

$ arm-none-eabi-objdump -Cd lib/rustlib/thumbv7m-none-eabi/lib/libcore-*.rlib
(..)
00000000 <core::panicking::panic>:
   0:   b08c            sub     sp, #48 ; 0x30
   2:   e890 1006       ldmia.w r0, {r1, r2, ip}
   6:   e9d0 3e03       ldrd    r3, lr, [r0, #12]
   a:   6940            ldr     r0, [r0, #20]
   c:   e9cd 1206       strd    r1, r2, [sp, #24]
  10:   a906            add     r1, sp, #24
  12:   f240 0200       movw    r2, #0
  16:   9100            str     r1, [sp, #0]
  18:   2101            movs    r1, #1
  1a:   f2c0 0200       movt    r2, #0
  1e:   9101            str     r1, [sp, #4]
  20:   2100            movs    r1, #0
  22:   e9cd 1102       strd    r1, r1, [sp, #8]
  26:   e9cd 2104       strd    r2, r1, [sp, #16]
  2a:   a908            add     r1, sp, #32
  2c:   e9cd c308       strd    ip, r3, [sp, #32]
  30:   e9cd e00a       strd    lr, r0, [sp, #40]       ; 0x28
  34:   4668            mov     r0, sp
  36:   f7ff fffe       bl      0 <core::panicking::panic>
  3a:   defe            udf     #254    ; 0xfe

It seems that all divergent functions (fn(..) -> !) are given the noreturn attribute in LLVM-IR and this gives LLVM permission to thrash the LR register when optimizing the code. See example below:

#![feature(asm)]
#![no_main]
#![no_std]

extern crate panic_abort;

use cortex_m_rt::entry;

#[entry]
fn main() -> ! {
    foo();
    bar();
    baz()
}

#[inline(never)]
fn foo() {
    unsafe {
        asm!("" :: "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) :: "volatile");
    }
}

#[inline(never)]
fn bar() {
    unsafe {
        asm!("" :: "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) :: "volatile");
    }
}

#[inline(never)]
fn baz() -> ! {
    unsafe {
        asm!("" :: "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) :: "volatile");
    }

    quux()
}

#[inline(never)]
fn quux() -> ! {
    unsafe {
        asm!("" :: "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) :: "volatile");
    }

    loop {}
}

$ cargo rustc --example asm --release -- --emit=llvm-ir

$ cat $(find -name '*.ll')
; asm::foo
; Function Attrs: noinline nounwind
define internal fastcc void @_ZN3asm3foo17h42aa59da8b4484e3E() unnamed_addr #0 {
start:
  tail call void asm sideeffect "", "r,r,r,r,r,r"(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5) #3, !srcloc !513
  ret void
}

; asm::bar
; Function Attrs: noinline nounwind
define internal fastcc void @_ZN3asm3bar17h6e8a9117df8b0663E() unnamed_addr #0 {
start:
  tail call void asm sideeffect "", "r,r,r,r,r"(i32 0, i32 1, i32 2, i32 3, i32 4) #3, !srcloc !514
  ret void
}

; asm::baz
; Function Attrs: noinline noreturn nounwind
define internal fastcc void @_ZN3asm3baz17h32a1e9952bcf27a1E() unnamed_addr #1 {
start:
  tail call void asm sideeffect "", "r,r,r,r,r,r"(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5) #3, !srcloc !515
; call asm::quux
  tail call fastcc void @_ZN3asm4quux17h6aa0bed8e5be8684E()
  unreachable
}

; asm::quux
; Function Attrs: noinline noreturn nounwind
define internal fastcc void @_ZN3asm4quux17h6aa0bed8e5be8684E() unnamed_addr #1 {
start:
  tail call void asm sideeffect "", "r,r,r,r,r,r,r"(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6) #3, !srcloc !516
  br label %bb1

bb1:                                              ; preds = %bb1, %start
  br label %bb1
}

$ cargo objdump --example asm --release -- -d -no-show-raw-insn -print-imm-hex
Disassembly of section .text:
asm::foo::hab8803af3cab0482:
 8000400:       push    {r7, lr}
 8000402:       mov.w   r12, #0x0
 8000406:       mov.w   lr, #0x1
 800040a:       movs    r2, #0x2
 800040c:       movs    r3, #0x3
 800040e:       movs    r0, #0x4
 8000410:       movs    r1, #0x5
 8000412:       pop     {r7, pc}

asm::bar::h0d958609aa53168d:
 8000414:       mov.w   r12, #0x0
 8000418:       movs    r1, #0x1
 800041a:       movs    r2, #0x2
 800041c:       movs    r3, #0x3
 800041e:       movs    r0, #0x4
 8000420:       bx      lr

asm::baz::h3d14c7f076829055:
 8000422:       mov.w   r12, #0x0
 8000426:       mov.w   lr, #0x1 ; <-
 800042a:       movs    r2, #0x2
 800042c:       movs    r3, #0x3
 800042e:       movs    r0, #0x4
 8000430:       movs    r1, #0x5
 8000432:       bl      #0x2
 8000436:       trap

asm::quux::h485b4bd2341abdb3:
 8000438:       mov.w   r12, #0x0
 800043c:       mov.w   lr, #0x1 ; <-
 8000440:       movs    r2, #0x2
 8000442:       movs    r3, #0x3
 8000444:       movs    r0, #0x4
 8000446:       movs    r1, #0x5
 8000448:       movs    r4, #0x6
 800044a:       b       #-0x4 <asm::quux::h485b4bd2341abdb3+0x12>

I can't think of any way to solve this issue from our side given that libcore comes pre-compiled.

thejpster · 2018-10-27T21:02:33Z

Jumping in this thread. I saw this in an Embedded Rust workshop I ran at work the other day.

adamgreen · 2018-12-10T02:54:53Z

It would be nice if the LLVM backend for ARM targets could be told to not trash LR when the noreturn attribute is used so that stack backtraces weren't broken in the debugger. What's more important, the little bit of speed/space savings from not spilling LR to the stack before modification or allowing a developer to actually be able to debug their code when such code is in the middle of their stack backtrace?

japaric · 2018-12-10T11:01:26Z

AFAICT the difference between noreturn and no-noreturn is not small. The former will not stack any registers until all registers (~10) are used up; the latter will stack registers as soon as the scratch registers (3-5) are used up. I think we should not disable this optimization (in LLVM or in cortex-m-rt), or make it opt-in (makes too big of a diff in perf to have people "forget to enable it").

It would be nice if the LLVM backend for ARM targets could be told to not trash LR when the noreturn
attribute is used

If LLVM has this feature then I think it should be opt-out (i.e. enabled by default in the compiler but can be disabled via a compiler flag) for the Cortex-M targets. I haven't seen any switch like that though; I have only seen one for the frame pointer (-fno-omit-frame-pointer).

adamgreen · 2018-12-10T23:37:37Z

I think we should not disable this optimization (in LLVM or in cortex-m-rt), or make it opt-in (makes too big of a diff in perf to have people "forget to enable it").

I wonder how big the performance hit really is as these divergent functions aren't going to be called in a hot loop since they never return to be called more than once.

To corrupt the registers so bad that a developer can't get a stack backtrace in the debugger is pretty bad behavior. It is made even worse by the fact that a lot of times these noreturn functions find themselves in the middle of a call stack that a developer would really be interested in looking at (HardFault, panics, etc).

korken89 · 2019-01-07T10:04:27Z

I will chime in and say that I have seen the same issue, which unfortunately makes postmortem debugging of embedded systems currently impossible and unteachable.

korken89 · 2019-01-08T20:15:37Z

Update from the WG meeting 2019-01-08

The main issue with this is that the lr (link register) is not pushed to the stack, and used as a scratch register for divergent functions
There is a question if this is an intentional LLVM optimization
We should collect if panic debugging worked, and when it stopped working
When we have data, we will make an upstream issue to get more input on how to solve it

eddyp · 2019-01-10T21:20:38Z

I was wondering that in the case of a secure OS which has the ability to load and execute untrusted payloads it would make sense not to corrupt these so the exception handlers can decode the data automatically, log it and kick out or restart the offending payload (depending on the OS policy a retry count could be added).

Also note that not trashing LR is essential since it might be possible that a hard fault occurs in the SVC handler because of an offending payload, so the OS should be able to go down the stack and identify the caller of the SVC.

This would make for a very good use case, supplemental to the debugging support.

Granted, probably for such a use case there is a need for stable inline asm (sorry if this is not the right name, I am unfamiliar with the feature names), and maybe even some support for custom linker symbols so the decoding can happen at runtime, automatically.

As for the resolution, at Rust code level, I agree, a per-function opt-in attribute for divergent functions would be a very elegant solution. An alternative might be that all exception handlers are, by default, not trashing LR, even if divergent, but that would probably mean some "special knowledge" in the compiler which will probably not be accepted in llvm.

adamgreig · 2019-01-15T00:03:48Z

Noting that this is presumably the same problem as #158

korken89 · 2019-01-15T08:49:30Z

Yeah, with the HardfaultHandler is a divergent function it will not protect the link register, and any function call (or other use of LR) within the handler will break backtrace..

jamesmunns · 2019-01-25T15:39:17Z

Quick note if you are using a tool like gdbgui which automatically tries to query the stack frame on a crash (and ends up stuck recursively trying to query the corrupted stack frame), gdb does support limiting the max backtrace depth, which will keep the debugger from infinitely querying. Something like the following worked for me:

set backtrace limit 32

Edit: This can be set at command line, or in a batch file like .gdbinit or -x debug.gdb.

eddyp · 2019-01-25T19:19:43Z

set backtrace limit 32

Edit: This can be set at command line, or in a batch file like .gdbinit or -x debug.gdb

Would it make sense to have such a setting enabled by default in any bare metal project?

ssendev · 2019-07-22T13:19:10Z

Isn't it possible to do some linker tricks to avoid the noreturn?

fn panic(info: &PanicInfo) -> ! {
  if trick_the_optimizer() {
    unsafe { asm!{"ret"} }
  }
  loop {}
}

And then replace trick_the_optimizer during linking and maybe disable LTO if it is too good. Or maybe it is enough to mark it as extern "C" to inhibit optimizations.

tpwrules · 2019-08-05T18:02:23Z

Isn't it possible to do some linker tricks to avoid the noreturn?

I tried various methods, including the one you suggested. Unfortunately, this does not seem to help because there are a couple other noreturn functions between fn panic(info: &PanicInfo) -> ! and the panic site, so LR ends up corrupted anyway.

I have found that one can break on core::panicking::panic instead of rust_begin_unwind and still have a consistent stack. You don't get the panic message because it hasn't been constructed yet, but you do get a complete and non-corrupted backtrace, which is much more valuable in my experience. Perhaps the .gdbinit should be changed so that this is the default.

jonas-schievink · 2019-11-22T22:11:12Z

Is there an upstream issue for this?

jamesmunns · 2019-11-22T22:21:45Z

There have been discussions about this with one of the teams (at previous all hands), not sure if there is an issue

qwerty19106 · 2020-02-17T10:18:19Z

There is a case in which break on core::panicking::panic does not help.
My code controls the PWM output for a power inverter. I run the code with only one breakpoint, as in case of an accidental stop, the device may be destructed.

The rust_begin_unwind contains code to correct stoping of device. After this code I set breakpoint and want to watch registers and memory. But now stack already corrupted!

Removing noreturn from a -> ! function seems to be the right solution for thumbv7m-none-eabi (also thumbv6m-none-eabi and et.al.)

jonas-schievink · 2020-02-17T10:31:12Z

Submitted an upstream bug report: rust-lang/rust#69231

jonas-schievink · 2020-02-17T12:54:33Z

Has nobody tried building with -Cforce-frame-pointers=true yet? Is there a minimal example showing the issue?

MabezDev · 2020-02-17T13:29:05Z

Has nobody tried building with -Cforce-frame-pointers=true yet? Is there a minimal example showing the issue?

FYI it's -Cforce-frame-pointers=yes/no, but unfortunately it doesn't seem to make a difference to the backtrace.

I wasn't aware of a minimal example, so I put one together based on the quickstart that you can use. https://github.com/MabezDev/cortex-m-quickstart

jonas-schievink · 2020-02-17T13:30:06Z

Great, thanks!

MabezDev · 2020-02-17T20:07:40Z

Reposting here for anyone not following the upstream issue: rust-lang/rust#69231 (comment)

Forcing framepointers on fixes the backtrace issue, unfortunately the precompiled core doesn't force framepointers hence you will have to build your own with xargo or -Z build-std at the moment.

Don't eliminate frame pointers on thumb targets This should hopefully fix issues rust-lang#69231 and rust-embedded/cortex-m-rt#139. ~~I couldn't test this locally as the rustc I produced does not create binaries (no idea why).~~ Resolved.

jonas-schievink · 2020-02-20T12:44:45Z

This should be fixed on the current nightly

jonas-schievink · 2020-02-26T20:07:57Z

Closing as fixed!

Note that according to rust-lang/rust#69231 (comment) there might be a cheaper way of achieving this, so if someone wants to go after that, feel free! (I won't be spending more time on this as we don't have any easy to use tooling for assessing changes like this)

therealprof mentioned this issue Oct 26, 2018

Signature of bkpt() function rust-embedded/cortex-m#119

Closed

perlindgren mentioned this issue Dec 9, 2018

Backtrace for HardFault #158

Closed

korken89 mentioned this issue Jan 7, 2019

Embedded WG meeting, January 8 rust-embedded/wg#284

Closed

japaric mentioned this issue Jan 8, 2019

Embedded WG meeting, January 15 rust-embedded/wg#291

Closed

japaric mentioned this issue Jan 17, 2019

Embedded WG meeting, January 22 rust-embedded/wg#296

Closed

japaric mentioned this issue Jan 29, 2019

Embedded WG meeting, January 29 rust-embedded/wg#300

Closed

thejpster mentioned this issue Jan 29, 2019

Update .gdbinit to prevent infinite recursion in GDB rust-embedded/cortex-m-quickstart#65

Closed

jonas-schievink mentioned this issue Jan 27, 2020

All Hands 2020: Draft Agenda Items rust-embedded/wg#381

Closed

jonas-schievink added the upstream label Feb 16, 2020

jonas-schievink mentioned this issue Feb 17, 2020

noreturn on diverging functions makes LLVM trash the link register on thumb targets rust-lang/rust#69231

Closed

jonas-schievink mentioned this issue Feb 17, 2020

Don't eliminate frame pointers on thumb targets rust-lang/rust#69248

Merged

jonas-schievink closed this as completed Feb 26, 2020

datdenkikniet mentioned this issue Feb 13, 2023

cortex-m-rt: ensure the stack is 8-byte aligned. rust-embedded/cortex-m#463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite backtrace, odd backtrace behavior and "corrupted stack?" in GDB #139

Infinite backtrace, odd backtrace behavior and "corrupted stack?" in GDB #139

japaric commented Oct 26, 2018

adamgreen commented Oct 26, 2018

japaric commented Oct 26, 2018

thejpster commented Oct 27, 2018

adamgreen commented Dec 10, 2018

japaric commented Dec 10, 2018

adamgreen commented Dec 10, 2018

korken89 commented Jan 7, 2019

korken89 commented Jan 8, 2019

eddyp commented Jan 10, 2019 •

edited

Loading

adamgreig commented Jan 15, 2019

korken89 commented Jan 15, 2019

jamesmunns commented Jan 25, 2019 •

edited

Loading

eddyp commented Jan 25, 2019

ssendev commented Jul 22, 2019 •

edited

Loading

tpwrules commented Aug 5, 2019

jonas-schievink commented Nov 22, 2019

jamesmunns commented Nov 22, 2019 •

edited

Loading

qwerty19106 commented Feb 17, 2020

jonas-schievink commented Feb 17, 2020

jonas-schievink commented Feb 17, 2020

MabezDev commented Feb 17, 2020

jonas-schievink commented Feb 17, 2020

MabezDev commented Feb 17, 2020

jonas-schievink commented Feb 20, 2020

jonas-schievink commented Feb 26, 2020

Infinite backtrace, odd backtrace behavior and "corrupted stack?" in GDB #139

Infinite backtrace, odd backtrace behavior and "corrupted stack?" in GDB #139

Comments

japaric commented Oct 26, 2018

STR

Metadata

adamgreen commented Oct 26, 2018

japaric commented Oct 26, 2018

thejpster commented Oct 27, 2018

adamgreen commented Dec 10, 2018

japaric commented Dec 10, 2018

adamgreen commented Dec 10, 2018

korken89 commented Jan 7, 2019

korken89 commented Jan 8, 2019

Update from the WG meeting 2019-01-08

eddyp commented Jan 10, 2019 • edited Loading

adamgreig commented Jan 15, 2019

korken89 commented Jan 15, 2019

jamesmunns commented Jan 25, 2019 • edited Loading

eddyp commented Jan 25, 2019

ssendev commented Jul 22, 2019 • edited Loading

tpwrules commented Aug 5, 2019

jonas-schievink commented Nov 22, 2019

jamesmunns commented Nov 22, 2019 • edited Loading

qwerty19106 commented Feb 17, 2020

jonas-schievink commented Feb 17, 2020

jonas-schievink commented Feb 17, 2020

MabezDev commented Feb 17, 2020

jonas-schievink commented Feb 17, 2020

MabezDev commented Feb 17, 2020

jonas-schievink commented Feb 20, 2020

jonas-schievink commented Feb 26, 2020

eddyp commented Jan 10, 2019 •

edited

Loading

jamesmunns commented Jan 25, 2019 •

edited

Loading

ssendev commented Jul 22, 2019 •

edited

Loading

jamesmunns commented Nov 22, 2019 •

edited

Loading