Backtracing after stack overflow does not work on macOS #356

losfair · 2020-07-01T17:15:19Z

I'm trying to get a backtrace from a SIGSEGV caused by stack overflow (hitting guard page). It seems that this is not working on macOS.

My reproduction case:

use backtrace::Backtrace;
use std::{mem, ptr};

#[inline(never)]
fn f(x: i32) -> i32 {
    if x == 0 || x == 1 {
        1
    } else {
        f(x - 1) + f(x - 2)
    }
}

fn main() {
    unsafe {
        let mut handler: libc::sigaction = mem::zeroed();
        handler.sa_flags = libc::SA_ONSTACK;
        handler.sa_sigaction = trap_handler as usize;
        libc::sigemptyset(&mut handler.sa_mask);
        assert_eq!(libc::sigaction(libc::SIGSEGV, &handler, ptr::null_mut()), 0);

        // Backtracing from a normal SIGSEGV works
        //println!("Before invalid write");
        //ptr::write_volatile(0 as *mut u32, 0);
        //println!("After invalid write");

        // Backtracing from a stack overflow crashes
        println!("Before stack overflow");
        println!("{}", f(0xfffffff));
        println!("After stack overflow");
    }
}

unsafe extern "C" fn trap_handler(
    _: libc::c_int
) {
    println!("Backtrace begin");
    let backtrace = Backtrace::new_unresolved();
    println!("Backtrace result: {:?}", backtrace);
}

Output:

% ./target/release/backtrace-stackoverflow-bug
Before stack overflow
Backtrace begin
zsh: segmentation fault  ./target/release/backtrace-stackoverflow-bug

Rust version:

rustc 1.46.0-nightly (16957bd4d 2020-06-30)
binary: rustc
commit-hash: 16957bd4d3a5377263f76ed74c572aad8e4b7e59
commit-date: 2020-06-30
host: x86_64-apple-darwin
release: 1.46.0-nightly
LLVM version: 10.0

The text was updated successfully, but these errors were encountered:

alexcrichton · 2020-07-01T19:02:59Z

Thanks for the report! Can you perhaps get a stack trace in a debugger for this?

One common issue i've seen is that the sigaltstack is too small, so it may be a "double" stack overflow where the trap_handler is overflowing the sigaltstack, causing a second segfault.

losfair · 2020-07-02T09:31:20Z

I tried to allocate a 1MB sigaltstack, but the error persists:

        let mut stack_space = vec![0u8; 1048576];
        let new_stack = libc::stack_t {
            ss_sp: stack_space.as_mut_ptr() as *mut _,
            ss_flags: 0,
            ss_size: 1048576,
        };

        assert_eq!(libc::sigaltstack(&new_stack, ptr::null_mut()), 0);

I wasn't able to get a stack trace because the debugger can't resume execution from the signal handler after a EXC_BAD_ACCESS exception, due to a Darwin kernel bug.

alexcrichton · 2020-07-02T16:34:48Z

Ah sorry but without the ability to reproduce or debug I'm not really sure what's going on here, I can't really help a whole lot :(

More info: rust-lang/backtrace-rs#356 And related PR: rust-lang/backtrace-rs#357

workingjubilee · 2023-06-28T22:21:00Z

Excerpting relevant comments from the PR that adds a test to demonstrate this:

This library is not async signal safe, but it is safe for synchronous signals. In this case generating a backtrace from a segfault handler is intended to work.

—alexcrichton

Whether signal is generated in synchronous or asynchronous manner doesn't change the fact that the signal handler can only use async-signal-safe functions.

Take for example one reason why this crate isn't safe to use from a signal handler: the use of memory allocation routines. If signal is generated during an execution of a malloc, which holds an internal lock, and then the signal handler allocates memory and needs to acquire the same lock, a deadlock will occur.

—tmiasko

The segfault here is in the libunwind unwinder itself, and after researching a bit as to what's going on, it looks like the segfault is happening 16 bytes below the end of the stack. I believe the sequence of events can be reconstructed as:

Using libunwind we can get a handful of frames.

The frame that segfaults happens when we unwind the first frame of f

The frame f faulted in the middle of the function prologue

The unwind information for f is stored in a "compact format"

The compact format does not have a way to describe how to unwind in the middle of the prologue, instead it only defines how to unwind "during" the function

In interpreting the compact unwind information libunwind will hit a segfault again, trying to access memory the function itself faulted trying to push.

The issue here is that a stack overflow exception can happen anywhere in the prologue of a function, but generally unwind tables are not intended for arbitrarily happening in the prologue (there's the notion of "async unwind tables" on some systems for this). This means that the unwinder can't reliably unwind frames that are interrupted in the prologue.

Oh what I mean is that to generate a backtrace from a function that segfaulted in its prologue libunwind needs to know how to unwind from every single instruction in the function, not just the "body" after the prologue. AFAIK that's only supported with async unwind tables (and maybe full-dwarf unwind tables?), and I'm not sure how to get LLVM to generate non-compact or async unwind tables.

—alexcrichton

I do not see a reason to close this issue but to be frank, it is the sort of enhancement request that is likely to be open for a long, long time.

bjorn3 · 2023-06-28T22:51:23Z

Ignoring apple's compact unwind info I did expect backtraces to work in the prologue even without asynchronous unwinding support. Asynchronous unwinding is only necessary when popping stack frames and running cleanup code for faults at arbitrary instructions. I very much expect backtrace generation to unconditionally work at arbitrary locations. Sampling profiles depend on this.

bjorn3 · 2023-06-28T22:54:39Z

Also I believe LLVM is going to stop emitting compact unwind info for rust code or any other code not using the C, C++ or Obj-C personality functions as there is a limit of 3 personality functions in the compact unwind info format and these personality functions take up all room when used in the same executable/dylib.

workingjubilee · 2023-06-28T22:56:39Z

I do think that we should try to improve the situation, FWIW, and I am aware incremental improvements may be sufficient for many use-cases. It just seems like fixing all this is a nontrivial haul.

workingjubilee · 2023-06-28T23:02:05Z

Also I believe LLVM is going to stop emitting compact unwind info for rust code or any other code not using the C, C++ or Obj-C personality functions as there is a limit of 3 personality functions in the compact unwind info format and these personality functions take up all room when used in the same executable/dylib.

Can you confirm this and if so, open a new issue for that?

bjorn3 · 2023-06-28T23:34:54Z

If I understand correctly it got merged, then reverted because of a build error and a revert of the revert has been posted but not yet merged: rust-lang/rust#102754 (comment)

losfair mentioned this issue Jul 6, 2020

Add test for backtracing after stack overflow. #357

Closed

nlewycky pushed a commit to wasmerio/wasmer that referenced this issue Aug 13, 2020

Skip stack guard page tests on darwin

03ac81c

More info: rust-lang/backtrace-rs#356 And related PR: rust-lang/backtrace-rs#357

workingjubilee added OS-macos enhancement help wanted labels Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backtracing after stack overflow does not work on macOS #356

Backtracing after stack overflow does not work on macOS #356

losfair commented Jul 1, 2020 •

edited

Loading

alexcrichton commented Jul 1, 2020

losfair commented Jul 2, 2020

alexcrichton commented Jul 2, 2020

workingjubilee commented Jun 28, 2023

bjorn3 commented Jun 28, 2023

bjorn3 commented Jun 28, 2023

workingjubilee commented Jun 28, 2023

workingjubilee commented Jun 28, 2023

bjorn3 commented Jun 28, 2023

Backtracing after stack overflow does not work on macOS #356

Backtracing after stack overflow does not work on macOS #356

Comments

losfair commented Jul 1, 2020 • edited Loading

alexcrichton commented Jul 1, 2020

losfair commented Jul 2, 2020

alexcrichton commented Jul 2, 2020

workingjubilee commented Jun 28, 2023

bjorn3 commented Jun 28, 2023

bjorn3 commented Jun 28, 2023

workingjubilee commented Jun 28, 2023

workingjubilee commented Jun 28, 2023

bjorn3 commented Jun 28, 2023

losfair commented Jul 1, 2020 •

edited

Loading