Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtracing after stack overflow does not work on macOS #356

Open
losfair opened this issue Jul 1, 2020 · 9 comments
Open

Backtracing after stack overflow does not work on macOS #356

losfair opened this issue Jul 1, 2020 · 9 comments

Comments

@losfair
Copy link

losfair commented Jul 1, 2020

I'm trying to get a backtrace from a SIGSEGV caused by stack overflow (hitting guard page). It seems that this is not working on macOS.

My reproduction case:

use backtrace::Backtrace;
use std::{mem, ptr};

#[inline(never)]
fn f(x: i32) -> i32 {
    if x == 0 || x == 1 {
        1
    } else {
        f(x - 1) + f(x - 2)
    }
}

fn main() {
    unsafe {
        let mut handler: libc::sigaction = mem::zeroed();
        handler.sa_flags = libc::SA_ONSTACK;
        handler.sa_sigaction = trap_handler as usize;
        libc::sigemptyset(&mut handler.sa_mask);
        assert_eq!(libc::sigaction(libc::SIGSEGV, &handler, ptr::null_mut()), 0);

        // Backtracing from a normal SIGSEGV works
        //println!("Before invalid write");
        //ptr::write_volatile(0 as *mut u32, 0);
        //println!("After invalid write");

        // Backtracing from a stack overflow crashes
        println!("Before stack overflow");
        println!("{}", f(0xfffffff));
        println!("After stack overflow");
    }
}

unsafe extern "C" fn trap_handler(
    _: libc::c_int
) {
    println!("Backtrace begin");
    let backtrace = Backtrace::new_unresolved();
    println!("Backtrace result: {:?}", backtrace);
}

Output:

% ./target/release/backtrace-stackoverflow-bug
Before stack overflow
Backtrace begin
zsh: segmentation fault  ./target/release/backtrace-stackoverflow-bug

Rust version:

rustc 1.46.0-nightly (16957bd4d 2020-06-30)
binary: rustc
commit-hash: 16957bd4d3a5377263f76ed74c572aad8e4b7e59
commit-date: 2020-06-30
host: x86_64-apple-darwin
release: 1.46.0-nightly
LLVM version: 10.0
@alexcrichton
Copy link
Member

Thanks for the report! Can you perhaps get a stack trace in a debugger for this?

One common issue i've seen is that the sigaltstack is too small, so it may be a "double" stack overflow where the trap_handler is overflowing the sigaltstack, causing a second segfault.

@losfair
Copy link
Author

losfair commented Jul 2, 2020

I tried to allocate a 1MB sigaltstack, but the error persists:

        let mut stack_space = vec![0u8; 1048576];
        let new_stack = libc::stack_t {
            ss_sp: stack_space.as_mut_ptr() as *mut _,
            ss_flags: 0,
            ss_size: 1048576,
        };

        assert_eq!(libc::sigaltstack(&new_stack, ptr::null_mut()), 0);

I wasn't able to get a stack trace because the debugger can't resume execution from the signal handler after a EXC_BAD_ACCESS exception, due to a Darwin kernel bug.

@alexcrichton
Copy link
Member

Ah sorry but without the ability to reproduce or debug I'm not really sure what's going on here, I can't really help a whole lot :(

@workingjubilee
Copy link
Member

Excerpting relevant comments from the PR that adds a test to demonstrate this:

This library is not async signal safe, but it is safe for synchronous signals. In this case generating a backtrace from a segfault handler is intended to work.

—alexcrichton

Whether signal is generated in synchronous or asynchronous manner doesn't change the fact that the signal handler can only use async-signal-safe functions.

Take for example one reason why this crate isn't safe to use from a signal handler: the use of memory allocation routines. If signal is generated during an execution of a malloc, which holds an internal lock, and then the signal handler allocates memory and needs to acquire the same lock, a deadlock will occur.

—tmiasko

The segfault here is in the libunwind unwinder itself, and after researching a bit as to what's going on, it looks like the segfault is happening 16 bytes below the end of the stack. I believe the sequence of events can be reconstructed as:

  • Using libunwind we can get a handful of frames.
  • The frame that segfaults happens when we unwind the first frame of f
  • The frame f faulted in the middle of the function prologue
  • The unwind information for f is stored in a "compact format"
  • The compact format does not have a way to describe how to unwind in the middle of the prologue, instead it only defines how to unwind "during" the function
  • In interpreting the compact unwind information libunwind will hit a segfault again, trying to access memory the function itself faulted trying to push.

The issue here is that a stack overflow exception can happen anywhere in the prologue of a function, but generally unwind tables are not intended for arbitrarily happening in the prologue (there's the notion of "async unwind tables" on some systems for this). This means that the unwinder can't reliably unwind frames that are interrupted in the prologue.

Oh what I mean is that to generate a backtrace from a function that segfaulted in its prologue libunwind needs to know how to unwind from every single instruction in the function, not just the "body" after the prologue. AFAIK that's only supported with async unwind tables (and maybe full-dwarf unwind tables?), and I'm not sure how to get LLVM to generate non-compact or async unwind tables.

—alexcrichton

I do not see a reason to close this issue but to be frank, it is the sort of enhancement request that is likely to be open for a long, long time.

@bjorn3
Copy link
Member

bjorn3 commented Jun 28, 2023

Ignoring apple's compact unwind info I did expect backtraces to work in the prologue even without asynchronous unwinding support. Asynchronous unwinding is only necessary when popping stack frames and running cleanup code for faults at arbitrary instructions. I very much expect backtrace generation to unconditionally work at arbitrary locations. Sampling profiles depend on this.

@bjorn3
Copy link
Member

bjorn3 commented Jun 28, 2023

Also I believe LLVM is going to stop emitting compact unwind info for rust code or any other code not using the C, C++ or Obj-C personality functions as there is a limit of 3 personality functions in the compact unwind info format and these personality functions take up all room when used in the same executable/dylib.

@workingjubilee
Copy link
Member

I do think that we should try to improve the situation, FWIW, and I am aware incremental improvements may be sufficient for many use-cases. It just seems like fixing all this is a nontrivial haul.

@workingjubilee
Copy link
Member

Also I believe LLVM is going to stop emitting compact unwind info for rust code or any other code not using the C, C++ or Obj-C personality functions as there is a limit of 3 personality functions in the compact unwind info format and these personality functions take up all room when used in the same executable/dylib.

Can you confirm this and if so, open a new issue for that?

@bjorn3
Copy link
Member

bjorn3 commented Jun 28, 2023

If I understand correctly it got merged, then reverted because of a build error and a revert of the revert has been posted but not yet merged: rust-lang/rust#102754 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants