Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during tests on rust 1.27.x on macOS (SIGSEGV: invalid memory reference) #52390

Closed
ejpcmac opened this issue Jul 14, 2018 · 19 comments
Closed
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@ejpcmac
Copy link

ejpcmac commented Jul 14, 2018

Problem description

Running the tests for a program I’ve written leads to a segfault when compiled by Rust 1.27.0 or 1.27.1:

$ cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running target/debug/deps/diceware-c7f6b180dd3b52c6

running 3 tests
error: process didn't exit successfully: `/***/diceware/target/debug/deps/diceware-c7f6b180dd3b52c6` (signal: 11, SIGSEGV: invalid memory reference)

The tests work well on Rust 1.26.2, 1.28-beta.10 and today’s nightly. Only the 1.27.x releases seem impacted by this bug.

Meta

$ rustc --version --verbose
rustc 1.27.1 (5f2b325f6 2018-07-07)
binary: rustc
commit-hash: 5f2b325f64ed6caa7179f3e04913db437656ec7e
commit-date: 2018-07-07
host: x86_64-apple-darwin
release: 1.27.1
LLVM version: 6.0

Note that the debug and release binaries seem to work properly. Only the test runner seem to be impacted.

@Mark-Simulacrum
Copy link
Member

I can't reproduce on macOS or Ubuntu; what version of macOS do you have? Can you get a backtrace on the segfault? Is the segfault persistent, or is it transient?

@Mark-Simulacrum Mark-Simulacrum added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. C-bug Category: This is a bug. labels Jul 14, 2018
@ejpcmac
Copy link
Author

ejpcmac commented Jul 14, 2018

I am on OS X 10.11.6. Running RUST_BACKTRACE=1 cargo test I get nothing more than the trace I’ve put in my first message. The segfault is persistent, at least on 100% of the 10-20 runs I’ve done.

@Mark-Simulacrum
Copy link
Member

You'll need to run with something like lldb or gdb to get the segfault or look for a core dump (I believe those are generated on macOS, though you may need to do something to get them).

I am testing on 10.13.5 so this is possibly related to #51828, though that claims to only be a problem on 10.10.

@kennytm
Copy link
Member

kennytm commented Jul 14, 2018

Could you run target/debug/deps/diceware-c7f6b180dd3b52c6 in lldb and get the stack trace from there?

@Mark-Simulacrum
Copy link
Member

I do get repeated assertion failures in makes_a_passphrase_with_special_char, but even those don't happen on every run.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 14, 2018

Running in lldb I get:

(lldb) run
Process 46736 launched: './diceware-c7f6b180dd3b52c6' (x86_64)

running 3 tests
Process 46736 stopped
* thread #2: tid = 0x959310, 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237, name = 'tests::returns_an_error_if_number_of_words_is_zero', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237

Edit note: running multiple times gets to the very same failure.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 14, 2018

I do get repeated assertion failures in makes_a_passphrase_with_special_char, but even those don't happen on every run.

@Mark-Simulacrum Regarding this, I think it is due to the test itself lacking for a check. If the special char is a digit and inserted in a digit, it can match a dictionary word and the test fails. I have not yet figured how to write a better test to catch this. The problem seems to occur very early in the process, and in returns_an_error_if_number_of_words_is_zero.

@Mark-Simulacrum
Copy link
Member

Can you run where or maybe backtrace in lldb to get the full trace?

@kennytm
Copy link
Member

kennytm commented Jul 15, 2018

ptr.rs:237 is inside swap_nonoverlapping_bytes(). I'm pretty sure it is the same TLS issue, though I don't know why it affects 10.11.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 15, 2018

@Mark-Simulacrum Sorry for the delay, it was already late here in Europe.

Follows a new run with its backtrace:

(lldb) run
Process 48199 launched: './diceware-c7f6b180dd3b52c6' (x86_64)

running 3 tests
Process 48199 stopped
* thread #2: tid = 0x95c6e0, 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237, name = 'tests::returns_an_error_if_number_of_words_is_zero', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237
(lldb) bt
* thread #2: tid = 0x95c6e0, 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237, name = 'tests::returns_an_error_if_number_of_words_is_zero', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237
    frame #1: 0x000000010003dec3 diceware-c7f6b180dd3b52c6`std::sys_common::backtrace::__rust_begin_short_backtrace::hdb024ac408a215ad (.llvm.10125672080075898413) + 563 at mod.rs:650
    frame #2: 0x0000000100054f78 diceware-c7f6b180dd3b52c6`std::panicking::try::do_call::hf7d4215061b5619b (.llvm.18293538534938544574) + 40 at mod.rs:409
    frame #3: 0x00000001000d865f diceware-c7f6b180dd3b52c6`__rust_maybe_catch_panic + 31 at lib.rs:105
    frame #4: 0x000000010003f7b5 diceware-c7f6b180dd3b52c6`_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hbf42c8a8f5d699cc + 165 at panicking.rs:289
    frame #5: 0x00000001000c9018 diceware-c7f6b180dd3b52c6`std::sys_common::thread::start_thread::hf39c8bd91f08cd93 + 136 at boxed.rs:648
    frame #6: 0x00000001000b95d9 diceware-c7f6b180dd3b52c6`std::sys::unix::thread::Thread::new::thread_start::h27c7af0fa85baf64 + 9 at thread.rs:90
    frame #7: 0x00007fff93c5299d libsystem_pthread.dylib`_pthread_body + 131
    frame #8: 0x00007fff93c5291a libsystem_pthread.dylib`_pthread_start + 168
    frame #9: 0x00007fff93c50351 libsystem_pthread.dylib`thread_start + 13

@pnkfelix
Copy link
Member

tagging as T-compiler under assumption that this is in our wheel house, at least until we seen evidence to the contrary

@pnkfelix pnkfelix added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority and removed I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. labels Jul 19, 2018
@ejpcmac
Copy link
Author

ejpcmac commented Jul 19, 2018

If I can help in some way please let me know.

@kennytm
Copy link
Member

kennytm commented Jul 19, 2018

@ejpcmac In the debugger when it crashed, could you execute disas and reg read to show the disassembly and registry dump?

@ejpcmac
Copy link
Author

ejpcmac commented Jul 19, 2018

@kennytm For sure. I’ll do it as soon as I’m home.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 19, 2018

@kennytm Here I am:

(lldb) disas
diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a:
    0x1000cc990 <+0>:  pushq  %rbp
    0x1000cc991 <+1>:  movq   %rsp, %rbp
    0x1000cc994 <+4>:  subq   $0x10, %rsp
    0x1000cc998 <+8>:  leaq   0xebc09(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc99f <+15>: callq  *(%rdi)
    0x1000cc9a1 <+17>: cmpq   $0x1, (%rax)
    0x1000cc9a5 <+21>: jne    0x1000cc9b6               ; <+38>
    0x1000cc9a7 <+23>: leaq   0xebbfa(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc9ae <+30>: callq  *(%rdi)
    0x1000cc9b0 <+32>: movq   0x8(%rax), %rcx
    0x1000cc9b4 <+36>: jmp    0x1000cc9d7               ; <+71>
    0x1000cc9b6 <+38>: movl   $0x1, %eax
    0x1000cc9bb <+43>: movd   %rax, %xmm0
    0x1000cc9c0 <+48>: movdqa %xmm0, -0x10(%rbp)
    0x1000cc9c5 <+53>: leaq   0xebbdc(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc9cc <+60>: callq  *(%rdi)
    0x1000cc9ce <+62>: movaps -0x10(%rbp), %xmm0
->  0x1000cc9d2 <+66>: movaps %xmm0, (%rax)
    0x1000cc9d5 <+69>: xorl   %ecx, %ecx
    0x1000cc9d7 <+71>: leaq   0xebbca(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc9de <+78>: callq  *(%rdi)
    0x1000cc9e0 <+80>: movq   %rcx, 0x8(%rax)
    0x1000cc9e4 <+84>: testq  %rcx, %rcx
    0x1000cc9e7 <+87>: setne  %al
    0x1000cc9ea <+90>: addq   $0x10, %rsp
    0x1000cc9ee <+94>: popq   %rbp
    0x1000cc9ef <+95>: retq   

(lldb) reg read
General Purpose Registers:
       rax = 0x0000000100500358
       rbx = 0x000000010180d008
       rcx = 0x0000000000000000
       rdx = 0x0000000000000000
       rdi = 0x00000001001b85a8  diceware-c7f6b180dd3b52c6`std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
       rsi = 0x0000000000000103
       rbp = 0x00007000006068e0
       rsp = 0x00007000006068d0
        r8 = 0x0000000101217358
        r9 = 0x0000000000a45e09
       r10 = 0x0000000101217360
       r11 = 0xffffffff00000000
       r12 = 0x0000000000000001
       r13 = 0x0000000000001003
       r14 = 0x0000000000000000
       r15 = 0x0000000101217380
       rip = 0x00000001000cc9d2  diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66
    rflags = 0x0000000000010202
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

@kennytm
Copy link
Member

kennytm commented Jul 19, 2018

Thanks @ejpcmac! As the debugger error shows, the segfault is indeed caused by unaligned TLS access (movaps %xmm0, (%rax) but %rax is not 16-byte-aligned).

As mentioned in the issue there's no bug in 1.28-beta. We may workaround in 1.27 by forcing a link_section in the thread_local! macro (not tested)...

diff --git a/src/libstd/thread/local.rs b/src/libstd/thread/local.rs
index a170abb262..79293acc2c 100644
--- a/src/libstd/thread/local.rs
+++ b/src/libstd/thread/local.rs
@@ -177,6 +177,7 @@ macro_rules! __thread_local_inner {
                     $crate::thread::__StaticLocalKeyInner::new();
 
                 #[thread_local]
+                #[cfg_attr(target_os = "macos", link_section = "__DATA,__thread_data")]
                 #[cfg(all(target_thread_local, not(target_arch = "wasm32")))]
                 static __KEY: $crate::thread::__FastLocalKeyInner<$t> =
                     $crate::thread::__FastLocalKeyInner::new();

... but this won't affect other #[thread_local] variables if existing (and also bloats the executable size since it can't be placed in the BSS), and I'm not sure if this worth a 1.27.3.

@Mark-Simulacrum
Copy link
Member

I don't think it's worth 1.27.3. We already decided that we're not going to backport the original TLS patch -- and 1.28 will be out in two weeks anyway.

@pnkfelix
Copy link
Member

visiting for triage. I concur with @Mark-Simulacrum 's assessment. Only question is whether to close this bug today, or close it after 1.28 is released.

@nikomatsakis
Copy link
Contributor

Visiting in compiler team meeting. Inclination is to close now in favor of using 1.28 (released on Aug 2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants