[MacOS] nightly causes runtime segfault when crate binary is compiled with --release #50586

jjpe · 2018-05-09T21:40:25Z

On OS X 10.10, when I compile a crate I wrote for work and run the crate binary using cargo run --release, the generated binary will segfault when I try certain actions. Oddly enough, debug builds do not seem to exhibit the segfaulting behavior at all.

This bug is tricky to describe, as I have no clue what causes the generated binary to segfault.
Just to rule out the factor: the crate I'm building has no unsafe code.
However, I do know that compiling using up to and including rustc 1.26.0-nightly (392645394 2018-03-15), the generated crate binary runs fine.

Starting with rustc 1.26.0-nightly (55c984ee5 2018-03-16) something has changed in behavior to cause this segfault on OS X.

For context and some additional information, please see the discussion on users.rust-lang.org.

The text was updated successfully, but these errors were encountered:

kennytm · 2018-05-09T21:52:46Z

PRs in the range 3926453...55c984e:

* 55c984ee5d Auto merge of #48097 - oli-obk:did_you_mean_to_run_rls_with_clippy, r=nrc
* 3b6412b943 Auto merge of #48896 - alexcrichton:bitcode-in-object, r=michaelwoerister
* cc34ca1c97 Auto merge of #48818 - michaelwoerister:issue-47309, r=eddyb
* 5f3996c3ec Auto merge of #48813 - sinkuu:build_in_assert_macro, r=alexcrichton
* a7170b0412 Auto merge of #48524 - abonander:check-macro-stability, r=petrochenkov
* 36b6687318 Auto merge of #49051 - kennytm:rollup, r=kennytm
* db2f0ae1f4 Rollup merge of #48972 - mark-i-m:rustdoc_readme, r=QuietMisdreavus
* fb49ae0260 Rollup merge of #49049 - Centril:fix/flatten-fusediterator, r=bluss
* a199fb2319 Rollup merge of #49037 - estebank:coherence-tweaks, r=nikomatsakis
* 64490ff883 Rollup merge of #48990 - ExpHP:dont-drop-the-bomb, r=estebank
* 6fbd03393b Rollup merge of #48853 - Songbird0:addrparseerror_documentation_improvement, r=GuillaumeGomez
* 90230ce372 Rollup merge of #49050 - snf:fix_oom_asmjs, r=alexcrichton
* 55a0075a20 Rollup merge of #49042 - kennytm:fix-e0307-typo, r=rkruppe
* 77cf02de18 Rollup merge of #49024 - draganmladjenovic:mips64_cabi_sret, r=sanxiyn
* 5a7aa6cae5 Rollup merge of #49007 - estebank:follow-up-47574, r=oli-obk
* bf270bb66a Rollup merge of #48965 - alexcrichton:add-sha-feature, r=eddyb
* 97b489ef39 Rollup merge of #48961 - Songbird0:rustdoc_doctests_assertions, r=GuillaumeGomez
* 292c6ca041 Rollup merge of #48959 - alexcrichton:signext, r=eddyb
* 35b008cb92 Rollup merge of #48957 - Eijebong:rand, r=michaelwoerister
* e1d19df9a5 Rollup merge of #48922 - petrochenkov:asunder, r=nikomatsakis
* 68a602efa9 Rollup merge of #48892 - alexcrichton:thinlto-again, r=Mark-Simulacrum
* da88827267 Rollup merge of #48875 - jcowgill:mips-test-fixes, r=sanxiyn
* a45b79ddb7 Rollup merge of #48706 - ehuss:main-not-found-in-crate, r=estebank

jjpe · 2018-05-09T22:03:39Z

@kennytm Does the regression-from-stable-to-stable label apply to regressions on nightly as well?

kennytm · 2018-05-09T22:06:06Z

@jjpe It is "stable-to-stable" since the regression happened at 1.26, the coming stable version.

nikomatsakis · 2018-05-10T14:39:50Z

Hmm, I have a "nagging feeling" this may fall under the heading of "LLVM codegen regression" (#50422). Without access to some way to reproduce, it's going to be hard to track down. Here are some ideas you might try though:

See if enabling or disabling lto in your "release" configuration makes any difference, perhaps?
- I forget how to do that :P but can figure it out if needed; add something like lto = no to the [profile.release] in Cargo.toml, probably
Try to get a test case you can give to us :)
Or, bisect yourself through the commits in this range. I think though that because this regression is so old, we'll have to build from source to do the builds by hand.

nikomatsakis · 2018-05-10T14:40:25Z

triage: P-high

However, this is not assigned to anyone yet, as I don't know that we have any immediate steps forward available to us.

alexcrichton · 2018-05-11T15:40:30Z

One thing this may be which I've seen a lot of lately is a misaligned memory access due to misaligned TLS slots. To diagnose that though we'd need to take a look a gdb backtrace as well as the faulting instruction

kennytm · 2018-05-11T18:29:31Z

TLS... that reminds me of #48866. Any chance you're using Xcode 7?

daschl · 2018-05-15T04:35:10Z

I think I may got more info on this one since I may ran into it as well. I'm on OSX 10.11.6

The code works fine on 1.25 but immediately segfaults on 1.26. Its also using the regex crate and here is the valgrind info:

==75691== Process terminating with default action of signal 11 (SIGSEGV)
==75691==  General Protection Fault
==75691==    at 0x100151B0B: thread_local::thread_id::get (in target/release/oida)
==75691==    by 0x1000FC3E5: regex::re_unicode::Regex::captures (in target/release/oida)
==75691==    by 0x100032CEB: oida::index::EventIndex::feed (in target/release/oida)
==75691==    by 0x10003C742: oida::main (in target/release/oida)
==75691==    by 0x10002ECE5: std::rt::lang_start::{{closure}} (in target/release/oida)
==75691==    by 0x100172477: _ZN3std9panicking3try7do_call17hc7b9e6190a1d9f3eE.llvm.7628423903506348692 (rt.rs:59)
==75691==    by 0x10018655E: __rust_maybe_catch_panic (lib.rs:102)
==75691==    by 0x100170E01: std::rt::lang_start_internal (panicking.rs:285)
==75691==    by 0x10003E16B: main (in target/release/oida)

A couple of more data points

It works with development but fails with release
Code depends on regex 1.0 which itself depends on thread_local 0.3.5 which seems to cause it
It works with lto = true on 1.26 but fails with lto = false

I'll see if I can come up with an isolated test.

daschl · 2018-05-15T04:46:16Z

Actually that was easier than expected, it boils down to this to segfault:

extern crate thread_local;

use thread_local::ThreadLocal;

fn main() {
    let tls: ThreadLocal<u32> = ThreadLocal::new();
    assert_eq!(tls.get(), None);
}

$ valgrind target/release/examples/repro
==80783== Memcheck, a memory error detector
==80783== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==80783== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==80783== Command: target/release/examples/repro
==80783== 
--80783-- run: /usr/bin/dsymutil "target/release/examples/repro"
==80783== 
==80783== Process terminating with default action of signal 11 (SIGSEGV)
==80783==  General Protection Fault
==80783==    at 0x100002CAB: thread_local::thread_id::get (in target/release/examples/repro)
==80783==    by 0x1000020DB: repro::main (in target/release/examples/repro)
==80783==    by 0x100001CE5: std::rt::lang_start::{{closure}} (in target/release/examples/repro)
==80783==    by 0x10000B227: _ZN3std9panicking3try7do_call17hc7b9e6190a1d9f3eE.llvm.7628423903506348692 (rt.rs:59)
==80783==    by 0x10001BFAE: __rust_maybe_catch_panic (lib.rs:102)
==80783==    by 0x100009D51: std::rt::lang_start_internal (panicking.rs:285)
==80783==    by 0x10000222B: main (in target/release/examples/repro)
==80783== 
==80783== HEAP SUMMARY:
==80783==     in use at exit: 22,839 bytes in 193 blocks
==80783==   total heap usage: 259 allocs, 66 frees, 28,607 bytes allocated
==80783== 
==80783== LEAK SUMMARY:
==80783==    definitely lost: 0 bytes in 0 blocks
==80783==    indirectly lost: 0 bytes in 0 blocks
==80783==      possibly lost: 2,064 bytes in 1 blocks
==80783==    still reachable: 184 bytes in 2 blocks
==80783==         suppressed: 20,591 bytes in 190 blocks
==80783== Rerun with --leak-check=full to see details of leaked memory
==80783== 
==80783== For counts of detected and suppressed errors, rerun with: -v
==80783== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault: 11

Let me know if there is more I can do to help debug this.

jjpe · 2018-05-15T09:07:42Z

I tried turning on LTO, and that definitely makes the segfault go away on OS X.
In addition, I built the code with --release in Linux and Win10, and there are no regressions there that I can detect ATM.

As for where the issue occurs, the code sample I mentioned on rust-lang.org is this:

    pub fn read_regex(&mut self, regex: &Regex) -> LexResult<Token> {
        self.ensure_has_input()?;
        let (m_start, m_end) = match regex.find(self.remaining_source_code()) {
            Some(regex_match) if regex_match.start() == 0 =>
                (regex_match.start(),  regex_match.end()),
            _ => return Err(LexErr::ExpectedRegexMatch {
                position: self.position()
            }),
        };
        let start = self.position() + m_start;
        let end   = self.position() + m_end;
        self.position = end;
        Ok(Token::new(start..end))
    }

The segfault location here is in the regex.find(self.remaining_source_code()) method call (as opposed to its argument self.remaining_source_code(), which works as it should). So it doesn't only happen in Regex.captures().

daschl · 2018-05-15T09:17:12Z

@jjpe I'm curious - can you run the binary with valgrind so we see where it ends up in the stack when segfaulting? Checking to see if the thread locals are also the culprit.

jjpe · 2018-05-15T11:09:46Z

I redacted some details due to the fact that this is proprietary software, but I think the parts relevant to this thread (no pun intended) are still in there. Specifically, looks like you're correct about thread locals being issue:

$ sudo valgrind ./target/release/sxi
Password:
==17563== Memcheck, a memory error detector
==17563== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17563== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==17563== Command: ./target/release/sxi
==17563==
--17563-- run: /usr/bin/dsymutil "./target/release/sxi"
Press Ctrl-D or enter "quit" to exit.
>> "foo"
==17563==
==17563== Process terminating with default action of signal 11 (SIGSEGV)
==17563==  General Protection Fault
==17563==    at 0x1004BDD4B: thread_local::thread_id::get (in ./target/release/sxi)
==17563==    by 0x100471960: regex::re_unicode::Regex::find_at (in ./target/release/sxi)
==17563==    by 0x10046EBB0: regex::re_unicode::Regex::find (in ./target/release/sxi)
==17563==    by 0x100409777: PARSER_CRATE::lexer::Lexer::read_regex (in ./target/release/sxi)
==17563==    by 0x10040A47B: PARSER_CRATE::lex_rules::whitespace (in ./target/release/sxi)
==17563==    by 0x10030CE3E: core::ops::function::Fn::call (in ./target/release/sxi)
                          [redacted]
==17563==    by 0x100407BF0: core::ops::function::Fn::call (in ./target/release/sxi)
==17563==
==17563== HEAP SUMMARY:
==17563==     in use at exit: 42,555 bytes in 432 blocks
==17563==   total heap usage: 520 allocs, 88 frees, 52,289 bytes allocated
==17563==
==17563== LEAK SUMMARY:
==17563==    definitely lost: 0 bytes in 0 blocks
==17563==    indirectly lost: 0 bytes in 0 blocks
==17563==      possibly lost: 0 bytes in 0 blocks
==17563==    still reachable: 7,575 bytes in 15 blocks
==17563==         suppressed: 34,980 bytes in 417 blocks
==17563== Rerun with --leak-check=full to see details of leaked memory
==17563==
==17563== For counts of detected and suppressed errors, rerun with: -v
==17563== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

alexcrichton · 2018-05-15T17:56:46Z

From what we know about this bug the issue here is that there's a bug in older linkers on OSX which seems to be fixed on more recent versions. AFAIK we don't know what the bug is, only that newer versions of the linker, when used, do not produce binaries that fault whereas older binaries do.

One alternative is to compile Rust code with MACOSX_DEPLOYMENT_TARGET=10.6 which will cause libstd a different strategy of TLS that shouldn't hit this linker bug.

daschl · 2018-05-16T04:25:46Z

@alexcrichton do you have an idea why it still works with 1.25 then? Is it because of the LLVM upgrade which triggered this linker bug?

alexcrichton · 2018-05-16T14:06:59Z

@daschl LLVM upgrades bring a huge amount of changes, so my guess is that something entirely unrelated is triggering a preexisting bug. I'd doubt that an LLVM change which caused this to worsen would be the actual cause of the underlying issue.

jjpe · 2018-05-16T14:17:23Z

Well, at least for now enabling LTO is the "workaround of least resistance" because I don't need to care about env vars, and it doesn't seem to bring any performance regressions in my case either. So until I get a chance to upgrade the OS I'll just use that.

I'm not sure what this means for the bug, though.

pnkfelix · 2018-07-12T15:01:48Z

visiting for triage. assigning self to evaluate current status and what our options are.

kennytm · 2018-07-12T15:10:39Z

If it is TLS it might be fixed by #51828. Someone needs to confirm.

jjpe · 2018-07-13T08:02:11Z

I just built my crate with using cargo +nightly run --release and with lto = true removed from Cargo.toml.
cargo +nightly -V reports cargo 1.29.0-nightly (af9e40c26 2018-07-05) (which is a tad odd, since I ran rustup update right before that, and rustup reported having updated nightly to rustc 1.29.0-nightly (64f7de921 2018-07-12)).

I cannot reproduce the issue anymore.

SimonSapin · 2018-07-13T08:45:07Z

Thank you @jjpe for confirming that this is fixed!

(The Cargo version that you get together with Nightly is based on the src/tools/cargo submodule in this repository, which is only updated occasionally. This is why cargo +nightly -V shows can show a date older than rustc +nightly -V. In both cases the date is that of the latest merge commit made by @bors in the corresponding repository.)

kennytm added O-macos Operating system: macOS regression-from-stable-to-stable Performance or correctness regression from one stable version to another. C-bug Category: This is a bug. labels May 9, 2018

nikomatsakis added the P-high High priority label May 10, 2018

kennytm added the A-thread-locals Area: Thread local storage (TLS) label May 15, 2018

micbou mentioned this issue May 28, 2018

Segfault Compiling Rust Completer ycm-core/YouCompleteMe#3040

Closed

sfackler mentioned this issue Jun 5, 2018

Segmentation fault on macOS hyperium/hyper#1534

Closed

BurntSushi mentioned this issue Jun 17, 2018

Segmentation fault: 11 error on OSX in release mode rust-lang/regex#489

Closed

sfackler mentioned this issue Jun 24, 2018

println! segfaults with nightly-2018-06-24 on macOS 10.10 #51758

Closed

kennytm mentioned this issue Jun 26, 2018

segfault on 'index out of bounds' OS X Yosemite #51794

Closed

pnkfelix self-assigned this Jul 12, 2018

SimonSapin closed this as completed Jul 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MacOS] nightly causes runtime segfault when crate binary is compiled with --release #50586

[MacOS] nightly causes runtime segfault when crate binary is compiled with --release #50586

jjpe commented May 9, 2018 •

edited

Loading

kennytm commented May 9, 2018

jjpe commented May 9, 2018

kennytm commented May 9, 2018

nikomatsakis commented May 10, 2018

nikomatsakis commented May 10, 2018

alexcrichton commented May 11, 2018

kennytm commented May 11, 2018

daschl commented May 15, 2018

daschl commented May 15, 2018 •

edited

Loading

jjpe commented May 15, 2018

daschl commented May 15, 2018

jjpe commented May 15, 2018

alexcrichton commented May 15, 2018

daschl commented May 16, 2018

alexcrichton commented May 16, 2018

jjpe commented May 16, 2018 •

edited

Loading

pnkfelix commented Jul 12, 2018

kennytm commented Jul 12, 2018

jjpe commented Jul 13, 2018

SimonSapin commented Jul 13, 2018

[MacOS] nightly causes runtime segfault when crate binary is compiled with --release #50586

[MacOS] nightly causes runtime segfault when crate binary is compiled with --release #50586

Comments

jjpe commented May 9, 2018 • edited Loading

kennytm commented May 9, 2018

jjpe commented May 9, 2018

kennytm commented May 9, 2018

nikomatsakis commented May 10, 2018

nikomatsakis commented May 10, 2018

alexcrichton commented May 11, 2018

kennytm commented May 11, 2018

daschl commented May 15, 2018

daschl commented May 15, 2018 • edited Loading

jjpe commented May 15, 2018

daschl commented May 15, 2018

jjpe commented May 15, 2018

alexcrichton commented May 15, 2018

daschl commented May 16, 2018

alexcrichton commented May 16, 2018

jjpe commented May 16, 2018 • edited Loading

pnkfelix commented Jul 12, 2018

kennytm commented Jul 12, 2018

jjpe commented Jul 13, 2018

SimonSapin commented Jul 13, 2018

jjpe commented May 9, 2018 •

edited

Loading

daschl commented May 15, 2018 •

edited

Loading

jjpe commented May 16, 2018 •

edited

Loading