Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM generates incorrect code with -Zprofile #70148

Closed
Lesiuk opened this issue Mar 19, 2020 · 22 comments · Fixed by #70527
Closed

LLVM generates incorrect code with -Zprofile #70148

Lesiuk opened this issue Mar 19, 2020 · 22 comments · Fixed by #70527
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Lesiuk
Copy link

Lesiuk commented Mar 19, 2020

Instructions to reproduce:

  • git clone [email protected]:servo/html5ever.git
  • cd html5ever/markup5ever
  • CARGO_INCREMENTAL=0 RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Cinline-threshold=0" cargo check
  • The build script of markup5ever segfaults.

The actual function with the bug is an instance std::panicking::try in the proc_macro2 crate.

The cause seems to be an LLVM bug.

full gist

This LLVM IR is generated:

catch.i:                                          ; preds = %.noexc
  %120 = phi i64* [ getelementptr inbounds ([24 x i64], [24 x i64]* @__llvm_gcov_ctr.27, i64 0, i64 11), %.noexc ], !dbg !2861
  %121 = landingpad { i8*, i32 }
          catch i8* null, !dbg !2861
  %122 = load i64, i64* %120, !dbg !2861
  %123 = add i64 %122, 1, !dbg !2861
  store i64 %123, i64* %120, !dbg !2861```

Notice how the phi is inserted before the landingpad instruction. This causes the following asm to be generated:

.LBB27_14: // This is never executed
	.loc	27 0 15 is_stmt 0
	movq	160(%rsp), %rcx
	movl	$1, %esi
.Ltmp379: // Landing pad points to here!!!
	leaq	__llvm_gcov_ctr.27(%rip), %rdi
	addq	$120, %rdi
	.loc	27 274 15
	movq	(%rcx), %r8
	addq	$1, %r8
	movq	%r8, (%rcx)

So basically the initialization of %rcx is getting skipped by the incorrect landing pad, which in turn causes the crash.

Edit by @Amanieu, original bug report follows.


Just updated nightly on my CI machine

nightly-aarch64-unknown-linux-gnu updated - rustc 1.44.0-nightly (f509b26 2020-03-18) (from rustc 1.43.0-nightly (c20d7ee 2020-03-11))

and found out that tests stopped compiling few of dependencies like cssparser or string_cache or html5ever.

It probably happens because of my RUSTFLAGS

CARGO_INCREMENTAL=0;
RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Coverflow-checks=off -Zno-landing-pads";

I created repository with reproduction
https://github.com/Lesiuk/rust-nightly-issue-reproduction

Bissect found that this PR introduced this issue #67502

searched toolchains c20d7ee through 3c6f982
regression in be055d9

Log from test run

Compiling html5ever v0.25.1
error: failed to run custom build command for html5ever v0.25.1

Caused by:
process didn't exit successfully: /Users/XXXXXX/CLionProjects/issue/target/debug/build/html5ever-1a979961379450d7/build-script-build (signal: 6, SIGABRT: process abort signal)
--- stdout
cargo:rerun-if-changed=/Users/XXXXXX/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/html5ever-0.25.1/src/tree_builder/rules.rs

--- stderr
fatal runtime error: failed to initiate panic, error 5

warning: build failed, waiting for other jobs to finish...
error: build failed

@Lesiuk Lesiuk added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 19, 2020
@Mark-Simulacrum
Copy link
Member

cc @Amanieu, perhaps related to #67502

@Mark-Simulacrum
Copy link
Member

Is there a reason you're passing -Zno-landing-pads? I believe in ~all cases you should either be doing -Cpanic=abort today, no-landing-pads is likely to lead to problems... but it would be odd for that to lead to trouble in the compiler I think

@Lesiuk
Copy link
Author

Lesiuk commented Mar 19, 2020

Yes. I just updated my description of the issue. Bissect found this PR.

@Lesiuk
Copy link
Author

Lesiuk commented Mar 19, 2020

Those flags are recommended by mozilla's grcov tool:
https://github.com/mozilla/grcov

@Amanieu
Copy link
Member

Amanieu commented Mar 19, 2020

Does this problem still happen with -C panic=abort instead of -Z no-landing-pads?

@Amanieu
Copy link
Member

Amanieu commented Mar 19, 2020

-Z no-landing-pads will cause catch_unwind to be optimized away with the new implementation. This is why the runtime can't initiate a panic: it can't find a catch for the exception.

I would recommend either using -C panic=abort or just removing -Z no-landing-pads.

@jonas-schievink jonas-schievink added A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 19, 2020
@Lesiuk
Copy link
Author

Lesiuk commented Mar 19, 2020

Changed my RUSTFLAGS to

-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Coverflow-checks=off -C panic=abort

and still crashes.

@Mark-Simulacrum Mark-Simulacrum added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Mar 19, 2020
@Amanieu
Copy link
Member

Amanieu commented Mar 19, 2020

Minimal reproduction:

  • Clone rust-lang/libc
  • RUSTFLAGS=-Zprofile CARGO_INCREMENTAL=0 cargo check

Output:

   Compiling libc v0.2.68 (/home/amanieu/code/rust-libc)
error: failed to run custom build command for `libc v0.2.68 (/home/amanieu/code/rust-libc)`

Caused by:
  process didn't exit successfully: `/home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build-script-build` (signal: 11, SIGSEGV: invalid memory reference)
--- stdout
cargo:rustc-cfg=freebsd11
cargo:rustc-cfg=libc_priv_mod_use
cargo:rustc-cfg=libc_union
cargo:rustc-cfg=libc_const_size_of
cargo:rustc-cfg=libc_align
cargo:rustc-cfg=libc_core_cvoid
cargo:rustc-cfg=libc_packedN

--- stderr
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x73693724)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x646e6148)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x34616639)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x6124544c)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous run count: corrupt object tag (0x636f6c6c)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x735f646c)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0xf7ebb268)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000001)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x65727573)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)

cargo-bisect-rustc points to e2223c9 as the source of the regression, which seems a bit surprising.

@Mark-Simulacrum
Copy link
Member

I'm not familiar with -Zprofile -- but I agree that seems like a surprising trace. It's not implausible that there's some bug introduced by that, though, given the heavy use of unsafe code in BTree...

Maybe we could get a stack trace on the SIGSEGV?

@Amanieu
Copy link
Member

Amanieu commented Mar 19, 2020

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7edd328 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7edd328 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
#1  0x000055555558856b in write_bytes (s=0x5555555b7296 "_ZN5alloc11collections5btree4node25Handle$LT$Node$C$Type$GT$9into_node17hf4012ff21ae14a7aE", len=90) at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:178
#2  write_string (s=0x5555555b7296 "_ZN5alloc11collections5btree4node25Handle$LT$Node$C$Type$GT$9into_node17hf4012ff21ae14a7aE") at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:202
#3  llvm_gcda_emit_function (ident=<optimized out>, function_name=0x5555555b7296 "_ZN5alloc11collections5btree4node25Handle$LT$Node$C$Type$GT$9into_node17hf4012ff21ae14a7aE", func_checksum=<optimized out>, use_extra_checksum=0 '\000', cfg_checksum=<optimized out>)
    at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:455
#4  0x0000555555568670 in __llvm_gcov_writeout ()
#5  0x00005555555894b3 in llvm_writeout_files () at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:606
#6  0x00007ffff7db7537 in __run_exit_handlers () from /usr/lib/libc.so.6
#7  0x00007ffff7db76ee in exit () from /usr/lib/libc.so.6
#8  0x00007ffff7da002a in __libc_start_main () from /usr/lib/libc.so.6
#9  0x000055555555f24e in _start ()

It does look related to the BTree stuff.

@jonas-schievink
Copy link
Contributor

cc @ssomers

@ssomers
Copy link
Contributor

ssomers commented Mar 19, 2020

#69776 introduced extra calls to Handle::into_node (the function that the profiler is trying to talk about) in case of panic, that would come into play in should_panic test cases (that are excluded by Miri, and don't count on me knowing exactly what happens there). So you might want to try excluding the bunch of should_panic test cases in liballoc/tests/btree/map.rs

PS That doesn't add up - the problem fixed was pointed out by Miri, so it must be able to run should_panic tests now. Or something.

@Amanieu
Copy link
Member

Amanieu commented Mar 19, 2020

Nevermind that's a false positive. It seems that -Zprofile requires -Ccodegen-units=1 to work properly. The issue is unrelated to the btree code.

@Amanieu
Copy link
Member

Amanieu commented Mar 19, 2020

OK I think I've narrowed it down to an LLVM bug.

full gist

This LLVM IR is generated:

catch.i:                                          ; preds = %.noexc
  %120 = phi i64* [ getelementptr inbounds ([24 x i64], [24 x i64]* @__llvm_gcov_ctr.27, i64 0, i64 11), %.noexc ], !dbg !2861
  %121 = landingpad { i8*, i32 }
          catch i8* null, !dbg !2861
  %122 = load i64, i64* %120, !dbg !2861
  %123 = add i64 %122, 1, !dbg !2861
  store i64 %123, i64* %120, !dbg !2861```

Notice how the phi is inserted before the landingpad instruction. This causes the following asm to be generated:

.LBB27_14: // This is never executed
	.loc	27 0 15 is_stmt 0
	movq	160(%rsp), %rcx
	movl	$1, %esi
.Ltmp379: // Landing pad points to here!!!
	leaq	__llvm_gcov_ctr.27(%rip), %rdi
	addq	$120, %rdi
	.loc	27 274 15
	movq	(%rcx), %r8
	addq	$1, %r8
	movq	%r8, (%rcx)

So basically the initialization of %rcx is getting skipped by the incorrect landing pad, which in turn causes the crash.

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. requires-nightly This issue requires a nightly compiler in some way. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-nominated labels Mar 19, 2020
@jonas-schievink
Copy link
Contributor

Unnominating since this requires an unstable compiler flag to trigger

@Amanieu
Copy link
Member

Amanieu commented Mar 19, 2020

Instructions to reproduce:

  • git clone [email protected]:servo/html5ever.git
  • cd html5ever/markup5ever
  • CARGO_INCREMENTAL=0 RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Cinline-threshold=0" cargo check (NOTE: -Cdebuginfo=0 will cause the bug to no longer trigger)

The actual function with the bug is an instance std::panicking::try in the proc_macro2 crate.

@Lesiuk
Copy link
Author

Lesiuk commented Mar 20, 2020

-Cdebuginfo=0 basically disables -Zprofile because no gcda files are generated. This miscompilation is kinda sad because mozilla's grcov is the most reliable coverage tool for rust. And if you want to have your rust application very well tested coverage is kinda important.

I will probably stick to nightly-2020-03-11 until It's not fixed but I can't use it indefinitely.

@Amanieu Amanieu changed the title fatal runtime error: failed to initiate panic, error 5 LLVM generates incorrect code with -Zprofile Mar 20, 2020
@Amanieu
Copy link
Member

Amanieu commented Mar 20, 2020

@RalfJung
Copy link
Member

that would come into play in should_panic test cases (that are excluded by Miri

Miri has supported panics for a while now. If there are still panic tests excluded by Miri, that's a bug. Do you have an example of such a test?

@ssomers
Copy link
Contributor

ssomers commented Mar 20, 2020

Never mind, I excluded should_panic tests when testing myself, since I don't know how to run them successfully.
PS and yes, with today's nightly Miri these tests succeed just like any other. I thought they didn't a month ago or so (memory leaks).

@briansmith
Copy link
Contributor

Unnominating since this requires an unstable compiler flag to trigger

AFAICT using these flags is the only way to get code coverage reporting working. A lot of projects (all that I've been associated with) build everything with stable Rust but then do code coverage using nightly Rust + these unstable flags. Temporarily I've reverted back to a 2020-03-11 nightly for this as a workaround, but that workaround will stop working effectively in once Stable Rust is no longer a subset of the nightly-2020-03-11 feature set.

@Amanieu
Copy link
Member

Amanieu commented Mar 28, 2020

Fixed in LLVM: rust-lang/llvm-project@e5bf503

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-code-coverage Area: Source-based code coverage (-Cinstrument-coverage) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants