Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc -Clink-dead-code causes MSVC linker to produce invalid binaries when LLVM InstrProf counters are enabled #76038

Open
richkadel opened this issue Aug 28, 2020 · 9 comments
Labels
C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ link-dead-code Linkage: using -Clink-dead-code O-windows-msvc Toolchain: MSVC, Operating system: Windows T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@richkadel
Copy link
Contributor

This reports a bug with the -C link-dead-code option is enabled, when targeting MSVC and when LLVM InstrProf counter reports are enabled. The bug is probably not specific to InstrProf, but InstrProf sets the conditions for exposing the bug.

I hope someone with better familiarity with the -C link-dead-code option, and the MSVC linker options it enables and/or interacts with, can help determine the root cause of the issue, and provide a fix or better workaround that allows InstrProf and -Z instrument-coverage to work with -C link-dead-code on MSVC.

Brief summary of error: When running a program compiled with -Z instrument-coverage and -C link-dead-code, the program should generate an InstrProf counter report upon program exit, but instead it generates a zero-length file and almost always crashes with a Segmentation fault.

Background: The InstrProf counter report (a file generated at program exit, called default.profraw unless overridden by setting LLVM_PROFILE_FILE) is generated by the LLVM compiler-rt source that is compiled into a rust library by rust-lang's rust/library/profiler-builtins/.

The -Z instrument-coverage flag injects LLVM intrinsics that increment global counter variables during program execution, and the profiler-builtins library sets up atexit() callbacks that gather the final counter values and write them to the default.profraw file.

I've documented my tests and observations, in case they are helpful to someone with more knowledge in this area.

I compared working InstrProf-enabled binaries in a C program compiled by Clang to the failing Rust-based binaries, stepping through the counter report generation code of both programs using Visual Studio debugger, the debugger's disassembled view, and by inspecting the instructions and call graphs in IDA Pro for both.

I observed that the Rust binary had some notable (and clearly incorrect) differences, such as:

  • Some jmp instructions appear to be jumping to the wrong offset (4 bytes off)
  • Some instructions and variables (such as instructions initializing fields of a Header structure for the counter report) were simply not present in the Rust binary. They are present in the C binary, and map to sequential assignments in the compiler-rt source, but they are not shown in the disassembled views. The debugger skips over the source for these statements, and the variables involved cannot be evaluated. The debugger complains that certain variables were "optimized out".

My best guess is the MSVC linker is first miscalculating the jmp addresses for some functions, and then, when the miscalculated addresses don't line up to known functions, an optimization pass may be deciding that some variables are not used (because their functions don't appear to be called), so code that updates those variables is optimized out.

Note that PR #76002 overcomes what was originally thought to be a bug related to the -Zinstrument-coverage implementation. Prior to PR #76002, the -Zinstrument-coverage option always automatically enabled -Clink-dead-code by default, to ensure dead code was still counted, as zero executions, in coverage reports. PR #76002 works around the problem by only enabling -Clink-dead-code by default for non-MSVC platforms.

Since PR #76002, the bug can still be reproduced on MSVC by enabling -Clink-dead-code explicitly (as shown in the Code section below).

Code

fn testfunc() {                                                                 
}                                                                               
                                                                                
fn main() {                                                                     
    let mut i = 0;                                                              
    loop {                                                                      
        if i >= 10 {                                                            
            break;                                                              
        }                                                                       
        testfunc();                                                             
        i += 1;                                                                 
    }                                                                           
}    

Meta

rustc --version --verbose:

binary: rustc                                                                   
commit-hash: unknown                                                            
commit-date: unknown                                                            
host: x86_64-pc-windows-msvc                                                    
release: 1.47.0-dev                                                             
LLVM version: 11.0 

Error output

The following example explicitly enables or disables -Clink-dead-code. The default value was to enable this setting, prior to PR #76002, and as of this PR the default is now to not enable this setting.

From an msys2 terminal:

$ build/x86_64-pc-windows-msvc/stage1/bin/rustc -Zinstrument-coverage -Clink-dead-code=no basic.rs
$ ./basic
Hello world
$ ls -l default.profraw
-rw-r--r-- 1 richkadel Domain Users 168 Aug 28 10:43 default.profraw

$ build/x86_64-pc-windows-msvc/stage1/bin/rustc -Zinstrument-coverage -Clink-dead-code basic.rs
$ ./basic
Hello world
Segmentation fault

and also generates an empty default.profraw file.

@richkadel richkadel added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 28, 2020
@richkadel
Copy link
Contributor Author

Note: This issue was also called out in Issue #64685 because there are some similarities (both involve coverage profiling, and exposed a bug related to -C link-dead-code); however the symptoms and the targets are actually different, with this issue being specific to InstrProf counters, built into Rust via profiler-builtins, and the failure of only the MSVC linker (as far as I know).

richkadel added a commit to richkadel/rust that referenced this issue Sep 1, 2020
Found that -C link-dead-code (which was enabled automatically
under -Z instrument-coverage) was causing the linking error that
resulted in segmentation faults in coverage instrumented binaries. Link
dead code is now disabled under MSVC, allowing `-Z instrument-coverage`
to be enabled under MSVC for the first time.

More details are included in Issue rust-lang#76038.

(This PR was broken out from PR rust-lang#75828)
tmandry added a commit to tmandry/rust that referenced this issue Sep 1, 2020
…3, r=tmandry

Fix `-Z instrument-coverage` on MSVC

Found that `-C link-dead-code` (which was enabled automatically
under `-Z instrument-coverage`) was causing the linking error that
resulted in segmentation faults in coverage instrumented binaries. Link
dead code is now disabled under MSVC, allowing `-Z instrument-coverage`
to be enabled under MSVC for the first time.

More details are included in Issue rust-lang#76038 .

Note this PR makes it possible to support `Z instrument-coverage` but
does not enable instrument coverage for MSVC in existing tests. It will be
enabled in another PR to follow this one (both PRs coming from original
PR rust-lang#75828).

r? @tmandry
FYI: @wesleywiser
@richkadel
Copy link
Contributor Author

richkadel commented Sep 2, 2020

I just noticed this recent PR (rust-lang/llvm-project#72 ) by @mati865 that seems to be fixing a similar problem on Windows binaries. That PR fixes a problem for the MinGW GCC target, not MSVC, but considering the clues documented in this issue point to a function alignment problem, I have to wonder if the MSVC fix would be similar.

Note, my detailed notes include a dump of the target-specific values for #define’s used in compiler-rt when compiling for MSVC. COMPILER_RT_ALIGNAS and COMPILER_RT_FTRUNCATE (for example) were "not defined", but these were among the changed values made in the MinGW GCC-related PR. I wonder if there are equivalent settings that also need to be updated for MSVC.

@mati865
Copy link
Contributor

mati865 commented Sep 2, 2020

@richkadel that PR only fixes linking to libprofiler built by MinGW GCC, it segfaults when running instrumented executable: rust-lang/llvm-project#72 (comment)
Clang didn't have that issue with or without that PR.

EDIT: The only meaningful change was regarding to COMPILER_RT_WEAK other changes were made to silence warnings.
COMPILER_RT_ALIGNAS didn't changed at all, I just moved it outside #ifdef.

@richkadel
Copy link
Contributor Author

that PR only fixes linking to libprofiler built by MinGW GCC

@mati865 - Yes, agreed. I did say that in my comment. I assume your fix would not resolve the MSVC issue directly, but I'm wondering if there is a similar fix that should be made for MSVC.

it segfaults when running instrumented executable

Yes, the MSVC bug described here has the same symptom. The instrumented executable segfaults.

I wanted to link to your PR in case it gives some insights that might help someone solve this issue for MSVC.

I have no experience with these compiler attributes and annotations, so I don't know for sure what's related or what's not, but there are a lot of similarities.

@mati865
Copy link
Contributor

mati865 commented Sep 2, 2020

@richkadel do you have backtrace for MSVC segafaults? I don't think GCC and MSVC segfaults are the same but maybe there is the same issue underneath...

@richkadel
Copy link
Contributor Author

The best I can do is show you the comparison between debugging Clang (on the left) and debugging rustc (on the right), where things first go wrong:

https://github.com/richkadel/rust/blob/coverage-msvc-segfault-discuss/coverage-msvc-segfault-notes.md#debugging-differences-in-1-first-available-breakpoint-line-in-__llvm_profile_write_file-2-locals-shown-by-rust-as-optimized-away-and-3-available-call-stack

The segfault happens later, when trying to access some data fields, but the actual problem starts from the point highlighted in the above link.

I know the notes doc is daunting, but there's a lot of info here that might reveal the problem.

In particular, there appear to be Jump (jmp) instructions that jump to functions that are 4 bytes off from where the function's symbol says it's supposed to be. That sounded like an alignment issue related to the function's linked position in the binary.

@mati865
Copy link
Contributor

mati865 commented Sep 2, 2020

Unfortunately I have no clue on what could have caused it and don't have skills to investigate it that far.

@richkadel
Copy link
Contributor Author

No worries. Thanks for taking a look!

Hopefully someone with those skills (that I also lack!) will be helped by both your PR and these notes.

Thanks!

richkadel added a commit to richkadel/rust that referenced this issue Dec 3, 2020
Fixes multiple issue with counters, with simplification

  Includes a change to the implicit else span in ast_lowering, so coverage
  of the implicit else no longer spans the `then` block.

  Adds coverage for unused closures and async function bodies.

  Fixes: rust-lang#78542

Adding unreachable regions for known MIR missing from coverage map

Cleaned up PR commits, and removed link-dead-code requirement and tests

  Coverage no longer depends on Issue rust-lang#76038 (`-C link-dead-code` is
  no longer needed or enforced, so MSVC can use the same tests as
  Linux and MacOS now)

Restrict adding unreachable regions to covered files

  Improved the code that adds coverage for uncalled functions (with MIR
  but not-codegenned) to avoid generating coverage in files not already
  included in the files with covered functions.

Resolved last known issue requiring --emit llvm-ir workaround

  Fixed bugs in how unreachable code spans were added.
@richkadel
Copy link
Contributor Author

Follow-up note: Rust Coverage via -Zinstrument-coverage no longer defaults to -Clink-dead-code on any platform. -Clink-dead-code is no longer required or recommended when enabling coverage instrumentation.

This only avoids the apparent bug with -Clink-dead-code on MSVC, but I believe there is still a bug, somewhere, that may affect other use cases where -Clink-dead-code is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ link-dead-code Linkage: using -Clink-dead-code O-windows-msvc Toolchain: MSVC, Operating system: Windows T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants