Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm.dbg.value should be used instead of llvm.dbg.declare wherever possible. #68817

Open
eddyb opened this issue Feb 4, 2020 · 6 comments
Open
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Feb 4, 2020

(I'm opening this issue because I can't find a pre-existing one)

llvm.dbg.declare describes the location in memory of a debuginfo variable, whereas llvm.dbg.value describes the value itself directly. Today we only use the former, which means that in debug mode, all named variables are placed on the stack, even if they're simple scalars (e.g. integers).

Ideally, we'd use llvm.dbg.value in the same situations where we skip creating a stack slot (e.g. LLVM alloca), i.e. for scalars, vectors and pairs of scalars.


However, this is much harder than I expected, mostly due to LLVM being too eager to throw away llvm.dbg.value if computing the value can be avoided (i.e. it's not used by anything else).

I think llvm.dbg.declare only fares better because at opt-level=0, the lack of SROA means allocas are kept around pretty much completely intact.

FastISel (used at opt-level=0) throws away any non-trivial llvm.dbg.value, but disabling it with -C llvm-args=-fast-isel=0 only helps some simple cases (such as a reference to another stack variable - IMO FastISel should be improved to handle those, I don't see why it couldn't).

In general, it looks like Instruction Selection ("ISel") in LLVM ignores llvm.dbg.values while building all of the machine instructions for a BB, and only afterwards does it come back to the llvm.dbg.values and lower them to DBG_VALUE, using the value only if it already exists (i.e. something else needs that same value, at runtime), and otherwise it leads to <optimized out> vars.

Maybe the upcoming GlobalISel approach handles llvm.dbg.value better, but I would need to target AArch64 to even try it out, from what I hear (I might still do it out of curiosity).


While investigating this I also came across #8855 (comment) - but setting AlwaysPreserve to true only keeps the debuginfo variable around, it doesn't help at all with keeping any value alive (so you end up with <optimized out> in the debugger, instead of the variable missing).

I haven't seen anything that would make llvm.dbg.value keep the value alive so it's guaranteed to be available to the debugger, but if there is such a thing, it's probably the way to go, if we want to avoid relying on the stack so much.

cc @nikomatsakis @michaelwoerister @hanna-kruppe

@eddyb
Copy link
Member Author

eddyb commented Feb 4, 2020

Here's a more detailed example of how even simple cases break down:

let stack_val_ref: &SomeStruct = &stack_val;
let stack_val_interior_ref_1: &isize = &stack_val.x;
let stack_val_interior_ref_2: &f64 = &stack_val.y;

That snippet results in this (LLVM) Machine IR (with -fast-isel=0):

DBG_VALUE %stack.0, $noreg, !"stack_val_ref",
    !DIExpression(),
    debug-location !31; src/test/debuginfo/borrowed-struct.rs:77:8 line no:77

DBG_VALUE $noreg, $noreg, !"stack_val_interior_ref_1",
     !DIExpression(),
     debug-location !33; src/test/debuginfo/borrowed-struct.rs:78:8 line no:78

DBG_VALUE $noreg, $noreg, !"stack_val_interior_ref_2",
     !DIExpression(DW_OP_plus_uconst, 8, DW_OP_stack_value),
     debug-location !35; src/test/debuginfo/borrowed-struct.rs:79:8 line no:79

(NB: $noreg is effectively undef, and %stack.0 is stack_val's stack slot)

stack_val_interior_ref_1 should be the same value as stack_val_ref, but LLVM somehow manages to lose track of %stack.0, even though it would need no extra machine instructions, just %stack.0 instead of $noreg for the first DBG_VALUE operand.

LLVM even went through the trouble of turning an offset of 8 bytes (to get to the y field) into !DIExpression(DW_OP_plus_uconst, 8, DW_OP_stack_value) for stack_val_interior_ref_2, but it's all for naught without %stack.0 as the base of that offset.

The result is that stack_val_interior_ref_{1,2} have no DW_AT_location in DWARF (dumped by llvm-dwarfdump), and in a debugger they would show up as <optimized out>, and/or cause errors.

@jonas-schievink jonas-schievink added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 5, 2020
@eddyb
Copy link
Member Author

eddyb commented Feb 6, 2020

I tried an experiment, namely using inline assembly to keep the input to llvm.dbg.value alive.

However, what seems to happen is that a register is used to hold the value, but as soon as that register is needed for anything else (e.g. the return value of a call), the value is gone, and it's not saved anywhere.

Relying on the stack with llvm.dbg.declare, as we do today, appears to be the only foolproof way of keeping arbitrary values around, sadly (kind of obvious in retrospect).

There are some exceptions, such as constants, pointers to globals or to other variables on the stack, which should be encodable in DWARF (and indeed LLVM does so, but it's somewhat unreliable).

I suspect I can emit the inline assembly around the point of the StorageDead or so, but I've spent enough time on this already, and it would be somewhat tricky to do so.

I've opened #68902 with the branch I experimented on, to get some perf numbers, but it's only as a rough estimate and unlikely to lead to anything.

@eddyb
Copy link
Member Author

eddyb commented Feb 6, 2020

There are some exceptions, such as constants, pointers to globals or to other variables on the stack, which should be encodable in DWARF (and indeed LLVM does so, but it's somewhat unreliable).

Long term we might want to handle some of these ourselves, up to and including encoding them into mir::VarDebugInfo (constants would be easy to justify, no need to use a MIR local).

@eddyb
Copy link
Member Author

eddyb commented Feb 8, 2020

One way we might be able to get some of the wins from llvm.dbg.value is, for cases where we wouldn't use the stack if we could use llvm.dbg.value, to keep the stack usage but instead of loading from the stack every time the value is used, refer to the original value that was stored on the stack.

So instead of:

%x = alloca i32
llvm.dbg.declare %x, !DIVariable("x")
; ...
%3 = ...
store %3, %x
; ...
%10 = load %x
%11 = add %10, 42
; ...
%20 = load %x
%21 = add %20, 42
; ...
%30 = load %x
%31 = add %30, 42
; ...

we'd be able to get this:

%x = alloca i32
llvm.dbg.declare %x, !DIVariable("x")
; ...
%3 = ...
store %3, %x
%4 = add %3, 42
; ...

(more realistically, this would be helpful for e.g. accessing a field through a reference)

bors added a commit that referenced this issue Feb 11, 2020
rustc_codegen_ssa: only "spill" SSA-like values to the stack for debuginfo.

This is an implementation of the idea described in #68817 (comment).

In short, instead of debuginfo forcing otherwise-SSA-like MIR locals into `alloca`s, and requiring a `load` for each use (or two, for scalar pairs), the `alloca` is now *only* used for attaching debuginfo with `llvm.dbg.declare`: the `OperandRef` is stored to the `alloca`, but *never loaded* from it.

Outside of `debug_introduce_local`, nothing cares about the debuginfo-only `alloca`, and instead works with `OperandRef` the same as MIR locals without debuginfo before this PR.

This should have some of the benefits of `llvm.dbg.value`, while working today.

cc @nagisa @nikomatsakis
@eddyb
Copy link
Member Author

eddyb commented Feb 20, 2020

@hanna-kruppe pointed me to https://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html, which looks really promising (even if it doesn't solve everything, it's a step in the right direction).

Also, something in there that peaked my attention was:

Of these categories, the first 3 will become salvageable after this work is implemented, and Select Insts will also be salvageable with some follow-up work to enable conditional branching in complex expressions.

This is exciting, because the lack of conditional branching makes fragmenting enums variables that have debuginfo attached to them impossible in #48300.
(Or more accurately, the most straightforward implementation of fragmented enum debuginfo would branch on the discriminant - I have an alternative that involves crafting type debuginfo in a way that variants don't overlap from the perspective of the debugger, but that's harder and more dubious)

@programmerjake
Copy link
Member

LLVM is redoing debug info so hopefully the issues go away with the new IR representation for debug info: https://llvm.org/docs/RemoveDIsDebugInfo.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants