Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary alloca Without Optimization Flags #129282

Open
CJacob314 opened this issue Aug 19, 2024 · 4 comments
Open

Unnecessary alloca Without Optimization Flags #129282

CJacob314 opened this issue Aug 19, 2024 · 4 comments
Labels
A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@CJacob314
Copy link

CJacob314 commented Aug 19, 2024

The following code, when compiled without optimization flags (or with -Copt-level=0) emits LLVM IR for an 8192-byte alloca, which can easily cause 100% unnecessary stack overflow errors in a running executable.

fn test<const SIZE: usize>() {
  if SIZE < 4096 {
    let arr = [0u8; SIZE];
    std::hint::black_box(&arr);
  } else {
    let vec = vec![0u8; SIZE];
    std::hint::black_box(&vec);
  }
}

fn main() {
    test::<8192>();
}

The following is the relevant LLVM IR:

define internal void @example::test::hb882c86c9d7a582d() unnamed_addr #1 personality ptr @rust_eh_personality !dbg !546 {
start:
  %0 = alloca [16 x i8], align 8
  %vec = alloca [24 x i8], align 8
  %arr = alloca [8192 x i8], align 1
  br label %bb2, !dbg !548

%arr only ends up being used by bb1 below, but bb1 has no predecessors:

bb1:                                              ; No predecessors!
  call void @llvm.memset.p0.i64(ptr align 1 %arr, i8 0, i64 8192, i1 false), !dbg !555
; call core::hint::black_box
  %_3 = call align 1 ptr @core::hint::black_box::h61dc8dd57d9b7993(ptr align 1 %arr), !dbg !556
  br label %bb5, !dbg !556
}

Here's a Godbolt Compiler Explorer link with all of the IR.

@CJacob314 CJacob314 added the C-bug Category: This is a bug. label Aug 19, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 19, 2024
@CJacob314
Copy link
Author

@saethlin informed me that this is an issue with his mono-reachable traversal implementation.

@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Aug 19, 2024
@saethlin
Copy link
Member

I know that when I initially worked on this I searched the generated LLVM IR for the special "No predecessors!" comment that LLVM leaves on such orphaned basic blocks. There are a lot and I think I concluded they were all not actionable.

Clearly this case is.

There are actually two bugs here:

  1. The mono-reachable traverasal only looks for a constant SwitchInt param coming from Rvalue::Use and Rvalue::UbChecks. In this case, we have Rvalue::BinOp. GVN could fix this, but only if test were inlined or we had post-mono MIR optimizations. So I lean towards it being worthwhile to teach mono-reachable traversal about this.

  2. Even if we get the traversal right (this can be simulated by wrapping the comparison in an inline const) we still have an alloca for the argument, because we don't set up locals based on reachability.

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 19, 2024
@saethlin saethlin self-assigned this Aug 19, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 19, 2024
Don't alloca for unused locals

This fixes the second problem in rust-lang#129282

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 20, 2024
Don't alloca for unused locals

This fixes the second problem in rust-lang#129282

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 20, 2024
Let MonoReachable traversal evaluate BinOps

This fixes the first bug in rust-lang#129282

I don't know what the right organization is for the code here, but something probably has to get rearranged because this traversal now depends on `rustc_const_eval` so that it can use an `InterpCx` internally, so it can't live in `rustc_middle`.

r? `@ghost`
@scottmcm
Copy link
Member

I think that if you want this you should write

 fn test<const SIZE: usize>() {
-  if SIZE < 4096 {
+  if const { SIZE < 4096 } {

so that rust will know it's a constant, and be more likely to give you the behaviour you want in unoptimized builds.

(Or make your own trait so it can be <[u8; 4096]>::IS_SMALL or something if you need an older MSRV.)

Anything else will just mean exactly the same bug comes back if you write it as SIZE.ilog2() < 12 instead, but if you write const { SIZE.ilog2() < 12 } that'd be fine again.

@CJacob314
Copy link
Author

@scottmcm That doesn't work, either. See version 1.80.0 or version 1.82.0-nightly on Compiler Explorer.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 20, 2024
Let MonoReachable traversal evaluate BinOps

This fixes the first bug in rust-lang#129282

I don't know what the right organization is for the code here, but something probably has to get rearranged because this traversal now depends on `rustc_const_eval` so that it can use an `InterpCx` internally, so it can't live in `rustc_middle`.

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 21, 2024
Don't alloca for unused locals

We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it.

There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen.

This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home.
@saethlin saethlin added A-codegen Area: Code generation and removed C-bug Category: This is a bug. labels Aug 21, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 21, 2024
…tmcm

Don't alloca for unused locals

We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it.

There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen.

This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home.
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 21, 2024
Don't alloca for unused locals

We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it.

There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen.

This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home.

try-job: test-various
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 21, 2024
Don't alloca for unused locals

We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it.

There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen.

This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home.

try-job: test-various
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 21, 2024
…tmcm

Don't alloca for unused locals

We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it.

There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen.

This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home.
@saethlin saethlin removed their assignment Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants