Don't alloca for unused locals #129283

saethlin · 2024-08-19T23:06:42Z

We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an if false. But since we never gave locals the same analysis, a large local only used under an if false will still have stack space allocated for it.

There are 3 places we traverse MIR during monomorphization: Inside the collector, non_ssa_locals, and the walk to generate code. Unfortunately, #129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen.

This fixes the second problem in #129282, and brings us anther step toward const if at home.

saethlin · 2024-08-19T23:13:56Z

@bors try @rust-timer queue

bors · 2024-08-19T23:15:07Z

⌛ Trying commit 8df3ccc with merge f383908...

Don't alloca for unused locals This fixes the second problem in rust-lang#129282 r? `@ghost`

bors · 2024-08-19T23:28:26Z

💔 Test failed - checks-actions

saethlin · 2024-08-20T00:04:03Z

@bors try @rust-timer queue

bors · 2024-08-20T00:05:13Z

⌛ Trying commit f98c04d with merge 2d91c0b...

Don't alloca for unused locals This fixes the second problem in rust-lang#129282 r? `@ghost`

bors · 2024-08-20T01:59:40Z

☀️ Try build successful - checks-actions
Build commit: 2d91c0b (2d91c0b2cd023747941053983f33c4f2753436cd)

rust-timer · 2024-08-20T04:08:48Z

Finished benchmarking commit (2d91c0b): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.2%, 0.9%]	29
Regressions ❌ (secondary)	1.0%	[0.4%, 2.4%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[0.2%, 0.9%]	29

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary 1.5%, secondary 1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.5%	[1.5%, 1.5%]	1
Regressions ❌ (secondary)	1.7%	[1.7%, 1.7%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.5%	[1.5%, 1.5%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 751.143s -> 748.981s (-0.29%)
Artifact size: 338.64 MiB -> 338.75 MiB (0.03%)

compiler/rustc_codegen_ssa/src/mir/mod.rs

saethlin · 2024-08-21T01:29:56Z

The code definitely needs a lot of cleaning-up, but I want to know if this is faster. It should be.

@bors try @rust-timer queue

bors · 2024-08-21T01:31:07Z

⌛ Trying commit 66653dc with merge 6eddbec...

Don't alloca for unused locals We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it. There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen. This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home.

saethlin · 2024-09-21T03:05:24Z

@bors try

bors · 2024-09-21T03:06:38Z

⌛ Trying commit 338965b with merge 2f48442...

Don't alloca for unused locals We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it. There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen. This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home. try-job: test-various

bors · 2024-09-21T03:52:02Z

💔 Test failed - checks-actions

saethlin · 2024-09-21T05:23:28Z

@bors try

bors · 2024-09-21T05:24:40Z

⌛ Trying commit aa28ee1 with merge bcce48a...

Don't alloca for unused locals We already have a concept of mono-unreachable basic blocks; this is primarily useful for ensuring that we do not compile code under an `if false`. But since we never gave locals the same analysis, a large local only used under an `if false` will still have stack space allocated for it. There are 3 places we traverse MIR during monomorphization: Inside the collector, `non_ssa_locals`, and the walk to generate code. Unfortunately, rust-lang#129283 (comment) indicates that we cannot afford the expense of tracking reachable locals during the collector's traversal, so we do need at least two mono-reachable traversals. And of course caching is of no help here because the benchmarks that regress are incr-unchanged; they don't do any codegen. This fixes the second problem in rust-lang#129282, and brings us anther step toward `const if` at home. try-job: test-various

bors · 2024-09-21T06:28:42Z

☀️ Try build successful - checks-actions
Build commit: bcce48a (bcce48ade5140371744b528196923f97b58a3d1c)

saethlin · 2024-09-21T13:26:38Z

@bors r=scottmcm

bors · 2024-09-21T13:26:40Z

📌 Commit aa28ee1 has been approved by scottmcm

It is now in the queue for this repository.

bors · 2024-09-21T13:48:16Z

⌛ Testing commit aa28ee1 with merge 2836482...

bors · 2024-09-21T16:19:02Z

☀️ Test successful - checks-actions
Approved by: scottmcm
Pushing 2836482 to master...

rust-timer · 2024-09-21T17:45:47Z

Finished benchmarking commit (2836482): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (secondary -1.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.9%	[-1.9%, -1.9%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results (primary -0.1%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.0%]	35
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.1%]	6
All ❌✅ (primary)	-0.1%	[-0.1%, -0.0%]	35

Bootstrap: 768.677s -> 767.545s (-0.15%)
Artifact size: 341.36 MiB -> 341.42 MiB (0.02%)

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 19, 2024