-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove assignments to ZST places instead of marking ZST return place as unused #83177
Conversation
src/test/mir-opt/matches_reduce_branches.foo.PreCodegen.before.64bit.mir
Outdated
Show resolved
Hide resolved
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 90562b4 with merge 7ac77ab9f463f60282360fd96138f4c09eb263e8... |
☀️ Try build successful - checks-actions |
Queued 7ac77ab9f463f60282360fd96138f4c09eb263e8 with parent 4c10c84, future comparison URL. |
Finished benchmarking try commit (7ac77ab9f463f60282360fd96138f4c09eb263e8): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Pushed a change to cache layouts, let's see if it gets better or worse. Also moved it to a separate pass, since it's a bit different than the other opts in instcombine...let me know if you have a preference for where it should live. |
Looking at MIR diffs of some real world projects, this implementation is definitely more effective at removing ZST assignments than previous one was. Though, it's not demonstrated by any of existing mir-opt tests, so if we want to land this adding and extra one would be nice. The perf results, both those here and earlies ones, are quite hard to interpret. Unfortunately the most significant impact of this change is one on the size estimates. In a few benchmarks I looked at, the CGU partitioning was changed. This almost surely applies to rustc itself as well. In fact, I suspect that the -3.0% change in ctfe-stress-4 benchmark from earlier perf run were entirely because of this (code that is hot in those benchmarks is optimized differently and CTFE evaluates unoptimized MIR). The layout computation uses query system, the computation should be cached already, but we can of course try again: |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit b6d5b72 with merge 39cf6bc137798a38f205e17dc9994bdb2205ba41... |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
☀️ Try build successful - checks-actions |
Queued 39cf6bc137798a38f205e17dc9994bdb2205ba41 with parent 0c34122, future comparison URL. |
Finished benchmarking try commit (39cf6bc137798a38f205e17dc9994bdb2205ba41): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Added a check to skip When compiling std (or whatever gets built during a stage 1 build), the RemoveZsts pass now sees:
I didn't add a fast path for known ZSTs because they make up <10% of the remaining |
so... 90% of zsts are aggregates or user defined? I would have thought a large portion is @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 46fd49c with merge bd5d1b96f0c64c9938feea831789e1b5bb2cd4a2... |
No, >90% of types we call What I assume you meant by "fast path" is if ty == unit {
// fast path
} else {
// slow path
if let Ok(layout) = layout_of(ty) && layout.is_zst() {
// slow path success
} else {
// slow path failure
}
} In my test, the fast path could be hit at most 24k times, if every ZST is Unless you meant "add a fast path and remove the slow path entirely", i.e. the optimization only works for |
☀️ Try build successful - checks-actions |
Queued bd5d1b96f0c64c9938feea831789e1b5bb2cd4a2 with parent 41b315a, future comparison URL. |
I did not mean that. My brain just took a wrong turn somewhere. You're completely right. Though... we could enable the optimization for FnDef and unit in debug builds and for everything in release builds, but let's look at perf before we resort to such schemes. |
Finished benchmarking try commit (bd5d1b96f0c64c9938feea831789e1b5bb2cd4a2): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Perf looks very promising. While there's still some regression in servo, that is entirely in LLVM, so we may be optimizing more stuff now, no way to tell without runtime perf tests. Also the LLVM perf test shows a 60% reduction in @erikdesjardins this looks really good, all that is left is to add a mir-opt-level 3 check in the opt so it doesn't run by default. I think we should do that here and not immediately stabilize, even if I see no reason not to stabilize. The opt doesn't affect anything UB related and is trivial to review. So my proposal is to merge this PR quickly with a level 3 check, and then open a PR removing that check and pinging wg-mir-opt so that everyone can have their say |
@bors r+ |
📌 Commit 6960bc9 has been approved by |
☀️ Test successful - checks-actions |
…i-obk Run RemoveZsts pass at mir-opt-level=1 per rust-lang#83177 (comment) This pass removes assignments to ZST places. Perf (from rust-lang#83177 (comment)): https://perf.rust-lang.org/compare.html?start=41b315a470d583f6446599984ff9ad3bd61012b2&end=bd5d1b96f0c64c9938feea831789e1b5bb2cd4a2 r? `@oli-obk`
partially reverts #83118
requested by @tmiasko in #83118 (comment)
r? @oli-obk