-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimization to avoid load of address #76683
Conversation
r? @oli-obk (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
62de698
to
71219e7
Compare
Hi @jonas-schievink, thanks for the review.
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization is very scary, it has lots of edge cases that we need to think about
} | ||
} | ||
} | ||
_ => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if local_being_derefed
is used on the rhs of an assignment to another local, your optimization may misfire I think.
let x = 42;
let a = 99;
let mut y = &x;
let z = &mut y;
*z = &a;
println!("{}", *y);
will not print 99
after your optimization but 42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for this. It does not seem like it will misfire. I also added more clarifying comments on the matches. I have persuaded myself that since we are only applying this optimization on immutable references, we can't have a mutable reference at the same time, so nothing (besides asm) can break this optimization. My reasoning may be wrong though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Setting the lookback higher does indeed cause a miscompilation. I'll look into fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general way to look for mutation is to define a visitor and visit the construct you want to analyze (Statement
in your case) and implement https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/visit/trait.Visitor.html#method.visit_local to and check the PlaceContext
if the local matches the local you want to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I saw your comment a tad too late. I already pushed a fix, but i'll see if it is cleaner/less adhoc to use the visitor. Thanks for the hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commits now check that local_being_derefed is not mutated using a visitor. I have verified that setting the lookback to 10000 causes a misoptimization without the fix, and fixes it with the fix. I have set it back to to avoid bad runtime complexity. I'm not sure how best to represent that the fix is actually working. For now I separated it into two commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Splitting it into individual commits made this great to review.
I also don't know how to test it. A bogus idea (Don't do this 😆): instead of 6
use 6 + mir_opt_level
, then we can do -Zmir-opt-level=99999
on that test.
An alternative to having this magic number would be to implement the optimization as a feed forward (am I using words correctly here?) optimization. What I mean is that we don't go through every deref and look back, but we walk forward through each block and keep a set of "deref-optimizable" locals that we update as we go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I'm fine with merging this optimization as long as there's a tracking issue for exploring such a change to the optimization. I'm not sure whether that change is feasible in practice, but we should explore it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Yeah it should be possible to do this optimization without the lookback, such that a single statement is only visited once. I'll open a new pr for that.
Can we do a perf run on the current pr? I'm curious if this has any impact
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 116283b910a7f8809260ba91cb9b9b647a8900a3 with merge 7d835e12834e0230779883b269523b01a940c2df... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 7d835e12834e0230779883b269523b01a940c2df with parent 10b3595, future comparison URL. |
Finished benchmarking try commit (7d835e12834e0230779883b269523b01a940c2df): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
perf looks ok. Please squash so we get all the back and forth out of the timeline |
116283b
to
4dedb76
Compare
Squashed and rebased on master |
@bors r+ |
📌 Commit 4dedb76 has been approved by |
⌛ Testing commit 4dedb76 with merge 40fe3db7186781f895de3f6d860dbd5d570ec21c... |
💔 Test failed - checks-actions |
Looks like you need to rebase and rebless again? |
…heckedAdd This makes the test run deterministic regardless of noopt testruns
4dedb76
to
dfc469d
Compare
Rebased on master and added a new commit such that Add is always generated instead of CheckedAdd, which broke the test diff on noopt testruns. |
@bors r+ |
📌 Commit dfc469d has been approved by |
☀️ Test successful - checks-actions, checks-azure |
@rust-lang/wg-mir-opt I think this would be better suited to constant-folding with a notion of "symbolic" values. But also I'm a bit worried this kind of optimization may break stacked borrows type assumptions if the indirection was "weakening" the semantics of the access, and making it direct "strengthens" it too much - though this may only be a problem with raw pointers. |
Look for the sequence
in which we can replace the last statement with
_5 = _1
to avoid the load of _2