-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] mir-opt for generators/futures: copy propagate upvar into locals #108590
[WIP] mir-opt for generators/futures: copy propagate upvar into locals #108590
Conversation
r? @cjgillot (rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
Oh, I just remembered: the PR as it currently stands has these two optimizations turned off by default, but my actual plan/hope here is to have them turned on by default (even when other MIR optimizations are not running). I probably need to talk to a slew of people to get approval to do that. :) |
I'm going to add a commit that turns the optimizations on by default, for a timer run (and ... maybe a crater run...?) |
ugh, something's wrong. I had thought I had bootstrapped with these on by default locally, but clearly not recently enough. I'll look at it again tomorrow. |
We have Is there particular reasoning you think these optimizations should be treated differently? Is there some reason you're sure these will never remove/hide UB from Miri? |
This comment has been minimized.
This comment has been minimized.
The reason I am inclined to treat these However, maybe |
I'll switch this to S-waiting-on-author until I've addressed the comments so far. When I've finished with that, I'll remove the "[WIP]" from the title (and put the labels back the way they were) @rustbot label: +S-waiting-on-author -S-waiting-on-review |
Okay, I just read the code in That is, if I understand correctly:
And mir-opt-level > 0 sounds like exactly like what I want, assuming that its reasonable to potentially force miri to deal with the future size explosion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few hints on prior work that may help simplify the implementation. I haven't started reading the UpvarToLocalProp
pass, as it's quite a big one.
The big question is the implications of the MIR opsem for generators. On this, @JakobDegen will be the better person to ask.
If we are going to add passes to optimize generators prior to the state transform, we should eventually consider moving the state transform back to running on optimized MIR.
mir_source_def_id, | ||
local_decls, | ||
}; | ||
v.visit_body(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need a visitor here, a simple loop over basic blocks looking at terminators should be enough.
let Operand::Constant(c) = func else { return None; }; | ||
let ConstantKind::Val(val_const, const_ty) = c.literal else { return None; }; | ||
let ConstValue::ZeroSized = val_const else { return None; }; | ||
let ty::FnDef(fn_def_id, substs) = const_ty.kind() else { return None; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to check that func.ty(local_decls, tcx).kind()
is a FnDef
. We don't need to check that it's actually a constant.
let arg0_ty = args.get(0).map(|arg0| arg0.ty(&self.local_decls, self.tcx())); | ||
trace!("InlineFutureIntoFuture substs:{substs:?} args:{args:?} arg0 ty:{arg0_ty:?}"); | ||
let Some(arg0_ty) = arg0_ty else { return None; }; | ||
found.0 = self.does_ty_impl_future(arg0_ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks like we are re-implementing a simplified version of instance resolution.
What about doing the same thing as the Inliner
optimization:
let substs = tcx.try_normalize_erasing_regions(param_env, substs).ok()?;
let callee = Instance::resolve(tcx, param_env, def_id, substs).flatten()?;
and check that callee.def
is InstanceDef::Item(<def_id of the default impl of IntoFuture::into_future>)
.
} else { | ||
// otherwise, fall back to scanning code to find "upvar" count. | ||
struct MaxUpvarField(usize); | ||
impl<'tcx> Visitor<'tcx> for MaxUpvarField { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we restrict this pass to only generators or closures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to restrict it to generators, at least for turning it on by default.
(but it certainly has been useful to be able to at least opt-into the more general case, in terms of finding new bugs.)
// If we are looking at a generator, we have a fast path | ||
if body.yield_ty().is_some() { | ||
// The first argument is the generator type passed by value | ||
let gen_ty = body.local_decls.raw[SELF_IDX as usize].ty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need to peel refs to fetch the actual generator type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean because the generator might be receiving its self
argument by something other than by-value Self
, right?
I should look into that.
//! about array indices nor enum variants). | ||
//! | ||
//! * transitively apply the replacement to subsequent copies of the local(s). | ||
//! e.g. `_3 = _1.0; ... _4 = _3; ...; _5 = _4;`, then might replace `_5` with `_1.0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot like what CopyProp
does. (That pass is very poorly named, as it merges locals more than anything.)
There may be a way to simplify the implementation by using the same analysis to unify locals, and a second pass to replace the locals by the upvars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its possible there's opportunity for simplification there.
Part of my goal here was to have this be an inherently cheap pass. (For example, I wanted to avoid any question of whether there might be a fixed point iteration going on.)
But if CopyProp
is sufficiently cheap to run that we might consider turning it on for mir-opt-level > 0
, I'd be willing to consider unifying these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I also was hoping for this to be an "obviously correct" pass, by building in limitations like the fact that it solely handles trivial chains A -> B -> C, no branching allowed for any of the chains. Though the fact that I encountered a number of issues in its development has led me to think that maybe there's no such thing as "obviously correct" when it comes to MIR transformations... 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI CopyProp
is already so enabled.
It sounds like you are realizing that MIR is not a good IR to write optimizations for, which is a very common observation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like you are realizing that MIR is not a good IR to write optimizations for, which is a very common observation.
Well... I've seen much much worse. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if CopyProp is sufficiently cheap to run that we might consider turning it on for mir-opt-level > 0, I'd be willing to consider unifying these things.
I've added investigation of this option as a to-do for the PR. (I.e., I'm going to attempt to revise upvar_to_local_prop
so that it leverages CopyProp+SSA code in some way. Either it works and this PR gets vastly simpler, or it doesn't work and I provide an explanation for why to continue using the code as written here.)
I think it is reasonable. Miri already executes code without optimizations that are always present in a debug build, so categorically this is not new. Additionally, anyone who has run Miri on a program beyond a unit test is familiar with the idea that it is very slow and memory-hungry. Even on unit tests, it's common for users to And with regard to this optimization in particular: One potential situation that could arise is that a library starts to produce generators which rely on this optimization, to the degree that it encounters stack overflow if this optimization is not applied. Such a situation does not apply in Miri; the interpreter doesn't have separate stack and heap memory and even if it did, there's no reason it needs to have an artificial stack memory usage limit like normal platforms have. So: I think this is quite reasonable as a MIR opt level 1 optimization. |
// This verifies that `ty` implements `Future`, according to the where | ||
// clauses (i.e. predicates) attached to the source code identified by | ||
// `mir_source_def_id`). | ||
fn does_ty_impl_future(&self, ty: Ty<'tcx>) -> FoundImplFuture { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tcx
has a type_implements_trait
util function, can we invoke it here? Not sure whether type_implements_trait(future,...)
and does_ty_impl_future
are same or not.
cc
let impls_future = self.type_implements_trait( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added as a to-do for the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reuse the type_implements_trait
function; that is a method on the InferCtxt
, not on TyCtxt
(and I only have a TyCtxt
in the context of this code).
Just a quick note that this PR is intended to address at least some of the cases listed on issue #59087, though I still need to dig into whether the misgivings about unsafe code are relevant even for the limited transformation being applied here. (T-opsem has certainly expressed some misgivings.) |
4f9e527
to
020d08b
Compare
☔ The latest upstream changes (presumably #108106) made this pull request unmergeable. Please resolve the merge conflicts. |
…r_transform crate. incorporated review feedback: simplify replace_base via Place::project_deeper
Problem Overview ---------------- Consider: `async fn(x: param) { a(&x); b().await; c(&c); }`. It desugars to: `fn(x: param) -> impl Future { async { let x = x; a(&x); b().await; c(&c); }` and within that desugared form, the `let x = x;` ends up occupying *two* distinct slots in the generator: one for the upvar (the right-hand side) and one for the local (the left-hand side). The UpvarToLocalProp MIR transformation tries to detect the scenario where we have a generator with upvars (which look like `self.field` in the MIR) that are *solely* moved/copied to non-self locals (and those non-self locals may likewise be moved/copied to other locals). After identifying the full set of locals that are solely moved/copied from `self.field`, the transformation replaces them all with `self.field` itself. Note 1: As soon as you have a use local L that *isn't* a local-to-local assignment, then that is something you need to respect, and the propagation will stop trying to propagate L at that point. (And likewise, writes to the self-local `_1`, or projections thereof, need to be handled with care as well.) Note 2: `_0` is the special "return place" and should never be replaced with `self.field`. In addition, the UpvarToLocalProp transformation removes all the silly `self.field = self.field;` assignments that result from the above replacement (apparently some later MIR passes try to double-check that you don't have any assignments with overlapping memory, so it ends up being necessary to do this no-op transformation to avoid assertions later). Note 3: This transformation is significantly generalized past what I demonstrated on youtube; the latter was focused on matching solely `_3 = _1.0`, because it was a proof of concept to demostrate that a MIR transformation prepass even *could* address the generator layout problem. Furthermore, the UpvarToLocalProp transformation respects optimization fuel: you can use `-Z fuel=$CRATE=$FUEL` and when the fuel runs out, the transformation will stop being applied, or be applied only partially. Note 4: I did not put the optimization fuel check in the patching code for UpvarToLocalProp: once you decide to replace `_3` with `_1.0` in `_3 = _1.0;`, you are committed to replacing all future reads of `_3` with `_1.0`, and it would have complicated the patch transformation to try to use fuel with that level of control there. Instead, the way I used the fuel was to have it control how many local variables are added to the `local_to_root_upvar_and_ty` table, which is the core database that informs the patching process, and thus gets us the same end effect (of limiting the number of locals that take part in the transformation) in a hopefully sound manner. Note 5: Added check that we do not ever call `visit_local` on a local that is being replaced. This way we hopefully ensure that we will ICE if we ever forget to replace one. But also: I didnt think I needed to recur on place, but failing to do so meant I overlooked locals in the projection. So now I recur on place. Satisfying above two changes did mean we need to be more aggressive about getting rid of now useless StorageLive and StorageDead on these locals. Note 6: Derefs invalidate replacement attempts in any context, not just mutations. Updates ------- rewrote saw_upvar_to_local. Namely, unified invalidation control paths (because I realized that when looking at `_l = _1.field`, if you need to invalidate either left- or right-hand side, you end up needing to invalidate both). Also made logic for initializing the upvar_to_ty map more robust: Instead of asserting that we encounter each upvar at most once (because, when chains stop growing, we cannot assume that), now just ensure that the types we end up inserting are consistent. (Another alternative would be to bail out of the routine if the chain is not marked as growing; I'm still debating about which of those two approaches yields better code here.) Fixed a bug in how I described an invariant on `LocalState::Ineligible`. Updated to respect -Zmir_opt_level=0
…calProp. Problem Overview ---------------- When `async fn` is desugared, there's a whole bunch of local-to-local moves that are easily identified and eliminated. However, there is one exception: the sugaring of `.await` does `a = IntoFuture::into_future(b);`, and that is no longer obviously a move from the viewpoint of the analysis. However, for all F: Future, `<F as IntoFuture>::into_future(self)` is "guaranteed" to be the identity function that returns `self`. So: this matches `a = <impl Future as IntoFuture>::into_future(b);` and replaces it with `a = b;`, based on reasoning that libcore's blanket implementation of IntoFuture for impl Future is an identity function that takes `self` by value. This transformation, in tandem with UpvarToLocalProp, is enough to address both case 1 and case 2 of Rust issue 62958. InlineFutureIntoFuture respects optimization fuel, same as UpvarToLocalProp (much simpler to implement in this case). inline-future-into-future: improved comments during code walk for a rubber duck. MERGEME inline_future_into_future revised internal instrumentation to print out arg0 type (because that is what is really going to matter and I should be doing more to let it drive the analysis.) Updates ------- respect -Zmir_opt_level=0
…egardless of their default settings) and then updated the numbers to reflect the improvements those transformations now yield.
transformations added here (regardless of their default settings) and updated the numbers in the output to reflect the improvements those transformations yield.
… as I identify them. issue-62958-c.rs was reduced from the tracing-attributes proc-macro crate. issue-62958-d.rs was reduced from the doc test attached to `AtomicPtr::from_mut_slice`. issue-62958-e.rs covers some important operational characteristics.
020d08b
to
f4a1d4b
Compare
The job Click to see the possible cause of the failure (guessed by this bot)
|
(I have decided that time would be better spent trying to fix the problem in a more general manner rather than trying to apply band-aids like what this PR was proposing.) |
This is a pair of targeted MIR optimizations that are intended to address issue #62958, at least partially.
There remain scenarios where one can see upvar/local duplication that are still not covered by this
PR, but this catches the easy ones in a relatively straight-forward way.
Todo:
-Z mir-opt-level=0
(comment)-Z mir-enable-passes
for toggling, rather than adding new-Z
flags (comment)upvar_to_local_prop
can be removed or simplified by leveraging the existingcopy_prop
mir transform that is already turned on (but at a different phase, IIUC). (comment)replace_base
viaPlace::project_deeper
(comment)leveragetcx.type_implements_trait
method (comment)