Notes about MIR generation for match expressions #943
Labels
A-match
Area: `match` expressions
A-MIR
Area: mid-level intermediate representation (MIR)
C-enhancement
Category: enhancement
E-hard
Difficulty: might require advanced knowledge
T-compiler
Relevant to compiler team
I wrote these up while working on rust-lang/rust#72680
I'm not an expert on this code (this is the fist time I'm reading it), so take this with a grain of salt.
Background
First, my assumption in the last post is incorrect. or-patterns aren't desugared into different patterns. IIUC, this is to avoid exponential blowup both in the size of the MIR and of the generated code, as well as the computational time to check compile. Here is a quick overview of the code as it stands today (to the best of my understanding).
All of the relevant code is in
rustc_mir_build::build::matches
. The comments throughout this module are pretty helpful, but kind of high-level.We have a
match
expression, which consists of aplace
to match on (the "scrutinee") and collection ofArm
s. Each arm has a pattern and possible a guard expression, and each pattern may be composed of arbitrarily nested subpatterns. Given this, we want to generate the MIR that checks all of the relevant patterns to find the first matching arm.The main entry point is
match_expr
https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L87 The comments on this function are helpful. They talk about the overall algorithm.The heart of the algorithm is the
match_candidates
method, which also has great comments. https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L890 This is where the MIR is actually built for the matching itself.match_candidates
takes a list ofCandidate
s and a few references toBasicBlock
s. TheBasicBlock
s are where we branch to in different cases, and we will be generating more blocks as we match on the scrutinee. We will recursively callmatch_candidates
as we evaluate the candidates.The argument (of
match_candidates
)candidates: &mut [&mut Candidate]
is roughly equivalent to an arm (or part of an arm). Specifically: https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L687-L712 A particularCandidate
is satisfied if all of itsmatch_pairs
are satisfied. EachMatchPair
inmatch_pairs
is a single pattern match against a single place. We would extend thematch_pairs
of theCandidate
recursively to get nested matches. So for example if we had an arm likeFoo { x: Some(0), y: false }
, we would first have aMatchPair
matching the scrutinee place withFoo
; then, we would add a couple more to match the placesscrutinee.x
andscrutinee.y
withSome
andfalse
, respectively; then, we would add one more to match placescrutinee.x.0
to0
. So by the end of this process, all of theseMatchPair
s need to match for theCandidate
to match.Notably, throughout the algorithm, there are recursive calls to
match_candidates
with subsets ofcandidates
. This simplifies the logic significantly, but it means thatcandidates
may be non-exhaustive (even though Rustmatch
es are exhaustive).match_candidates
does a bunch of stuff, but most relevant for this issue is that it callstest_candidates_with_or
. In the terminology of the code, a "test" is a check of whether a place matches part of a pattern (e.g.scrutinee.x
matchesSome
). (Note: we are not actually executing this test right now; it's still compile time! Rather we are generating the code that does the test into the user's program. For some reason, this mindset was hard for me to stay in).test_candidates_with_or
generates the MIR for the tests for each candidate incandidates
by recursively callingmatch_candidates
. Specifically, we pop the first candidate fromcandidates
and generate it's MIR; then, we recursivelymatch_candidates
on the tail.Or-patterns at the top-level of a pattern are handled specially in
test_candidates_with_or
. If the first pattern incandidates
is an or-pattern, we calltest_or_pattern
https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L1202 This method splits the candidate into two or moreCandidate
s temporarily and callsmatch_candidates
on this small list of temporary candidates. Afterwards, if the candidates are all simple, we can generate some glue MIR that OR's them all together; otherwise, we just leave them as a collection of different candidates and we just have a larger MIR.We continue recursively calling the above chain of methods until all or-patterns have been removed from the top level patterns. So we are left with only non-or-patterns at the top level.
When we have only non-or-patterns,
test_candidates_with_or
callstest_candidates
https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L1377test_candidates
has a lot of vary useful comments explaining what's going on. Basically, we only try to process one candidate, but in processing it, we may be able to share work for some of the other candidates. Processing a candidate requires figuring out what kind of test to apply for eachMatchPair
in a candidate. For example, if we are matching against a variant, we do aSwitch
test; if we are matching against an integer, we do aSwitchInt
test, etc...During
test_candidates
, we may add moreMatchPair
s to a candidate to test for nested subpatterns (more in the next few bullets). At the end oftest_candidates
, we recursively callmatch_candidates
on the remaining candidates, which ensures that we process all subpatterns.test_candidates
callstest::test
to generate an actual test for aMatchPair
. https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/test.rs#L27 This function assumes that it will not have to generate a test for an or-pattern (they should all have been dealt with bytest_or_pattern
), and this is where the ICE is (more later).test_candidates
then tries to share the test with all candidates it can. For example, if more than one candidate need a test forscrutinee.x
matchesSome
, then both candidates can branch from the basic block where that test is done, effectively combining two (or more) basic blocks in an otherwise exponential set of basic blocks. To determine which candidates can share the test,test_candidates
callstest::sort_candidates
https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/test.rs#L485 (which also has good comments).sort_candidates
will find all candidates that can share the test and simplify them by addingMatchPair
s to the relevantCandidates
for the immediate child subpatterns of the top-level pattern (if there are subpatterns). These subpatterns are later evaluated whenevermatch_candidates
is recursively called later (and they may expand to more subpatterns, etc) until we have processed all theMatchPair
s in all the candidates and generated tests for them.The text was updated successfully, but these errors were encountered: