-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redundant Branch Opts Enhancements #48115
Comments
Phi-case mentioned above: ; Assembly listing for method System.Type:get_IsInterface():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; fully interruptible
; PGO data available, but JitDisablePGO != 0
; Final local variable assignments
;
; V00 this [V00,T01] ( 5, 4 ) ref -> rcx this class-hnd
; V01 loc0 [V01,T02] ( 3, 2.50) ref -> rax class-hnd
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V03 tmp1 [V03,T00] ( 5, 6.75) ref -> rax class-hnd "spilling QMark2"
;
; Lcl frame size = 40
G_M8876_IG01: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M8876_IG02: ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref, isz
; gcrRegs +[rcx]
mov rax, rcx
; gcrRegs +[rax]
test rax, rax
je SHORT G_M8876_IG05
;; bbWeight=1 PerfScore 1.50
G_M8876_IG03: ; gcrefRegs=00000003 {rax rcx}, byrefRegs=00000000 {}, byref, isz
mov rdx, 0xD1FFAB1E
cmp qword ptr [rax], rdx
je SHORT G_M8876_IG05
;; bbWeight=0.25 PerfScore 0.81
G_M8876_IG04: ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref
; gcrRegs -[rax]
xor rax, rax
; gcrRegs +[rax]
;; bbWeight=0.12 PerfScore 0.03
G_M8876_IG05: ; gcrefRegs=00000003 {rax rcx}, byrefRegs=00000000 {}, byref, isz
test rax, rax
je SHORT G_M8876_IG08
;; bbWeight=1 PerfScore 1.25
G_M8876_IG06: ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref
; gcrRegs -[rcx]
mov rcx, rax
; gcrRegs +[rcx]
;; bbWeight=0.50 PerfScore 0.12
G_M8876_IG07: ; , epilog, nogc, extend
add rsp, 40
jmp System.RuntimeTypeHandle:IsInterface()
; gcr arg pop 0
;; bbWeight=0.50 PerfScore 1.12
G_M8876_IG08: ; gcVars=0000000000000000 {}, gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, gcvars, byref
; gcrRegs -[rax]
mov rax, qword ptr [rcx]
mov rax, qword ptr [rax+120]
call qword ptr [rax]hackishModuleName:hackishMethodName()
; gcrRegs -[rcx]
; gcr arg pop 0
test al, 32
setne al
movzx rax, al
;; bbWeight=0.50 PerfScore 4.25
G_M8876_IG09: ; , epilog, nogc, extend
add rsp, 40
ret
;; bbWeight=0.50 PerfScore 0.62 The test in IG05 is redundant; there are 3 preds, two have If we were to chase the phi defs we'd find that substituting those defs the VNs into the
Logically we could also envision this as a a PHI-EQ interchange, that is instead of (EQ(PHI(...)) we equivalently have PHI(EQ(...)) and those inner EQs are the ones with VN matches. |
If the current relop has PHI inputs, see if any of those inputs would produce the same relop VN as a dominating compare; if so the current relop is partially redundant, and we may be able to optimize some of the paths through the relop block via jump threading. Addresses cases like the one seen in dotnet#48115, though that particular case is not optimized as the current relop block has a side effect.
Handle cases where the dominating compare is the reverse of the compare we're trying to optimize. For example, if `(x > y)` dominates `(y <= x)` we may be able to optimize the dominated compare. Addresses aspects of dotnet#48115.
Handle cases where the dominating compare is the reverse of the compare we're trying to optimize. For example, if `(x > y)` dominates `(y <= x)` we may be able to optimize the dominated compare. Addresses aspects of #48115.
Another example of "handle cases where the redundant compare block has side effects": static bool Test(string s)
{
if (s == null)
{
Console.WriteLine("11");
}
Console.WriteLine("22");
if (s == null)
{
Console.WriteLine("33");
}
return true;
}
|
In some cases the value of a block's branch predicate is correlated with the predecessor of the block. Often this correlation is hinted at by the presence of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph. For each predecessor of a block, we evaluate the predicate value number using the values brought to the block by that predecessor. If we find correlations, we use them to drive the existing jump threading optimization. Also, if we end up partially disambiguating such that there is just one remaining predecessor, update the value number of the predicate to reflect the values that flow in from that predecessor. Fixes dotnet#75944. Contributes to dotnet#48115.
In some cases the value of a block's branch predicate is correlated with the predecessor of the block. Often this correlation is hinted at by the presence of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph. For each predecessor of a block, we evaluate the predicate value number using the values brought to the block by that predecessor. If we find correlations, we use them to drive the existing jump threading optimization. Make sure that when we search local PHIs we also match the ssa def number to ensure we're looking at the right PHI. Also, if we end up partially disambiguating such that there is just one remaining predecessor, update the value number of the predicate to reflect the values that flow in from that predecessor. Fixes #75944. Contributes to #48115.
There is often a single-def single use temp consuming a QMARK (which in turn often comes from expansion of `isinst` and friends). This temp assignment tends to stay around and can inhibit redundant branch opts in a number of interesting cases. It may also serve as an attractive nuisance for copy prop. While it would be ideal to generalize RBO to handle assignment side effects, doing so appears quite challenging, as we would need to rewrite possibly large chunks of the SSA and VN graphs. Forward sub eliminates the temp and so clears the way for the existing RBO to remove more branches. Addresses aspects of the "side effect" enhancements for RBO (see dotnet#48115).
There is often a single-def single use temp consuming a QMARK (which in turn often comes from expansion of `isinst` and friends). This temp assignment tends to stay around and can inhibit redundant branch opts in a number of interesting cases. It may also serve as an attractive nuisance for copy prop. While it would be ideal to generalize RBO to handle assignment side effects, doing so appears quite challenging, as we would need to rewrite possibly large chunks of the SSA and VN graphs. Forward sub eliminates the temp and so clears the way for the existing RBO to remove more branches. Addresses aspects of the "side effect" enhancements for RBO (see #48115).
Revise SSA to add one phi arg per pred instead of one phi arg per ssa def. This unlocks some cases for redundant branch opts. In some pathological cases we may see extremely long phi arg lists. Will keep an eye out for that and perhaps modify the strategy if we end up with hundreds or thousands of phi args. Addresses an issue noted in dotnet#48115.
Revise SSA to add one phi arg per pred instead of one phi arg per ssa def. This unlocks some cases for redundant branch opts. In some pathological cases we may see extremely long phi arg lists. Will keep an eye out for that and perhaps modify the strategy if we end up with hundreds or thousands of phi args. Addresses an issue noted in #48115.
#43811 introduced a redundant branch optimization, further enhanced in #46237 and #46257.
There are still more improvements possible. Here's a partial list of ideas, issues, and improvements:
x > 0
dominatingx == 0
).fgOptimizeUncondBranchToSimpleCond
x > 5
dominatingx > 3
)QMARK
. We see this inType.IsInstance
, for instance.AND
,OR
(would guessNOT
is already handled by VN?).optJumpThread
on the basic block epochoptReachable
on the visited BB flagsCheck
method and elsewhere eliminate most of the places where an exception could sneak past...?category:cq
theme:redundant-branches
skill-level:expert
cost:large
impact:medium
The text was updated successfully, but these errors were encountered: