Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant Branch Opts Enhancements #48115

Open
AndyAyersMS opened this issue Feb 10, 2021 · 3 comments
Open

Redundant Branch Opts Enhancements #48115

AndyAyersMS opened this issue Feb 10, 2021 · 3 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 10, 2021

#43811 introduced a redundant branch optimization, further enhanced in #46237 and #46257.

There are still more improvements possible. Here's a partial list of ideas, issues, and improvements:

category:cq
theme:redundant-branches
skill-level:expert
cost:large
impact:medium

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 10, 2021
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Feb 10, 2021
@AndyAyersMS AndyAyersMS added this to the Future milestone Feb 10, 2021
@AndyAyersMS
Copy link
Member Author

Phi-case mentioned above:

; Assembly listing for method System.Type:get_IsInterface():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; fully interruptible
; PGO data available, but JitDisablePGO != 0
; Final local variable assignments
;
;  V00 this         [V00,T01] (  5,  4   )     ref  ->  rcx         this class-hnd
;  V01 loc0         [V01,T02] (  3,  2.50)     ref  ->  rax         class-hnd
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V03 tmp1         [V03,T00] (  5,  6.75)     ref  ->  rax         class-hnd "spilling QMark2"
;
; Lcl frame size = 40

G_M8876_IG01:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
       sub      rsp, 40
						;; bbWeight=1    PerfScore 0.25
G_M8876_IG02:        ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref, isz
       ; gcrRegs +[rcx]
       mov      rax, rcx
       ; gcrRegs +[rax]
       test     rax, rax
       je       SHORT G_M8876_IG05
						;; bbWeight=1    PerfScore 1.50
G_M8876_IG03:        ; gcrefRegs=00000003 {rax rcx}, byrefRegs=00000000 {}, byref, isz
       mov      rdx, 0xD1FFAB1E
       cmp      qword ptr [rax], rdx
       je       SHORT G_M8876_IG05
						;; bbWeight=0.25 PerfScore 0.81
G_M8876_IG04:        ; gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, byref
       ; gcrRegs -[rax]
       xor      rax, rax
       ; gcrRegs +[rax]
						;; bbWeight=0.12 PerfScore 0.03
G_M8876_IG05:        ; gcrefRegs=00000003 {rax rcx}, byrefRegs=00000000 {}, byref, isz
       test     rax, rax
       je       SHORT G_M8876_IG08
						;; bbWeight=1    PerfScore 1.25
G_M8876_IG06:        ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref
       ; gcrRegs -[rcx]
       mov      rcx, rax
       ; gcrRegs +[rcx]
						;; bbWeight=0.50 PerfScore 0.12
G_M8876_IG07:        ; , epilog, nogc, extend
       add      rsp, 40
       jmp      System.RuntimeTypeHandle:IsInterface()
       ; gcr arg pop 0
						;; bbWeight=0.50 PerfScore 1.12
G_M8876_IG08:        ; gcVars=0000000000000000 {}, gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, gcvars, byref
       ; gcrRegs -[rax]
       mov      rax, qword ptr [rcx]
       mov      rax, qword ptr [rax+120]
       call     qword ptr [rax]hackishModuleName:hackishMethodName()
       ; gcrRegs -[rcx]
       ; gcr arg pop 0
       test     al, 32
       setne    al
       movzx    rax, al
						;; bbWeight=0.50 PerfScore 4.25
G_M8876_IG09:        ; , epilog, nogc, extend
       add      rsp, 40
       ret      
						;; bbWeight=0.50 PerfScore 0.62

The test in IG05 is redundant; there are 3 preds, two have EAX == null and the third EAX != null. But the local test VN is based on a PHI and so does not match any dominating VN.

If we were to chase the phi defs we'd find that substituting those defs the VNs into the EQ we'd find matching dominating compare VNs and would attempt jump threading, which would succeed and all preds could bypass IG05 (targeting IG06 or IG08 as appropriate).

N001 [000020]   LCL_VAR   V03 tmp1         u:2 (last use) => $1c0 {PhiDef($3, $2, $143)}
N002 [000021]   CNS_INT   null => $VN.Null
N003 [000022]   EQ        => $103 {EQ($1c0, $0)}

***** BB04, STMT00002(after)
N004 (  5,  5) [000023] ------------              *  JTRUE     void  
N003 (  3,  3) [000022] J------N----              \--*  EQ        int    $103
N001 (  1,  1) [000020] ------------                 +--*  LCL_VAR   ref    V03 tmp1         u:2 (last use) $1c0
N002 (  1,  1) [000021] ------------                 \--*  CNS_INT   ref    null $VN.Null

Logically we could also envision this as a a PHI-EQ interchange, that is instead of (EQ(PHI(...)) we equivalently have PHI(EQ(...)) and those inner EQs are the ones with VN matches.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Mar 16, 2021
If the current relop has PHI inputs, see if any of those inputs would
produce the same relop VN as a dominating compare; if so the current
relop is partially redundant, and we may be able to optimize some of
the paths through the relop block via jump threading.

Addresses cases like the one seen in dotnet#48115, though that particular
case is not optimized as the current relop block has a side effect.
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Oct 21, 2021
Handle cases where the dominating compare is the reverse of the compare
we're trying to optimize. For example, if `(x > y)` dominates `(y <= x)`
we may be able to optimize the dominated compare.

Addresses aspects of dotnet#48115.
AndyAyersMS added a commit that referenced this issue Oct 23, 2021
Handle cases where the dominating compare is the reverse of the compare
we're trying to optimize. For example, if `(x > y)` dominates `(y <= x)`
we may be able to optimize the dominated compare.

Addresses aspects of #48115.
@AndyAyersMS
Copy link
Member Author

Looking at handling side effects in jump threading.

Here's the distribution of costs for blocks that have jump threading across SPMI.

image

A cost limit of 20 (in units of GetCostSz) would get most occurrences.

Recall for various reasons we want to be able to run this optimization without introducing new basic blocks, if possible.

Seemingly in most cases we could get by with making just one copy of the code. Assuming no ambiguous preds, then one set of preds could continue to target the current block and use the copy that's already there; for the other set we could put a copy at the start of the relevant successor block (if viable) or if not, if there's just one pred in that set, at the end of that pred (if viable). I haven't done this bit of the screening yet so it may prove overly limiting. For example in the below we duplicate side effect S and add a copy to the C.

We could remove the (if viable) by ensuring that we split all critical edges before running the optimizer; then we would always be able to place the copy before the appropriate successor.

@EgorBo
Copy link
Member

EgorBo commented Jun 17, 2022

Another example of "handle cases where the redundant compare block has side effects":

static bool Test(string s)
{
    if (s == null)
    {
        Console.WriteLine("11");
    }

    Console.WriteLine("22");

    if (s == null)
    {
        Console.WriteLine("33");
    }
    return true;
}

Console.WriteLine("22") is considered as a side-effect and brakes jump-threading

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Sep 27, 2022
In some cases the value of a block's branch predicate is correlated with the
predecessor of the block. Often this correlation is hinted at by the presence
of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph.

For each predecessor of a block, we evaluate the predicate value number using
the values brought to the block by that predecessor. If we find correlations,
we use them to drive the existing jump threading optimization.

Also, if we end up partially disambiguating such that there is just one
remaining predecessor, update the value number of the predicate to reflect
the values that flow in from that predecessor.

Fixes dotnet#75944.
Contributes to dotnet#48115.
AndyAyersMS added a commit that referenced this issue Sep 29, 2022
In some cases the value of a block's branch predicate is correlated with the
predecessor of the block. Often this correlation is hinted at by the presence
of phis in the predicate's tree and/or phi VNs in in the predicate's VN graph.

For each predecessor of a block, we evaluate the predicate value number using
the values brought to the block by that predecessor. If we find correlations,
we use them to drive the existing jump threading optimization.

Make sure that when we search local PHIs we also match the ssa def
number to ensure we're looking at the right PHI.

Also, if we end up partially disambiguating such that there is just one
remaining predecessor, update the value number of the predicate to reflect
the values that flow in from that predecessor.

Fixes #75944.
Contributes to #48115.
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Oct 1, 2022
There is often a single-def single use temp consuming a QMARK (which in
turn often comes from expansion of `isinst` and friends). This temp
assignment tends to stay around and can inhibit redundant branch opts
in a number of interesting cases. It may also serve as an attractive
nuisance for copy prop.

While it would be ideal to generalize RBO to handle assignment side
effects, doing so appears quite challenging, as we would need to rewrite
possibly large chunks of the SSA and VN graphs. Forward sub eliminates
the temp and so clears the way for the existing RBO to remove more branches.

Addresses aspects of the "side effect" enhancements for RBO (see dotnet#48115).
AndyAyersMS added a commit that referenced this issue Oct 1, 2022
There is often a single-def single use temp consuming a QMARK (which in
turn often comes from expansion of `isinst` and friends). This temp
assignment tends to stay around and can inhibit redundant branch opts
in a number of interesting cases. It may also serve as an attractive
nuisance for copy prop.

While it would be ideal to generalize RBO to handle assignment side
effects, doing so appears quite challenging, as we would need to rewrite
possibly large chunks of the SSA and VN graphs. Forward sub eliminates
the temp and so clears the way for the existing RBO to remove more branches.

Addresses aspects of the "side effect" enhancements for RBO (see #48115).
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Apr 28, 2023
Revise SSA to add one phi arg per pred instead of one phi arg per ssa def.
This unlocks some cases for redundant branch opts.

In some pathological cases we may see extremely long phi arg lists. Will
keep an eye out for that and perhaps modify the strategy if we end up
with hundreds or thousands of phi args.

Addresses an issue noted in dotnet#48115.
AndyAyersMS added a commit that referenced this issue May 1, 2023
Revise SSA to add one phi arg per pred instead of one phi arg per ssa def.
This unlocks some cases for redundant branch opts.

In some pathological cases we may see extremely long phi arg lists. Will
keep an eye out for that and perhaps modify the strategy if we end up
with hundreds or thousands of phi args.

Addresses an issue noted in #48115.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

2 participants