Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid hoisting of indirections proven to be loop invariant #54118

Closed
jakobbotsch opened this issue Jun 13, 2021 · 35 comments · Fixed by #55936
Closed

Invalid hoisting of indirections proven to be loop invariant #54118

jakobbotsch opened this issue Jun 13, 2021 · 35 comments · Fixed by #55936
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@jakobbotsch
Copy link
Member

// Generated by Fuzzlyn v1.2 on 2021-06-13 00:28:44
// Seed: 3271777432388482561
// Reduced from 81.8 KiB to 0.3 KiB in 00:01:03
// Debug: Outputs 1
// Release: Outputs 0
public class Program
{
    public static void Main()
    {
        uint[] vr1 = new uint[] { 0 };
        for (int vr2 = 0; vr2 < 2; vr2++)
        {
            vr1[0] = 1;
            var vr3 = vr1[0];
            System.Console.WriteLine(vr3);
        }
    }
}

We hoist vr1[0] out of the loop because it's constant. Seems to be a variant of #13328 which was fixed for static fields in dotnet/coreclr#26952.

Another variant:

// Generated by Fuzzlyn v1.2 on 2021-06-13 01:10:47
// Seed: 15404436673887137956
// Reduced from 66.4 KiB to 0.3 KiB in 00:00:53
// Debug: Outputs 1
// Release: Outputs 0
public class Program
{
    static uint[] s_16 = new uint[]{0};
    public static void Main()
    {
        ref uint vr0 = ref s_16[0];
        for (int vr1 = 0; vr1 < 2; vr1++)
        {
            vr0 = 1;
            vr0 = vr0;
        }

        System.Console.WriteLine(s_16[0]);
    }
}

cc @dotnet/jit-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 13, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@jakobbotsch jakobbotsch added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 13, 2021
@sandreenko
Copy link
Contributor

We hoist vr1[0] out of the loop because it's constant.

which tree do we hoist?

@jakobbotsch
Copy link
Member Author

which tree do we hoist?

JIT dump: https://gist.github.com/jakobbotsch/3a71d78c99c3c267d697bbb267ddeb27

optHoistLoopCode for loop L00 <BB02..BB02>:
  Loop body contains a call
  Loop has single exit

  USEDEF  (2)={V00 V01}
  INOUT   (2)={V00 V01}
  LOOPVARS(2)={V00 V01}
    optHoistLoopBlocks BB02 (weight=  4   ) of loop L00 <BB02..BB02>, firstBlock is true

Hoisting a copy of [000019] into PreHeader for loop L00 <BB02..BB02>:
N006 (  4,  4) [000019] a---G-------              *  IND       int    <l:$44, c:$441>
N005 (  2,  2) [000047] -------N----              \--*  ADD       byref  $400
N003 (  1,  1) [000039] ------------                 +--*  LCL_VAR   ref    V00 loc0         u:2 $240
N004 (  1,  1) [000046] ------------                 \--*  CNS_INT   long   16 Fseq[#FirstElem] $141

New Basic Block BB04 [0005] created.

Created PreHeader (BB04) for loop L00 (BB02 - BB02), with weight = 1   
Setting edge weights for BB01 -> BB04 to [0 .. 3.402823e+38]
Setting edge weights for BB01 -> BB04 to [100 .. 100]
Setting edge weights for BB04 -> BB02 to [0 .. 3.402823e+38]
Setting edge weights for BB04 -> BB02 to [100 .. 100]
This hoisted copy placed in PreHeader (BB04):
               [000064] ----G-------              *  COMMA     void  
     (  4,  4) [000059] a---G--H----              +--*  IND       int    <l:$44, c:$441>
     (  2,  2) [000060] -------N----              |  \--*  ADD       byref  $400
     (  1,  1) [000061] -------H----              |     +--*  LCL_VAR   ref    V00 loc0         u:2 $240
     (  1,  1) [000062] -------H----              |     \--*  CNS_INT   long   16 Fseq[#FirstElem] $141
               [000063] ------------              \--*  NOP       void  
Blocks/Trees after optHoistLoopCode() modified flowgraph

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..00B)-> BB03 ( cond )                     i hascall new[] 
BB04 [0005]  1       BB01                  1       [00B..???)                                     internal LoopPH 
BB02 [0001]  2       BB02,BB04             4     0 [00B..01F)-> BB02 ( cond )                     i Loop hascall gcsafe idxlen bwd bwd-target 
BB03 [0003]  2       BB01,BB02             1       [01F..020)        (return)                     i 
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..00B) -> BB03 (cond), preds={} succs={BB04,BB03}

***** BB01
STMT00000 (IL 0x000...0x006)
N007 ( 17, 18) [000004] -ACXG---R---              *  ASG       ref    $1c2
N006 (  1,  1) [000003] D------N----              +--*  LCL_VAR   ref    V00 loc0         d:2 $240
N005 ( 17, 18) [000002] --CXG-------              \--*  CALL help ref    HELPER.CORINFO_HELP_NEWARR_1_VC $1c2
N003 (  2, 10) [000001] H----------- arg0 in rcx     +--*  CNS_INT(h) long   0x7ffa2f2f19b8 class $100
N004 (  1,  1) [000000] ------------ arg1 in rdx     \--*  CNS_INT   long   1 $140

***** BB01
STMT00001 (IL 0x007...0x008)
N003 (  1,  3) [000007] -A------R---              *  ASG       int    $40
N002 (  1,  1) [000006] D------N----              +--*  LCL_VAR   int    V01 loc1         d:2 $40
N001 (  1,  1) [000005] ------------              \--*  CNS_INT   int    0 $40

***** BB01
STMT00007 (IL 0x01B...  ???)
N004 (  5,  5) [000050] ------------              *  JTRUE     void  
N003 (  3,  3) [000051] J------N----              \--*  GE        int    $40
N001 (  1,  1) [000052] ------------                 +--*  LCL_VAR   int    V01 loc1         u:2 $40
N002 (  1,  1) [000053] ------------                 \--*  CNS_INT   int    2 $42

------------ BB04 [00B..???), preds={BB01} succs={BB02}

***** BB04
STMT00009 (IL   ???...  ???)
N006 (  4,  4) [000064] ----G-------              *  COMMA     void  
N004 (  4,  4) [000059] a---G--H----              +--*  IND       int    <l:$44, c:$441>
N003 (  2,  2) [000060] -------N----              |  \--*  ADD       byref  $400
N001 (  1,  1) [000061] -------H----              |     +--*  LCL_VAR   ref    V00 loc0         u:2 $240
N002 (  1,  1) [000062] -------H----              |     \--*  CNS_INT   long   16 Fseq[#FirstElem] $141
N005 (  0,  0) [000063] ------------              \--*  NOP       void  

------------ BB02 [00B..01F) -> BB02 (cond), preds={BB02,BB04} succs={BB03,BB02}

***** BB02
STMT00008 (IL   ???...  ???)
N005 (  0,  0) [000056] -A------R---              *  ASG       int   
N004 (  0,  0) [000054] D------N----              +--*  LCL_VAR   int    V01 loc1         d:3
N003 (  0,  0) [000055] ------------              \--*  PHI       int   
N001 (  0,  0) [000058] ------------ pred BB02       +--*  PHI_ARG   int    V01 loc1         u:4
N002 (  0,  0) [000057] ------------ pred BB04       \--*  PHI_ARG   int    V01 loc1         u:2 $40

***** BB02
STMT00003 (IL 0x00B...0x00E)
N008 (  6,  6) [000016] -A--G-------              *  ASG       int    $VN.Void
N006 (  4,  4) [000038] ----G--N----              +--*  COMMA     int    $VN.Void
N001 (  0,  0) [000032] ------------              |  +--*  NOP       void   $380
N005 (  4,  4) [000015] a---G--N----              |  \--*  IND       int    $44
N004 (  2,  2) [000037] -------N----              |     \--*  ADD       byref  $400
N002 (  1,  1) [000029] ------------              |        +--*  LCL_VAR   ref    V00 loc0         u:2 $240
N003 (  1,  1) [000036] ------------              |        \--*  CNS_INT   long   16 Fseq[#FirstElem] $141
N007 (  1,  1) [000014] ------------              \--*  CNS_INT   int    1 $44

***** BB02
STMT00004 (IL 0x00F...0x01A)
N008 ( 18, 10) [000020] --CXG-------              *  CALL      void   System.Console.WriteLine $VN.Void
N007 (  4,  4) [000048] ----G--N---- arg0 in rcx  \--*  COMMA     int    <l:$44, c:$441>
N002 (  0,  0) [000042] ------------                 +--*  NOP       void   $381
N006 (  4,  4) [000019] a---G-------                 \--*  IND       int    <l:$44, c:$441>
N005 (  2,  2) [000047] -------N----                    \--*  ADD       byref  $400
N003 (  1,  1) [000039] ------------                       +--*  LCL_VAR   ref    V00 loc0         u:2 $240
N004 (  1,  1) [000046] ------------                       \--*  CNS_INT   long   16 Fseq[#FirstElem] $141

***** BB02
STMT00005 (IL 0x017...  ???)
N005 (  3,  3) [000025] -A------R---              *  ASG       int    $2c2
N004 (  1,  1) [000024] D------N----              +--*  LCL_VAR   int    V01 loc1         d:4 $2c2
N003 (  3,  3) [000023] ------------              \--*  ADD       int    $2c2
N001 (  1,  1) [000021] ------------                 +--*  LCL_VAR   int    V01 loc1         u:3 (last use) $300
N002 (  1,  1) [000022] ------------                 \--*  CNS_INT   int    1 $44

***** BB02
STMT00002 (IL 0x01B...0x01D)
N004 (  5,  5) [000011] ------------              *  JTRUE     void  
N003 (  3,  3) [000010] J------N----              \--*  LT        int    $2c3
N001 (  1,  1) [000008] ------------                 +--*  LCL_VAR   int    V01 loc1         u:4 $2c2
N002 (  1,  1) [000009] ------------                 \--*  CNS_INT   int    2 $42

------------ BB03 [01F..020) (return), preds={BB01,BB02} succs={}

***** BB03
STMT00006 (IL 0x01F...0x01F)
N001 (  0,  0) [000026] ------------              *  RETURN    void   $480

-------------------------------------------------------------------------------------------------------------------

It seems like the compiler reasons that since vr1[0] is constant it must be loop invariant, but it is only constant because it was just assigned inside the loop. There's more information over in #13328.

@JulieLeeMSFT
Copy link
Member

@sandreenko assigning this to you for now.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jun 15, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Jun 15, 2021
@jakobbotsch
Copy link
Member Author

I'll try to take a look at this.

@jakobbotsch
Copy link
Member Author

Value numbering seems a bit inconsistent with regards to by-refs. For

public static int Main()
{
    uint[] arr = new uint[1];
    ref uint r = ref arr[0];
    for (int vr1 = 0; vr1 < 2; vr1++)
    {
        r = 1;
        if (r != 1)
            return 101;
    }

    return 100;
}

we get for the second read of r:

N002 [000020]   IND       => <l:$384 {norm=$84 {IntCns 1}, exc=$207 {NullPtrExc($340)}}, c:$383 {norm=$4c0 {4c0}, exc=$207 {NullPtrExc($340)}}>
N003 [000021]   CNS_INT   1 => $84 {IntCns 1}
N004 [000022]   NE        => <l:$387 {norm=$80 {IntCns 0}, exc=$207 {NullPtrExc($340)}}, c:$386 {norm=$385 {NE($4c0, $84)}, exc=$207 {NullPtrExc($340)}}>

so value numbering knows that the indirection is constant 1, and subsequently concludes that it is invariant and hoists it out.
If we use a local instead:

public static int Main()
{
    uint loc = 0;
    ref uint r = ref loc;
    for (int vr1 = 0; vr1 < 2; vr1++)
    {
        r = 1;
        if (r != 1)
            return 101;
    }

    return 100;
}

then we get

N002 [000022]   IND       => <l:$342 {norm=$381 {ByrefExposedLoad($85, $2c0, $400)}, exc=$203 {NullPtrExc($2c0)}}, c:$341 {norm=$440 {440}, exc=$203 {NullPtrExc($2c0)}}>
N003 [000023]   CNS_INT   1 => $84 {IntCns 1}
N004 [000024]   NE        => <l:$346 {norm=$344 {NE($381, $84)}, exc=$203 {NullPtrExc($2c0)}}, c:$345 {norm=$343 {NE($440, $84)}, exc=$203 {NullPtrExc($2c0)}}>

and ByrefExposedLoad is not considered invariant.

@jakobbotsch
Copy link
Member Author

The problem remains that we conclude that an indirection is constant (ok), conclude that constants are invariant (ok) and then hoist out the indirection (not ok).
One potential solution is to make hoisting a bit more conservative in the same way that #26952 fixed this problem for static variables. That is, avoid hoisting GT_IND nodes that are constants because they may be constant since they were assigned inside the loop. If we do that, we get a handful of negative size diffs:

benchmarks.run.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 407
Total bytes of diff: 399
Total bytes of delta: -8 (-1,97% of base)
    diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -8 : 27630.dasm (-1,97% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -8 (-1,97% of base) : 27630.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger

Top method improvements (percentages):
          -8 (-1,97% of base) : 27630.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger

1 total methods with Code Size differences (1 improved, 0 regressed), 0 unchanged.


coreclr_tests.pmi.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 1891
Total bytes of diff: 1883
Total bytes of delta: -8 (-0,42% of base)
    diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -8 : 219692.dasm (-1,97% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 1 unchanged.

Top method improvements (bytes):
          -8 (-1,97% of base) : 219692.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger

Top method improvements (percentages):
          -8 (-1,97% of base) : 219692.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger

1 total methods with Code Size differences (1 improved, 0 regressed), 1 unchanged.


libraries.pmi.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 2075
Total bytes of diff: 2075
Total bytes of delta: 0 (0,00% of base)
Detail diffs


0 total files with Code Size differences (0 improved, 0 regressed), 2 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 2 unchanged.


libraries_tests.pmi.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 7704
Total bytes of diff: 7439
Total bytes of delta: -265 (-3,44% of base)
    diff is an improvement.
Detail diffs


Top file improvements (bytes):
        -176 : 93415.dasm (-10,92% of base)
         -22 : 1141.dasm (-1,49% of base)
         -13 : 93426.dasm (-2,54% of base)
         -13 : 93425.dasm (-2,79% of base)
         -13 : 93417.dasm (-2,54% of base)
          -6 : 93418.dasm (-2,30% of base)
          -6 : 93456.dasm (-1,74% of base)
          -6 : 93459.dasm (-1,75% of base)
          -5 : 93457.dasm (-1,68% of base)
          -5 : 93574.dasm (-1,62% of base)

10 total files with Code Size differences (10 improved, 0 regressed), 1 unchanged.

Top method improvements (bytes):
        -176 (-10,92% of base) : 93415.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetNumbersExpectedString(bool,System.String,System.Int32[],System.UInt32[],System.Int64[],System.UInt64[],System.Single[],System.Double[],System.Decimal[],bool):System.String
         -22 (-1,49% of base) : 1141.dasm - Microsoft.Build.Construction.SolutionFile:GetSolutionFileAndVisualStudioMajorVersions(System.String,byref,byref)
         -13 (-2,54% of base) : 93426.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetDatesExpectedString(bool,System.String,System.DateTimeOffset[],bool):System.String
         -13 (-2,79% of base) : 93425.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetDatesExpectedString(bool,System.String,System.DateTime[],bool):System.String
         -13 (-2,54% of base) : 93417.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetGuidsExpectedString(bool,System.String,System.Guid[],bool):System.String
          -6 (-2,30% of base) : 93418.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetNumbersExpectedString(bool,int,System.__Canon):System.String
          -6 (-1,74% of base) : 93456.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetCommentExpectedString(bool,System.String):System.String
          -6 (-1,75% of base) : 93459.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetCustomExpectedString(bool):System.String
          -5 (-1,68% of base) : 93457.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetStringsExpectedString(bool,System.String):System.String
          -5 (-1,62% of base) : 93574.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetExpectedLargeString(bool):System.String

Top method improvements (percentages):
        -176 (-10,92% of base) : 93415.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetNumbersExpectedString(bool,System.String,System.Int32[],System.UInt32[],System.Int64[],System.UInt64[],System.Single[],System.Double[],System.Decimal[],bool):System.String
         -13 (-2,79% of base) : 93425.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetDatesExpectedString(bool,System.String,System.DateTime[],bool):System.String
         -13 (-2,54% of base) : 93426.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetDatesExpectedString(bool,System.String,System.DateTimeOffset[],bool):System.String
         -13 (-2,54% of base) : 93417.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetGuidsExpectedString(bool,System.String,System.Guid[],bool):System.String
          -6 (-2,30% of base) : 93418.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetNumbersExpectedString(bool,int,System.__Canon):System.String
          -6 (-1,75% of base) : 93459.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetCustomExpectedString(bool):System.String
          -6 (-1,74% of base) : 93456.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetCommentExpectedString(bool,System.String):System.String
          -5 (-1,68% of base) : 93457.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetStringsExpectedString(bool,System.String):System.String
          -5 (-1,62% of base) : 93574.dasm - System.Text.Json.Tests.Utf8JsonWriterTests:GetExpectedLargeString(bool):System.String
         -22 (-1,49% of base) : 1141.dasm - Microsoft.Build.Construction.SolutionFile:GetSolutionFileAndVisualStudioMajorVersions(System.String,byref,byref)

10 total methods with Code Size differences (10 improved, 0 regressed), 1 unchanged.


The diff in benchmarks is an actual miscompilation where we hoist the first array access out in this loop: here. Thankfully, it has no effect here since the value is 0 and the loop condition check ends up reloading it from the array (diff after change).
coreclr_tests is the same function which is duplicated.

The biggest regressions are when we no longer hoist out an indirection into a register:

-       mov      r12, qword ptr [(reloc)]
-                                               ;; bbWeight=1.00 PerfScore 27.50
+                                               ;; bbWeight=1.00 PerfScore 25.50
 G_M57024_IG04:        ; gcrefRegs=0000C000 {r14 r15}, byrefRegs=00000000 {}, byref, isz
        mov      rcx, gword ptr [rbp-100H]
        ; gcrRegs +[rcx]
+       mov      r12, qword ptr [(reloc)]
        call     gword ptr [r12+56]hackishModuleName:hackishMethodName()
        ; gcrRegs -[rcx] +[rax]
        ; gcr arg pop 0
        inc      edi
        cmp      edi, esi
        jl       SHORT G_M57024_IG04

But in many cases these hoists also seem costly, since often we end up needing spills and reloads in the loop because of it:

        jle      SHORT G_M2715_IG07
-       mov      r8, qword ptr [(reloc)]
-       mov      r10, r8
-                                               ;; bbWeight=1    PerfScore 17.00
+                                               ;; bbWeight=1    PerfScore 14.75
 G_M2715_IG06:        ; gcrefRegs=0000F068 {rbx rbp rsi r12 r13 r14 r15}, byrefRegs=00000000 {}, byref, isz
        movzx    r8, dil
        mov      rcx, gword ptr [rsp+50H]
        ; gcrRegs +[rcx]
        mov      rdx, rsi
        ; gcrRegs +[rdx]
-       mov      qword ptr [rsp+80H], r10
+       mov      r10, qword ptr [(reloc)]
        call     qword ptr [r10+16]hackishModuleName:hackishMethodName()
        ; gcrRegs -[rcx rdx]
        ; gcr arg pop 0
@@ -220,29 +217,26 @@ G_M2715_IG06:        ; gcrefRegs=0000F068 {rbx rbp rsi r12 r13 r14 r15}, byrefRe
        ; gcr arg pop 0
        mov      eax, dword ptr [rsp+B4H]
        inc      eax
-       cmp      dword ptr [rsp+74H], eax
+       cmp      dword ptr [rsp+78H], eax
        mov      dword ptr [rsp+B4H], eax
-       mov      r10, qword ptr [rsp+80H]
        jg       SHORT G_M2715_IG06

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Jul 2, 2021
It is possible they are constant because they were assigned inside the
loop, so we cannot hoist them out in this situation.

Fix dotnet#54118
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 2, 2021
@AndyAyersMS
Copy link
Member

I think the fix is that to consider an indir tree invariant we must also show that (the right kind of) memory is also invariant in the loop, eg:

iff --git a/src/coreclr/jit/optimizer.cpp b/src/coreclr/jit/optimizer.cpp
index a7df5b95905..f4be1d3f70f 100644
--- a/src/coreclr/jit/optimizer.cpp
+++ b/src/coreclr/jit/optimizer.cpp
@@ -6102,6 +6102,24 @@ void Compiler::optHoistLoopBlocks(unsigned loopNum, ArrayStack<BasicBlock*>* blo
                 {
                     return false;
                 }
+
+                // For indirections, memory must also be invariant in the loop.
+                //
+                if (tree->OperIsIndir())
+                {
+                    BasicBlock* const loopEntry = m_compiler->optLoopTable[m_loopNum].lpEntry;
+
+                    // Todo: use knowledge of where address points to avoid considering some memory kinds
+                    //
+                    for (MemoryKind memoryKind : allMemoryKinds())
+                    {
+                        ValueNum loopMemoryVN = m_compiler->GetMemoryPerSsaData(loopEntry->bbMemorySsaNumIn[memoryKind])->m_vnPair.GetLiberal();
+                        if (!m_compiler->optVNIsLoopInvariant(loopMemoryVN, m_loopNum, &m_hoistContext->m_curLoopVnInvariantCache))
+                        {
+                            return false;
+                        }
+                    }
+                }
             }

@AndyAyersMS
Copy link
Member

The change above (which is in IsTreeVNInvariant) fixes the 3 cases here and has just one SPMI diff.

          -8 (-2.07% of base) : 219751.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger

@jakobbotsch
Copy link
Member Author

That fix sounds much better conceptually, thanks @AndyAyersMS. I think we need to do the same for GT_CLS_VAR and GT_OBJ as well. I'll verify that it fixes the other examples I've collected and update #55081 with it.

@jakobbotsch
Copy link
Member Author

Hmm, I'm seeing quite a few SPMI diffs with that patch applied. For example:

diff --git "a/.\\base\\10172.dasm" "b/.\\diff\\10172.dasm"
index 6100b53..44568ad 100644
--- "a/.\\base\\10172.dasm"
+++ "b/.\\diff\\10172.dasm"
@@ -1,206 +1,187 @@
 ; Assembly listing for method System.Net.Primitives.Tests.CredentialCacheTests:CreateCredentialCache(int,int):System.Net.CredentialCache
 ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
 ; optimized code
 ; rsp based frame
 ; partially interruptible
 ; No matching PGO data
 ; 0 inlinees with PGO data; 4 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 arg0         [V00,T14] (  4,  7   )     int  ->  rsi        
-;  V01 arg1         [V01,T15] (  4,  7   )     int  ->  rdi        
+;  V00 arg0         [V00,T13] (  4,  7   )     int  ->  rsi        
+;  V01 arg1         [V01,T14] (  4,  7   )     int  ->  rdi        
 ;  V02 loc0         [V02,T12] (  4, 10   )     ref  ->  rbx         class-hnd exact
 ;  V03 loc1         [V03,T01] (  5, 17   )     int  ->  rbp         ld-addr-op
 ;* V04 loc2         [V04    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact
-;  V05 loc3         [V05,T02] (  5, 17   )     int  ->  r14         ld-addr-op
-;  V06 loc4         [V06,T16] (  2,  8   )     ref  ->  rbp         class-hnd
+;  V05 loc3         [V05,T02] (  5, 17   )     int  ->  rsi         ld-addr-op
+;  V06 loc4         [V06,T15] (  2,  8   )     ref  ->  rbp         class-hnd
 ;  V07 OutArgs      [V07    ] (  1,  1   )  lclBlk (40) [rsp+00H]   "OutgoingArgSpace"
-;  V08 tmp1         [V08,T19] (  2,  4   )     ref  ->  rbx         class-hnd exact "NewObj constructor temp"
-;  V09 tmp2         [V09,T00] (  3, 24   )     ref  ->  r12         class-hnd exact "NewObj constructor temp"
+;  V08 tmp1         [V08,T16] (  2,  4   )     ref  ->  rbx         class-hnd exact "NewObj constructor temp"
+;  V09 tmp2         [V09,T00] (  3, 24   )     ref  ->  r14         class-hnd exact "NewObj constructor temp"
 ;  V10 tmp3         [V10,T03] (  2, 16   )     ref  ->  rdx         "argument with side effect"
-;  V11 tmp4         [V11,T04] (  2, 16   )     ref  ->  r13         "argument with side effect"
+;  V11 tmp4         [V11,T04] (  2, 16   )     ref  ->  r15         "argument with side effect"
 ;  V12 tmp5         [V12,T05] (  2, 16   )     ref  ->  rdx         "argument with side effect"
 ;  V13 tmp6         [V13,T06] (  2, 16   )     ref  ->   r9         "argument with side effect"
-;  V14 tmp7         [V14,T07] (  2, 16   )     ref  ->  r13         "argument with side effect"
+;  V14 tmp7         [V14,T07] (  2, 16   )     ref  ->  r15         "argument with side effect"
 ;  V15 tmp8         [V15,T08] (  2, 16   )     ref  ->  rdx         "argument with side effect"
 ;  V16 tmp9         [V16,T09] (  2, 16   )     ref  ->  rbp         "argument with side effect"
 ;  V17 tmp10        [V17,T10] (  2, 16   )     ref  ->   r9         "argument with side effect"
-;  V18 tmp11        [V18,T11] (  2, 16   )     ref  ->  r12         "argument with side effect"
-;  V19 cse0         [V19,T13] (  4, 10   )     ref  ->  r15         "CSE - aggressive"
-;  V20 cse1         [V20,T17] (  2,  5   )     ref  ->  r14         "CSE - moderate"
-;  V21 cse2         [V21,T18] (  2,  5   )     ref  ->  rsi         "CSE - moderate"
+;  V18 tmp11        [V18,T11] (  2, 16   )     ref  ->  r14         "argument with side effect"
 ;
 ; Lcl frame size = 40
 
 G_M38214_IG01:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
        push     r15
        push     r14
-       push     r13
-       push     r12
        push     rdi
        push     rsi
        push     rbp
        push     rbx
        sub      rsp, 40
        mov      esi, ecx
        mov      edi, edx
-						;; bbWeight=1    PerfScore 8.75
+						;; bbWeight=1    PerfScore 6.75
 G_M38214_IG02:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
        mov      rcx, 0xD1FFAB1E
        call     CORINFO_HELP_NEWSFAST
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        mov      rbx, rax
        ; gcrRegs +[rbx]
        xor      ebp, ebp
        test     esi, esi
        jle      G_M38214_IG04
-       mov      rcx, 0xD1FFAB1E
-       mov      r14, gword ptr [rcx]
-       ; gcrRegs +[r14]
-       mov      rcx, 0xD1FFAB1E
-       mov      r15, gword ptr [rcx]
-       ; gcrRegs +[r15]
-						;; bbWeight=1    PerfScore 7.50
-G_M38214_IG03:        ; gcrefRegs=0000C008 {rbx r14 r15}, byrefRegs=00000000 {}, byref, isz
+						;; bbWeight=1    PerfScore 3.00
+G_M38214_IG03:        ; gcrefRegs=00000008 {rbx}, byrefRegs=00000000 {}, byref, isz
        ; gcrRegs -[rax]
        mov      rcx, 0xD1FFAB1E
        call     CORINFO_HELP_NEWSFAST
        ; gcrRegs +[rax]
        ; gcr arg pop 0
-       mov      r12, rax
-       ; gcrRegs +[r12]
-       mov      r13, r14
-       ; gcrRegs +[r13]
+       mov      r14, rax
+       ; gcrRegs +[r14]
+       mov      rcx, 0xD1FFAB1E
+       mov      r15, gword ptr [rcx]
+       ; gcrRegs +[r15]
        mov      ecx, ebp
        call     System.Number:Int32ToDecStr(int):System.String
        ; gcr arg pop 0
        mov      rdx, rax
        ; gcrRegs +[rdx]
-       mov      rcx, r13
+       mov      rcx, r15
        ; gcrRegs +[rcx]
        call     System.String:Concat(System.String,System.String):System.String
-       ; gcrRegs -[rcx rdx r13]
+       ; gcrRegs -[rcx rdx r15]
        ; gcr arg pop 0
        mov      rdx, rax
        ; gcrRegs +[rdx]
-       mov      rcx, r12
+       mov      rcx, r14
        ; gcrRegs +[rcx]
        call     System.Uri:.ctor(System.String):this
        ; gcrRegs -[rax rcx rdx]
        ; gcr arg pop 0
-       mov      r13, r15
-       ; gcrRegs +[r13]
+       mov      rcx, 0xD1FFAB1E
+       mov      r15, gword ptr [rcx]
+       ; gcrRegs +[r15]
        mov      rcx, 0xD1FFAB1E
        mov      edx, 553
        call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
        ; gcr arg pop 0
        mov      r9, 0xD1FFAB1E
        mov      r9, gword ptr [r9]
        ; gcrRegs +[r9]
-       mov      r8, r13
+       mov      r8, r15
        ; gcrRegs +[r8]
-       mov      rdx, r12
+       mov      rdx, r14
        ; gcrRegs +[rdx]
        mov      rcx, rbx
        ; gcrRegs +[rcx]
        call     System.Net.CredentialCache:Add()
-       ; gcrRegs -[rcx rdx r8-r9 r12-r13]
+       ; gcrRegs -[rcx rdx r8-r9 r14-r15]
        ; gcr arg pop 0
        inc      ebp
        cmp      ebp, esi
        jl       SHORT G_M38214_IG03
-						;; bbWeight=4    PerfScore 53.00
+						;; bbWeight=4    PerfScore 69.00
 G_M38214_IG04:        ; gcrefRegs=00000008 {rbx}, byrefRegs=00000000 {}, byref, isz
-       ; gcrRegs -[r14-r15]
-       xor      r14d, r14d
+       xor      esi, esi
        test     edi, edi
        jle      SHORT G_M38214_IG06
+						;; bbWeight=1    PerfScore 1.50
+G_M38214_IG05:        ; gcrefRegs=00000008 {rbx}, byrefRegs=00000000 {}, byref, isz
        mov      rcx, 0xD1FFAB1E
-       mov      rsi, gword ptr [rcx]
-       ; gcrRegs +[rsi]
-       mov      rcx, 0xD1FFAB1E
-       mov      r15, gword ptr [rcx]
-       ; gcrRegs +[r15]
-						;; bbWeight=1    PerfScore 6.00
-G_M38214_IG05:        ; gcrefRegs=00008048 {rbx rsi r15}, byrefRegs=00000000 {}, byref, isz
-       mov      rbp, rsi
+       mov      rbp, gword ptr [rcx]
        ; gcrRegs +[rbp]
-       mov      ecx, r14d
+       mov      ecx, esi
        call     System.Number:Int32ToDecStr(int):System.String
        ; gcrRegs +[rax]
        ; gcr arg pop 0
        mov      rdx, rax
        ; gcrRegs +[rdx]
        mov      rcx, rbp
        ; gcrRegs +[rcx]
        call     System.String:Concat(System.String,System.String):System.String
        ; gcrRegs -[rcx rdx rbp]
        ; gcr arg pop 0
        mov      rbp, rax
        ; gcrRegs +[rbp]
-       mov      r12, r15
-       ; gcrRegs +[r12]
+       mov      rcx, 0xD1FFAB1E
+       mov      r14, gword ptr [rcx]
+       ; gcrRegs +[r14]
        mov      rcx, 0xD1FFAB1E
        mov      edx, 553
        call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
        ; gcrRegs -[rax]
        ; gcr arg pop 0
        mov      r9, 0xD1FFAB1E
        mov      r9, gword ptr [r9]
        ; gcrRegs +[r9]
        mov      gword ptr [rsp+20H], r9
        ; gcr arg write
-       mov      r9, r12
+       mov      r9, r14
        mov      rcx, rbx
        ; gcrRegs +[rcx]
        mov      rdx, rbp
        ; gcrRegs +[rdx]
        mov      r8d, 80
        call     System.Net.CredentialCache:Add()
-       ; gcrRegs -[rcx rdx rbp r9 r12]
+       ; gcrRegs -[rcx rdx rbp r9 r14]
        ; gcr arg pop 0
-       inc      r14d
-       cmp      r14d, edi
+       inc      esi
+       cmp      esi, edi
        jl       SHORT G_M38214_IG05
-						;; bbWeight=4    PerfScore 47.00
+						;; bbWeight=4    PerfScore 63.00
 G_M38214_IG06:        ; gcrefRegs=00000008 {rbx}, byrefRegs=00000000 {}, byref
-       ; gcrRegs -[rsi r15]
        mov      rax, rbx
        ; gcrRegs +[rax]
 						;; bbWeight=1    PerfScore 0.25
 G_M38214_IG07:        ; , epilog, nogc, extend
        add      rsp, 40
        pop      rbx
        pop      rbp
        pop      rsi
        pop      rdi
-       pop      r12
-       pop      r13
        pop      r14
        pop      r15
        ret      
-						;; bbWeight=1    PerfScore 5.25
+						;; bbWeight=1    PerfScore 4.25
 
-; Total bytes of code 327, prolog size 16, PerfScore 160.45, instruction count 86, allocated bytes for code 327 (MethodHash=ed236ab9) for method System.Net.Primitives.Tests.CredentialCacheTests:CreateCredentialCache(int,int):System.Net.CredentialCache
+; Total bytes of code 303, prolog size 12, PerfScore 178.05, instruction count 78, allocated bytes for code 303 (MethodHash=ed236ab9) for method System.Net.Primitives.Tests.CredentialCacheTests:CreateCredentialCache(int,int):System.Net.CredentialCache
 ; ============================================================

That seems expected since the loops have function calls, doesn't it? Did you do something more or know what I'm missing?

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 8, 2021
@jakobbotsch
Copy link
Member Author

Ah, I see now that the patch is only doing it for constant VNs. This is also a problem for InitVal, phis and casts. Here's a test that has more cases:
https://gist.github.com/jakobbotsch/74a751527786c76fc9b7d1e53ffe45d5

@AndyAyersMS
Copy link
Member

Ok, I'll look at the bigger set of cases.

We might also be able to exempt GTF_IND_INVARIANT cases too.

@AndyAyersMS
Copy link
Member

Suspect there's a broader problem where we can lose track of the (implicit) memory dependence. That is, if VN can deduce an in-loop indir has a relatively simple VN, and then that VN propagates to other computations later in the loop yielding more relatively simple VNs, those second-level VNs won't be attached to scary-looking IR nodes and changes like the one I came up with will fail to detect the memory dependence. So if those second-level IR nodes look like attractive hosting candidates we could be in trouble.

We might get lucky for now depending on whether trees are simplified, etc.

@AndyAyersMS
Copy link
Member

Here's a more general version, it causes quite a few diffs, but fixes the all the cases in the more involved test:

        bool IsTreeVNInvariant(GenTree* tree)
        {
            // The value number must itself be loop invariant.
            //
            ValueNum vn = tree->gtVNPair.GetLiberal();
            if (!m_compiler->optVNIsLoopInvariant(vn, m_loopNum, &m_hoistContext->m_curLoopVnInvariantCache))
            {
                return false;
            }

            // If the operation accesses memory, the memory must also be loop invariant.
            // Exempt indirs that are known to be invariant.
            //
            if (tree->OperIs(GT_IND) && (tree->gtFlags & GTF_IND_INVARIANT) != 0)
            {
                return true;
            }

            // Otherwise check that the state of memory at loop entry is defined outside the loop.
            //
            if ((tree->gtFlags & GTF_GLOB_REF) != 0)
            {
                BasicBlock* const loopEntry = m_compiler->optLoopTable[m_loopNum].lpEntry;

                // Todo: use knowledge of where tree points to avoid considering some memory kinds
                //
                for (MemoryKind memoryKind : allMemoryKinds())
                {
                    ValueNum loopMemoryVN = m_compiler->GetMemoryPerSsaData(loopEntry->bbMemorySsaNumIn[memoryKind])->m_vnPair.GetLiberal();
                    if (!m_compiler->optVNIsLoopInvariant(loopMemoryVN, m_loopNum, &m_hoistContext->m_curLoopVnInvariantCache))
                    {
                        return false;
                    }
                }
            }

            return true;
        }

@jakobbotsch
Copy link
Member Author

I tried it in https://github.com/dotnet/runtime/compare/main...jakobbotsch:hoist-opt-trees?expand=1.
However, there are other trees where this is a problem e.g. casts/phis/initial values (see test case above). So I wasn't sure if it was the right way to fix it.

@AndyAyersMS
Copy link
Member

@briansull as Jakob noted the general problem is that there's a tree with a VN that is loop invariant, but the tree will not evaluate to the same VN if it's hoisted out of the loop.

I don't think we can const/copy prop all these cases away, nor should we have one optimization depend on another to avoid unsafe behavior. Hoisting needs to be able to prove that the tree will have the same value once hoisted. And it seemingly can't do that without more information.

@jakobbotsch
Copy link
Member Author

To clarify a bit, the fix above does not actually rely on const/copy prop to clean things up. Rather, it recognizes when we are hoisting a constant by liberal VN and creates the constant directly. So instead of hoisting out an indirection it hoists out a constant node. Similarly, it tries to do the same for InitVal and phis, creating loads of local variables. I'm not sure if this part works correctly as I'm not sure if loading locals outside the loop will give the right values. Also, the fix has some weird side effects, for example sometimes we hoist the constant out only to end up spilling it and reloading it from the stack in the loop.

The current reason that things do not get cleaned by const prop/copy prop seems to be that in these cases we end up with conservative/liberal VNs that do not match, and const/copy prop works on conservative VNs. Which brings me to a couple of questions:

  1. I get that the liberal/conservative VN split is meant to ensure that we do not break type safety/memory safety in multithreaded scenarios. However I still don't understand when const prop would be unsafe with liberal VNs. Is there a simple example where things go wrong?
  2. If const prop cannot work with liberal VNs, then is my fix above even ok? It seems weird that hoisting should be allowed to optimize an indirection to a constant based on liberal VNs while const propagation is not allowed to.

@briansull
Copy link
Contributor

An indirection that reads a known constant from memory will get different ValueNumbers for liberal/conservative***. A simple constant node will (or should) have the same value for liberal/conservative ValueNumbers.

If you create a new GT_CONST node because you recognize that the liberal ValueNumber is a constant, you also should update the ValueNumberPair so that both liberal/conservative have the constant ValueNumber. That should fix the const prop issues that you saw.

*** - The reason for this behavior is that the liberal ValueNumber is allowed to assume that the method executes single-threaded, where as the conservative ValueNumber considers that a different thread can modify the memory being read, so that each indirection gets a different value number.

@briansull
Copy link
Contributor

briansull commented Jul 15, 2021

@AndyAyersMS I agree that we need to insure correctness when hoisting loop-invariant indirections.
We probably need to pessimize the current hoisting logic for correctness. But I haven't yet looked into the side table stuff that you are working on.

If the indirection loads a known constant we can transform the hoisted node into a constant and try to make it into a CSE. Another alternative would be to add a pass that goes through all the nodes and replaces the ones that have a constant ValueNumber with their actual constant, thus removing all of the indirections. This would also allow some additional folding/tree reordering optimizations to occur.

@AndyAyersMS
Copy link
Member

Here's what I have been working on: main...AndyAyersMS:FixLicmMemoryDependence

I think the optimizer side captures what I was thinking of pretty well. It should give the "right" level of dependence for loop hoisting. It's not fine-grained enough for other kinds of code motion; that's ok for now.

But I'm not super happy with the VN side. I was thinking of revising it to make clearer distinctions between reads and writes of memory, so that there aren't so many call site nullptrs floating about. Or maybe rely more on ambient state (we use compCurBB this way currently) -- we could do something similar for the tree we're trying to tag, so we don't have to pass it around everywhere.

It also seems fragile; if we miss one update on the VN side we'll draw the wrong conclusion and possibly do an invalid hoist, and all parties need to implicitly agree on which trees read memory (or at least read memory that can be disambiguated the way VN does it).

One other angle to pursue here is to see how far back this bug goes; my guess is it has been there for a while now. Perhaps that reduces the urgency of a fix...?

@briansull
Copy link
Contributor

briansull commented Jul 15, 2021

I don't think that I implemented any of this hoisting code
I think this was introduced by Joe, in this change that you pointed at: dotnet/coreclr#6747

@jakobbotsch
Copy link
Member Author

@briansull

If you create a new GT_CONST node because you recognize that the liberal ValueNumber is a constant, you also should update the ValueNumberPair so that both liberal/conservative have the constant ValueNumber. That should fix the const prop issues that you saw.

The code does give the same VN number for both numbers of the pair, I didn't have any const prop issues. But I'm confused why const prop needs to use the conservative number instead of the liberal number, and why we should be allowed to use the liberal number in hoisting to do this optimization if const prop cannot. Is there a simple example where const prop would break memory safety if we allowed it to use liberal VNs? Or is it simply that we don't want to do the optimization because users might rely on current behavior in multi-threaded scenarios?

If the indirection loads a known constant we can transform the hoisted node into a constant and try to make it into a CSE. Another alternative would be to add a pass that goes through all the nodes and replaces the ones that have a constant ValueNumber with their actual constant, thus removing all of the indirections. This would also allow some additional folding/tree reordering optimizations to occur.

This does not cover the other cases that fail. For example:

[MethodImpl(MethodImplOptions.NoInlining)]
static bool TestPhiArr()
{
    int[] arr = { -1 };
    int val = -1;
    for (int i = 0; i < 2; i++)
    {
        for (int j = 0; j < 2; j++)
        {
            arr[0] = i;
            val = arr[0];
        }
    }

    return val == 1;
}

@AndyAyersMS

One other angle to pursue here is to see how far back this bug goes; my guess is it has been there for a while now. Perhaps that reduces the urgency of a fix...?

In .NET core 1.0 all the cases pass. in .NET core 1.1 the byref cases start to fail. In .NET core 2.0 the cls var cases (except the constant one, covered by dotnet/coreclr#26952) start failing. And finally in .NET 5, the array ones start failing. So it seems like a pretty old problem at this point.

@AndyAyersMS
Copy link
Member

Using an ambient tree (compCurTree) for VN greatly simplifies the plumbing needed, but there's another issue -- VNF_MapStore VNs do not record their loop dependence; they go in the CEA_Func3 chunks which do not support recording loops.

This seems fixable, but possibly requires a new range of VN chunks just for VNF_MapStore, unless I can find some clever way to share the CEA_None chunk used by the opaque VNs...

@AndyAyersMS
Copy link
Member

I suppose another option is to put VNF_MapStore into CEA_Func4 and use the extra arg to record the loop number. But this might cause other problems.

@AndyAyersMS
Copy link
Member

FWIW here's the latest

main...AndyAyersMS:FixLicmMemoryDependenceAmbientVnTree

As you can see the plumbing in VN is minimal.

It works on cases where the loop-dependent memory update is the first update in the loop... so most of the tests above. More complex patterns (with multiple disambiguable updates) or with statics aren't fixed yet because of the MapStore issue above.

@briansull
Copy link
Contributor

@AndyAyersMS The latest changes look good. It is best to keep the VN changes as small as possible.

@briansull briansull changed the title Invalid hoisting with refs/array elements proven to be constant Invalid hoisting with refs/array elements proven to be loop invariant Jul 16, 2021
@briansull briansull changed the title Invalid hoisting with refs/array elements proven to be loop invariant Invalid hoisting of indirections proven to be loop invariant Jul 16, 2021
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Jul 16, 2021
Use a unary function to capture opaque VN loop dependence, rather than factoring
that out as part of the owning chunk. As a result the VN chunk data no longer
has a per-loop component (no dependence on MAX_LOOP_NUM).

This gives us the flexibility to address dotnet#54118 by doing something similar for
`MapUpdate` VNs.
@AndyAyersMS
Copy link
Member

I've updated my fix

main...AndyAyersMS:FixLicmMemoryDependenceAmbientVnTree

to build upon #55853; it passes all the test cases here with minimal diffs per SPMI (5 methods total with diffs). Will put up a PR for this once #55853 is in and I've added the test cases.

AndyAyersMS added a commit that referenced this issue Jul 19, 2021
Use a unary function to capture opaque VN loop dependence, rather than factoring
that out as part of the owning chunk. As a result the VN chunk data no longer
has a per-loop component (no dependence on MAX_LOOP_NUM).

This gives us the flexibility to address #54118 by doing something similar for
`MapUpdate` VNs.
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Jul 19, 2021
Leverage value numbering's alias analysis to annotate trees with the loop
memory dependence of the tree's value number.

First, refactor the `mapStore` value number so that it also tracks the loop
number where the store occurs. This is done via an extra non-value-num arg,
so add appropriate bypasses to logic in the jit that expect to only find
value number args. Also update the dumping to display the loop information.

Next, during VN computation, record loop memory dependence from `mapStores`
with the tree currently being value numbered, whenever a value number comes
from a particular map. There may be multiple such recording events per tree,
so add logic on the recording side to track the most constraining dependence.
Note value numbering happens in execution order, so there is an unambiguous
current tree being value numbered.

This dependence info is tracked via a side map.

Finally, during hoisting, for each potentially hoistable tree, consult the side
map to recover the loop memory dependence of a tree, and if that dependence is
at or within the loop that we're hoisting from, block the hoist.

I've also absorbed the former class var (static field) hosting exclusion into
this new logic. This gives us slightly more relaxed dependence in some cases.

Resolves dotnet#54118.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 19, 2021
AndyAyersMS added a commit that referenced this issue Jul 22, 2021
…5936)

Leverage value numbering's alias analysis to annotate trees with the loop
memory dependence of the tree's value number.

First, refactor the `mapStore` value number so that it also tracks the loop
number where the store occurs. This is done via an extra non-value-num arg,
so add appropriate bypasses to logic in the jit that expect to only find
value number args. Also update the dumping to display the loop information.

Next, during VN computation, record loop memory dependence from `mapStores`
with the tree currently being value numbered, whenever a value number comes
from a particular map. There may be multiple such recording events per tree,
so add logic on the recording side to track the most constraining dependence.
Note value numbering happens in execution order, so there is an unambiguous
current tree being value numbered.

This dependence info is tracked via a side map.

Finally, during hoisting, for each potentially hoistable tree, consult the side
map to recover the loop memory dependence of a tree, and if that dependence is
at or within the loop that we're hoisting from, block the hoist.

I've also absorbed the former class var (static field) hosting exclusion into
this new logic. This gives us slightly more relaxed dependence in some cases.

Resolves #54118.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 22, 2021
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Jul 27, 2021
If a loop is removed (because of unrolling) then the loop dependence
tracking introduced in dotnet#55936 and dotnet#56184 may not properly update.

So when a loop is removed, walk up the chain of parent loops looking
for one that is not removed, and record the dependence on that parent.

Addresses last part of dotnet#54118.
AndyAyersMS added a commit that referenced this issue Jul 28, 2021
…56436)

If a loop is removed (because of unrolling) then the loop dependence
tracking introduced in #55936 and #56184 may not properly update.

So when a loop is removed, walk up the chain of parent loops looking
for one that is not removed, and record the dependence on that parent.

Addresses last part of #54118.
@ghost ghost locked as resolved and limited conversation to collaborators Aug 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
5 participants