Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: track memory loop dependence of trees during value numbering #55936

Merged

Conversation

AndyAyersMS
Copy link
Member

Leverage value numbering's alias analysis to annotate trees with the loop
memory dependence of the tree's value number.

First, refactor the mapStore value number so that it also tracks the loop
number where the store occurs. This is done via an extra non-value-num arg,
so add appropriate bypasses to logic in the jit that expect to only find
value number args. Also update the dumping to display the loop information.

Next, during VN computation, record loop memory dependence from mapStores
with the tree currently being value numbered, whenever a value number comes
from a particular map. There may be multiple such recording events per tree,
so add logic on the recording side to track the most constraining dependence.
Note value numbering happens in execution order, so there is an unambiguous
current tree being value numbered.

This dependence info is tracked via a side map.

Finally, during hoisting, for each potentially hoistable tree, consult the side
map to recover the loop memory dependence of a tree, and if that dependence is
at or within the loop that we're hoisting from, block the hoist.

I've also absorbed the former class var (static field) hosting exclusion into
this new logic. This gives us slightly more relaxed dependence in some cases.

Resolves #54118.

Leverage value numbering's alias analysis to annotate trees with the loop
memory dependence of the tree's value number.

First, refactor the `mapStore` value number so that it also tracks the loop
number where the store occurs. This is done via an extra non-value-num arg,
so add appropriate bypasses to logic in the jit that expect to only find
value number args. Also update the dumping to display the loop information.

Next, during VN computation, record loop memory dependence from `mapStores`
with the tree currently being value numbered, whenever a value number comes
from a particular map. There may be multiple such recording events per tree,
so add logic on the recording side to track the most constraining dependence.
Note value numbering happens in execution order, so there is an unambiguous
current tree being value numbered.

This dependence info is tracked via a side map.

Finally, during hoisting, for each potentially hoistable tree, consult the side
map to recover the loop memory dependence of a tree, and if that dependence is
at or within the loop that we're hoisting from, block the hoist.

I've also absorbed the former class var (static field) hosting exclusion into
this new logic. This gives us slightly more relaxed dependence in some cases.

Resolves dotnet#54118.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 19, 2021
@AndyAyersMS
Copy link
Member Author

@briansull @jakobbotsch PTAL
cc @dotnet/jit-contrib

Passes the more elaborate test case from #54118 (added here).

5 methods with SPMI diffs (not counting the new test). Will say more about them in a follow-up note a bit later today.

benchmarks.run.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 407
Total bytes of diff: 399
Total bytes of delta: -8 (-1.97% of base)
Total relative delta: -0.02
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -8 : 26074.dasm (-1.97% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -8 (-1.97% of base) : 26074.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger

Top method improvements (percentages):
          -8 (-1.97% of base) : 26074.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger

1 total methods with Code Size differences (1 improved, 0 regressed), 0 unchanged.


coreclr_tests.pmi.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 2649
Total bytes of diff: 2638
Total bytes of delta: -11 (-0.42% of base)
Total relative delta: -0.07
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file regressions (bytes):
           3 : 86555.dasm (1.76% of base)
           2 : 85392.dasm (0.99% of base)

Top file improvements (bytes):
          -8 : 219848.dasm (-1.97% of base)
          -5 : 86566.dasm (-6.33% of base)
          -3 : 86074.dasm (-1.68% of base)

5 total files with Code Size differences (3 improved, 2 regressed), 1 unchanged.

Top method regressions (bytes):
           3 ( 1.76% of base) : 86555.dasm - T:Main():int
           2 ( 0.99% of base) : 85392.dasm - T:Main():int

Top method improvements (bytes):
          -8 (-1.97% of base) : 219848.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger
          -5 (-6.33% of base) : 86566.dasm - test:Main():int
          -3 (-1.68% of base) : 86074.dasm - ILGEN_0x981b6a55:Method_0x9d35bca7():float

Top method regressions (percentages):
           3 ( 1.76% of base) : 86555.dasm - T:Main():int
           2 ( 0.99% of base) : 85392.dasm - T:Main():int

Top method improvements (percentages):
          -5 (-6.33% of base) : 86566.dasm - test:Main():int
          -8 (-1.97% of base) : 219848.dasm - V8.Crypto.RSAKey:pkcs1pad2(System.String,int):V8.Crypto.BigInteger
          -3 (-1.68% of base) : 86074.dasm - ILGEN_0x981b6a55:Method_0x9d35bca7():float

5 total methods with Code Size differences (3 improved, 2 regressed), 1 unchanged.


@AndyAyersMS
Copy link
Member Author

;; 86566 (source unknown) -- we now hoist load of a static var out of a loop. Since there are no opaque memory accesses / synchronization points in the loop, this seems ok. Similar diffs in 85392.

BeforeAfter
G_M57269_IG02:
       mov      dword ptr [CLASS-VAR], 5
       align    [13 bytes]
 
 
G_M57269_IG03:
       mov      eax, dword ptr [CLASS-VAR]
       mov      edx, eax
       inc      ecx
       jne      SHORT G_M57269_IG05
       movsxd   rax, eax
       mov      esi, dword ptr [rax]
  
G_M57269_IG05:
       cmp      ecx, 7
       jl       SHORT G_M57269_IG03
G_M57269_IG02:
       mov      dword ptr [CLASS-VAR], 5
       mov      eax, dword ptr [CLASS-VAR]
       align    [7 bytes]
  
G_M57269_IG03:
       mov      edx, eax
       inc      ecx
       jne      SHORT G_M57269_IG05
       movsxd   r8, eax
       mov      esi, dword ptr [r8]
  
  
G_M57269_IG05:
       cmp      ecx, 7
       jl       SHORT G_M57269_IG03

;; 219848 -- V8.Crypto.RSAKey:pkcs1pad2

We previously moved a read past a write in a fully unrolled loop, now we don't.

BeforeAfter
       movzx    r15, byte  ptr [rbx+16]
 
G_M31322_IG07:
       mov      byte  ptr [rbx+16], 0
       test     r15d, r15d
       jne      SHORT G_M31322_IG11
 
 
G_M31322_IG07:
       mov      byte  ptr [rbx+16], 0
       cmp      byte  ptr [rbx+16], 0
       jne      SHORT G_M31322_IG11

@AndyAyersMS
Copy link
Member Author

x86 test failure seems unrelated; looks like an instance of #54469.

tracing\\eventpipe\\reverseouter\\reverseouter\\reverseouter.cmd

          7.3s: Assert failure(PID 5424 [0x00001530], Thread: 3364 [0x0d24]): Consistency check failed: AV in clr at this callstack:
          7.3s: ------
          7.3s: CORECLR! AllocMemTracker::~AllocMemTracker + 0xFE (0x70a3a1d5)
          7.3s: CORECLR! IBCLoggerAwareAllocMemTracker::~IBCLoggerAwareAllocMemTracker + 0xDC (0x707b0799)
          7.3s: CORECLR! ClassLoader::DoIncrementalLoad + 0x437 (0x70576fda)
          7.3s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey_Body + 0x894 (0x7057fd27)
          7.3s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey + 0x19A (0x7057ef6b)
          7.3s: CORECLR! ClassLoader::LoadConstructedTypeThrowing + 0x63D (0x7057bb38)
          7.3s: CORECLR! ClassLoader::LoadGenericInstantiationThrowing + 0x4DF (0x7057c4e1)
          7.3s: CORECLR! SigPointer::GetTypeHandleThrowing + 0xE92 (0x706aedf5)

@AndyAyersMS AndyAyersMS requested a review from briansull July 19, 2021 19:46
// The map provides the entry block of the most closely enclosing loop that
// defines the memory region accessed when defining the nodes's VN.
//
// This information should consulted when considering hoisting node out of a loop, as the VN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should => should be

//
bool IsTreeLoopMemoryInvariant(GenTree* tree)
{
if (tree->OperIsIndir() && ((tree->gtFlags & GTF_IND_INVARIANT) != 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a comment, stating that we early out returning true for any GT_IND marked as Invariant

Copy link
Member Author

@AndyAyersMS AndyAyersMS Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we probably don't need this bail out as we should be recording that invariant VNs indirs are dependent on an invariant memory state. So I'll likely just delete this and verify we still get the same results.

return true;
}

// Todo: other operators that read memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to have a ToDo here, then I think that we have a convention on how we write ToDo's

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a vestige from an earlier version where VN wasn't annotating all trees. But now it does.

So here I think we can just check all trees. However I believe calls are handled specially during hoisting, so we may still need an early bail-out for calls.

@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib it might be interesting for more of you to look this one over, given our ongoing discussions of VN.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@AndyAyersMS
Copy link
Member Author

Forgot to mention that the revised version has the same set of SPMI diffs as the first version.

@AndyAyersMS AndyAyersMS merged commit 1b4f786 into dotnet:main Jul 22, 2021
@AndyAyersMS AndyAyersMS deleted the FixLicmMemoryDependenceAmbientVnTree branch July 22, 2021 01:03
@jakobbotsch
Copy link
Member

I just rechecked my examples with newest fixes from master and it looks like there are still some unhandled cases:

// Generated by Fuzzlyn v1.2 on 2021-07-22 13:37:14
// Seed: 14815563263006255362
// Reduced from 12.9 KiB to 0.4 KiB in 00:00:17
// Debug: Outputs 1
// Release: Outputs 0
public class Program
{
    static short[] s_2;
    public static void Main()
    {
        byte[] vr7 = new byte[]{0};
        bool vr11 = default(bool);
        for (int vr9 = 0; vr9 < 2; vr9++)
        {
            if (vr11)
            {
                s_2[0] = 0;
            }

            vr7[0] = 1;
            byte vr10 = vr7[0];
            System.Console.WriteLine(vr10);
        }
    }
}

Looks like they all involve some form of control flow before the pattern.

@AndyAyersMS
Copy link
Member Author

Thanks, will take a look.

@AndyAyersMS
Copy link
Member Author

Seems like we need to add similar loop dependence tracking to VNF_PhiDef.

int[] arr = { -1 };
ref int r = ref arr[0];
int val = -1;
for (int i = 0; i < 2; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test loop-dependent VN, these tests all depend on these loops remaining as loops. But Checked JIT loop unrolling under stress will probably fully unroll them. And what if we change the loop unrolling heuristics? Maybe the upper bound should be make an argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #56184

@@ -2136,7 +2136,7 @@ ValueNum ValueNumStore::VNForFunc(
assert(arg0VN == VNNormalValue(arg0VN));
assert(arg1VN == VNNormalValue(arg1VN));
assert(arg2VN == VNNormalValue(arg2VN));
assert(arg3VN == VNNormalValue(arg3VN));
assert((func == VNF_MapStore) || (arg3VN == VNNormalValue(arg3VN)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above: "// Note: Currently the only four operand func is the VNF_PtrToArrElem operation" is no longer true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in #56184.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jul 22, 2021
Specify `Overwrite` when setting loop dependence map entries, as we may
refine the initial result.

Fixes dotnet#56174.

Extract loop dependence of `VNF_PhiMemoryDef`.

Fixes new case noted in dotnet#55936, and 13/16 or so other cases Jakob sent
me privately. Also update a comment and fix tests to work better with
jitstress per other notes on that PR.
AndyAyersMS added a commit that referenced this pull request Jul 27, 2021
Specify `Overwrite` when setting loop dependence map entries, as we may
refine the initial result.

Fixes #56174.

Extract loop dependence of `VNF_PhiMemoryDef`.

Fixes new case noted in #55936, and 13/16 or so other cases Jakob sent
me privately. Also update a comment and fix tests to work better with
jitstress per other notes on that PR.
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jul 27, 2021
If a loop is removed (because of unrolling) then the loop dependence
tracking introduced in dotnet#55936 and dotnet#56184 may not properly update.

So when a loop is removed, walk up the chain of parent loops looking
for one that is not removed, and record the dependence on that parent.

Addresses last part of dotnet#54118.
AndyAyersMS added a commit that referenced this pull request Jul 28, 2021
…56436)

If a loop is removed (because of unrolling) then the loop dependence
tracking introduced in #55936 and #56184 may not properly update.

So when a loop is removed, walk up the chain of parent loops looking
for one that is not removed, and record the dependence on that parent.

Addresses last part of #54118.
@ghost ghost locked as resolved and limited conversation to collaborators Aug 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid hoisting of indirections proven to be loop invariant
4 participants