Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve struct inits. #52292

Merged
merged 2 commits into from
May 8, 2021
Merged

Improve struct inits. #52292

merged 2 commits into from
May 8, 2021

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented May 5, 2021

There is 1 change:

  1. Keep ASG struct(LCL_VAR, 0) as STORE_LCL_VAR struct(0).
Small positive diffs (spmi x64):

benchmarks.run.windows.x64.checked.mch
Total bytes of delta: -86 (-0.07% of base)
88 total files with Code Size differences (87 improved, 1 regressed), 13 unchanged.

libraries.crossgen.windows.x64.checked.mch
Total bytes of delta: -152 (-0.16% of base)
142 total files with Code Size differences (141 improved, 1 regressed), 26 unchanged.

libraries.crossgen2.windows.x64.checked.mch
Total bytes of delta: -182 (-0.03% of base)
196 total files with Code Size differences (177 improved, 19 regressed), 864 unchanged.

libraries.pmi.windows.x64.checked.mch
Total bytes of delta: -576 (-0.08% of base)
573 total files with Code Size differences (565 improved, 8 regressed), 20 unchanged.

tests.pmi.windows.x64.checked.mch
Total bytes of delta: -11384 (-0.34% of base)
1951 total files with Code Size differences (1926 improved, 25 regressed), 255 unchanged.

a typical improvement looks like:

before:
N001 (  1,  1) [000020] ------------        t20 =    CNS_INT   int    0 $41
               [000201] Dc-----N----       t201 =    LCL_VAR_ADDR byref  V02 tmp2         
                                                  *    ubyte  V02.m_value (offs=0x00) -> V13 tmp13        
                                                  /--*  t201   byref  
                                                  +--*  t20    int    
N003 (  5,  4) [000021] sA----------              *  STORE_BLK struct<System.Data.SqlTypes.SqlBoolean, 1> (init) (Unroll)

codegen:
IN001a: 000070 xor      ecx, ecx
IN001b: 000072 mov      byte  ptr [rsp+30H], cl

after:
N001 (  1,  1) [000020] -c----------        t20 =    CNS_INT   int    0 $41
                                                  /--*  t20    int    
N003 (  5,  4) [000021] DA--G-------              *  STORE_LCL_VAR struct<System.Data.SqlTypes.SqlBoolean, 1>(AX)(P) V02 tmp2 

codegen:
IN0017: 00006D mov      byte  ptr [rsp+30H], 0

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 5, 2021
Copy link
Contributor Author

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @echesakovMSFT @tannergooding , cc @dotnet/jit-contrib


// Check if type1 matches any type from the list.
template <typename... T>
bool TypeIsInList(var_types type1, var_types type2, T... rest)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS was right in #43386 (comment).
I am looking for a better name for this method, could not keep TypeIs because that would be a conflict inside GenTree methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior art here is GenTree::StaticOperIs, I personally do not like that name.

I would be voting for V/varTypeIs (both cases are in use today...).

var_types simdType = location->TypeGet();
GenTree* initVal = assignment->AsOp()->gtOp2;
CorInfoType simdBaseJitType = comp->getBaseJitTypeOfSIMDLocal(location);
if (simdBaseJitType != CORINFO_TYPE_UNDEF)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have never had simdBaseJitType == CORINFO_TYPE_UNDEF on x64, otherwise it would be an assert later on

codegenxarch.cpp:4439:
assert(varTypeUsesFloatReg(targetType) == varTypeUsesFloatReg(op1Type));

from a tree like:

STORE_LCL_VAR simd16 V01
\--* CNS_INT 0

Copy link
Member

@tannergooding tannergooding May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible, but we handle this in register allocation by lying about the type:
https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lsra.cpp#L6995-L7000

    if (simdNode->GetSimdBaseJitType() == CORINFO_TYPE_UNDEF)
    {
        // There are a few scenarios where we can get a LCL_VAR which
        // doesn't know the underlying baseType. In that scenario, we
        // will just lie and say it is a float. Codegen doesn't actually
        // care what the type is but this avoids an assert that would
        // otherwise be fired from the more general checks that happen.
        simdNode->SetSimdBaseJitType(CORINFO_TYPE_FLOAT);
    }

Namely there are a variety of scenarios where the base type might not get copied around for things like SIMD locals and where we lose the class handle and type information.

Ideally, we would preserve this 100% of the time, since its needed by things like CSE, but we don't today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The potential issue was:
ASG SIMD(LCL_VAR , 0) with simdBaseJitType == CORINFO_TYPE_UNDEF comes to rationalize and does not create STORE_LCL_VAR(SIMD(0)) leaving STORE_LCL_VAR(0).
code in lsra does not matter, it won't create a new node because it is after rationalize.
codegenxarch fails.

These lyings could be also deleted now (will push a change soon), maybe it will require to add if ((simdNode->gtSIMDIntrinsicID == SIMDIntrinsicUpperSave) || (simdNode->gtSIMDIntrinsicID != SIMDIntrinsicUpperRestore)) to simdcodegenxarch.cpp:genSIMDIntrinsic as we have this condition for arm64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASG SIMD(LCL_VAR , 0)

Is an interesting tree. In my testing (wrote a checker that looks for all ASG SIMD trees with mismatched types of LHS and RHS and ran it through SPMI), it only comes up when you initobj/initblk a SIMD local, and even then morph transforms it into ASG SIMD(LCL_VAR, SIMD(0)). I am curious to know if you know of other sources because my (WIP) fix for #51500 is to never import such ASGs in the first place.

Copy link
Contributor Author

@sandreenko sandreenko May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this code exists here to catch what morph did not for some reason, for example:

repro-7984.zip
unzip it and run like

"D:\Sergey\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe" D:\Sergey\git\runtime\artifacts\obj\coreclr\windows.x64.Checked\ide\jit\Debug\clrjit.dll(or any jit that you want to use) "repro-7984.mc" -jitoption force AltJit= -jitoption force AltJitNgen= -jitoption force JitDump=*   -jitoption force NgenDump=* 

to see the dump, it will show you that in this case the issue is in fgMorphBlock that does not call morphTree for the trees that it creates, @kunalspathak recently pointed me to another issue caused by the same source.
I have decided not to hack fgMorphBlock anymore to squeeze positive diffs(and support struct enreg), so I am rewriting it now. However, it will take time and for this PR I want to keep this "catch morph failures" block in place.

fgMorphInitBlock: using field by field initialization.
GenTreeNode creates assertion:
               [000079] IA----------              *  ASG       simd12 (init)
In BB01 New Local Constant Assertion: V06 == 0 index=#01, mask=0000000000000001
GenTreeNode creates assertion:
               [000082] -A----------              *  ASG       float
In BB01 New Local Constant Assertion: V07 == 0.000000 index=#02, mask=0000000000000002
fgMorphInitBlock (after):
               [000083] -A---+------              *  COMMA     void
               [000079] IA----------              +--*  ASG       simd12 (init) <- forgot to call fgMorphTree :-( 
               [000077] D------N----              |  +--*  LCL_VAR   simd12<System.Numerics.Vector3> V06 tmp4
               [000078] ------------              |  \--*  CNS_INT   simd12 0
               [000082] -A----------              \--*  ASG       float
               [000080] D------N----                 +--*  LCL_VAR   float  V07 tmp5
               [000081] ------------                 \--*  CNS_DBL   float  0.00000000000000000

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have decided not to hack fgMorphBlock anymore to squeeze positive diffs(and support struct enreg), so I am rewriting it now. However, it will take time.

I believe you 😄. Good luck!

it will show you that in this case the issue is in fgMorphBlock that does not call morphTree for the trees that it creates

Thanks!

}
else if (!src->OperIs(GT_LCL_VAR) || (varDsc->GetLayout()->GetRegisterType() == TYP_UNDEF))
else if (!varDsc->IsEnregisterable())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main goal of this change is to keep ASG(LCL_VAR, 0) as STORE_LCL_VAR(0) without taking the address of the lcl.

@sandreenko sandreenko marked this pull request as ready for review May 5, 2021 16:13
// these implemented over lclVars created by CSE without full handle information (and
// therefore potentially without a base type).
if ((simdNode->gtSIMDIntrinsicID != SIMDIntrinsicUpperSave) &&
if (!TypeIsInList(simdNode->GetSimdBaseType(), TYP_INT, TYP_LONG, TYP_FLOAT, TYP_DOUBLE, TYP_USHORT, TYP_UBYTE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is probably better served by one that does a value >= TYP_BYTE && value <= TYP_ULONG (I think we have an existing helper for this already), since these values are sequentially ordered in the enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the relevant helper is varTypeIsArithmetic, although it also allows TYP_BOOL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

varTypeIsArithmetic is the one that the rest of the SIMD checks use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for TYP_BOOL SIMD intrinsic? why don't they fail nowadays on this check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we filter it out by resolving the actual handle for all generic types during import: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/simd.cpp#L128

If the CorInfoType isn't recognized here, then we return CORINFO_TYPE_UNDEF and that is ultimately filtered out and treated as "unsupported":

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, we have tests (such as https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/General/NotSupported) that covers types like bool and nint/nuint (the latter two of which are only supported on Vector<T> and not yet on Vector64/128/256<T>)

}

const GenTree* op1 = simdNode->gtGetOp1();
if ((simdNode->gtSIMDIntrinsicID == SIMDIntrinsicInit) && (op1->IsIntegralConst(0) || op1->IsFPZero()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have GenTree::IsSIMDZero that should be preferred here, since it will be updated to include the get_Zero HWIntrinsics in the future: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/gentree.h#L7061

@@ -1042,30 +1030,30 @@ bool Compiler::impCheckImplicitArgumentCoercion(var_types sigType, var_types nod
return true;
}

if (TypeIs(sigType, TYP_BOOL, TYP_UBYTE, TYP_BYTE, TYP_USHORT, TYP_SHORT, TYP_UINT, TYP_INT))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeIs matches the names we have elsewhere, like OperIs. I don't think its worth changing without doing considering a standard approach for the entire JIT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that TypeIs is declared for GenTree as a member function, so if we add a global TypeIs (we can't declare methods inside enum, even if it is a enum class) for each GenTree:method() { TypeIs() } we will deal with c++ overload resolution that will choose between GenTree:TypeIs() and global ::TypeIs() that could lead to errors.

@@ -446,7 +446,7 @@ void CodeGen::genSIMDScalarMove(
}
else
{
genSIMDZero(targetType, TYP_FLOAT, targetReg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about how this might impact CSE, particularly since (AFAIR) it relies on having the handle available (which requires the baseType to resolve)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are (and I think the intent for them is to stay that way) all contained to codegen, so this shouldn't be a problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).

I guess a bit of it will be moot when I eventually port this over to the SimdAsHWIntrinsic path...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).

My understanding is that some intrinsic generate different code depending on their base type, for example, init(SIMD16, 1) will need to know the base type to know how many elements are in the SIMD16.
The other intrinsics in some cases do not depend on base type, like SIMDIntrinsicInit(0), so we should not include baseType in CSE/VN considerations.

It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).

I don't see what is lost here, the generated code does not depend on the base type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple instructions for zeroing and while they are treated the same on most modern CPUs, not all CPUs do.

And so, it may be that certain hardware we want to generate pxor or xorpd instead of xorps. Likewise, if we ever add AVX-512 or SVE support, we may want different logic here for various masking or other operations that can occur.
and so carrying the type through to codegen is goodness IMO, even if today we happen to ignore it.

Copy link
Member

@tannergooding tannergooding May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both TYP_SIMD16 and TYP_SIMD32 there are the following "simd zeroing" instructions:

  • xorps - float
  • xorpd - double
  • pxor - integer
    Where xmm is used for TYP_SIMD16 and ymm is used for TYP_SIMD32

On modern Intel/AMD (the architecture manuals list Silvermont/Ryzen and newer), all three of these are dependency breaking/zeroing idioms and should actually be replaced in the renamer and so should be equivalent/interchangeable.

  • Noting that Silvermont is listed under the Intel Atom section, I didn't see anything listed under the non-Atom sections

However, on some hardware (particularly "older" hardware, such as Intel Core microarchitecture), the recommendation is to not mix floating-point and integer code:

3.5.1.10 Mixing SIMD Data Type
Previous microarchitectures (before Intel Core microarchitecture) do not have explicit restrictions on mixing integer and floating-point (FP) operations on XMM registers. For Intel Core microarchitecture, mixing integer and floating-point operations on the content of an XMM register can degrade perfor�mance. Software should avoid mixed-use of integer/FP operation on XMM registers. Specifically:
• Use SIMD integer operations to feed SIMD integer operations. Use PXOR for idiom.
• Use SIMD floating-point operations to feed SIMD floating-point operations. Use XORPS for idiom.
• When floating-point operations are bitwise equivalent, use PS data type instead of PD data type.
MOVAPS and MOVAPD do the same thing, but MOVAPS takes one less byte to encode the instruction.

Likewise, once you get to AVX-512, there are additional instructions such as pxord and pxorq, which get used instead because it is impactful for masked instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation, so would you prefer to lie about the type when it is not available to better support a future possibility of an arch where void genSIMDZero(var_types targetType, var_types baseType, regNumber targetReg); will use base_type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we always pass the "correct" type for the operation here and to finish plumbing through things like GenTreeJitIntrinsic->m_classLayout, but short of that the existing behavior of lying works well enough today so we can still generate something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I did not clearly describe the scenario that I want to support:

struct RandomStructWith16ByteSize
{
  int a;
  float b;
  int c;
  float d;
}

and we do:

RandomStructWith16ByteSize a = new RandomStructWith16ByteSize(); // zero init

if a can be put into an xmm register I want it to be there and generate a zero-init when there is no existing "correct type".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just explicitly support TYP_UNDEF and we can handle it as appropriate (which is currently treating it as TYP_FLOAT) in codegen ?

For context, I've been slowly doing work to get rid of the older SIMD support (like SimdIntrinsicInit) and replacing it with the newer HWIntrinsic support. When that happens, genSimdZero will be replaced with NI_Vector64_get_Zero, NI_Vector128_get_Zero, and NI_Vector256_get_zero so everything can be centrally handled. -- #35421 and #37882 are the two past PRs and #52288 is the latest

At that point, the codegen will generate different instructions based on the type by default and we'll need to handle TYP_UNDEF somehow anyways (different HWIntrinsics have different contracts for what instruction they'll generate, since they are mostly a 1-to-1 mapping with the hardware).

@sandreenko
Copy link
Contributor Author

The regressions are covered by #52286

@sandreenko
Copy link
Contributor Author

The PR was updated based on the previous discussion.

@sandreenko
Copy link
Contributor Author

@dotnet/jit-contrib Could somebody please take a look?

}
GenTreeSIMD* simdTree =
comp->gtNewSIMDNode(regType, src, SIMDIntrinsicInit, simdBaseJitType, varDsc->lvExactSize);
BlockRange().InsertAfter(src, simdTree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we lower the newly created simdTree so containment and other transforms can happen as appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you! It fixes a todo that I added because now we have SIMD16 instead of SIMD12.

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@sandreenko sandreenko merged commit d0b9864 into dotnet:main May 8, 2021
@sandreenko sandreenko deleted the improveSIMDInit branch May 8, 2021 00:34
@sandreenko sandreenko mentioned this pull request May 10, 2021
10 tasks
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants