-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve struct inits. #52292
Improve struct inits. #52292
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @echesakovMSFT @tannergooding , cc @dotnet/jit-contrib
src/coreclr/jit/vartype.h
Outdated
|
||
// Check if type1 matches any type from the list. | ||
template <typename... T> | ||
bool TypeIsInList(var_types type1, var_types type2, T... rest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndyAyersMS was right in #43386 (comment).
I am looking for a better name for this method, could not keep TypeIs
because that would be a conflict inside GenTree
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior art here is GenTree::StaticOperIs
, I personally do not like that name.
I would be voting for V/varTypeIs
(both cases are in use today...).
var_types simdType = location->TypeGet(); | ||
GenTree* initVal = assignment->AsOp()->gtOp2; | ||
CorInfoType simdBaseJitType = comp->getBaseJitTypeOfSIMDLocal(location); | ||
if (simdBaseJitType != CORINFO_TYPE_UNDEF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have never had simdBaseJitType == CORINFO_TYPE_UNDEF
on x64, otherwise it would be an assert later on
codegenxarch.cpp:4439:
assert(varTypeUsesFloatReg(targetType) == varTypeUsesFloatReg(op1Type));
from a tree like:
STORE_LCL_VAR simd16 V01
\--* CNS_INT 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible, but we handle this in register allocation by lying about the type:
https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lsra.cpp#L6995-L7000
if (simdNode->GetSimdBaseJitType() == CORINFO_TYPE_UNDEF)
{
// There are a few scenarios where we can get a LCL_VAR which
// doesn't know the underlying baseType. In that scenario, we
// will just lie and say it is a float. Codegen doesn't actually
// care what the type is but this avoids an assert that would
// otherwise be fired from the more general checks that happen.
simdNode->SetSimdBaseJitType(CORINFO_TYPE_FLOAT);
}
Namely there are a variety of scenarios where the base type might not get copied around for things like SIMD locals and where we lose the class handle and type information.
Ideally, we would preserve this 100% of the time, since its needed by things like CSE, but we don't today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The potential issue was:
ASG SIMD(LCL_VAR , 0) with simdBaseJitType == CORINFO_TYPE_UNDEF
comes to rationalize and does not create STORE_LCL_VAR(SIMD(0))
leaving STORE_LCL_VAR(0)
.
code in lsra does not matter, it won't create a new node because it is after rationalize.
codegenxarch fails.
These lyings could be also deleted now (will push a change soon), maybe it will require to add if ((simdNode->gtSIMDIntrinsicID == SIMDIntrinsicUpperSave) || (simdNode->gtSIMDIntrinsicID != SIMDIntrinsicUpperRestore))
to simdcodegenxarch.cpp:genSIMDIntrinsic
as we have this condition for arm64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASG SIMD(LCL_VAR , 0)
Is an interesting tree. In my testing (wrote a checker that looks for all ASG SIMD
trees with mismatched types of LHS and RHS and ran it through SPMI), it only comes up when you initobj/initblk
a SIMD local, and even then morph transforms it into ASG SIMD(LCL_VAR, SIMD(0))
. I am curious to know if you know of other sources because my (WIP) fix for #51500 is to never import such ASG
s in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this code exists here to catch what morph did not for some reason, for example:
repro-7984.zip
unzip it and run like
"D:\Sergey\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe" D:\Sergey\git\runtime\artifacts\obj\coreclr\windows.x64.Checked\ide\jit\Debug\clrjit.dll(or any jit that you want to use) "repro-7984.mc" -jitoption force AltJit= -jitoption force AltJitNgen= -jitoption force JitDump=* -jitoption force NgenDump=*
to see the dump, it will show you that in this case the issue is in fgMorphBlock that does not call morphTree for the trees that it creates, @kunalspathak recently pointed me to another issue caused by the same source.
I have decided not to hack fgMorphBlock
anymore to squeeze positive diffs(and support struct enreg), so I am rewriting it now. However, it will take time and for this PR I want to keep this "catch morph failures" block in place.
fgMorphInitBlock: using field by field initialization.
GenTreeNode creates assertion:
[000079] IA---------- * ASG simd12 (init)
In BB01 New Local Constant Assertion: V06 == 0 index=#01, mask=0000000000000001
GenTreeNode creates assertion:
[000082] -A---------- * ASG float
In BB01 New Local Constant Assertion: V07 == 0.000000 index=#02, mask=0000000000000002
fgMorphInitBlock (after):
[000083] -A---+------ * COMMA void
[000079] IA---------- +--* ASG simd12 (init) <- forgot to call fgMorphTree :-(
[000077] D------N---- | +--* LCL_VAR simd12<System.Numerics.Vector3> V06 tmp4
[000078] ------------ | \--* CNS_INT simd12 0
[000082] -A---------- \--* ASG float
[000080] D------N---- +--* LCL_VAR float V07 tmp5
[000081] ------------ \--* CNS_DBL float 0.00000000000000000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have decided not to hack
fgMorphBlock
anymore to squeeze positive diffs(and support struct enreg), so I am rewriting it now. However, it will take time.
I believe you 😄. Good luck!
it will show you that in this case the issue is in fgMorphBlock that does not call morphTree for the trees that it creates
Thanks!
} | ||
else if (!src->OperIs(GT_LCL_VAR) || (varDsc->GetLayout()->GetRegisterType() == TYP_UNDEF)) | ||
else if (!varDsc->IsEnregisterable()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main goal of this change is to keep ASG(LCL_VAR, 0)
as STORE_LCL_VAR(0)
without taking the address of the lcl.
src/coreclr/jit/codegenarm64.cpp
Outdated
// these implemented over lclVars created by CSE without full handle information (and | ||
// therefore potentially without a base type). | ||
if ((simdNode->gtSIMDIntrinsicID != SIMDIntrinsicUpperSave) && | ||
if (!TypeIsInList(simdNode->GetSimdBaseType(), TYP_INT, TYP_LONG, TYP_FLOAT, TYP_DOUBLE, TYP_USHORT, TYP_UBYTE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is probably better served by one that does a value >= TYP_BYTE && value <= TYP_ULONG
(I think we have an existing helper for this already), since these values are sequentially ordered in the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the relevant helper is varTypeIsArithmetic
, although it also allows TYP_BOOL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
varTypeIsArithmetic
is the one that the rest of the SIMD checks use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for TYP_BOOL SIMD intrinsic? why don't they fail nowadays on this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we filter it out by resolving the actual handle for all generic types during import: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/simd.cpp#L128
If the CorInfoType
isn't recognized here, then we return CORINFO_TYPE_UNDEF
and that is ultimately filtered out and treated as "unsupported":
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, we have tests (such as https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/General/NotSupported) that covers types like bool
and nint
/nuint
(the latter two of which are only supported on Vector<T>
and not yet on Vector64/128/256<T>
)
src/coreclr/jit/codegenarm64.cpp
Outdated
} | ||
|
||
const GenTree* op1 = simdNode->gtGetOp1(); | ||
if ((simdNode->gtSIMDIntrinsicID == SIMDIntrinsicInit) && (op1->IsIntegralConst(0) || op1->IsFPZero())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have GenTree::IsSIMDZero
that should be preferred here, since it will be updated to include the get_Zero
HWIntrinsics in the future: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/gentree.h#L7061
src/coreclr/jit/importer.cpp
Outdated
@@ -1042,30 +1030,30 @@ bool Compiler::impCheckImplicitArgumentCoercion(var_types sigType, var_types nod | |||
return true; | |||
} | |||
|
|||
if (TypeIs(sigType, TYP_BOOL, TYP_UBYTE, TYP_BYTE, TYP_USHORT, TYP_SHORT, TYP_UINT, TYP_INT)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeIs
matches the names we have elsewhere, like OperIs
. I don't think its worth changing without doing considering a standard approach for the entire JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that TypeIs
is declared for GenTree
as a member function, so if we add a global TypeIs
(we can't declare methods inside enum, even if it is a enum class) for each GenTree:method() { TypeIs() }
we will deal with c++ overload resolution that will choose between GenTree:TypeIs()
and global ::TypeIs()
that could lead to errors.
src/coreclr/jit/simdcodegenxarch.cpp
Outdated
@@ -446,7 +446,7 @@ void CodeGen::genSIMDScalarMove( | |||
} | |||
else | |||
{ | |||
genSIMDZero(targetType, TYP_FLOAT, targetReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly worried about how this might impact CSE, particularly since (AFAIR) it relies on having the handle available (which requires the baseType to resolve)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are (and I think the intent for them is to stay that way) all contained to codegen, so this shouldn't be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).
I guess a bit of it will be moot when I eventually port this over to the SimdAsHWIntrinsic
path...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).
My understanding is that some intrinsic generate different code depending on their base type, for example, init(SIMD16, 1) will need to know the base type to know how many elements are in the SIMD16.
The other intrinsics in some cases do not depend on base type, like SIMDIntrinsicInit(0)
, so we should not include baseType in CSE/VN considerations.
It might be fine for this case, but it does lose the ability to generate different instructions based on type (which we somewhat do for hwintrinsics).
I don't see what is lost here, the generated code does not depend on the base type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple instructions for zeroing and while they are treated the same on most modern CPUs, not all CPUs do.
And so, it may be that certain hardware we want to generate pxor
or xorpd
instead of xorps
. Likewise, if we ever add AVX-512 or SVE support, we may want different logic here for various masking or other operations that can occur.
and so carrying the type through to codegen is goodness IMO, even if today we happen to ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both TYP_SIMD16
and TYP_SIMD32
there are the following "simd zeroing" instructions:
xorps
-float
xorpd
-double
pxor
-integer
Wherexmm
is used forTYP_SIMD16
andymm
is used forTYP_SIMD32
On modern Intel/AMD (the architecture manuals list Silvermont/Ryzen and newer), all three of these are dependency breaking/zeroing idioms and should actually be replaced in the renamer and so should be equivalent/interchangeable.
- Noting that Silvermont is listed under the Intel Atom section, I didn't see anything listed under the non-Atom sections
However, on some hardware (particularly "older" hardware, such as Intel Core microarchitecture), the recommendation is to not mix floating-point and integer code:
3.5.1.10 Mixing SIMD Data Type
Previous microarchitectures (before Intel Core microarchitecture) do not have explicit restrictions on mixing integer and floating-point (FP) operations on XMM registers. For Intel Core microarchitecture, mixing integer and floating-point operations on the content of an XMM register can degrade perfor�mance. Software should avoid mixed-use of integer/FP operation on XMM registers. Specifically:
• Use SIMD integer operations to feed SIMD integer operations. Use PXOR for idiom.
• Use SIMD floating-point operations to feed SIMD floating-point operations. Use XORPS for idiom.
• When floating-point operations are bitwise equivalent, use PS data type instead of PD data type.
MOVAPS and MOVAPD do the same thing, but MOVAPS takes one less byte to encode the instruction.
Likewise, once you get to AVX-512, there are additional instructions such as pxord
and pxorq
, which get used instead because it is impactful for masked instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation, so would you prefer to lie about the type when it is not available to better support a future possibility of an arch where void genSIMDZero(var_types targetType, var_types baseType, regNumber targetReg);
will use base_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we always pass the "correct" type for the operation here and to finish plumbing through things like GenTreeJitIntrinsic->m_classLayout
, but short of that the existing behavior of lying works well enough today so we can still generate something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I did not clearly describe the scenario that I want to support:
struct RandomStructWith16ByteSize
{
int a;
float b;
int c;
float d;
}
and we do:
RandomStructWith16ByteSize a = new RandomStructWith16ByteSize(); // zero init
if a
can be put into an xmm register I want it to be there and generate a zero-init when there is no existing "correct type".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just explicitly support TYP_UNDEF
and we can handle it as appropriate (which is currently treating it as TYP_FLOAT
) in codegen ?
For context, I've been slowly doing work to get rid of the older SIMD support (like SimdIntrinsicInit
) and replacing it with the newer HWIntrinsic support. When that happens, genSimdZero
will be replaced with NI_Vector64_get_Zero
, NI_Vector128_get_Zero
, and NI_Vector256_get_zero
so everything can be centrally handled. -- #35421 and #37882 are the two past PRs and #52288 is the latest
At that point, the codegen will generate different instructions based on the type by default and we'll need to handle TYP_UNDEF
somehow anyways (different HWIntrinsics have different contracts for what instruction they'll generate, since they are mostly a 1-to-1 mapping with the hardware).
The regressions are covered by #52286 |
02bacd1
to
bb7356e
Compare
The PR was updated based on the previous discussion. |
@dotnet/jit-contrib Could somebody please take a look? |
src/coreclr/jit/lower.cpp
Outdated
} | ||
GenTreeSIMD* simdTree = | ||
comp->gtNewSIMDNode(regType, src, SIMDIntrinsicInit, simdBaseJitType, varDsc->lvExactSize); | ||
BlockRange().InsertAfter(src, simdTree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we lower
the newly created simdTree
so containment and other transforms can happen as appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thank you! It fixes a todo that I added because now we have SIMD16 instead of SIMD12.
/azp run runtime-coreclr outerloop, runtime-coreclr jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
There is 1 change:
ASG struct(LCL_VAR, 0)
asSTORE_LCL_VAR struct(0)
.a typical improvement looks like: