Skip to content

Commit

Permalink
Scalar/Packed conversions for floating point to integer (dotnet#97529)
Browse files Browse the repository at this point in the history
* merging with main
Initial changes for scalar conversion double -> ulong

* Basic working version of double -> ulong saturation

* Moving the code in a do-while with proper checks to amke sure we are adding the fixup node at all cases

* adjusting comments

* Merging with main
Saturating NaN to 0 and also adding Dbl2Ulng implementation in MathHelpers. Adding vector conversion support for double /float -> ulong conversion

* removing conflicts from gentree.h flags
merging with main
doubel to uint conversion

* float to uint conversion verified. removing commented code

* merging with main. Making changes to simdashwintrinsic.cpp and
listxarch.h
float -> uint packed conversion

* progress on double to long morphing

* another attempt at double to long conversion

* Merge with main
Merge with main

adding a new helper function ofr float to uint scalar conversion for SSE2.

* adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512.

* partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working

* adding float to int working scalar conversion case. Working on vectro case here on.

* partial work on float to int packed conversion

* partial version of float to int conversion

* working version of float to int scalar/packed for avx512

* complete conversions code for floating point to integral conversions for scalar/packed for SSE / avx512

* Merging with main.
fixing out of range test case adn adding conversion changes to simdashwintrinsic

* fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level

* adding JIT_Dbl2Int for target_x86 and other architectures.

* Supporting x86 for saturating conversions as well

* fixing errors in packed conversion

* accomodate unsigned in IR

* adding evex support for cvttss2si

* Mergw with main
defining nativeaot helpers for x86

* Catch divide by zero exception

* Handle overflow cases

* Fix tests to check saturating behavior

* Correct mapping of instructions

* Convert float -> ulong / long as float -> double -> ulong / long

* Merging with main
Initial changes for scalar conversion double -> ulong

* Merging with main
adjusting comments

* removing conflicts from gentree.h flags
merging with main
doubel to uint conversion

* merging with main. Making changes to simdashwintrinsic.cpp and
listxarch.h
float -> uint packed conversion

* adding a new helper function ofr float to uint scalar conversion for SSE2.

* Merging with main

adding handling for scalar conversion cases for SSE2. Remaining float/double -> long/int for AVX512.

* partial changes for float to int conversion using double to int for avx512. vfixup not working. next step is to fix the vfixup instruction and get it working

* partial version of float to int conversion

* working version of float to int scalar/packed for avx512

* Merging with main.
fixing out of range test case adn adding conversion changes to simdashwintrinsic

* Changing the way helper functions are handled in morph
fixing debug checks hitting asserts for TYP_ULONG and TYP_UINT at IR level

* adding JIT_Dbl2Int for target_x86 and other architectures.

* Supporting x86 for saturating conversions as well

* fixing errors in packed conversion

* Correct mapping of instructions

* delete extra files

* Merging main
review changes

* Merge with main and adding new helpers in nativeaot
Rebasing with main

* changing type of cast node as signed when making cast nodes

* Avoiding removing extra element from the stack

* Fix formatting, Change comp->IsaSupportedDebugOnly to IsBaselineVector512SupportedDebugOnly

* Reverting some changes to maintain uniformity in code

* Handling cases where AVX512 is not supported in simdashwintrinsic.cpp

* fixing exit conditions for ConvertVectorT_ToDouble

* Check for AVX512 support for TARGET_XARCH

* Avoid avx512 path for x86

* Enable AVX512F codepath for conversions in x86 arch. Move x86 to using c++ helpers

* Add SSE41 path for scalar conversions and 128 bit float to int packed conversions

* Adding SSE41 path for floating point to UINT scalar conversions

* Add AVX path for ConvertToInt32

* Adding comments and cleaning the code

* Fix errors in double to ulong

* Addressing review comments

* Fix tests

* Reverse val < 0 check in dbltoUint and dbltoUlng helpers

* Add overflow conversions for 86/x64, remove FastDbl2Lng and inline it

* Apply suggestions from code review

Co-authored-by: Jan Kotas <[email protected]>

* Correct Dbl2UlngOvf

* Apply suggestions from code review

* Apply suggestions from code review

* Update src/coreclr/vm/jithelpers.cpp

* Disable failing mono tests

* Working version of saturating logic moved to lowering for x86/x64

* Making changes for pre SSE41

* Apply suggestions from code review

Co-authored-by: Jan Kotas <[email protected]>

* Removing dead code

* Fix formatting

* Address review comments, add proper docstrings

---------

Co-authored-by: Jan Kotas <[email protected]>
  • Loading branch information
2 people authored and matouskozak committed Apr 30, 2024
1 parent 63d62fc commit 4077459
Show file tree
Hide file tree
Showing 30 changed files with 987 additions and 597 deletions.
6 changes: 3 additions & 3 deletions src/coreclr/inc/jithelpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@
JITHELPER(CORINFO_HELP_ULMOD, JIT_ULMod, CORINFO_HELP_SIG_16_STACK)
JITHELPER(CORINFO_HELP_LNG2DBL, JIT_Lng2Dbl, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_ULNG2DBL, JIT_ULng2Dbl, CORINFO_HELP_SIG_8_STACK)
DYNAMICJITHELPER(CORINFO_HELP_DBL2INT, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2INT, JIT_Dbl2Int, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2INT_OVF, JIT_Dbl2IntOvf, CORINFO_HELP_SIG_8_STACK)
DYNAMICJITHELPER(CORINFO_HELP_DBL2LNG, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2LNG, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2LNG_OVF, JIT_Dbl2LngOvf, CORINFO_HELP_SIG_8_STACK)
DYNAMICJITHELPER(CORINFO_HELP_DBL2UINT, JIT_Dbl2Lng, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2UINT, JIT_Dbl2UInt, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2UINT_OVF, JIT_Dbl2UIntOvf, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2ULNG, JIT_Dbl2ULng, CORINFO_HELP_SIG_8_STACK)
JITHELPER(CORINFO_HELP_DBL2ULNG_OVF, JIT_Dbl2ULngOvf, CORINFO_HELP_SIG_8_STACK)
Expand Down
11 changes: 7 additions & 4 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7602,21 +7602,24 @@ void CodeGen::genFloatToIntCast(GenTree* treeNode)
noway_assert((dstSize == EA_ATTR(genTypeSize(TYP_INT))) || (dstSize == EA_ATTR(genTypeSize(TYP_LONG))));

// We shouldn't be seeing uint64 here as it should have been converted
// into a helper call by either front-end or lowering phase.
assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))));
// into a helper call by either front-end or lowering phase, unless we have AVX512F
// accelerated conversions.
assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))) ||
compiler->compIsaSupportedDebugOnly(InstructionSet_AVX512F));

// If the dstType is TYP_UINT, we have 32-bits to encode the
// float number. Any of 33rd or above bits can be the sign bit.
// To achieve it we pretend as if we are converting it to a long.
if (varTypeIsUnsigned(dstType) && (dstSize == EA_ATTR(genTypeSize(TYP_INT))))
if (varTypeIsUnsigned(dstType) && (dstSize == EA_ATTR(genTypeSize(TYP_INT))) &&
!compiler->compOpportunisticallyDependsOn(InstructionSet_AVX512F))
{
dstType = TYP_LONG;
}

// Note that we need to specify dstType here so that it will determine
// the size of destination integer register and also the rex.w prefix.
genConsumeOperands(treeNode->AsOp());
instruction ins = ins_FloatConv(TYP_INT, srcType, emitTypeSize(srcType));
instruction ins = ins_FloatConv(dstType, srcType, emitTypeSize(srcType));
GetEmitter()->emitInsBinary(ins, emitTypeSize(dstType), treeNode, op1);
genProduceReg(treeNode);
}
Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -3204,6 +3204,14 @@ class Compiler
CorInfoType simdBaseJitType,
unsigned simdSize);

#if defined(TARGET_XARCH)
GenTree* gtNewSimdCvtNode(var_types type,
GenTree* op1,
CorInfoType simdTargetBaseJitType,
CorInfoType simdSourceBaseJitType,
unsigned simdSize);
#endif //TARGET_XARCH

GenTree* gtNewSimdCreateBroadcastNode(
var_types type, GenTree* op1, CorInfoType simdBaseJitType, unsigned simdSize);

Expand Down
6 changes: 4 additions & 2 deletions src/coreclr/jit/emit.h
Original file line number Diff line number Diff line change
Expand Up @@ -4012,7 +4012,8 @@ emitAttr emitter::emitGetBaseMemOpSize(instrDesc* id) const
case INS_comiss:
case INS_cvtss2sd:
case INS_cvtss2si:
case INS_cvttss2si:
case INS_cvttss2si32:
case INS_cvttss2si64:
case INS_divss:
case INS_extractps:
case INS_insertps:
Expand Down Expand Up @@ -4055,7 +4056,8 @@ emitAttr emitter::emitGetBaseMemOpSize(instrDesc* id) const
case INS_comisd:
case INS_cvtsd2si:
case INS_cvtsd2ss:
case INS_cvttsd2si:
case INS_cvttsd2si32:
case INS_cvttsd2si64:
case INS_divsd:
case INS_maxsd:
case INS_minsd:
Expand Down
41 changes: 23 additions & 18 deletions src/coreclr/jit/emitxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1522,9 +1522,11 @@ bool emitter::TakesRexWPrefix(const instrDesc* id) const
switch (ins)
{
case INS_cvtss2si:
case INS_cvttss2si:
case INS_cvttss2si32:
case INS_cvttss2si64:
case INS_cvtsd2si:
case INS_cvttsd2si:
case INS_cvttsd2si32:
case INS_cvttsd2si64:
case INS_movd:
case INS_movnti:
case INS_andn:
Expand All @@ -1544,7 +1546,6 @@ bool emitter::TakesRexWPrefix(const instrDesc* id) const
#endif // TARGET_AMD64
case INS_vcvtsd2usi:
case INS_vcvtss2usi:
case INS_vcvttsd2usi:
{
if (attr == EA_8BYTE)
{
Expand Down Expand Up @@ -2723,8 +2724,10 @@ bool emitter::emitInsCanOnlyWriteSSE2OrAVXReg(instrDesc* id)
case INS_blsmsk:
case INS_blsr:
case INS_bzhi:
case INS_cvttsd2si:
case INS_cvttss2si:
case INS_cvttsd2si32:
case INS_cvttsd2si64:
case INS_cvttss2si32:
case INS_cvttss2si64:
case INS_cvtsd2si:
case INS_cvtss2si:
case INS_extractps:
Expand All @@ -2748,7 +2751,8 @@ bool emitter::emitInsCanOnlyWriteSSE2OrAVXReg(instrDesc* id)
#endif
case INS_vcvtsd2usi:
case INS_vcvtss2usi:
case INS_vcvttsd2usi:
case INS_vcvttsd2usi32:
case INS_vcvttsd2usi64:
case INS_vcvttss2usi32:
case INS_vcvttss2usi64:
{
Expand Down Expand Up @@ -11605,22 +11609,20 @@ void emitter::emitDispIns(
break;
}

case INS_cvttsd2si:
case INS_cvttsd2si32:
case INS_cvttsd2si64:
case INS_cvtss2si:
case INS_cvtsd2si:
case INS_cvttss2si:
case INS_cvttss2si32:
case INS_cvttss2si64:
case INS_vcvtsd2usi:
case INS_vcvtss2usi:
case INS_vcvttsd2usi:
{
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
break;
}

case INS_vcvttsd2usi32:
case INS_vcvttsd2usi64:
case INS_vcvttss2usi32:
case INS_vcvttss2usi64:
{
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_4BYTE));
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
break;
}

Expand Down Expand Up @@ -19048,7 +19050,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
break;
}

case INS_cvttsd2si:
case INS_cvttsd2si32:
case INS_cvttsd2si64:
case INS_cvtsd2si:
case INS_cvtsi2sd32:
case INS_cvtsi2ss32:
Expand All @@ -19057,7 +19060,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
case INS_vcvtsd2usi:
case INS_vcvtusi2ss32:
case INS_vcvtusi2ss64:
case INS_vcvttsd2usi:
case INS_vcvttsd2usi32:
case INS_vcvttsd2usi64:
case INS_vcvttss2usi32:
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
result.insLatency += PERFSCORE_LATENCY_7C;
Expand All @@ -19069,7 +19073,8 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
result.insLatency += PERFSCORE_LATENCY_5C;
break;

case INS_cvttss2si:
case INS_cvttss2si32:
case INS_cvttss2si64:
case INS_cvtss2si:
case INS_vcvtss2usi:
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
Expand Down
Loading

0 comments on commit 4077459

Please sign in to comment.