-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64-SVE: gathervector extends #103370
ARM64-SVE: gathervector extends #103370
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
7dd8a48
to
6b722db
Compare
@dotnet/arm64-contrib @kunalspathak : This one is ready. |
stress test results. Failures look like predicate issues and the same as those for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for writing reusable test templates :) One comment about the memory load flag.
@@ -76,6 +76,16 @@ HARDWARE_INTRINSIC(Sve, FusedMultiplySubtract, | |||
HARDWARE_INTRINSIC(Sve, FusedMultiplySubtractBySelectedScalar, -1, 4, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_fmls, INS_sve_fmls}, HW_Category_SIMDByIndexedElement, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics|HW_Flag_FmaIntrinsic|HW_Flag_LowVectorOperation) | |||
HARDWARE_INTRINSIC(Sve, FusedMultiplySubtractNegated, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_fnmls, INS_sve_fnmls}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation|HW_Flag_FmaIntrinsic|HW_Flag_SpecialCodeGen) | |||
HARDWARE_INTRINSIC(Sve, GatherVector, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1w, INS_sve_ld1w, INS_sve_ld1d, INS_sve_ld1d, INS_sve_ld1w, INS_sve_ld1d}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, GatherVectorByteZeroExtend, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize earlier but all the load/gatherload need to be marked as HW_Category_MemoryLoad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize earlier but all the load/gatherload need to be marked as
HW_Category_MemoryLoad
That's going to require some debugging, as they don't fit a normal style of a load API (I get an assert failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hwintrinsic.cpp
has a if ((category == HW_Category_MemoryLoad) && op1->OperIs(GT_CAST))
check. This is only checked for memory loads with a single argument. All other types of memory loads are unchecked.
I moved this code so that it is checked for all HW_Category_MemoryLoad
nodes. I reused OperIsMemoryLoad()
but that means it needs doing after the ret node is created.
Inside OperIsMemoryLoad()
I'm not sure what to do for the cases where the load is a vector of addresses. My thought was that we should not set paddr
in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hwintrinsic.cpp has a if ((category == HW_Category_MemoryLoad) && op1->OperIs(GT_CAST)) check. This is only checked for memory loads with a single argument. All other types of memory loads are unchecked.
can we leave it as is then and not touch it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we leave it as is then and not touch it?
My assumption was this was a bug. The check needs to be added for all loads where the address is not in arg1.
What happens if the address is a GT_CAST and it is not removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the GT_CAST
changes for now. I'll raise a bug so we can double check it later. PR is all still working.
/ba-g the failures are known and test time out. |
if (varTypeIsSIMD(addr->gtType)) | ||
{ | ||
// The address is a vector of addresses. | ||
// Return true, but do not set pAddr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a74nh - do you remember why we do not set pAddr
here? We get AV in value numbering because of that.
Sve_GatherVector_Bases_double_ulong
was failing for me with below call stack:
> clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberAddExceptionSetForIndirection(GenTree * tree, GenTree * baseAddr) Line 13618 C++
clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberHWIntrinsic(GenTreeHWIntrinsic * tree) Line 12461 C++
clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberTree(GenTree * tree) Line 12056 C++
clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberBlock(BasicBlock * blk) Line 10554 C++
clrjit_universal_arm64_x64.dll!Compiler::fgValueNumber() Line 10340 C++
clrjit_universal_arm64_x64.dll!CompilerPhaseWithStatus::DoPhase() Line 132 C++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #103606
Does not include GatherVectorWithByteOffsets.
There were handful of reviewed APIs that did not make sense, as they were all zero/sign extending from 32bits to 32bits.
eg:
Vector<int> GatherVectorUInt32ZeroExtend(Vector<int> mask, uint* address, Vector<int> indices);
For these, there was no matching C++ API and the SVE instruction it pointed to was invalid (as you can't extend 32bits to 32bits). Not quite sure how they got into the API. I did not include any of these in this PR.
As per #103159, all 32bit address APIs have been commented out.