-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64-SVE: Add Not, InsertIntoShiftedVector #103725
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrong emitIns* method.
Can you confirm the exact place where this is happening?
/// INSR Ztied1.D, Xop2 | ||
/// INSR Ztied1.D, Dop2 | ||
/// </summary> | ||
public static unsafe Vector<double> InsertIntoShiftedVector(Vector<double> left, double right) { throw new PlatformNotSupportedException(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API, as per the docs is implementing both INSR - SIMD&FP and INSR - scalar, but it is not clear to me, how we decide which one to pick. @a74nh - any idea?
if (targetReg != op1Reg) | ||
{ | ||
assert(targetReg != op2Reg); | ||
GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with both op1Reg
being gpr or SIMD/FP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, though I'm struggling to get the JIT to use a gpr for op1Reg
under the stress modes -- I'm only ever getting the "easy" case, where when op1Reg
and targetReg
differ, op1Reg
is already a vector register, so we're moving from a vector reg to a vector reg. Since the first argument in Sve.InsertIntoShiftedVector<T>
is of type Vector<T>
, we'd expect op1Reg
to always be a vector register, right? I could add an assert here clarify this. (Though looking at emitIns_Mov
, it does have a path for emitting mov
instructions from a gpr to a vector register.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I meant that for op2Reg
. Sorry. So ideally, where you have double
or float
as 2nd argument, we should have op2Reg
as SIMD/floating point and otherwise should be gpr. Can you verify that please? For op1Reg
it should always be scalable register and we already assert that in emitter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I've added an assert to check this. Stress tests for unoptimized and optimized tests are still passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind sharing small section of disassembly for both the categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. Here's a snippet from the double
tests:
ldr q16, [x0]
ldr d17, [fp, #0x30] // [V01 loc0]
insr z16.d, d17
And from the uint
tests:
ldr q16, [x0]
ldr w0, [fp, #0x34] // [V01 loc0]
insr z16.s, w0
Sure; right here. Notice how in the if-else statement, we check if the instruction is scalable before we check if it has RMW semantics. |
I made a small change to |
@kunalspathak @a74nh does this PR need anything else? |
Your change looks good except that I realized that after #103620, we do not need condition anymore for below code because both branches are doing same thing. if (HWIntrinsicInfo::IsExplicitMaskedOperation(intrin.id))
{
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
}
else
{
// This generates an unpredicated version
// Implicitly predicated should be taken care above `intrin.op2->IsEmbMaskOp()`
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Got it, I'll fix that in my next PR. |
/ba-g Build Analysis blocked by timeouts |
Part of #99957.
InsertIntoShiftedVector
requires a new test template,_SveVecAndScalarOpTest.template
. This template is a clone of_SveMasklessUnaryOpTestTemplate.template
, except that a second argument of typeT
is passed toSve.InsertIntoShiftedVector<T>
and handled elsewhere accordingly. I could've implementedInsertIntoShiftedVector
without any special codegen, but the path for handling unpredicated instructions with two operands inCodeGen::genHWIntrinsic
checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrongemitIns*
method. Existing instructions seem to be dependent on this order of checking, so I cannot flip the order of the if-elseif-else statement without breaking existing tests, and I wasn't sure if duplicating the RMW logic on the scalable path would be ideal if it's only for one instruction -- if we prefer to do that now, I can get rid of the special path for this intrinsic, and handle the scalable RMW case on the general path.Not tests:
InsertIntoShiftedVector tests:
cc @dotnet/arm64-contrib