-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats #65706
[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats #65706
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pointed out a few cases that we may not want to do this optimization. Wondering if anyone has any feedback.
f118ff1
to
d8b25ea
Compare
@ChunyuLiao pointed out that the VectorCombine pass already scalarizes binary ops with splatted operands at the IR level: https://godbolt.org/z/MPvvTG5dT Would it make sense to do this in VectorCombine::scalarizeBinopOrCmp? It seems to have some cost modelling in there which I'd imagine would be good to take advantage of |
It isn't quite similar enough to fit right into VectorCombine::scalarizeBinopOrCmp, but I've put it in right around that same area. Having the cost model from VectorCombine there is a great idea. |
823e887
to
95d9fe3
Compare
2150a6d
to
1cfeeec
Compare
// is a poison value. For now, only do this simplification if all lanes | ||
// are active. | ||
// TODO: Relax the condition that all lanes are active by using insertelement | ||
// on inactive lanes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this patch, but for reinserting the inactive lanes later maybe we could do something like
%x = scalar
%v = splat
%res = @llvm.vp.merge.v16i32(%mask, %v, poison, %evl)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. I will work on this after this patch lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good. Just a note, not related to your patch, but about a missed scalarization in the existing non-vp scalarization: It only catches binops where both operands aren't constant, e.g. like this:
define <vscale x 1 x i64> @f(i64 %x, i64 %y) {
%head.x = insertelement <vscale x 1 x i64> poison, i64 %x, i32 0
%splat.x = shufflevector <vscale x 1 x i64> %head.x, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
%head.y = insertelement <vscale x 1 x i64> poison, i64 %y, i32 0
%splat.y = shufflevector <vscale x 1 x i64> %head.y, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
%v = add <vscale x 1 x i64> %splat.x, %splat.y
ret <vscale x 1 x i64> %v
}
Because this happens to get transformed by instcombine into:
define <vscale x 1 x i64> @f(i64 %x, i64 %y) #0 {
%head.x = insertelement <vscale x 1 x i64> poison, i64 %x, i64 0
%head.y = insertelement <vscale x 1 x i64> poison, i64 %y, i64 0
%1 = add <vscale x 1 x i64> %head.x, %head.y
%v = shufflevector <vscale x 1 x i64> %1, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
ret <vscale x 1 x i64> %v
}
And scalarizeBinopOrCmp only looks for insertelement
s.
But if one of the operands of the binop is a constant:
define <vscale x 1 x i64> @g(i64 %x) {
%head.x = insertelement <vscale x 1 x i64> poison, i64 %x, i64 0
%splat.x = shufflevector <vscale x 1 x i64> %head.x, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
%splat.y = shufflevector <vscale x 1 x i64> insertelement(<vscale x 1 x i64> poison, i64 42, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
%v = add <vscale x 1 x i64> %splat.x, %splat.y
ret <vscale x 1 x i64> %v
}
Then the above transformation doesn't happen, and it stays in the shufflevector %x, poison, zeroinitializer
form. Which scalarizeBinopOrCmp
doesn't handle.
llvm/test/Transforms/VectorCombine/RISCV/vpintrin-scalarization.ll
Outdated
Show resolved
Hide resolved
7eae810
to
31d1880
Compare
No problem, I learnt lots about UB too :) |
949275c
to
256f2ad
Compare
ScalarIntrID = VPI.getFunctionalIntrinsicID(); | ||
if (!ScalarIntrID) | ||
return false; | ||
ScalarIsIntr = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this flag? Can we check if ScalarIntrID has a value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
bool MustHaveNonZeroVL = | ||
IntrID == Intrinsic::vp_sdiv || IntrID == Intrinsic::vp_udiv || | ||
IntrID == Intrinsic::vp_srem || IntrID == Intrinsic::vp_urem || | ||
IntrID == Intrinsic::vp_fdiv || IntrID == Intrinsic::vp_frem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp isn't an issue. only integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Do you mind explaining why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FP division by 0 produces infinity or negative infinity unless than numerator is 0, then it's NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For integer division: The quotient of division by zero has all bits set, i.e. 2XLEN − 1 for unsigned division or −1 for signed division [Source, page 48] (https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf).
Why do we want to avoid doing integer division but okay with fp division?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not RISC-V specific code so the RISC-V spec does not apply. https://llvm.org/docs/LangRef.html#udiv-instruction "Division by zero is undefined behavior."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see, my bad. I've updated to remove check for fp here.
IntrID == Intrinsic::vp_srem || IntrID == Intrinsic::vp_urem || | ||
IntrID == Intrinsic::vp_fdiv || IntrID == Intrinsic::vp_frem; | ||
|
||
if ((MustHaveNonZeroVL && IsKnownNonZeroVL) || !MustHaveNonZeroVL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just !MustHaveNonZeroVL || IsKnownNonZeroVL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
ElementCount EC = cast<VectorType>(Op0->getType())->getElementCount(); | ||
Value *EVL = VPI.getArgOperand(3); | ||
const DataLayout &DL = VPI.getModule()->getDataLayout(); | ||
bool IsKnownNonZeroVL = isKnownNonZero(EVL, DL, 0, &AC, &VPI, &DT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only call isKnownNonZero
if we need it. It's expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
408253e
to
be0847f
Compare
of the scalar operation VP Intrinsics whose vector operands are both splat values may be simplified into the scalar version of the operation and the result is splatted. If this simplification occurs, then it can lead to scalarization during CodeGen. This issue is the intrinsic dual of llvm#65072. This issue scalarizes non-legal types when the operations are VP Intrinsics.
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation
…to splats of the scalar operation Use getFunctionalIntrinsicID
…to splats of the scalar operation Add zvfh and VEC-COMBINE-64/32
…to splats of the scalar operation Respond to craigs comments
be0847f
to
5cc6e53
Compare
…lvm#66190) This adds a helper method to get the ID of the functionally equivalent intrinsic, similar to the existing getFunctionalOpcodeForVP and getConstrainedIntrinsicIDForVP methods. Not sure if it's notable or not, but I can't find any existing uses of VP_PROPERTY_FUNCTIONAL_INTRINSIC? It could potentially be used in llvm#65706 to scalarize VP intrinsics.
VPIntrinsics with VP_PROPERTY_BINARYOP property should have the ability to be queried with with VPBinOpIntrinsic::isVPBinOp, similiar to how intrinsics with the VP_PROPERTY_REDUCTION property can be queried with VPReductionIntrinsic::isVPReduction. This will be used in llvm#65706. In that PR the usage of this class is tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This directory was missing a lit.local.cfg which was causing some build bots to fail when #65706 was comitted.
of the scalar operation
VP Intrinsics whose vector operands are both splat values may be simplified into the scalar version of the operation and the result is splatted. If this simplification occurs, then it can lead to scalarization during CodeGen.
This issue is the intrinsic dual of #65072. This issue scalarizes non-legal types when the operations are VP Intrinsics.