Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Shrink vslideup's LMUL when lowering fixed insert_subvector #65997

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 11, 2023

Similar to #65598, if we're using a vslideup to insert a fixed length vector into another vector, then we can work out the minimum number of registers it will need to slide up across given the minimum VLEN, and shrink the type operated on to reduce LMUL accordingly.

This is somewhat dependent on #66211 , since it introduces a subregister copy that triggers a crash with -early-live-intervals in one of the tests.

Stacked upon #66211

@lukel97 lukel97 requested review from preames and topperc September 11, 2023 18:55
@lukel97 lukel97 requested a review from a team as a code owner September 11, 2023 18:56
@lukel97 lukel97 changed the title [RISCV] Shrine vslideup's LMUL when lowering fixed insert_subvector [RISCV] Shrink vslideup's LMUL when lowering fixed insert_subvector Sep 11, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-backend-risc-v

Changes

Similar to #65598, if we're using a vslideup to insert a fixed length vector into another vector, then we can work out the minimum number of registers it will need to slide up across given the minimum VLEN, and shrink the type operated on to reduce LMUL accordingly.

This is somewhat dependent on #65916, since it introduces a subregister copy that triggers a crash with -early-live-intervals in one of the tests.

Patch is 47.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65997.diff

6 Files Affected:

  • (modified) llvm/lib/CodeGen/TwoAddressInstructionPass.cpp (+21-2)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+21-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fpclamptosat_vec.ll (+150-180)
  • (added) llvm/test/CodeGen/RISCV/rvv/twoaddressinstruction-subreg-liveness-update.mir (+42)
diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
index 45f61262faf9391..e36bffc91b91d95 100644
--- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
+++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
@@ -1871,11 +1871,30 @@ bool TwoAddressInstructionPass::runOnMachineFunction(MachineFunction &Func) {
             SlotIndex Idx = LIS->getInstructionIndex(*mi);
             for (auto &S : LI.subranges()) {
               if ((S.LaneMask & LaneMask).none()) {
+                // If Idx is 160B, and we have a subrange that isn't in
+                // %reg.subidx like so:
+                //
+                // [152r,160r)[160r,256r)
+                //
+                // Merge the two segments together so the subrange becomes:
+                //
+                // [152r,256r)
                 LiveRange::iterator UseSeg = S.FindSegmentContaining(Idx);
-                LiveRange::iterator DefSeg = std::next(UseSeg);
-                S.MergeValueNumberInto(DefSeg->valno, UseSeg->valno);
+                if (UseSeg != S.end()) {
+                  LiveRange::iterator DefSeg = std::next(UseSeg);
+                  assert(DefSeg != S.end());
+                  S.MergeValueNumberInto(DefSeg->valno, UseSeg->valno);
+                }
+                // Otherwise, it should have only one segment that starts at
+                // 160r which we should remove.
+                else {
+                  assert(S.containsOneValue());
+                  assert(S.begin()->start == Idx.getRegSlot());
+                  S.removeSegment(S.begin());
+                }
               }
             }
+            LI.removeEmptySubRanges();
 
             // The COPY no longer has a use of %reg.
             LIS->shrinkToUses(&LI);
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4ff264635cda248..3a81179d1cfc120 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -8606,6 +8606,18 @@ SDValue RISCVTargetLowering::lowerINSERT_SUBVECTOR(SDValue Op,
       ContainerVT = getContainerForFixedLengthVector(VecVT);
       Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
     }
+
+    // Shrink down Vec so we're performing the slideup on a smaller LMUL.
+    unsigned LastIdx = OrigIdx + SubVecVT.getVectorNumElements() - 1;
+    MVT OrigContainerVT = ContainerVT;
+    SDValue OrigVec = Vec;
+    if (auto ShrunkVT =
+            getSmallestVTForIndex(ContainerVT, LastIdx, DL, DAG, Subtarget)) {
+      ContainerVT = *ShrunkVT;
+      Vec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ContainerVT, Vec,
+                        DAG.getVectorIdxConstant(0, DL));
+    }
+
     SubVec = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, ContainerVT,
                          DAG.getUNDEF(ContainerVT), SubVec,
                          DAG.getConstant(0, DL, XLenVT));
@@ -8636,6 +8648,12 @@ SDValue RISCVTargetLowering::lowerINSERT_SUBVECTOR(SDValue Op,
                            SlideupAmt, Mask, VL, Policy);
     }
 
+    // If we performed the slideup on a smaller LMUL, insert the result back
+    // into the rest of the vector.
+    if (ContainerVT != OrigContainerVT)
+      SubVec = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, OrigContainerVT, OrigVec,
+                           SubVec, DAG.getVectorIdxConstant(0, DL));
+
     if (VecVT.isFixedLengthVector())
       SubVec = convertFromScalableVector(VecVT, SubVec, DAG, Subtarget);
     return DAG.getBitcast(Op.getValueType(), SubVec);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll
index 1d6a45ed36f335c..6a9212ed309a8ef 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll
@@ -14,7 +14,7 @@ define  @insert_nxv8i32_v2i32_0( %vec, ptr %
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; CHECK-NEXT:    vle32.v v12, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m4, tu, ma
+; CHECK-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
 ; CHECK-NEXT:    vmv.v.v v8, v12
 ; CHECK-NEXT:    ret
   %sv = load <2 x i32>, ptr %svp
@@ -27,7 +27,7 @@ define  @insert_nxv8i32_v2i32_2( %vec, ptr %
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; CHECK-NEXT:    vle32.v v12, (a0)
-; CHECK-NEXT:    vsetivli zero, 4, e32, m4, tu, ma
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
 ; CHECK-NEXT:    vslideup.vi v8, v12, 2
 ; CHECK-NEXT:    ret
   %sv = load <2 x i32>, ptr %svp
@@ -40,7 +40,7 @@ define  @insert_nxv8i32_v2i32_6( %vec, ptr %
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; CHECK-NEXT:    vle32.v v12, (a0)
-; CHECK-NEXT:    vsetivli zero, 8, e32, m4, tu, ma
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
 ; CHECK-NEXT:    vslideup.vi v8, v12, 6
 ; CHECK-NEXT:    ret
   %sv = load <2 x i32>, ptr %svp
@@ -51,22 +51,19 @@ define  @insert_nxv8i32_v2i32_6( %vec, ptr %
 define  @insert_nxv8i32_v8i32_0( %vec, ptr %svp) {
 ; LMULMAX2-LABEL: insert_nxv8i32_v8i32_0:
 ; LMULMAX2:       # %bb.0:
-; LMULMAX2-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; LMULMAX2-NEXT:    vle32.v v12, (a0)
-; LMULMAX2-NEXT:    vsetivli zero, 8, e32, m4, tu, ma
-; LMULMAX2-NEXT:    vmv.v.v v8, v12
+; LMULMAX2-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; LMULMAX2-NEXT:    vle32.v v8, (a0)
 ; LMULMAX2-NEXT:    ret
 ;
 ; LMULMAX1-LABEL: insert_nxv8i32_v8i32_0:
 ; LMULMAX1:       # %bb.0:
+; LMULMAX1-NEXT:    addi a1, a0, 16
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
-; LMULMAX1-NEXT:    vle32.v v12, (a0)
-; LMULMAX1-NEXT:    addi a0, a0, 16
-; LMULMAX1-NEXT:    vle32.v v16, (a0)
-; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m4, tu, ma
-; LMULMAX1-NEXT:    vmv.v.v v8, v12
-; LMULMAX1-NEXT:    vsetivli zero, 8, e32, m4, tu, ma
-; LMULMAX1-NEXT:    vslideup.vi v8, v16, 4
+; LMULMAX1-NEXT:    vle32.v v12, (a1)
+; LMULMAX1-NEXT:    vsetvli zero, zero, e32, m1, tu, ma
+; LMULMAX1-NEXT:    vle32.v v8, (a0)
+; LMULMAX1-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; LMULMAX1-NEXT:    vslideup.vi v8, v12, 4
 ; LMULMAX1-NEXT:    ret
   %sv = load <8 x i32>, ptr %svp
   %v = call  @llvm.vector.insert.v8i32.nxv8i32( %vec, <8 x i32> %sv, i64 0)
@@ -84,14 +81,14 @@ define  @insert_nxv8i32_v8i32_8( %vec, ptr %
 ;
 ; LMULMAX1-LABEL: insert_nxv8i32_v8i32_8:
 ; LMULMAX1:       # %bb.0:
-; LMULMAX1-NEXT:    addi a1, a0, 16
 ; LMULMAX1-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
-; LMULMAX1-NEXT:    vle32.v v12, (a1)
+; LMULMAX1-NEXT:    vle32.v v12, (a0)
+; LMULMAX1-NEXT:    addi a0, a0, 16
 ; LMULMAX1-NEXT:    vle32.v v16, (a0)
 ; LMULMAX1-NEXT:    vsetivli zero, 12, e32, m4, tu, ma
-; LMULMAX1-NEXT:    vslideup.vi v8, v16, 8
+; LMULMAX1-NEXT:    vslideup.vi v8, v12, 8
 ; LMULMAX1-NEXT:    vsetivli zero, 16, e32, m4, tu, ma
-; LMULMAX1-NEXT:    vslideup.vi v8, v12, 12
+; LMULMAX1-NEXT:    vslideup.vi v8, v16, 12
 ; LMULMAX1-NEXT:    ret
   %sv = load <8 x i32>, ptr %svp
   %v = call  @llvm.vector.insert.v8i32.nxv8i32( %vec, <8 x i32> %sv, i64 8)
@@ -166,7 +163,7 @@ define void @insert_v8i32_v2i32_0(ptr %vp, ptr %svp) {
 ; LMULMAX2-NEXT:    vle32.v v8, (a1)
 ; LMULMAX2-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; LMULMAX2-NEXT:    vle32.v v10, (a0)
-; LMULMAX2-NEXT:    vsetivli zero, 2, e32, m2, tu, ma
+; LMULMAX2-NEXT:    vsetivli zero, 2, e32, m1, tu, ma
 ; LMULMAX2-NEXT:    vmv.v.v v10, v8
 ; LMULMAX2-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; LMULMAX2-NEXT:    vse32.v v10, (a0)
@@ -197,7 +194,7 @@ define void @insert_v8i32_v2i32_2(ptr %vp, ptr %svp) {
 ; LMULMAX2-NEXT:    vle32.v v8, (a1)
 ; LMULMAX2-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; LMULMAX2-NEXT:    vle32.v v10, (a0)
-; LMULMAX2-NEXT:    vsetivli zero, 4, e32, m2, tu, ma
+; LMULMAX2-NEXT:    vsetivli zero, 4, e32, m1, tu, ma
 ; LMULMAX2-NEXT:    vslideup.vi v10, v8, 2
 ; LMULMAX2-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; LMULMAX2-NEXT:    vse32.v v10, (a0)
@@ -508,9 +505,9 @@ define void @insert_v2i64_nxv16i64(ptr %psv0, ptr %psv1, * %o
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; CHECK-NEXT:    vle64.v v8, (a0)
-; CHECK-NEXT:    vle64.v v16, (a1)
-; CHECK-NEXT:    vsetivli zero, 6, e64, m8, tu, ma
-; CHECK-NEXT:    vslideup.vi v8, v16, 4
+; CHECK-NEXT:    vle64.v v12, (a1)
+; CHECK-NEXT:    vsetivli zero, 6, e64, m4, tu, ma
+; CHECK-NEXT:    vslideup.vi v8, v12, 4
 ; CHECK-NEXT:    vs8r.v v8, (a2)
 ; CHECK-NEXT:    ret
   %sv0 = load <2 x i64>, ptr %psv0
@@ -539,7 +536,7 @@ define void @insert_v2i64_nxv16i64_lo2(ptr %psv, * %out) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; CHECK-NEXT:    vle64.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 4, e64, m8, ta, ma
+; CHECK-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; CHECK-NEXT:    vslideup.vi v16, v8, 2
 ; CHECK-NEXT:    vs8r.v v16, (a1)
 ; CHECK-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
index f52ba6f51d5c897..805557905117add 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll
@@ -27,13 +27,13 @@ define void @widen_3xv4i16(ptr %x, ptr %z) {
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
 ; CHECK-NEXT:    addi a2, a0, 8
-; CHECK-NEXT:    vle16.v v10, (a2)
+; CHECK-NEXT:    vle16.v v9, (a2)
 ; CHECK-NEXT:    addi a0, a0, 16
-; CHECK-NEXT:    vle16.v v12, (a0)
-; CHECK-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
-; CHECK-NEXT:    vslideup.vi v8, v10, 4
+; CHECK-NEXT:    vle16.v v10, (a0)
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, tu, ma
+; CHECK-NEXT:    vslideup.vi v8, v9, 4
 ; CHECK-NEXT:    vsetivli zero, 12, e16, m2, tu, ma
-; CHECK-NEXT:    vslideup.vi v8, v12, 8
+; CHECK-NEXT:    vslideup.vi v8, v10, 8
 ; CHECK-NEXT:    vse16.v v8, (a1)
 ; CHECK-NEXT:    ret
   %a = load <4 x i16>, ptr %x
@@ -75,17 +75,17 @@ define void @widen_4xv4i16_unaligned(ptr %x, ptr %z) {
 ; CHECK-NO-MISALIGN-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NO-MISALIGN-NEXT:    vle8.v v8, (a0)
 ; CHECK-NO-MISALIGN-NEXT:    addi a2, a0, 8
-; CHECK-NO-MISALIGN-NEXT:    vle8.v v10, (a2)
+; CHECK-NO-MISALIGN-NEXT:    vle8.v v9, (a2)
 ; CHECK-NO-MISALIGN-NEXT:    addi a2, a0, 16
-; CHECK-NO-MISALIGN-NEXT:    vle8.v v12, (a2)
+; CHECK-NO-MISALIGN-NEXT:    vle8.v v10, (a2)
 ; CHECK-NO-MISALIGN-NEXT:    addi a0, a0, 24
-; CHECK-NO-MISALIGN-NEXT:    vle8.v v14, (a0)
-; CHECK-NO-MISALIGN-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
-; CHECK-NO-MISALIGN-NEXT:    vslideup.vi v8, v10, 4
+; CHECK-NO-MISALIGN-NEXT:    vle8.v v12, (a0)
+; CHECK-NO-MISALIGN-NEXT:    vsetvli zero, zero, e16, m1, tu, ma
+; CHECK-NO-MISALIGN-NEXT:    vslideup.vi v8, v9, 4
 ; CHECK-NO-MISALIGN-NEXT:    vsetivli zero, 12, e16, m2, tu, ma
-; CHECK-NO-MISALIGN-NEXT:    vslideup.vi v8, v12, 8
+; CHECK-NO-MISALIGN-NEXT:    vslideup.vi v8, v10, 8
 ; CHECK-NO-MISALIGN-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
-; CHECK-NO-MISALIGN-NEXT:    vslideup.vi v8, v14, 12
+; CHECK-NO-MISALIGN-NEXT:    vslideup.vi v8, v12, 12
 ; CHECK-NO-MISALIGN-NEXT:    vse16.v v8, (a1)
 ; CHECK-NO-MISALIGN-NEXT:    ret
 ;
@@ -188,17 +188,17 @@ define void @strided_constant_mismatch_4xv4i16(ptr %x, ptr %z) {
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
 ; CHECK-NEXT:    addi a2, a0, 2
-; CHECK-NEXT:    vle16.v v10, (a2)
+; CHECK-NEXT:    vle16.v v9, (a2)
 ; CHECK-NEXT:    addi a2, a0, 6
-; CHECK-NEXT:    vle16.v v12, (a2)
+; CHECK-NEXT:    vle16.v v10, (a2)
 ; CHECK-NEXT:    addi a0, a0, 8
-; CHECK-NEXT:    vle16.v v14, (a0)
-; CHECK-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
-; CHECK-NEXT:    vslideup.vi v8, v10, 4
+; CHECK-NEXT:    vle16.v v12, (a0)
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, tu, ma
+; CHECK-NEXT:    vslideup.vi v8, v9, 4
 ; CHECK-NEXT:    vsetivli zero, 12, e16, m2, tu, ma
-; CHECK-NEXT:    vslideup.vi v8, v12, 8
+; CHECK-NEXT:    vslideup.vi v8, v10, 8
 ; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
-; CHECK-NEXT:    vslideup.vi v8, v14, 12
+; CHECK-NEXT:    vslideup.vi v8, v12, 12
 ; CHECK-NEXT:    vse16.v v8, (a1)
 ; CHECK-NEXT:    ret
   %a = load <4 x i16>, ptr %x
@@ -258,17 +258,17 @@ define void @strided_runtime_mismatch_4xv4i16(ptr %x, ptr %z, i64 %s, i64 %t) {
 ; RV32-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; RV32-NEXT:    vle16.v v8, (a0)
 ; RV32-NEXT:    add a0, a0, a2
-; RV32-NEXT:    vle16.v v10, (a0)
+; RV32-NEXT:    vle16.v v9, (a0)
 ; RV32-NEXT:    add a0, a0, a4
-; RV32-NEXT:    vle16.v v12, (a0)
+; RV32-NEXT:    vle16.v v10, (a0)
 ; RV32-NEXT:    add a0, a0, a2
-; RV32-NEXT:    vle16.v v14, (a0)
-; RV32-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
-; RV32-NEXT:    vslideup.vi v8, v10, 4
+; RV32-NEXT:    vle16.v v12, (a0)
+; RV32-NEXT:    vsetivli zero, 8, e16, m1, tu, ma
+; RV32-NEXT:    vslideup.vi v8, v9, 4
 ; RV32-NEXT:    vsetivli zero, 12, e16, m2, tu, ma
-; RV32-NEXT:    vslideup.vi v8, v12, 8
+; RV32-NEXT:    vslideup.vi v8, v10, 8
 ; RV32-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
-; RV32-NEXT:    vslideup.vi v8, v14, 12
+; RV32-NEXT:    vslideup.vi v8, v12, 12
 ; RV32-NEXT:    vse16.v v8, (a1)
 ; RV32-NEXT:    ret
 ;
@@ -277,17 +277,17 @@ define void @strided_runtime_mismatch_4xv4i16(ptr %x, ptr %z, i64 %s, i64 %t) {
 ; RV64-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; RV64-NEXT:    vle16.v v8, (a0)
 ; RV64-NEXT:    add a0, a0, a2
-; RV64-NEXT:    vle16.v v10, (a0)
+; RV64-NEXT:    vle16.v v9, (a0)
 ; RV64-NEXT:    add a0, a0, a3
-; RV64-NEXT:    vle16.v v12, (a0)
+; RV64-NEXT:    vle16.v v10, (a0)
 ; RV64-NEXT:    add a0, a0, a2
-; RV64-NEXT:    vle16.v v14, (a0)
-; RV64-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
-; RV64-NEXT:    vslideup.vi v8, v10, 4
+; RV64-NEXT:    vle16.v v12, (a0)
+; RV64-NEXT:    vsetivli zero, 8, e16, m1, tu, ma
+; RV64-NEXT:    vslideup.vi v8, v9, 4
 ; RV64-NEXT:    vsetivli zero, 12, e16, m2, tu, ma
-; RV64-NEXT:    vslideup.vi v8, v12, 8
+; RV64-NEXT:    vslideup.vi v8, v10, 8
 ; RV64-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
-; RV64-NEXT:    vslideup.vi v8, v14, 12
+; RV64-NEXT:    vslideup.vi v8, v12, 12
 ; RV64-NEXT:    vse16.v v8, (a1)
 ; RV64-NEXT:    ret
 ;
@@ -296,17 +296,17 @@ define void @strided_runtime_mismatch_4xv4i16(ptr %x, ptr %z, i64 %s, i64 %t) {
 ; ZVE64F-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; ZVE64F-NEXT:    vle16.v v8, (a0)
 ; ZVE64F-NEXT:    add a0, a0, a2
-; ZVE64F-NEXT:    vle16.v v10, (a0)
+; ZVE64F-NEXT:    vle16.v v9, (a0)
 ; ZVE64F-NEXT:    add a0, a0, a3
-; ZVE64F-NEXT:    vle16.v v12, (a0)
+; ZVE64F-NEXT:    vle16.v v10, (a0)
 ; ZVE64F-NEXT:    add a0, a0, a2
-; ZVE64F-NEXT:    vle16.v v14, (a0)
-; ZVE64F-NEXT:    vsetivli zero, 8, e16, m2, tu, ma
-; ZVE64F-NEXT:    vslideup.vi v8, v10, 4
+; ZVE64F-NEXT:    vle16.v v12, (a0)
+; ZVE64F-NEXT:    vsetivli zero, 8, e16, m1, tu, ma
+; ZVE64F-NEXT:    vslideup.vi v8, v9, 4
 ; ZVE64F-NEXT:    vsetivli zero, 12, e16, m2, tu, ma
-; ZVE64F-NEXT:    vslideup.vi v8, v12, 8
+; ZVE64F-NEXT:    vslideup.vi v8, v10, 8
 ; ZVE64F-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
-; ZVE64F-NEXT:    vslideup.vi v8, v14, 12
+; ZVE64F-NEXT:    vslideup.vi v8, v12, 12
 ; ZVE64F-NEXT:    vse16.v v8, (a1)
 ; ZVE64F-NEXT:    ret
   %a = load <4 x i16>, ptr %x
diff --git a/llvm/test/CodeGen/RISCV/rvv/fpclamptosat_vec.ll b/llvm/test/CodeGen/RISCV/rvv/fpclamptosat_vec.ll
index 31e7e7be76c89b1..f598118c18aff9f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fpclamptosat_vec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fpclamptosat_vec.ll
@@ -460,54 +460,49 @@ define <4 x i32> @stest_f16i32(<4 x half> %x) {
 ; CHECK-V-NEXT:    sub sp, sp, a1
 ; CHECK-V-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x30, 0x22, 0x11, 0x04, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 48 + 4 * vlenb
 ; CHECK-V-NEXT:    lhu s0, 24(a0)
-; CHECK-V-NEXT:    lhu s1, 16(a0)
-; CHECK-V-NEXT:    lhu s2, 0(a0)
-; CHECK-V-NEXT:    lhu a0, 8(a0)
+; CHECK-V-NEXT:    lhu s1, 0(a0)
+; CHECK-V-NEXT:    lhu s2, 8(a0)
+; CHECK-V-NEXT:    lhu a0, 16(a0)
 ; CHECK-V-NEXT:    fmv.w.x fa0, a0
 ; CHECK-V-NEXT:    call __extendhfsf2@plt
 ; CHECK-V-NEXT:    fcvt.l.s a0, fa0, rtz
 ; CHECK-V-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; CHECK-V-NEXT:    vmv.s.x v8, a0
-; CHECK-V-NEXT:    addi a0, sp, 16
-; CHECK-V-NEXT:    vs2r.v v8, (a0) # Unknown-size Folded Spill
-; CHECK-V-NEXT:    fmv.w.x fa0, s2
-; CHECK-V-NEXT:    call __extendhfsf2@plt
-; CHECK-V-NEXT:    fcvt.l.s a0, fa0, rtz
-; CHECK-V-NEXT:    vsetivli zero, 2, e64, m2, tu, ma
-; CHECK-V-NEXT:    vmv.s.x v8, a0
-; CHECK-V-NEXT:    addi a0, sp, 16
-; CHECK-V-NEXT:    vl2r.v v10, (a0) # Unknown-size Folded Reload
-; CHECK-V-NEXT:    vslideup.vi v8, v10, 1
 ; CHECK-V-NEXT:    csrr a0, vlenb
 ; CHECK-V-NEXT:    slli a0, a0, 1
 ; CHECK-V-NEXT:    add a0, sp, a0
 ; CHECK-V-NEXT:    addi a0, a0, 16
 ; CHECK-V-NEXT:    vs2r.v v8, (a0) # Unknown-size Folded Spill
+; CHECK-V-NEXT:    fmv.w.x fa0, s2
+; CHECK-V-NEXT:    call __extendhfsf2@plt
+; CHECK-V-NEXT:    fcvt.l.s a0, fa0, rtz
+; CHECK-V-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; CHECK-V-NEXT:    fmv.w.x fa0, s1
+; CHECK-V-NEXT:    vmv.s.x v8, a0
+; CHECK-V-NEXT:    addi a0, sp, 16
+; CHECK-V-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
 ; CHECK-V-NEXT:    call __extendhfsf2@plt
 ; CHECK-V-NEXT:    fcvt.l.s a0, fa0, rtz
+; CHECK-V-NEXT:    vsetivli zero, 2, e64, m1, tu, ma
+; CHECK-V-NEXT:    vmv.s.x v10, a0
+; CHECK-V-NEXT:    addi a0, sp, 16
+; CHECK-V-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
+; CHECK-V-NEXT:    vslideup.vi v10, v8, 1
 ; CHECK-V-NEXT:    vsetivli zero, 3, e64, m2, tu, ma
-; CHECK-V-NEXT:    vmv.s.x v8, a0
 ; CHECK-V-NEXT:    csrr a0, vlenb
 ; CHECK-V-NEXT:    slli a0, a0, 1
 ; CHECK-V-NEXT:    add a0, sp, a0
 ; CHECK-V-NEXT:    addi a0, a0, 16
-; CHECK-V-NEXT:    vl2r.v v10, (a0) # Unknown-size Folded Reload
+; CHECK-V-NEXT:    vl2r.v v8, (a0) # Unknown-size Folded Reload
 ; CHECK-V-NEXT:    vslideup.vi v10, v8, 2
-; CHECK-V-NEXT:    csrr a0, vlenb
-; CHECK-V-NEXT:    slli a0, a0, 1
-; CHECK-V-NEXT:    add a0, sp, a0
-; CHECK-V-NEXT:    addi a0, a0, 16
+; CHECK-V-NEXT:    addi a0, sp, 16
 ; CHECK-V-NEXT:    vs2r.v v10, (a0) # Unknown-size Folded Spill
 ; CHECK-V-NEXT:    fmv.w.x fa0, s0
 ; CHECK-V-NEXT:    call __extendhfsf2@plt
 ; CHECK-V-NEXT:    fcvt.l.s a0, fa0, rtz
 ; CHECK-V-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; CHECK-V-NEXT:    vmv.s.x v8, a0
-; CHECK-V-NEXT:    csrr a0, vlenb
-; CHECK-V-NEXT:    slli a0, a0, 1
-; CHECK-V-NEXT:    add a0, sp, a0
-; CHECK-V-NEXT:    addi a0, a0, 16
+; CHECK-V-NEXT:    addi a0, sp, 16
 ; CHECK-V-NEXT:    vl2r.v v10, (a0) # Unknown-size Folded Reload
 ; CHECK-V-NEXT:    vslideup.vi v10, v8, 3
 ; CHECK-V-NEXT:    lui a0, 524288
@@ -632,54 +627,49 @@ define <4 x i32> @utesth_f16i32(<4 x half> %x) {
 ; CHECK-V-NEXT:    sub sp, sp, a1
 ; CHECK-V-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x30, 0x22, 0x11, 0x04, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 48 + 4 * vlenb
 ; CHECK-V-NEXT:    lhu s0, 24(a0)
-; CHECK-V-NEXT:    lhu s1, 16(a0)
-; CHECK-V-NEXT:    lhu s2, 0(a0)
-; CHECK-V-NEXT:    lhu a0, 8(a0)
+; CHECK-V-NEXT:    lhu s1, 0(a0)
+; CHECK-V-NEXT:    lhu s2, 8(a0)
+; CHECK-V-NEXT:    lhu a0, 16(a0)
 ; CHECK-V-NEXT:    fmv.w.x fa0, a0
 ; CHECK-V-NEXT:    call __extendhfsf2@plt
 ; CHECK-V-NEXT:    fcvt.lu.s a0, fa0, rtz
 ; CHECK-V-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; CHECK-V-NEXT:    vmv.s....

@jayfoad
Copy link
Contributor

jayfoad commented Sep 13, 2023

This is somewhat dependent on #66211 , since it introduces a subregister copy that triggers a crash with -early-live-intervals in one of the tests.

I think #65916 and #66211 are two different fixes for the same problem. Do you #65916 also fix the crash? Or do you really need #66211?

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 13, 2023

This is somewhat dependent on #66211 , since it introduces a subregister copy that triggers a crash with -early-live-intervals in one of the tests.

I think #65916 and #66211 are two different fixes for the same problem. Do you #65916 also fix the crash? Or do you really need #66211?

Both fix the crash, but I closed #65916 in favour of #66211 since my understanding of LiveIntervals is limited

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 18, 2023
Similiar to llvm#66267, we can perform a vslideup_vl on a smaller type if we know
the highest lane that will be written to, which can be determined from VL.

This is an alternative to llvm#65997 and llvm#66087
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@preames
Copy link
Collaborator

preames commented Sep 20, 2023

Warning: Given #66211 has landed, I am only reviewing the second change in this stack. Make sure you rebase!

Similar to llvm#65598, if we're using a vslideup to insert a fixed length vector
into another vector, then we can work out the minimum number of registers it
will need to slide up across given the minimum VLEN, and shrink the type
operated on to reduce LMUL accordingly.

This is somewhat dependent on llvm#65916, since it introduces a subregister copy
that triggers a crash with -early-live-intervals in one of the tests.
@lukel97 lukel97 force-pushed the insert-subvector-shrink branch from 5621ded to 0645568 Compare September 21, 2023 11:56
@lukel97 lukel97 merged commit b5ff71e into llvm:main Sep 21, 2023
@topperc
Copy link
Collaborator

topperc commented Sep 27, 2023

I think we're starting to see failures from this change. This fixes the failures I've seen, but I think there may be some ordering issues with when we decide to shrink. If we enter this particular if it doesn't seem like we should have done any shrinking in the first place.

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index c4942f9c637b..ee9957795f97 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -8641,6 +8641,9 @@ SDValue RISCVTargetLowering::lowerINSERT_SUBVECTOR(SDValue Op,
                          DAG.getUNDEF(ContainerVT), SubVec,
                          DAG.getConstant(0, DL, XLenVT));
     if (OrigIdx == 0 && Vec.isUndef() && VecVT.isFixedLengthVector()) {
+      if (ContainerVT != OrigContainerVT)
+        SubVec = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, OrigContainerVT, OrigVec,
+                             SubVec, DAG.getVectorIdxConstant(0, DL));
       SubVec = convertFromScalableVector(VecVT, SubVec, DAG, Subtarget);
       return DAG.getBitcast(Op.getValueType(), SubVec);
     }

Here's one failure that I think is related to this iree-org/iree#15038 I also have an internal build error , but I don't have a reduce reproducer yet.

Here's a fragment of the debug output from my internal reproducer.

Legalizing: t218: v2f32 = extract_subvector t217, Constant:i64<0>                
Trying custom legalization                                                       
                                                                                 
Legalizing: t217: nxv1f32,ch = llvm.riscv.vle<(dereferenceable load (s64) from %ir.this, align 4, !tbaa !13)> t0, TargetConstant:i64<10003>, undef:nxv1f32, t4, Constant:i64<2>
Trying custom legalization                                                       
Could not custom legalize node                                                   
Trying to expand node                                                            
                                                                                 
Legalizing: t216: v8f32 = insert_subvector undef:v8f32, t218, Constant:i64<0>    
Trying custom legalization                                                       
Creating new node: t220: nxv2f32 = undef                                         
Creating new node: t221: nxv2f32 = insert_subvector undef:nxv2f32, t218, Constant:i64<0>
Creating new node: t222: v8f32 = extract_subvector t221, Constant:i64<0>         
Successfully custom legalized node                                               
 ... replacing: t216: v8f32 = insert_subvector undef:v8f32, t218, Constant:i64<0>
     with:      t222: v8f32 = extract_subvector t221, Constant:i64<0>

MinVLen is 128. The issue is with the t222 extract_subvector. t221 is an LMUL=1 with 128 bits. v8f32 is 256 bits, so we can't extract that from an LMUL=1 type.

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 27, 2023

@topperc whoops, thanks for catching that. Got a reproducer at #67535

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 27, 2023
Continuing on from llvm#65997, if the index of insert_vector_elt is a constant then
we can work out what the minimum number of registers will be needed for the
slideup and choose a smaller type to operate on. This reduces the LMUL for not
just the slideup but also for the scalar insert.
lukel97 added a commit that referenced this pull request Sep 27, 2023
…lt (#66087)

Continuing on from #65997, if the index of insert_vector_elt is a
constant then we can work out what the minimum number of registers will
be needed for the slideup and choose a smaller type to operate on.

This reduces the LMUL for not just the slideup but also for the scalar
insert.
legrosbuffle pushed a commit to legrosbuffle/llvm-project that referenced this pull request Sep 29, 2023
…lt (llvm#66087)

Continuing on from llvm#65997, if the index of insert_vector_elt is a
constant then we can work out what the minimum number of registers will
be needed for the slideup and choose a smaller type to operate on.

This reduces the LMUL for not just the slideup but also for the scalar
insert.
preames added a commit that referenced this pull request Oct 10, 2023
…vector (#65997)"

This reverts commit b5ff71e.  As described in
#68730, this appears to have exposed
an existing liveness issue.  Revert to green until we can figure out how to
address the root cause.

Note: This was not a clean revert.  I ended up doing it by hand.
preames added a commit to preames/llvm-project that referenced this pull request Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants