[RISCV] Clear vill for whole vector register moves in vsetvli insertion #118283

lukel97 · 2024-12-02T11:11:21Z

This is an alternative to #117866 that works by demanding a valid vtype instead of using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else.

This is one part of a fix for #114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

graphite-app · 2024-12-02T11:11:34Z

Your org has enabled the Graphite merge queue for merging into main

Add the label “FP Bundles” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

llvmbot · 2024-12-02T11:11:58Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This is an alternative to #117866 that works by demanding a valid vtype instead using a separate pass.

The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block.

To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible.

Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else.

This is one part of a fix for #114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

Patch is 1.29 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118283.diff

175 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+58-3)
(modified) llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/abs-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/calling-conv.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll (+19-9)
(modified) llvm/test/CodeGen/RISCV/rvv/compressstore.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/constant-folding-crash.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+488-63)
(modified) llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitreverse-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv-fastcc.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll (+10-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-floor-vp.ll (+10-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fmaximum-vp.ll (+17-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fminimum-vp.ll (+17-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptrunc-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fshr-fshl-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+991-1066)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-load-int.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-nearbyint-vp.ll (+8-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-mask-vp.ll (+30)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-rint-vp.ll (+8-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-round-vp.ll (+10-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundeven-vp.ll (+10-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundtozero-vp.ll (+10-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-int-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-concat.ll (+9-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll (+11-11)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-vslide1up.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+28-40)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vadd-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmax-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmaxu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmin-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vminu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpload.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpmerge.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vsadd-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vsaddu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vssub-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vssubu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/floor-vp.ll (+19-9)
(modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-sdnode.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-vp.ll (+25-11)
(modified) llvm/test/CodeGen/RISCV/rvv/fminimum-sdnode.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fminimum-vp.ll (+25-11)
(modified) llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fshr-fshl-vp.ll (+5-2)
(modified) llvm/test/CodeGen/RISCV/rvv/inline-asm.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll (+22)
(modified) llvm/test/CodeGen/RISCV/rvv/llrint-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/lrint-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/masked-tama.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/mgather-sdnode.ll (+16-2)
(modified) llvm/test/CodeGen/RISCV/rvv/mscatter-sdnode.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/named-vector-shuffle-reverse.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/rvv/nearbyint-vp.ll (+19-9)
(modified) llvm/test/CodeGen/RISCV/rvv/pr88576.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/rint-vp.ll (+19-9)
(modified) llvm/test/CodeGen/RISCV/rvv/round-vp.ll (+19-9)
(modified) llvm/test/CodeGen/RISCV/rvv/roundeven-vp.ll (+19-9)
(modified) llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll (+19-9)
(modified) llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector-csr.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/rv32-spill-zvlsseg.ll (+5)
(modified) llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/rv64-spill-zvlsseg.ll (+5)
(modified) llvm/test/CodeGen/RISCV/rvv/rvv-args-by-mem.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll (+3-1)
(modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/setcc-int-vp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpstore.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/undef-earlyclobber-chain.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vcpop.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+15-15)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-reassociations.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-splice.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vfabs-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfadd-vp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-vp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfirst.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+14-11)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-sdnode.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmax-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmin-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmul-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmuladd-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfneg-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfpext-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptosi-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptoui-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptrunc-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfsqrt-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vfsub-vp.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.ll (+2-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv32-dead.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv32.ll (+165)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv64-dead.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv64.ll (+165)
(modified) llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfeq.ll (+24)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfge.ll (+24)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfgt.ll (+24)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfle.ll (+24)
(modified) llvm/test/CodeGen/RISCV/rvv/vmflt.ll (+24)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfne.ll (+24)
(modified) llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsbf.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vmseq.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsge.ll (+55)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsgeu.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsgt.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsgtu.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsif.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsle.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsleu.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmslt.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsltu.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsne.ll (+54)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsof.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-cttz-elts.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-select.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-splice-mask-fixed-vectors.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-splice-mask-vectors.ll (+21)
(modified) llvm/test/CodeGen/RISCV/rvv/vpgather-sdnode.ll (+10-10)
(modified) llvm/test/CodeGen/RISCV/rvv/vpload.ll (+2-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vpmerge-sdnode.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vpstore.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-mask-vp.ll (+37)
(modified) llvm/test/CodeGen/RISCV/rvv/vrgatherei16-subreg-liveness.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vsadd-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vsaddu-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-bf16.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-int.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-vp.ll (+8-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-O0.ll (+6-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vsext-vp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/vsitofp-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vssub-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vssubu-vp.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/rvv/vtrunc-vp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vuitofp-vp.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/vzext-vp.ll (+1-1)

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 052b4a61298223..20b4a40ecaa8f3 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -195,6 +195,27 @@ static bool hasUndefinedPassthru(const MachineInstr &MI) {
   return UseMO.getReg() == RISCV::NoRegister || UseMO.isUndef();
 }
 
+/// Return true if \p MI is a copy that will be lowered to one or more vmvNr.vs.
+static bool isVecCopy(const MachineInstr &MI) {
+  static const TargetRegisterClass *RVVRegClasses[] = {
+      &RISCV::VRRegClass,     &RISCV::VRM2RegClass,   &RISCV::VRM4RegClass,
+      &RISCV::VRM8RegClass,   &RISCV::VRN2M1RegClass, &RISCV::VRN2M2RegClass,
+      &RISCV::VRN2M4RegClass, &RISCV::VRN3M1RegClass, &RISCV::VRN3M2RegClass,
+      &RISCV::VRN4M1RegClass, &RISCV::VRN4M2RegClass, &RISCV::VRN5M1RegClass,
+      &RISCV::VRN6M1RegClass, &RISCV::VRN7M1RegClass, &RISCV::VRN8M1RegClass};
+  if (!MI.isCopy())
+    return false;
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register SrcReg = MI.getOperand(1).getReg();
+  for (const auto &RegClass : RVVRegClasses) {
+    if (RegClass->contains(DstReg, SrcReg)) {
+      return true;
+    }
+  }
+  return false;
+}
+
 /// Which subfields of VL or VTYPE have values we need to preserve?
 struct DemandedFields {
   // Some unknown property of VL is used.  If demanded, must preserve entire
@@ -221,10 +242,13 @@ struct DemandedFields {
   bool SEWLMULRatio = false;
   bool TailPolicy = false;
   bool MaskPolicy = false;
+  // If this is true, we demand that VTYPE is set to some legal state, i.e. that
+  // vill is unset.
+  bool VILL = false;
 
   // Return true if any part of VTYPE was used
   bool usedVTYPE() const {
-    return SEW || LMUL || SEWLMULRatio || TailPolicy || MaskPolicy;
+    return SEW || LMUL || SEWLMULRatio || TailPolicy || MaskPolicy || VILL;
   }
 
   // Return true if any property of VL was used
@@ -239,6 +263,7 @@ struct DemandedFields {
     SEWLMULRatio = true;
     TailPolicy = true;
     MaskPolicy = true;
+    VILL = true;
   }
 
   // Mark all VL properties as demanded
@@ -263,6 +288,7 @@ struct DemandedFields {
     SEWLMULRatio |= B.SEWLMULRatio;
     TailPolicy |= B.TailPolicy;
     MaskPolicy |= B.MaskPolicy;
+    VILL |= B.VILL;
   }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
@@ -308,7 +334,8 @@ struct DemandedFields {
     OS << ", ";
     OS << "SEWLMULRatio=" << SEWLMULRatio << ", ";
     OS << "TailPolicy=" << TailPolicy << ", ";
-    OS << "MaskPolicy=" << MaskPolicy;
+    OS << "MaskPolicy=" << MaskPolicy << ", ";
+    OS << "VILL=" << VILL;
     OS << "}";
   }
 #endif
@@ -503,6 +530,16 @@ DemandedFields getDemanded(const MachineInstr &MI, const RISCVSubtarget *ST) {
     }
   }
 
+  // In §32.16.6, whole vector register moves have a dependency on SEW. At the
+  // MIR level though we don't encode the element type, and it gives the same
+  // result whatever the SEW may be.
+  //
+  // However it does need valid SEW, i.e. vill must be cleared. The entry to a
+  // function, calls and inline assembly may all set it, so make sure we clear
+  // it for whole register copies.
+  if (isVecCopy(MI))
+    Res.VILL = true;
+
   return Res;
 }
 
@@ -1208,6 +1245,17 @@ static VSETVLIInfo adjustIncoming(VSETVLIInfo PrevInfo, VSETVLIInfo NewInfo,
 // legal for MI, but may not be the state requested by MI.
 void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo &Info,
                                         const MachineInstr &MI) const {
+  if (isVecCopy(MI) &&
+      (Info.isUnknown() || !Info.isValid() || Info.hasSEWLMULRatioOnly())) {
+    // Use an arbitrary but valid AVL and VTYPE so vill will be cleared. It may
+    // be coalesced into another vsetvli since we won't demand any fields.
+    VSETVLIInfo NewInfo; // Need a new VSETVLIInfo to clear SEWLMULRatioOnly
+    NewInfo.setAVLImm(0);
+    NewInfo.setVTYPE(RISCVII::VLMUL::LMUL_1, 8, true, true);
+    Info = NewInfo;
+    return;
+  }
+
   if (!RISCVII::hasSEWOp(MI.getDesc().TSFlags))
     return;
 
@@ -1296,7 +1344,8 @@ bool RISCVInsertVSETVLI::computeVLVTYPEChanges(const MachineBasicBlock &MBB,
   for (const MachineInstr &MI : MBB) {
     transferBefore(Info, MI);
 
-    if (isVectorConfigInstr(MI) || RISCVII::hasSEWOp(MI.getDesc().TSFlags))
+    if (isVectorConfigInstr(MI) || RISCVII::hasSEWOp(MI.getDesc().TSFlags) ||
+        isVecCopy(MI))
       HadVectorOp = true;
 
     transferAfter(Info, MI);
@@ -1426,6 +1475,12 @@ void RISCVInsertVSETVLI::emitVSETVLIs(MachineBasicBlock &MBB) {
       PrefixTransparent = false;
     }
 
+    if (isVecCopy(MI) &&
+        !PrevInfo.isCompatible(DemandedFields::all(), CurInfo, LIS)) {
+      insertVSETVLI(MBB, MI, MI.getDebugLoc(), CurInfo, PrevInfo);
+      PrefixTransparent = false;
+    }
+
     uint64_t TSFlags = MI.getDesc().TSFlags;
     if (RISCVII::hasSEWOp(TSFlags)) {
       if (!PrevInfo.isCompatible(DemandedFields::all(), CurInfo, LIS)) {
diff --git a/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll b/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll
index c04e4fea7b2c29..45bc1222c0677c 100644
--- a/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll
+++ b/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll
@@ -45,6 +45,7 @@ define <vscale x 1 x i8> @constraint_vd(<vscale x 1 x i8> %0, <vscale x 1 x i8>
 define <vscale x 1 x i1> @constraint_vm(<vscale x 1 x i1> %0, <vscale x 1 x i1> %1) nounwind {
 ; RV32I-LABEL: constraint_vm:
 ; RV32I:       # %bb.0:
+; RV32I-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV32I-NEXT:    vmv1r.v v9, v0
 ; RV32I-NEXT:    vmv1r.v v0, v8
 ; RV32I-NEXT:    #APP
@@ -54,6 +55,7 @@ define <vscale x 1 x i1> @constraint_vm(<vscale x 1 x i1> %0, <vscale x 1 x i1>
 ;
 ; RV64I-LABEL: constraint_vm:
 ; RV64I:       # %bb.0:
+; RV64I-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV64I-NEXT:    vmv1r.v v9, v0
 ; RV64I-NEXT:    vmv1r.v v0, v8
 ; RV64I-NEXT:    #APP
diff --git a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
index 163d9145bc3623..ee0016ec080e24 100644
--- a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
@@ -567,6 +567,7 @@ define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 1
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; CHECK-NEXT:    vsetvli a1, zero, e8, mf4, ta, ma
 ; CHECK-NEXT:    vmv1r.v v24, v0
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a1, a1, 3
@@ -576,7 +577,6 @@ define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 1
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    srli a2, a1, 3
 ; CHECK-NEXT:    sub a3, a0, a1
-; CHECK-NEXT:    vsetvli a4, zero, e8, mf4, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v0, v0, a2
 ; CHECK-NEXT:    sltu a2, a0, a3
 ; CHECK-NEXT:    addi a2, a2, -1
diff --git a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
index 66a1178cddb66c..10d24927d9b783 100644
--- a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
@@ -3075,6 +3075,7 @@ define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vsc
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; CHECK-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv1r.v v24, v0
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a1, a1, 3
@@ -3086,7 +3087,6 @@ define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vsc
 ; CHECK-NEXT:    lui a2, 3
 ; CHECK-NEXT:    srli a4, a3, 1
 ; CHECK-NEXT:    slli a3, a3, 2
-; CHECK-NEXT:    vsetvli a5, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v0, v0, a4
 ; CHECK-NEXT:    sub a4, a0, a3
 ; CHECK-NEXT:    sltu a5, a0, a4
@@ -3158,11 +3158,11 @@ define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vsc
 ;
 ; CHECK-ZVBB-LABEL: vp_bitreverse_nxv64i16:
 ; CHECK-ZVBB:       # %bb.0:
+; CHECK-ZVBB-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
 ; CHECK-ZVBB-NEXT:    vmv1r.v v24, v0
 ; CHECK-ZVBB-NEXT:    csrr a1, vlenb
 ; CHECK-ZVBB-NEXT:    srli a2, a1, 1
 ; CHECK-ZVBB-NEXT:    slli a1, a1, 2
-; CHECK-ZVBB-NEXT:    vsetvli a3, zero, e8, m1, ta, ma
 ; CHECK-ZVBB-NEXT:    vslidedown.vx v0, v0, a2
 ; CHECK-ZVBB-NEXT:    sub a2, a0, a1
 ; CHECK-ZVBB-NEXT:    sltu a3, a0, a2
diff --git a/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll b/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll
index 1c95ec8fafd4f1..0dc1d0c32ac449 100644
--- a/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll
@@ -1584,6 +1584,7 @@ define <vscale x 64 x i16> @vp_bswap_nxv64i16(<vscale x 64 x i16> %va, <vscale x
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; CHECK-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv1r.v v24, v0
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a1, a1, 3
@@ -1593,7 +1594,6 @@ define <vscale x 64 x i16> @vp_bswap_nxv64i16(<vscale x 64 x i16> %va, <vscale x
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    srli a2, a1, 1
 ; CHECK-NEXT:    slli a1, a1, 2
-; CHECK-NEXT:    vsetvli a3, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vslidedown.vx v0, v0, a2
 ; CHECK-NEXT:    sub a2, a0, a1
 ; CHECK-NEXT:    sltu a3, a0, a2
@@ -1631,11 +1631,11 @@ define <vscale x 64 x i16> @vp_bswap_nxv64i16(<vscale x 64 x i16> %va, <vscale x
 ;
 ; CHECK-ZVKB-LABEL: vp_bswap_nxv64i16:
 ; CHECK-ZVKB:       # %bb.0:
+; CHECK-ZVKB-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
 ; CHECK-ZVKB-NEXT:    vmv1r.v v24, v0
 ; CHECK-ZVKB-NEXT:    csrr a1, vlenb
 ; CHECK-ZVKB-NEXT:    srli a2, a1, 1
 ; CHECK-ZVKB-NEXT:    slli a1, a1, 2
-; CHECK-ZVKB-NEXT:    vsetvli a3, zero, e8, m1, ta, ma
 ; CHECK-ZVKB-NEXT:    vslidedown.vx v0, v0, a2
 ; CHECK-ZVKB-NEXT:    sub a2, a0, a1
 ; CHECK-ZVKB-NEXT:    sltu a3, a0, a2
diff --git a/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll b/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll
index a4e5ab661c5285..d824426727fd9a 100644
--- a/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll
@@ -336,6 +336,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_i32(<vsca
 ; RV32-NEXT:    add a1, a3, a1
 ; RV32-NEXT:    li a3, 2
 ; RV32-NEXT:    vs8r.v v16, (a1)
+; RV32-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV32-NEXT:    vmv8r.v v8, v0
 ; RV32-NEXT:    vmv8r.v v16, v24
 ; RV32-NEXT:    call ext2
@@ -374,6 +375,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_i32(<vsca
 ; RV64-NEXT:    add a1, a3, a1
 ; RV64-NEXT:    li a3, 2
 ; RV64-NEXT:    vs8r.v v16, (a1)
+; RV64-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV64-NEXT:    vmv8r.v v8, v0
 ; RV64-NEXT:    vmv8r.v v16, v24
 ; RV64-NEXT:    call ext2
@@ -451,6 +453,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 128
 ; RV32-NEXT:    vl8r.v v8, (a1) # Unknown-size Folded Reload
+; RV32-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV32-NEXT:    vmv8r.v v16, v0
 ; RV32-NEXT:    call ext3
 ; RV32-NEXT:    addi sp, s0, -144
@@ -523,6 +526,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_
 ; RV64-NEXT:    add a1, sp, a1
 ; RV64-NEXT:    addi a1, a1, 128
 ; RV64-NEXT:    vl8r.v v8, (a1) # Unknown-size Folded Reload
+; RV64-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV64-NEXT:    vmv8r.v v16, v0
 ; RV64-NEXT:    call ext3
 ; RV64-NEXT:    addi sp, s0, -144
diff --git a/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll b/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll
index 9b27116fef7cae..8d56a76dc8eb80 100644
--- a/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll
@@ -103,6 +103,7 @@ define target("riscv.vector.tuple", <vscale x 16 x i8>, 2) @caller_tuple_return(
 ; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32-NEXT:    .cfi_offset ra, -4
 ; RV32-NEXT:    call callee_tuple_return
+; RV32-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV32-NEXT:    vmv2r.v v6, v8
 ; RV32-NEXT:    vmv2r.v v8, v10
 ; RV32-NEXT:    vmv2r.v v10, v6
@@ -119,6 +120,7 @@ define target("riscv.vector.tuple", <vscale x 16 x i8>, 2) @caller_tuple_return(
 ; RV64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64-NEXT:    .cfi_offset ra, -8
 ; RV64-NEXT:    call callee_tuple_return
+; RV64-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV64-NEXT:    vmv2r.v v6, v8
 ; RV64-NEXT:    vmv2r.v v8, v10
 ; RV64-NEXT:    vmv2r.v v10, v6
@@ -144,6 +146,7 @@ define void @caller_tuple_argument(target("riscv.vector.tuple", <vscale x 16 x i
 ; RV32-NEXT:    .cfi_def_cfa_offset 16
 ; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV32-NEXT:    vmv2r.v v6, v8
 ; RV32-NEXT:    vmv2r.v v8, v10
 ; RV32-NEXT:    vmv2r.v v10, v6
@@ -160,6 +163,7 @@ define void @caller_tuple_argument(target("riscv.vector.tuple", <vscale x 16 x i
 ; RV64-NEXT:    .cfi_def_cfa_offset 16
 ; RV64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64-NEXT:    .cfi_offset ra, -8
+; RV64-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; RV64-NEXT:    vmv2r.v v6, v8
 ; RV64-NEXT:    vmv2r.v v8, v10
 ; RV64-NEXT:    vmv2r.v v10, v6
diff --git a/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
index 7d0b0118a72725..5bad25aab7aa29 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
@@ -117,8 +117,8 @@ declare <vscale x 4 x bfloat> @llvm.vp.ceil.nxv4bf16(<vscale x 4 x bfloat>, <vsc
 define <vscale x 4 x bfloat> @vp_ceil_vv_nxv4bf16(<vscale x 4 x bfloat> %va, <vscale x 4 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv4bf16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmv1r.v v9, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vmv1r.v v9, v0
 ; CHECK-NEXT:    vfwcvtbf16.f.f.v v10, v8
 ; CHECK-NEXT:    lui a1, 307200
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
@@ -169,8 +169,8 @@ declare <vscale x 8 x bfloat> @llvm.vp.ceil.nxv8bf16(<vscale x 8 x bfloat>, <vsc
 define <vscale x 8 x bfloat> @vp_ceil_vv_nxv8bf16(<vscale x 8 x bfloat> %va, <vscale x 8 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv8bf16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmv1r.v v10, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vmv1r.v v10, v0
 ; CHECK-NEXT:    vfwcvtbf16.f.f.v v12, v8
 ; CHECK-NEXT:    lui a1, 307200
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
@@ -221,8 +221,8 @@ declare <vscale x 16 x bfloat> @llvm.vp.ceil.nxv16bf16(<vscale x 16 x bfloat>, <
 define <vscale x 16 x bfloat> @vp_ceil_vv_nxv16bf16(<vscale x 16 x bfloat> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv16bf16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmv1r.v v12, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vmv1r.v v12, v0
 ; CHECK-NEXT:    vfwcvtbf16.f.f.v v16, v8
 ; CHECK-NEXT:    lui a1, 307200
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
@@ -279,9 +279,9 @@ define <vscale x 32 x bfloat> @vp_ceil_vv_nxv32bf16(<vscale x 32 x bfloat> %va,
 ; CHECK-NEXT:    slli a1, a1, 3
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; CHECK-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vmv1r.v v7, v0
 ; CHECK-NEXT:    csrr a2, vlenb
-; CHECK-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfwcvtbf16.f.f.v v24, v12
 ; CHECK-NEXT:    lui a3, 307200
 ; CHECK-NEXT:    slli a1, a2, 1
@@ -582,8 +582,8 @@ define <vscale x 4 x half> @vp_ceil_vv_nxv4f16(<vscale x 4 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vp_ceil_vv_nxv4f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    vmv1r.v v9, v0
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
+; ZVFHMIN-NEXT:    vmv1r.v v9, v0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
 ; ZVFHMIN-NEXT:    lui a1, 307200
 ; ZVFHMIN-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
@@ -649,6 +649,7 @@ declare <vscale x 8 x half> @llvm.vp.ceil.nxv8f16(<vscale x 8 x half>, <vscale x
 define <vscale x 8 x half> @vp_ceil_vv_nxv8f16(<vscale x 8 x half> %va, <vscale x 8 x i1> %m, i32 zeroext %evl) {
 ; ZVFH-LABEL: vp_ceil_vv_nxv8f16:
 ; ZVFH:       # %bb.0:
+; ZVFH-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; ZVFH-NEXT:    vmv1r.v v10, v0
 ; ZVFH-NEXT:    lui a1, %hi(.LCPI18_0)
 ; ZVFH-NEXT:    flh fa5, %lo(.LCPI18_0)(a1)
@@ -668,8 +669,8 @@ define <vscale x 8 x half> @vp_ceil_vv_nxv8f16(<vscale x 8 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vp_ceil_vv_nxv8f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    vmv1r.v v10, v0
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
+; ZVFHMIN-NEXT:    vmv1r.v v10, v0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
 ; ZVFHMIN-NEXT:    lui a1, 307200
 ; ZVFHMIN-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
@@ -735,6 +736,7 @@ declare <vscale x 16 x half> @llvm.vp.ceil.nxv16f16(<vscale x 16 x half>, <vscal
 define <vscale x 16 x half> @vp_ceil_vv_nxv16f16(<vscale x 16 x half> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
 ; ZVFH-LABEL: vp_ceil_vv_nxv16f16:
 ; ZVFH:       # %bb.0:
+; ZVFH-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; ZVFH-NEXT:    vmv1r.v v12, v0
 ; ZVFH-NEXT:    lui a1, %hi(.LCPI20_0)
 ; ZVFH-NEXT:    flh fa5, %lo(.LCPI20_0)(a1)
@@ -754,8 +756,8 @@ define <vscale x 16 x half> @vp_ceil_vv_nxv16f16(<vscale x 16 x half> %va, <vsca
 ;
 ; ZVFHMIN-LABEL: vp_ceil_vv_nxv16f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    vmv1r.v v12, v0
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
+; ZVFHMIN-NEXT:    vmv1r.v v12, v0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v8
 ; ZVFHMIN-NEXT:    lui a1, 307200
 ; ZVFHMIN-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
@@ -821,6 +823,7 @@ declare <vscale x 32 x half> @llvm.vp.ceil.nxv32f16(<vscale x 32 x half>, <vscal
 define <vscale x 32 x half> @vp_ceil_vv_nxv32f16(<vscale x 32 x half> %va, <vscale x 32 x i1> %m, i32 zeroext %evl) {
 ; ZVFH-LABEL: vp_ceil_vv_nxv32f16:
 ; ZVFH:       # %bb.0:
+; ZVFH-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; ZVFH-NEXT:    vmv1r.v v16, v0
 ; ZVFH-NEXT:    lui a1, %hi(.LCPI22_0)
 ; ZVFH-NEXT:    flh fa5, %lo(.LCPI22_0)(a1)
@@ -846,9 +849,9 @@ define <vscale x 32 x half> @vp_ceil_vv_nxv32f16(<vscale x 32 x half> %va, <vsca
 ; ZVFHMIN-NEXT:    slli a1, a1, 3
 ; ZVFHMIN-NEXT:    sub sp, sp, a1
 ; ZVFHMIN-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vmv1r.v v7, v0
 ; ZVFHMIN-NEXT:    csrr a2, vlenb
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v12
 ; ZVFHMIN-NEXT:    lui a3, 307200
 ; ZVFHMIN-NEXT:    slli a1, a2, 1
@@ -1068,6 +1071,7 @@ declare <vscale x 4 x float> @llvm.vp.ceil.nxv4f32(<vscale x 4 x float>, <vscale
 define <vscale x 4 x float> @vp_ceil_vv_nxv4f32(<vscale x 4 x float> %va, <vscale x 4 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv4f32:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv1r.v v10, v0
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
 ; CHECK-NEXT:    vfabs.v v12, v8, v0.t
@@ -1112,6 +1116,7 @@ declare <vscale x 8 x float> @llvm.vp.ceil.nxv8f32(<vscale x 8 x float>, <vscale
 define <vscale x 8 x float> @vp_ceil_vv_nxv8f32(<vscale x 8 x float> %va, <vscale x 8 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv8f32:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv1r.v v12, v0
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
 ; CHECK-NEXT:    vfabs.v v16, v8, v0.t
@@ -1156,6 +1161,7 @@ declare <vscale x 16 x float> @llvm.vp.ceil.nxv16f32(<vscale x 16 x float>, <vsc
 define <vscale x 16 x float> @vp_ceil_vv_nxv16f32(<vscale x 16 x float> %va, <vscale x 16 x i1...
[truncated]

In order to coalesce a vsetvli with a register AVL into a previous vsetvli, we need to make sure that the AVL register is reachable at the previous vsetvli. Back in pre-RA vsetvli insertion we just checked to see if the two virtual registers were the same virtual register, and then this was hacked around in the move. We can instead use live intervals to check that the reaching definition is the same at both instructions. On its own this doesn't have much of an impact, but helps a lot in llvm#118283 and enables coalescing in about 60 of the test cases from that PR.

lukel97 · 2024-12-02T11:41:11Z

llvm/test/CodeGen/RISCV/rvv/vreductions-mask-vp.ll

+; CHECK-NEXT:    vsetivli zero, 0, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv1r.v v9, v0
 ; CHECK-NEXT:    vmv1r.v v0, v8
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma


This is a case where we can't coalesce because the AVL is a register, which #118285 would fix

wangpc-pp

I think this way is neater.

wangpc-pp · 2024-12-02T13:22:42Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+  Register DstReg = MI.getOperand(0).getReg();
+  Register SrcReg = MI.getOperand(1).getReg();
+  for (const auto &RegClass : RVVRegClasses) {
+    if (RegClass->contains(DstReg, SrcReg)) {


See my previous comment? #117866 (comment)

Oh whoops, added now. That previous code looks like it was copied from RISCVInstrInfo::copyPhysReg, I wonder if we could reuse it later there.

I create #118435 for that.

lukel97 · 2024-12-02T16:05:21Z

A heads up, I'm currently doing a run of SPEC with this. Will report back when it's done.

lukel97 · 2024-12-02T16:08:55Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+    // Use an arbitrary but valid AVL and VTYPE so vill will be cleared. It may
+    // be coalesced into another vsetvli since we won't demand any fields.
+    VSETVLIInfo NewInfo; // Need a new VSETVLIInfo to clear SEWLMULRatioOnly
+    NewInfo.setAVLImm(0);


We might also want to use 1 as the AVL. I have a feeling some microarchitectures might special case vl=0, can anyone confirm?

Setting it to 1 actually matches what we do for vmv.x.s/undef VLs, and also allows us to coalesce it into vmv.s.x so I've gone ahead and switched it:

@@ -16750,10 +15950,8 @@ define <512 x i8> @test_expandload_v512i8_vlen512(ptr %base, <512 x i1> %mask, < ; CHECK-RV64-NEXT: j .LBB61_233 ; CHECK-RV64-NEXT: .LBB61_747: # %cond.load901 ; CHECK-RV64-NEXT: lbu a2, 0(a0) -; CHECK-RV64-NEXT: li a3, 512 -; CHECK-RV64-NEXT: vsetivli zero, 0, e8, m1, ta, ma +; CHECK-RV64-NEXT: vsetivli zero, 1, e8, m1, ta, ma ; CHECK-RV64-NEXT: vmv8r.v v16, v8 -; CHECK-RV64-NEXT: vsetvli zero, a3, e8, m1, ta, ma ; CHECK-RV64-NEXT: vmv.s.x v12, a2 ; CHECK-RV64-NEXT: li a2, 227 ; CHECK-RV64-NEXT: li a3, 226

…ng (#118285) In order to coalesce a vsetvli with a register AVL into a previous vsetvli, we need to make sure that the AVL register is reachable at the previous vsetvli. Back in pre-RA vsetvli insertion we just checked to see if the two virtual registers were the same virtual register, and then this was hacked around in the move to post-RA. We can instead use live intervals to check that the reaching definition is the same at both instructions. On its own this doesn't have much of an impact, but helps a lot in #118283 and enables coalescing in about 60 of the test cases from that PR.

preames

Minor comments. This approach looks entirely reasonable, will be happy to LGTM if the perf results look good.

Note that I remain concerned that something after VSETVLI insertion will move/insert a copy. This probably covers most of the cases, but I'm not convinced this completely closes the correctness hole.

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

topperc · 2024-12-02T18:35:12Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+    // be coalesced into another vsetvli since we won't demand any fields.
+    VSETVLIInfo NewInfo; // Need a new VSETVLIInfo to clear SEWLMULRatioOnly
+    NewInfo.setAVLImm(1);
+    NewInfo.setVTYPE(RISCVII::VLMUL::LMUL_1, 8, true, true);


Can you add comment labels for the 8 and true values here.

topperc · 2024-12-02T21:26:36Z

Do we need to add Uses = [VTYPE] to the whole register move instructions in RISCVInstrInfoV.td?

lukel97 · 2024-12-02T23:04:25Z

Do we need to add Uses = [VTYPE] to the whole register move instructions in RISCVInstrInfoV.td?

I think so. I was also thinking that we probably need to somehow add a use of VTYPE to the COPY instructions too. I don't think anything is preventing them from being shuffled about at the moment.

Maybe we need to eagerly lower the copies in RISCVInsertVSETVLI? We could do it in this PR or in a separate one.

EDIT: Turns out you can just add implicit uses onto COPYs and things seem to be OK. I'll add the uses in this PR

lukel97 · 2024-12-02T23:41:11Z

Do we need to add Uses = [VTYPE] to the whole register move instructions in RISCVInstrInfoV.td?

The whole register move instructions use RVInstV which already has Uses = [VL, VTYPE], so I think they're safe.

topperc · 2024-12-02T23:46:42Z

Do we need to add Uses = [VTYPE] to the whole register move instructions in RISCVInstrInfoV.td?

The whole register move instructions use RVInstV which already has Uses = [VL, VTYPE], so I think they're safe.

Isn't the Uses = [] overriding that?

foreach n = [1, 2, 4, 8] in {                                                    
  defvar vrc = !cast<VReg>(!if(!eq(n, 1), "VR", "VRM"#n));                       
  def VMV#n#R_V  : RVInstV<0b100111, !add(n, -1), OPIVI, (outs vrc:$vd),         
                           (ins vrc:$vs2), "vmv" # n # "r.v", "$vd, $vs2">,      
                   VMVRSched<n> {                                                
    let Uses = [];                                                               
    let vm = 1;                                                                  
  }                                                                              
}

lukel97 · 2024-12-02T23:56:29Z

Isn't the Uses = [] overriding that?

Oh woops, I completely missed that. I'll open up another PR to fix that

This was pointed out in llvm#118283 (comment). We cannot move these between vtype definitions as they depend on SEW and require vill to be clear.

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

lukel97 · 2024-12-03T13:18:20Z

The run on SPEC finished on the spacemit-x60, there were no significant changes. This was with -mcpu=spacemit-x60 -O3 -flto on the train dataset. 523.xalancbmk_r is a bit noisy and I didn't see any significant vector related codegen changes in the diff of the binary

This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

… vector copy

preames

LGTM w/minor comments only.

I am convinced by the performance and insertion count data provided.

preames · 2024-12-03T15:57:47Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

@@ -1426,6 +1468,15 @@ void RISCVInsertVSETVLI::emitVSETVLIs(MachineBasicBlock &MBB) {
      PrefixTransparent = false;
    }

+    if (isVectorCopy(ST->getRegisterInfo(), MI)) {
+      if (!PrevInfo.isCompatible(DemandedFields::all(), CurInfo, LIS)) {


As a followup, we can reduce the number of vsetvli's emitted by integrating this with the needVSETVLIPHI logic below. Definitely non blocking.

preames · 2024-12-03T16:03:51Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+  // However it does need valid SEW, i.e. vill must be cleared. The entry to a
+  // function, calls and inline assembly may all set it, so make sure we clear
+  // it for whole register copies. Do this by leaving VILL demanded.
+  if (isVectorCopy(ST->getRegisterInfo(), MI)) {


Can you add a cl::opt for testing? I think simple having it conditional here would effectively disable this change, and having a quick way to rule in/out this change might be useful going forward.

lukel97 requested review from BeMg, kito-cheng, preames, topperc and wangpc-pp December 2, 2024 11:11

llvmbot added the backend:RISC-V label Dec 2, 2024

lukel97 mentioned this pull request Dec 2, 2024

[RISCV] Use LiveIntervals to determine if AVL dominates when coalescing #118285

Merged

lukel97 commented Dec 2, 2024

View reviewed changes

wangpc-pp reviewed Dec 2, 2024

View reviewed changes

lukel97 commented Dec 2, 2024

View reviewed changes

lukel97 force-pushed the vsetvli-clear-vill-whole-vector-register-move branch from 7849a2c to b7cbc70 Compare December 2, 2024 18:14

preames reviewed Dec 2, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp Show resolved Hide resolved

topperc reviewed Dec 2, 2024

View reviewed changes

lukel97 mentioned this pull request Dec 2, 2024

[RISCV] Mark vmvNr.v as implicitly using vtype #118414

Open

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Dec 2, 2024

[RISCV] Mark vmvNr.v as implicitly using vtype

fbd53bc

This was pointed out in llvm#118283 (comment). We cannot move these between vtype definitions as they depend on SEW and require vill to be clear.

BeMg mentioned this pull request Dec 3, 2024

[RISCV] enable VTYPE before whole RVVReg move #117866

Closed

wangpc-pp reviewed Dec 3, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp Show resolved Hide resolved

lukel97 added 2 commits December 3, 2024 21:27

Use getMinimalPhysRegClass and check TSFlags to check if it's a whole…

33f0cb2

… vector copy

lukel97 added 5 commits December 3, 2024 21:27

Remove unused include

1d0a1a4

Use an AVL of 1

1133abc

Update tests after rebasing on top of llvm#118285

b3f82ca

Label sew and ta/ma

42b7316

Add an implicit $vtype use on vector copies

6bfc14f

lukel97 force-pushed the vsetvli-clear-vill-whole-vector-register-move branch from 8c987de to 6bfc14f Compare December 3, 2024 13:32

preames approved these changes Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Clear vill for whole vector register moves in vsetvli insertion #118283

[RISCV] Clear vill for whole vector register moves in vsetvli insertion #118283

lukel97 commented Dec 2, 2024 •

edited

Loading

graphite-app bot commented Dec 2, 2024

llvmbot commented Dec 2, 2024

lukel97 Dec 2, 2024

wangpc-pp left a comment

wangpc-pp Dec 2, 2024

lukel97 Dec 2, 2024

wangpc-pp Dec 3, 2024

lukel97 commented Dec 2, 2024

lukel97 Dec 2, 2024

lukel97 Dec 2, 2024 •

edited

Loading

preames left a comment

topperc Dec 2, 2024 •

edited

Loading

topperc commented Dec 2, 2024

lukel97 commented Dec 2, 2024 •

edited

Loading

lukel97 commented Dec 2, 2024

topperc commented Dec 2, 2024

lukel97 commented Dec 2, 2024

lukel97 commented Dec 3, 2024

preames left a comment

preames Dec 3, 2024

preames Dec 3, 2024

[RISCV] Clear vill for whole vector register moves in vsetvli insertion #118283

Are you sure you want to change the base?

[RISCV] Clear vill for whole vector register moves in vsetvli insertion #118283

Conversation

lukel97 commented Dec 2, 2024 • edited Loading

graphite-app bot commented Dec 2, 2024

Your org has enabled the Graphite merge queue for merging into main

llvmbot commented Dec 2, 2024

lukel97 Dec 2, 2024

Choose a reason for hiding this comment

wangpc-pp left a comment

Choose a reason for hiding this comment

wangpc-pp Dec 2, 2024

Choose a reason for hiding this comment

lukel97 Dec 2, 2024

Choose a reason for hiding this comment

wangpc-pp Dec 3, 2024

Choose a reason for hiding this comment

lukel97 commented Dec 2, 2024

lukel97 Dec 2, 2024

Choose a reason for hiding this comment

lukel97 Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

topperc Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

topperc commented Dec 2, 2024

lukel97 commented Dec 2, 2024 • edited Loading

lukel97 commented Dec 2, 2024

topperc commented Dec 2, 2024

lukel97 commented Dec 2, 2024

lukel97 commented Dec 3, 2024

preames left a comment

Choose a reason for hiding this comment

preames Dec 3, 2024

Choose a reason for hiding this comment

preames Dec 3, 2024

Choose a reason for hiding this comment

lukel97 commented Dec 2, 2024 •

edited

Loading

lukel97 Dec 2, 2024 •

edited

Loading

topperc Dec 2, 2024 •

edited

Loading

lukel97 commented Dec 2, 2024 •

edited

Loading