Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Peephole optimization to remove redundant csel instructions #101483

Merged
merged 2 commits into from
Aug 5, 2024

Conversation

citymarina
Copy link
Contributor

Given a sequence such as

%8:gpr64 = COPY $xzr
%10:gpr64 = COPY $xzr
%11:gpr64 = CSELXr %8:gpr64, %10:gpr64, 0, implicit $nzcv

PeepholeOptimizer::foldRedundantCopy led to the creation of select instructions where both inputs were the same register:

%11:gpr64 = CSELXr %8:gpr64, %8:gpr64, 0, implicit $nzcv

This change adds a later peephole optimization that replaces such selects with unconditional moves.

Given a sequence such as

  %8:gpr64 = COPY $xzr
  %10:gpr64 = COPY $xzr
  %11:gpr64 = CSELXr %8:gpr64, %10:gpr64, 0, implicit $nzcv

`PeepholeOptimizer::foldRedundantCopy` led to the creation of select
instructions where both inputs were the same register:

  %11:gpr64 = CSELXr %8:gpr64, %8:gpr64, 0, implicit $nzcv

This change adds a later peephole optimization that replaces such
selects with unconditional moves.
@llvmbot
Copy link
Member

llvmbot commented Aug 1, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Marina (citymarina)

Changes

Given a sequence such as

%8:gpr64 = COPY $xzr
%10:gpr64 = COPY $xzr
%11:gpr64 = CSELXr %8:gpr64, %10:gpr64, 0, implicit $nzcv

PeepholeOptimizer::foldRedundantCopy led to the creation of select instructions where both inputs were the same register:

%11:gpr64 = CSELXr %8:gpr64, %8:gpr64, 0, implicit $nzcv

This change adds a later peephole optimization that replaces such selects with unconditional moves.


Full diff: https://github.com/llvm/llvm-project/pull/101483.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp (+28)
  • (modified) llvm/test/CodeGen/AArch64/peephole-csel.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/peephole-csel.mir (+2-2)
diff --git a/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp b/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
index f61de8ff1a4a6..5c5a9df82d7b5 100644
--- a/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+++ b/llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
@@ -61,6 +61,9 @@
 //   %6:fpr128 = IMPLICIT_DEF
 //   %7:fpr128 = INSERT_SUBREG %6:fpr128(tied-def 0), killed %1:fpr64, %subreg.dsub
 //
+// 8. Remove redundant CSELs that select between identical registers, by
+//    replacing them with unconditional moves.
+//
 //===----------------------------------------------------------------------===//
 
 #include "AArch64ExpandImm.h"
@@ -124,6 +127,7 @@ struct AArch64MIPeepholeOpt : public MachineFunctionPass {
   template <typename T>
   bool visitAND(unsigned Opc, MachineInstr &MI);
   bool visitORR(MachineInstr &MI);
+  bool visitCSEL(MachineInstr &MI);
   bool visitINSERT(MachineInstr &MI);
   bool visitINSviGPR(MachineInstr &MI, unsigned Opc);
   bool visitINSvi64lane(MachineInstr &MI);
@@ -283,6 +287,26 @@ bool AArch64MIPeepholeOpt::visitORR(MachineInstr &MI) {
   return true;
 }
 
+bool AArch64MIPeepholeOpt::visitCSEL(MachineInstr &MI) {
+  // Replace CSEL with MOV when both inputs are the same register.
+  if (MI.getOperand(1).getReg() != MI.getOperand(2).getReg())
+    return false;
+
+  auto ZeroReg =
+      MI.getOpcode() == AArch64::CSELXr ? AArch64::XZR : AArch64::WZR;
+  auto OrOpcode =
+      MI.getOpcode() == AArch64::CSELXr ? AArch64::ORRXrs : AArch64::ORRWrs;
+
+  BuildMI(*MI.getParent(), MI, MI.getDebugLoc(), TII->get(OrOpcode))
+      .addReg(MI.getOperand(0).getReg(), RegState::Define)
+      .addReg(ZeroReg)
+      .addReg(MI.getOperand(1).getReg())
+      .addImm(0);
+
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AArch64MIPeepholeOpt::visitINSERT(MachineInstr &MI) {
   // Check this INSERT_SUBREG comes from below zero-extend pattern.
   //
@@ -788,6 +812,10 @@ bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
             visitADDSSUBS<uint64_t>({AArch64::SUBXri, AArch64::SUBSXri},
                                     {AArch64::ADDXri, AArch64::ADDSXri}, MI);
         break;
+      case AArch64::CSELWr:
+      case AArch64::CSELXr:
+        Changed |= visitCSEL(MI);
+        break;
       case AArch64::INSvi64gpr:
         Changed |= visitINSviGPR(MI, AArch64::INSvi64lane);
         break;
diff --git a/llvm/test/CodeGen/AArch64/peephole-csel.ll b/llvm/test/CodeGen/AArch64/peephole-csel.ll
index 3f92943b11eb1..868b9f1f2f6ac 100644
--- a/llvm/test/CodeGen/AArch64/peephole-csel.ll
+++ b/llvm/test/CodeGen/AArch64/peephole-csel.ll
@@ -6,7 +6,7 @@ define void @peephole_csel(ptr %dst, i1 %0, i1 %cmp) {
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    tst w2, #0x1
 ; CHECK-NEXT:    mov w8, #1 // =0x1
-; CHECK-NEXT:    csel x9, xzr, xzr, eq
+; CHECK-NEXT:    mov x9, xzr
 ; CHECK-NEXT:    tst w1, #0x1
 ; CHECK-NEXT:    csel x8, x8, x9, eq
 ; CHECK-NEXT:    str x8, [x0]
diff --git a/llvm/test/CodeGen/AArch64/peephole-csel.mir b/llvm/test/CodeGen/AArch64/peephole-csel.mir
index 5077441a33788..d424dc05c801c 100644
--- a/llvm/test/CodeGen/AArch64/peephole-csel.mir
+++ b/llvm/test/CodeGen/AArch64/peephole-csel.mir
@@ -19,7 +19,7 @@ body:             |
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr64 = COPY $x1
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr64 = COPY $x0
     ; CHECK-NEXT: $xzr = ANDSXri [[COPY]], 0, implicit-def $nzcv
-    ; CHECK-NEXT: [[CSELXr:%[0-9]+]]:gpr64 = CSELXr [[COPY1]], [[COPY1]], 0, implicit $nzcv
+    ; CHECK-NEXT: [[ORRXrs:%[0-9]+]]:gpr64 = ORRXrs $xzr, [[COPY1]], 0
     ; CHECK-NEXT: RET_ReallyLR
     %3:gpr64 = COPY $x1
     %4:gpr64 = COPY $x0
@@ -46,7 +46,7 @@ body:             |
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32 = COPY $w1
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32 = COPY $w0
     ; CHECK-NEXT: $wzr = ANDSWri [[COPY]], 0, implicit-def $nzcv
-    ; CHECK-NEXT: [[CSELWr:%[0-9]+]]:gpr32 = CSELWr [[COPY1]], [[COPY1]], 0, implicit $nzcv
+    ; CHECK-NEXT: [[ORRWrs:%[0-9]+]]:gpr32 = ORRWrs $wzr, [[COPY1]], 0
     ; CHECK-NEXT: RET_ReallyLR
     %3:gpr32 = COPY $w1
     %4:gpr32 = COPY $w0

Copy link

github-actions bot commented Aug 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@citymarina
Copy link
Contributor Author

The clang-format issue is with a pre-existing comment, and the suggested fix makes the code worse. Should I just mark that whole comment block (or perhaps just paragraph 7?) as clang-format off?

@tschuett tschuett requested a review from davemgreen August 1, 2024 14:00
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch LGTM, with a question related to constraining the regclass.

The clang-format issue is with a pre-existing comment, and the suggested fix makes the code worse. Should I just mark that whole comment block (or perhaps just paragraph 7?) as clang-format off?

Can you rebase over c7c5e05, that should hopefully be enough to fix the formatting.

.addReg(MI.getOperand(0).getReg(), RegState::Define)
.addReg(ZeroReg)
.addReg(MI.getOperand(1).getReg())
.addImm(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never know if we need it or not, but should this do a MRI->constrainRegClass on the Regs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the ORR should be ok with the same sets of registers as the CSEL.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Thanks, LGTM then.

Are you happy for me to hit submit?

@citymarina
Copy link
Contributor Author

That would be great - thank you!

@davemgreen davemgreen merged commit 29763aa into llvm:main Aug 5, 2024
7 checks passed
banach-space pushed a commit to banach-space/llvm-project that referenced this pull request Aug 7, 2024
…llvm#101483)

Given a sequence such as

  %8:gpr64 = COPY $xzr
  %10:gpr64 = COPY $xzr
  %11:gpr64 = CSELXr %8:gpr64, %10:gpr64, 0, implicit $nzcv

`PeepholeOptimizer::foldRedundantCopy` led to the creation of select
instructions where both inputs were the same register:

  %11:gpr64 = CSELXr %8:gpr64, %8:gpr64, 0, implicit $nzcv

This change adds a later peephole optimization that replaces such
selects with unconditional moves.
citymarina added a commit to swiftlang/llvm-project that referenced this pull request Aug 14, 2024
…llvm#101483)

Given a sequence such as

  %8:gpr64 = COPY $xzr
  %10:gpr64 = COPY $xzr
  %11:gpr64 = CSELXr %8:gpr64, %10:gpr64, 0, implicit $nzcv

`PeepholeOptimizer::foldRedundantCopy` led to the creation of select
instructions where both inputs were the same register:

  %11:gpr64 = CSELXr %8:gpr64, %8:gpr64, 0, implicit $nzcv

This change adds a later peephole optimization that replaces such
selects with unconditional moves.
kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Aug 15, 2024
…llvm#101483)

Given a sequence such as

  %8:gpr64 = COPY $xzr
  %10:gpr64 = COPY $xzr
  %11:gpr64 = CSELXr %8:gpr64, %10:gpr64, 0, implicit $nzcv

`PeepholeOptimizer::foldRedundantCopy` led to the creation of select
instructions where both inputs were the same register:

  %11:gpr64 = CSELXr %8:gpr64, %8:gpr64, 0, implicit $nzcv

This change adds a later peephole optimization that replaces such
selects with unconditional moves.
@citymarina citymarina deleted the csel2 branch August 30, 2024 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants