Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopUnroll] Ignore inlinable calls if unrolling is forced #88149

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gburgessiv
Copy link
Member

@gburgessiv gburgessiv commented Apr 9, 2024

NumInlineCandidates counts candidates that are very likely to be inlined. This is a useful metric, but causes linker warnings if:

  • the loop to be unrolled has had unrolling forced by the user, and
  • the inliner fails to inline the call (e.g., because it's considered a very cold callsite)

Fixes #88141

@llvmbot
Copy link
Member

llvmbot commented Apr 9, 2024

@llvm/pr-subscribers-llvm-transforms

Author: George Burgess IV (gburgessiv)

Changes

NumInlineCandidates counts candidates that are very likely to be inlined. This is a useful metric, but causes linker warnings if:

  • the loop to be unrolled has had unrolling forced by the user, and
  • the inliner fails to inline the call (e.g., because it's considered a very cold callsite)

Full diff: https://github.com/llvm/llvm-project/pull/88149.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+8-3)
  • (added) llvm/test/Transforms/LoopUnroll/unroll-calls-if-forced.ll (+35)
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index 75fb8765061edf..a026abc3267ca1 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1154,9 +1154,11 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
   // automatic unrolling from interfering with the user requested
   // transformation.
   Loop *ParentL = L->getParentLoop();
+  const bool UnrollIsForcedByUser =
+      hasUnrollTransformation(L) == TM_ForcedByUser;
   if (ParentL != nullptr &&
       hasUnrollAndJamTransformation(ParentL) == TM_ForcedByUser &&
-      hasUnrollTransformation(L) != TM_ForcedByUser) {
+      !UnrollIsForcedByUser) {
     LLVM_DEBUG(dbgs() << "Not unrolling loop since parent loop has"
                       << " llvm.loop.unroll_and_jam.\n");
     return LoopUnrollResult::Unmodified;
@@ -1166,7 +1168,7 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
   // loop has an explicit unroll-and-jam pragma. This is to prevent automatic
   // unrolling from interfering with the user requested transformation.
   if (hasUnrollAndJamTransformation(L) == TM_ForcedByUser &&
-      hasUnrollTransformation(L) != TM_ForcedByUser) {
+      !UnrollIsForcedByUser) {
     LLVM_DEBUG(
         dbgs()
         << "  Not unrolling loop since it has llvm.loop.unroll_and_jam.\n");
@@ -1216,7 +1218,10 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
   if (OptForSize)
     UP.Threshold = std::max(UP.Threshold, LoopSize + 1);
 
-  if (UCE.NumInlineCandidates != 0) {
+  // Ignore the potential for inlining if unrolling is forced by the user. It's
+  // only _very likely_ that `NumInlineCandidates` functions will be inlined;
+  // things like profile data can make this significantly less likely.
+  if (!UnrollIsForcedByUser && UCE.NumInlineCandidates != 0) {
     LLVM_DEBUG(dbgs() << "  Not unrolling loop with inlinable calls.\n");
     return LoopUnrollResult::Unmodified;
   }
diff --git a/llvm/test/Transforms/LoopUnroll/unroll-calls-if-forced.ll b/llvm/test/Transforms/LoopUnroll/unroll-calls-if-forced.ll
new file mode 100644
index 00000000000000..37548920358147
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/unroll-calls-if-forced.ll
@@ -0,0 +1,35 @@
+; RUN: opt -passes=loop-unroll -disable-verify -S < %s | FileCheck %s
+;
+; Ensure that loop unrolling occurs if the user forces it, even if there are
+; calls that may be inlined.
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+define internal void @foo() {
+  ret void
+}
+
+; CHECK-LABEL: @forced(
+; CHECK: call {{.*}}void @foo
+; CHECK: call {{.*}}void @foo
+define void @forced(ptr nocapture %a) {
+entry:
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, ptr %a, i64 %indvars.iv
+  call void @foo()
+  %0 = load i32, ptr %arrayidx, align 4
+  %inc = add nsw i32 %0, 1
+  store i32 %inc, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond = icmp eq i64 %indvars.iv.next, 64
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
+
+for.end:
+  ret void
+}
+
+!0 = distinct !{!0, !{!"llvm.loop.unroll.enable"},
+                    !{!"llvm.loop.unroll.count", i32 2}}

@gburgessiv gburgessiv requested a review from nikic April 10, 2024 14:47
@gburgessiv
Copy link
Member Author

Hey Nikita, I see you've done a few reviews in this area. Would you kindly be able to take a look at this CL?

@fhahn
Copy link
Contributor

fhahn commented Apr 10, 2024

This is a useful metric, but causes linker warnings if:

From the description, it's not clear what kind of warnings this would yield. Looking at the linked issue, it's remarks that get printed by default (loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering).

Would be good to clarify in the description.

Regardless of the patch at hand, I don't think we really should print those messages by default; there are other reasons why the compiler won't unroll a loop even if forced by the user.

Could you also add a link to #88141 in the description, not in the title as (#88141)? This could be confusing as Github will add the PR ID in a similar fashion when landing.

@@ -0,0 +1,35 @@
; RUN: opt -passes=loop-unroll -disable-verify -S < %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this -disable-verify?

Please also use UTC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this -disable-verify?

Copy-paste from another test; I removed this.

Please also use UTC.

I assume for commit timestamps? Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTC as in llvm/utils/update_test_checks.py to automatically generate CHECK lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes more sense. Thanks

@nikic
Copy link
Contributor

nikic commented Apr 15, 2024

I think the change makes sense conceptually -- pragma unroll should override pretty much all inlining heuristics, and I don't see a strong reason why it wouldn't override this one as well.

A possible caveat is the interaction with the pre-link pipeline: Are we okay with unrolling the calls there, or would it be better to wait until post-link (in which case the call has likely been inlined)?

@aeubanks
Copy link
Contributor

aeubanks commented May 9, 2024

reverse ping, Chromium is seeing something similar that may be fixed by this

@gburgessiv
Copy link
Member Author

Ah, sorry for the delay here.

A possible caveat is the interaction with the pre-link pipeline: Are we okay with unrolling the calls there, or would it be better to wait until post-link (in which case the call has likely been inlined)?

I added a commit that tries to funnel PrepareForLTO through properly. I'm not confident it caught everything, but it should be a step toward addressing this, at least. I agree pushing this out of prelink is good, since the prelink pass manager seems to favor transformations that reduce IR size.

Copy link
Contributor

@aeubanks aeubanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PrepareForLTO portion needs testing. you'll need to add parsing code for something like -passes='loop-unroll-full<prepare-for-lto>', grep for parseLoopRotateOptions for an example

also, isn't this still going to result in the remark getting printed in the pre-link optimization pipeline? I think we should treat the PrepareForLTO and the call heuristic as two separate issues and go back to the original PR, splitting out the PrepareForLTO change

please move the issue being fixed from the commit title to the commit message like Fixes #88141, otherwise it's confusing with the PR number

LPM2.addPass(LoopFullUnrollPass(
Level.getSpeedupLevel(),
/* OnlyWhenForced= */ !PTO.LoopUnrolling, PTO.ForgetAllSCEVInLoopUnroll,
/* PrepareForLTO= */ isLTOPreLink(ThinOrFullLTOPhase::ThinLTOPreLink)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isLTOPreLink(Phase), ditto below

@gburgessiv gburgessiv changed the title [LoopUnroll] Ignore inlinable calls if unrolling is forced (#88141) [LoopUnroll] Ignore inlinable calls if unrolling is forced May 20, 2024
@gburgessiv gburgessiv force-pushed the bug-88141 branch 2 times, most recently from 18de714 to e7eb490 Compare May 20, 2024 17:32
@gburgessiv
Copy link
Member Author

also, isn't this still going to result in the remark getting printed in the pre-link optimization pipeline?

I only observed the remarks during LTO, so I'd assumed that the prelink pipeline has a way of bypassing these checks (which passes my smell test, at least, since it's reasonable to expect opts to be in an unfinished state after pre-link).

I think we should treat the PrepareForLTO and the call heuristic as two separate issues and go back to the original PR, splitting out the PrepareForLTO change

Yeah, fixing the bug first SGTM.

please move the issue being fixed from the commit title to the commit message like Fixes #88141, otherwise it's confusing with the PR number

Done

@LeszekSwirski
Copy link

Is there any progress here? We've had to disable this annotation in V8 for now because of this issue (crbug.com/377144577)

@gburgessiv
Copy link
Member Author

I agree we should ideally push this through; let me rebase and check tests, and I'll go through this thread to see if AIs are blocked on me.

`NumInlineCandidates` counts candidates that are _very likely_ to be
inlined. This is a useful metric, but causes linker warnings if:
- the loop to be unrolled has had unrolling forced by the user, and
- the inliner fails to inline the call (e.g., because it's considered a
  very cold callsite)

Fixes llvm#88141
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LoopUnroll fails to unroll forced loop due to inlinable calls, leading to linker errors
6 participants