-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loop Unrolling is not Enabled in Release Build #41063
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
I am not sure why are you looking at the IL here though: it is produced by Roslyn, not an optimizing compiler. The JIT output here is perhaps more interesting in that it appears to have two (?) loops: Sample.SumOfFirstThreeElements(Int32[])
L0000: sub rsp, 0x28
L0004: xor eax, eax
L0006: xor edx, edx
L0008: test rcx, rcx
L000b: je short L0024
L000d: cmp dword ptr [rcx+8], 3
L0011: jl short L0024
L0013: movsxd r8, edx
L0016: add eax, [rcx+r8*4+0x10]
L001b: inc edx
L001d: cmp edx, 3
L0020: jl short L0013
L0022: jmp short L0038
L0024: cmp edx, [rcx+8]
L0027: jae short L003d
L0029: movsxd r8, edx
L002c: add eax, [rcx+r8*4+0x10]
L0031: inc edx
L0033: cmp edx, 3
L0036: jl short L0024
L0038: add rsp, 0x28
L003c: ret
L003d: call 0x00007ffe9888fc00
L0042: int3 I also confirmed this is still the codegen on the recent-ish (month old) version of the runtime. |
@SingleAccretion Good point. I now looked at the JIT output too. GCC 9.2 produces an assembly output like this:
I hope that in .NET we can produce a similar output to this with bounds checking. |
I am more or less sure we can try and bend JIT to do out bidding in this particular case. E. g., this version produces better ASM: int sum = 0;
for (int i = 0; i <= 2; i++)
sum += arr[i];
return sum; G_M22200_IG01:
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M22200_IG02:
xor eax, eax
xor edx, edx
mov r8d, dword ptr [rcx+8]
;; bbWeight=1 PerfScore 2.50
G_M22200_IG03:
cmp edx, r8d
jae SHORT G_M22200_IG05
movsxd r9, edx
add eax, dword ptr [rcx+4*r9+16]
inc edx
cmp edx, 2
jle SHORT G_M22200_IG03
;; bbWeight=4 PerfScore 20.00
G_M22200_IG04:
add rsp, 40
ret
;; bbWeight=1 PerfScore 1.25
G_M22200_IG05:
call CORINFO_HELP_RNGCHKFAIL
int3 |
Interesting. But most of code I've seen in loops compare with I tried the unsafe version of this code. And it also does not do loop unrolling optimization.
x86 JIT Assembly.
|
Thanks for raising this issue. You are observing a number of quirks in the jit's loop optimization strategy. We hope to address some of these in the not-too-distant future.
Probably so. The jit can unroll loops but it's not clear if the criteria it uses is well-tuned. See eg #4248, #8107.
This is a result of loop cloning -- because of the potential for exceptions the jit needs to produce a version of the loop that bounds checks each iteration. But an up-front test can determine that such a bounds check will always pass. So the jit generates a second loop with no bounds checking, and runs one version of the loop or the other, depending.
As with unrolling, the cloning heuristics are not well tuned. On this two-element case, I would expect cloning to kick in and the version without bounds checks to then get unrolled, but no... see #4929, #8558.
The jit's loop analysis doesn't understand unsafe accesses very well, so unsafe codegen won't trigger some of the optimizations that one sees with normal array accesses. |
very nice trick! so as far as I understand it looks like this: public int SumOfFirstThreeElements(int[] arr)
{
int sum = 0;
for (int i = 0; i < 3; i++)
sum += arr[i]; // bounds check each iteration
return sum;
}
// is optimized to:
public int SumOfFirstThreeElements(int[] arr)
{
int sum = 0;
if (arr.Length < 3)
goto unsafe_loop;
for (int i = 0; i < 3; i++)
sum += arr[i]
return sum;
unsafe_loop:
for (int i = 0; i < 3; i++)
sum += arr[i]; // bound check each iteration
return sum;
} But does it make sense to do it for "3" iterations? (I mean any small constant). public int SumOfFirstThreeElements(int[] arr)
{
int sum = 0;
int i = 0;
for (; i < Math.Min(3, arr.Length); i++)
sum += arr[i]; // no bounds check
for (; i < 3; i++)
sum += arr[i]; // unsafe area, bounds checks
return sum;
} (inspired by https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp#L24-L41) |
I have seen quite a few examples where I believe the loop cloner's profitability heuristic is suspect. #8558 is the master issue for this. |
Given that we have several existing issues covering loop unrolling and cloning, I'm going to close this one. |
I'm trying the following code in SharpLab. In GCC this loop is unrolled and there are no jump statements in generated assembly.
As we can see a loop code is generated with jump statements. I think the performance would be better if the loop was unrolled.
SharpLab link.
Thanks.
The text was updated successfully, but these errors were encountered: