-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Math.Abs is slow #24626
Comments
This is probably because public static int Abs(int value)
{
return (value >= 0) ? value : AbsHelper(value);
}
private static int AbsHelper(int value)
{
Debug.Assert(value < 0, "AbsHelper should only be called for negative values! (workaround for JIT inlining)");
if (value == int.MinValue)
{
throw new OverflowException(SR.Overflow_NegateTwosCompNum);
}
return -value;
} Based on the assert, I would guess that having it all in one part would either prevent inlining or result in too much code bloat when it is inlined (possibly due to the thrown exception) |
|
Yeah. Something like: public static int Abs(int value)
{
if (value >= 0)
{
return value;
}
else if (value == int.MinValue)
{
AbsHelper();
}
return -value;
}
private static void AbsHelper()
{
throw new OverflowException(SR.Overflow_NegateTwosCompNum);
} |
@AndyAyersMS is it still necessary to move "throw" out of a method to make it inlinable? Is it simply that it is slightly less IL to call a method rather than load the string, construct the exception and throw it? |
The main reason to move the |
We should probably change static int Abs(int value)
{
if (value < 0)
{
value = -value;
if (value < 0)
{
AbsHelper();
}
}
return value;
} This produces the smallest amount of code with the current JIT and there's some room for improvement as well. The current implementation is curious. Someone automagically decided that negative numbers are uncommon and thus only the positive case should be handled efficiently 😕 |
The general guidance is to separate out throws into helper methods that do the work of creating the exception object and any related data (eg formatted exception messages) and then unconditionally throw. The jit's inline performance heuristic will then block inlining of the helper. This has a number of performance benefits:
Native codegen for exception throws that use resource based strings is surprisingly large. There is no "correctness" reason preventing methods with throws from being inlined, and methods that conditionally throw (like the original |
Another funny thing about the current implementation is that it doesn't quite save code size as one may hope. static int Test(int[] a)
{
int s = 0;
foreach (int x in a)
s += Math.Abs(x);
return s;
} generates G_M33786_IG01:
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC28 sub rsp, 40
G_M33786_IG02:
33F6 xor esi, esi
488BF9 mov rdi, rcx
33DB xor ebx, ebx
8B6F08 mov ebp, dword ptr [rdi+8]
85ED test ebp, ebp
7E1C jle SHORT G_M33786_IG06
G_M33786_IG03:
4863CB movsxd rcx, ebx
8B4C8F10 mov ecx, dword ptr [rdi+4*rcx+16]
; begin abs
85C9 test ecx, ecx
7D07 jge SHORT G_M33786_IG04
E802E26D5C call System.Math:AbsHelper(int):int
EB02 jmp SHORT G_M33786_IG05
G_M33786_IG04:
8BC1 mov eax, ecx
G_M33786_IG05:
; end abs
03F0 add esi, eax
FFC3 inc ebx
3BEB cmp ebp, ebx
7FE4 jg SHORT G_M33786_IG03
G_M33786_IG06:
8BC6 mov eax, esi
G_M33786_IG07:
4883C428 add rsp, 40
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
C3 ret
; Total bytes of code 61, prolog size 8 for method C:Test(ref):int The G_M33787_IG01:
4883EC28 sub rsp, 40
G_M33787_IG02:
33C0 xor eax, eax
488BD1 mov rdx, rcx
33C9 xor ecx, ecx
448B4208 mov r8d, dword ptr [rdx+8]
4585C0 test r8d, r8d
7E1F jle SHORT G_M33787_IG05
G_M33787_IG03:
4C63C9 movsxd r9, ecx
468B4C8A10 mov r9d, dword ptr [rdx+4*r9+16]
; begin abs
4585C9 test r9d, r9d
7D08 jge SHORT G_M33787_IG04
41F7D9 neg r9d
4585C9 test r9d, r9d
7C0F jl SHORT G_M33787_IG06
; end abs
G_M33787_IG04:
4103C1 add eax, r9d
FFC1 inc ecx
443BC1 cmp r8d, ecx
7FE1 jg SHORT G_M33787_IG03
G_M33787_IG05:
4883C428 add rsp, 40
C3 ret
G_M33787_IG06:
; begin abs
E8B3FBFFFF call C:AbsHelper()
CC int3
; end abs
; Total bytes of code 62, prolog size 4 for method C:Test(ref):int And I think it's possible to get rid of the |
Added PR dotnet/coreclr#15823 |
@mikedn Is there there an open issue already for that unnecessary |
Hmm, afaik there's nothing related to this specific pattern. It's problematic because the result of |
Such patterns often occur when the first computation (neg/inc/dec/sub...) is stored in a variable and then followed by a related test/cmp. Otherwise the IL would only contain a cmp/test generating instruction in the first place (assuming properly optimized code). |
Apart from the issue with the intermediary variables there's another issue with these patterns - many are not valid when integer overflow has well defined behavior as it has in C# and IL. I already screwed up this once and reverted the change, see https://github.com/dotnet/coreclr/issues/14321 for more details. And I'm pretty sure Feel free to open an issue in the coreclr repository. But it would be nice to include some actual code examples where these optimizations are valid and potentially useful. Note that the JIT already handles cases involving |
@mikedn @benaadams It was mentioned earlier that this code path is only optimized for the non-negative input case. Would branchless code like the below address this? public static int Abs(int value) {
int sign = (value >> 31); // -1 if negative; 0 if non-negative
value ^= sign; // ~value if negative; value if non-negative
value -= sign; // ~value + 1 (= -value) if negative; value if non-negative
if (value < 0) { /* throw - should pretty much always evaluate to false in branch predictor */ }
return value;
} In my own testing this has a slight perf hit over the merged implementation of |
That's a complicated problem. Yes, if you run into branch mispredictions then the performance goes down the drain. Yes, in many cases the perf hit of the branchless version is small (though it's there pretty much all the time). But it's also likely that you can construct examples where the branchless version perf hit is higher. I've seen that with CMOV - 1.5x slower and even 2x slower. This already makes thing problematic. And then the branch version is also small. And it may be even smaller with an additional JIT optimization. And it's more likely that the JIT can understand it and do something with it (whatever that is, including applying some kind of if-conversion). It's less likely that the JIT will be able to do anything useful with the branchless version. It certainly won't be able to convert it back to the branch version if it somehow figures out that the branch version won't suffer from mispredictions. In the best case the JIT learns how to do proper PGO and decides to if-convert the branch code based on that (though even then you may have cases where the input data is random now and non-random next time). Failing that, the JIT might blindly decide to if-convert the branch version, at least that will produce slightly shorter code than the simulated version (I think, I haven't actually checked). |
Hmm, maybe it's not that bad. 1.5x - 2x tends to happen with In any case, this needs to be approached from the JIT side first since it could probably generate a shorter sequence such as |
Math.Abs is slow when the value is negative
The text was updated successfully, but these errors were encountered: