-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64: In mod operation happening inside the loop, if divisor is an invariant, hoist the divisor checks #64795
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsArm64 doesn't have In below example, when both dividend ( public static int issue2(int x, int y, int z)
{
int result = 0;
for (int i = 0; i < z; i++)
{
//result = x % y; <-- this hoist things properly because both dividend and divisor are invariant.
result = result % y;
}
return result;
} ...
G_M61875_IG03:
cmp w0, #0 ; Check# 1 divisor == 0, can be hoisted
beq G_M61875_IG08
cmn w0, #1 ; Check# 2 divisor == -1, can be hoisted
bne G_M61875_IG04
adds wzr, w1, w1
bne G_M61875_IG04
bvs G_M61875_IG07
;; bbWeight=4 PerfScore 22.00
G_M61875_IG04:
sdiv w4, w1, w0
mul w4, w4, w0
sub w1, w1, w4
add w3, w3, #1
cmp w3, w2
blt G_M61875_IG03
;; bbWeight=4 PerfScore 62.00
G_M61875_IG05:
mov w0, w1
;; bbWeight=1 PerfScore 0.50
G_M61875_IG06:
ldp fp, lr, [sp],#16
ret lr
;; bbWeight=1 PerfScore 2.00
G_M61875_IG07:
bl CORINFO_HELP_OVERFLOW
;; bbWeight=0 PerfScore 0.00
G_M61875_IG08:
bl CORINFO_HELP_THROWDIVZERO
brk_windows #0
|
@dotnet/jit-contrib |
Not sure I understand this, so early in morph we transform it to loop
{
- loopVariant = loopVariant % loopInvariant;
+ loopVariant = loopVariant - (loopVariant / loopInvariant) * loopInvariant;
} I don't see what can be hoisted here. In theory, for cases like this for loops (on both x64 and Arm64) we can use a super smart trick by @lemire that we already use for Dictionaries, e.g.: public static uint issue2(uint x, uint y, uint z)
{
uint result = 0;
ulong cachedMul = GetFastModMultiplier(y);
for (int i = 0; i < z; i++)
{
result = FastMod(x, y, cachedMul); // no 'div' in the loop body!
}
return result;
}
public static ulong GetFastModMultiplier(uint divisor) =>
ulong.MaxValue / divisor + 1;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static uint FastMod(uint value, uint divisor, ulong multiplier) =>
(uint)(((((multiplier * value) >> 32) + 1) * divisor) >> 32); |
so e.g. you can try to morph
to
and hope LICM will manage to hoist the multiplier part. This can only be done for loops and only when you're sure the invariant part will be hoisted. Also, only unsigned types and on 64bit NOTE: if |
The way to fix this is to introduce a new node like (One will also need to add support for "non-faulting" Obviously, a new node for this narrow optimization is not a small price to pay in complexity, so one will have to make sure it is worth it. |
The
I agree. There will be an entire new pipeline to be added to support this. Quick search on internet reveals the pattern being used in popular code.
|
Related: #46010 |
I spent some time looking at this. A few comments:
I'm trying to get familiar with this area, so forgive me if it turns out the JIT can handle a lot of this already. |
We already take care of that. Refer: runtime/src/coreclr/jit/optimizer.cpp Lines 6646 to 6649 in 21f1a2b
and runtime/src/coreclr/jit/optimizer.cpp Lines 6826 to 6837 in 21f1a2b
|
Great! Thank you for linking to that, that's really helpful. |
Looking back at my old comment, I did introduce similar flags from #82924: GTF_DIV_MOD_NO_BY_ZERO = 0x20000000, // GT_DIV, GT_MOD -- Div or mod definitely does not divide-by-zero.
GTF_DIV_MOD_NO_OVERFLOW = 0x40000000, // GT_DIV, GT_MOD -- Div or mod definitely does not overflow. I agree that introducing new nodes and hoisting them out would be the proper way to resolve this. |
Arm64 doesn't have
modulo
operator, so we convert the operation todiv/mul/sub
format. However, we don't do much to the tree node until codegen where we generate code for it. During codegen, it is too late to know thatdivisor
was invariant otherwise we can hoist some extra checks likedivisor == 0
anddivisor == -1
.In below example, when both dividend (
x
) and divisor (y
) are invariant, we hoist the checks properly, but when dividend is not invariant (result
) but divisor is invariant (y
), we can hoist the checks out of the loop.category:cq
theme:div-mod-rem
skill-level:expert
cost:medium
impact:medium
The text was updated successfully, but these errors were encountered: