-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64: Eliminate redundant opposite mov #38179
Conversation
Added IsOppositeOfPrevMov() that will skip generating redundant mov.
@dotnet/jit-contrib |
//cc : @TamarChristinaArm |
src/coreclr/src/jit/emitarm64.cpp
Outdated
@@ -4025,6 +4025,11 @@ void emitter::emitIns_R_R( | |||
} | |||
} | |||
|
|||
if (IsOppositeOfPrevMov(ins, reg1, reg2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you might need to worry about the EA_4BYTE case here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I think I will try to move the one above reg1 == reg2
inside the new IsOppositeOfPrevMov
so we have all the checks in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, generalize to "is unnecessary mov" and catch all 3 cases:
mov RX, RX
mov RX, RY
mov RY, RX
mov RX, RY
mov RX, RY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we want to check OptimizationEnabled()
before doing this optimization and mov RX, RX
was earlier done without this check, will it be fine to make this optimization only when opts is enabled?
For the IG boundary check, you can follow xarch's runtime/src/coreclr/src/jit/emitxarch.cpp Lines 165 to 171 in 80a7935
|
Ah sweet, didn't notice that when I referred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall structure looks good.
I assume those 2 regressions are cases where we're generating minopts code and now don't optimize?
For the new cases I think the size check needs to be more comprehensive. You need to verify that the data size in the current and prior instructions match.
That's right. I am surprised that only 2 methods were affected by this.
|
- Change IG boundary check to exclude extended IGs. - Check the operand size before removing redundant movs.
Thanks @AndyAyersMS for pointing it out. Turns out String.IsAscii was generating following code: G_M65212_IG01:
A9BE7BFD stp fp, lr, [sp,#-32]!
F9000FF3 str x19, [sp,#24]
910003FD mov fp, sp
;; bbWeight=1 PerfScore 2.50
G_M65212_IG02:
91003001 add x1, x0, #12
F9000BA1 str x1, [fp,#16] // [V02 loc1]
F9400BAB ldr x11, [fp,#16] // [V02 loc1]
B9400801 ldr w1, [x0,#8]
2A0103F3 mov w19, w1
AA1303E1 mov x1, x19
AA0B03E0 mov x0, x11
9000000B adrp x11, [RELOC #0x2357246b588]
9100016B add x11, x11, #0
F9400162 ldr x2, [x11]
D63F0040 blr x2
EB00027F cmp x19, x0
9A9F17E0 cset x0, eq
;; bbWeight=1 PerfScore 14.00
G_M65212_IG03:
F9400FF3 ldr x19, [sp,#24]
A8C27BFD ldp fp, lr, [sp],#32
D65F03C0 ret lr And I was optimizing this portion of code: 2A0103F3 mov w19, w1
AA1303E1 mov x1, x19 ; <-- Was wrongly removing this. I have now fixed the case and also relaxed a check about IG boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a size check in case (2) as well: mov X, Y; mov X.w, Y.w
is not the same as mov X, Y
.
The IG extends check is clever, but we now need to make sure we're always processing IGs in order. I think that's the case today, but not sure if its guaranteed. If we're ok taking that dependence, then we should make a similar change to AreUpper32BitsZero
for xarch.
Actually yes, I added an assert and didn't see the assert hitting during crossgen atleast, but that might not be the case while JITing. I will add the check.
Yes, Bruce and Egor pointed me out this trick and I will definitely add it inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One (hopefully last) comment -- in case 3 do we need to match the size check with the register class like we do for case 1?
In a separate PR. |
As far as my understanding, I didn't think it should be necessary to add the check because I was assuming if (size == EA_8BYTE && isVectorRegister(dst) != isVectorRegister(src)) {
return false;
} |
I found an issue where we were eliminating a Here,
|
Probably wouldn't hurt to kick off outerloop and jitstress runs on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a few minor suggestions.
Perform peephole optimization to skip
mov
instruction if the previous instruction was the opposite mov.Fixes: #35252
Looking at the size improvement, it matches closely to what was originally estimated in #35252.
Update: JIT code size diff collected using PMI.