Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JitArm64_Integer: cmp/cmpl optimizations #13324

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Sintendo
Copy link
Member

@Sintendo Sintendo commented Feb 2, 2025

The main optimizations here take advantage of ARM64's ADD/SUB instructions which can encode a (shifted) 12-bit immediate value, allowing us to skip materializing the immediate in a register. This either means we save an instruction, we end up with an equal amount of instructions if it needs to be materialized for a subsequence use. I didn't implement the (shifted) 12-bit additions for cmpl, as I was unable to find games where this occurred.

The second optimization consists of skipping the sign extension step in cmp if possible.

Finally, I fixed a const correctness issue I accidentally introduced in #13120.


cmp - Subtract 12-bit constant

Before:

0x52800416   mov    w22, #0x20                ; =32
0x93407f78   sxtw   x24, w27
0xcb36c318   sub    x24, x24, w22, sxtw

After:

0x93407f78   sxtw   x24, w27
0xd1008318   sub    x24, x24, #0x20
cmp - Subtract shifted 12-bit constant

Before:

0x52a00099   mov    w25, #0x40000             ; =262144
0x93407f7a   sxtw   x26, w27
0xcb39c35a   sub    x26, x26, w25, sxtw

After:

0x93407f7a   sxtw   x26, w27
0xd141035a   sub    x26, x26, #0x40, lsl #12  ; =0x40000
cmp - Add 12-bit constant

Before:

0x12800019   mov    w25, #-0x1                ; =-1
0x93407f5b   sxtw   x27, w26
0xcb39c37b   sub    x27, x27, w25, sxtw

After:

0x93407f5b   sxtw   x27, w26
0x9100077b   add    x27, x27, #0x1
cmp - Add shifted 12-bit constant

Before:

0x52bff01a   mov    w26, #-0x800000           ; =-8388608
0x93407f1b   sxtw   x27, w24
0xcb3ac37b   sub    x27, x27, w26, sxtw

After:

0x93407f1b   sxtw   x27, w24
0x9160037b   add    x27, x27, #0x800, lsl #12 ; =0x800000
cmp - Skip sign extension for a

Before:

0x5280003a   mov    w26, #0x1                 ; =1
0x93407f5b   sxtw   x27, w26
0xcb38c37b   sub    x27, x27, w24, sxtw

After:

0x5280003a   mov    w26, #0x1                 ; =1
0xcb38c35b   sub    x27, x26, w24, sxtw
cmp - Skip sign extension for b

Before:

0x52a20018   mov    w24, #0x10000000          ; =268435456
0x93407f79   sxtw   x25, w27
0xcb38c339   sub    x25, x25, w24, sxtw

After:

0x52a20018   mov    w24, #0x10000000          ; =268435456
0x93407f79   sxtw   x25, w27
0xcb180339   sub    x25, x25, x24
cmpl - Subtract 12-bit constant

Before:

0x5280003a   mov    w26, #0x1                 ; =1
0xcb1a033b   sub    x27, x25, x26

After:

0xd100073b   sub    x27, x25, #0x1
cmpl - Subtract shifted 12-bit constant

Before:

0x52a00218   mov    w24, #0x100000            ; =1048576
0xcb180379   sub    x25, x27, x24

After:

0xd1440379   sub    x25, x27, #0x100, lsl #12 ; =0x100000

You can encode a 12-bit immediate in a SUB instruction on ARM64.
Constants in this range do not need to be sign extended, so we can
exploit this to avoid materializing the immediate.

This approach saves an instruction if it does not need to be
materialized in a register afterwards. Otherwise, we just materialize
it later and the total number of instructions stays the same.

Before:
0x52800416   mov    w22, #0x20                ; =32
0x93407f78   sxtw   x24, w27
0xcb36c318   sub    x24, x24, w22, sxtw

After:
0x93407f78   sxtw   x24, w27
0xd1008318   sub    x24, x24, #0x20
You can encode a shifted 12-bit immediate in a SUB instruction on ARM64.
Constants in this range do not need to be sign extended, so we can
exploit this to avoid materializing the immediate.

This approach saves an instruction if it does not need to be
materialized in a register afterwards. Otherwise, we just materialize
it later and the total number of instructions stays the same.

Before:
0x52a00099   mov    w25, #0x40000             ; =262144
0x93407f7a   sxtw   x26, w27
0xcb39c35a   sub    x26, x26, w25, sxtw

After:
0x93407f7a   sxtw   x26, w27
0xd141035a   sub    x26, x26, #0x40, lsl dolphin-emu#12  ; =0x40000
You can encode a 12-bit immediate in an ADD instruction on ARM64. If the
negated constant fits in this range, we can exploit this to avoid
materializing the immediate.

This approach saves an instruction if it does not need to be
materialized in a register afterwards. Otherwise, we just materialize
it later and the total number of instructions stays the same.

Before:
0x12800019   mov    w25, #-0x1                ; =-1
0x93407f5b   sxtw   x27, w26
0xcb39c37b   sub    x27, x27, w25, sxtw

After:
0x93407f5b   sxtw   x27, w26
0x9100077b   add    x27, x27, #0x1
You can encode a shifted 12-bit immediate in an ADD instruction on
ARM64. If the negated constant fits in this range, we can exploit this
to avoid materializing the immediate.

This approach saves an instruction if it does not need to be
materialized in a register afterwards. Otherwise, we just materialize
it later and the total number of instructions stays the same.

Before:
0x52bff01a   mov    w26, #-0x800000           ; =-8388608
0x93407f1b   sxtw   x27, w24
0xcb3ac37b   sub    x27, x27, w26, sxtw

After:
0x93407f1b   sxtw   x27, w24
0x9160037b   add    x27, x27, #0x800, lsl dolphin-emu#12 ; =0x800000
While we cannot always avoid materializing immediates, we can still
inspect the most significant bit and potentially skip sign extension.
This can sometimes save an instruction.

Before:
0x5280003a   mov    w26, #0x1                 ; =1
0x93407f5b   sxtw   x27, w26
0xcb38c37b   sub    x27, x27, w24, sxtw

After:
0x5280003a   mov    w26, #0x1                 ; =1
0xcb38c35b   sub    x27, x26, w24, sxtw

Before:
0x52a20018   mov    w24, #0x10000000          ; =268435456
0x93407f79   sxtw   x25, w27
0xcb38c339   sub    x25, x25, w24, sxtw

After:
0x52a20018   mov    w24, #0x10000000          ; =268435456
0x93407f79   sxtw   x25, w27
0xcb180339   sub    x25, x25, x24
You can encode a 12-bit immediate in a SUB instruction on ARM64. We can
exploit this to avoid materializing the immediate.

This approach saves an instruction if it does not need to be
materialized in a register afterwards. Otherwise, we just materialize
it later and the total number of instructions stays the same.

Before:
0x5280003a   mov    w26, #0x1                 ; =1
0xcb1a033b   sub    x27, x25, x26

After:
0xd100073b   sub    x27, x25, #0x1
You can encode a shifted 12-bit immediate in a SUB instruction on ARM64.
We exploit this to avoid materializing the immediate.

This approach saves an instruction if it does not need to be
materialized in a register afterwards. Otherwise, we just materialize
it later and the total number of instructions stays the same.

Before:
0x52a00218   mov    w24, #0x100000            ; =1048576
0xcb180379   sub    x25, x27, x24

After:
0xd1440379   sub    x25, x27, #0x100, lsl dolphin-emu#12 ; =0x100000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants