-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JitArm64_Integer: cmp/cmpl optimizations #13324
Open
Sintendo
wants to merge
8
commits into
dolphin-emu:master
Choose a base branch
from
Sintendo:jitarm64-cmp-imm
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You can encode a 12-bit immediate in a SUB instruction on ARM64. Constants in this range do not need to be sign extended, so we can exploit this to avoid materializing the immediate. This approach saves an instruction if it does not need to be materialized in a register afterwards. Otherwise, we just materialize it later and the total number of instructions stays the same. Before: 0x52800416 mov w22, #0x20 ; =32 0x93407f78 sxtw x24, w27 0xcb36c318 sub x24, x24, w22, sxtw After: 0x93407f78 sxtw x24, w27 0xd1008318 sub x24, x24, #0x20
You can encode a shifted 12-bit immediate in a SUB instruction on ARM64. Constants in this range do not need to be sign extended, so we can exploit this to avoid materializing the immediate. This approach saves an instruction if it does not need to be materialized in a register afterwards. Otherwise, we just materialize it later and the total number of instructions stays the same. Before: 0x52a00099 mov w25, #0x40000 ; =262144 0x93407f7a sxtw x26, w27 0xcb39c35a sub x26, x26, w25, sxtw After: 0x93407f7a sxtw x26, w27 0xd141035a sub x26, x26, #0x40, lsl dolphin-emu#12 ; =0x40000
You can encode a 12-bit immediate in an ADD instruction on ARM64. If the negated constant fits in this range, we can exploit this to avoid materializing the immediate. This approach saves an instruction if it does not need to be materialized in a register afterwards. Otherwise, we just materialize it later and the total number of instructions stays the same. Before: 0x12800019 mov w25, #-0x1 ; =-1 0x93407f5b sxtw x27, w26 0xcb39c37b sub x27, x27, w25, sxtw After: 0x93407f5b sxtw x27, w26 0x9100077b add x27, x27, #0x1
You can encode a shifted 12-bit immediate in an ADD instruction on ARM64. If the negated constant fits in this range, we can exploit this to avoid materializing the immediate. This approach saves an instruction if it does not need to be materialized in a register afterwards. Otherwise, we just materialize it later and the total number of instructions stays the same. Before: 0x52bff01a mov w26, #-0x800000 ; =-8388608 0x93407f1b sxtw x27, w24 0xcb3ac37b sub x27, x27, w26, sxtw After: 0x93407f1b sxtw x27, w24 0x9160037b add x27, x27, #0x800, lsl dolphin-emu#12 ; =0x800000
e3ac54d
to
f3e7aeb
Compare
JosJuice
reviewed
Feb 2, 2025
While we cannot always avoid materializing immediates, we can still inspect the most significant bit and potentially skip sign extension. This can sometimes save an instruction. Before: 0x5280003a mov w26, #0x1 ; =1 0x93407f5b sxtw x27, w26 0xcb38c37b sub x27, x27, w24, sxtw After: 0x5280003a mov w26, #0x1 ; =1 0xcb38c35b sub x27, x26, w24, sxtw Before: 0x52a20018 mov w24, #0x10000000 ; =268435456 0x93407f79 sxtw x25, w27 0xcb38c339 sub x25, x25, w24, sxtw After: 0x52a20018 mov w24, #0x10000000 ; =268435456 0x93407f79 sxtw x25, w27 0xcb180339 sub x25, x25, x24
You can encode a 12-bit immediate in a SUB instruction on ARM64. We can exploit this to avoid materializing the immediate. This approach saves an instruction if it does not need to be materialized in a register afterwards. Otherwise, we just materialize it later and the total number of instructions stays the same. Before: 0x5280003a mov w26, #0x1 ; =1 0xcb1a033b sub x27, x25, x26 After: 0xd100073b sub x27, x25, #0x1
You can encode a shifted 12-bit immediate in a SUB instruction on ARM64. We exploit this to avoid materializing the immediate. This approach saves an instruction if it does not need to be materialized in a register afterwards. Otherwise, we just materialize it later and the total number of instructions stays the same. Before: 0x52a00218 mov w24, #0x100000 ; =1048576 0xcb180379 sub x25, x27, x24 After: 0xd1440379 sub x25, x27, #0x100, lsl dolphin-emu#12 ; =0x100000
Forgot this when I added it in dolphin-emu#13120.
f3e7aeb
to
755c003
Compare
JosJuice
approved these changes
Feb 2, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The main optimizations here take advantage of ARM64's
ADD
/SUB
instructions which can encode a (shifted) 12-bit immediate value, allowing us to skip materializing the immediate in a register. This either means we save an instruction, we end up with an equal amount of instructions if it needs to be materialized for a subsequence use. I didn't implement the (shifted) 12-bit additions forcmpl
, as I was unable to find games where this occurred.The second optimization consists of skipping the sign extension step in
cmp
if possible.Finally, I fixed a const correctness issue I accidentally introduced in #13120.
cmp - Subtract 12-bit constant
Before:
After:
cmp - Subtract shifted 12-bit constant
Before:
After:
cmp - Add 12-bit constant
Before:
After:
cmp - Add shifted 12-bit constant
Before:
After:
cmp - Skip sign extension for a
Before:
After:
cmp - Skip sign extension for b
Before:
After:
cmpl - Subtract 12-bit constant
Before:
After:
cmpl - Subtract shifted 12-bit constant
Before:
After: