-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add an emitter peephole for post-indexed addressing #105181
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 3 pipeline(s). |
@@ -5844,6 +5844,12 @@ void emitter::emitIns_R_R_I(instruction ins, | |||
return; | |||
} | |||
|
|||
if ((reg1 == reg2) && (EA_SIZE(attr) == EA_PTRSIZE) && emitComp->opts.OptimizationEnabled() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't necessarily have to be EA_PTRSIZE
, right? Rather, we just need the offset to be a multiple of 8
in range (looks to be in the range of -4096
to 4032
, inclusive)?
-- Not something I think that needs to be handled in this PR, but rather that might be a possible future improvement on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it depends on the instruction. Some are multiples of 4/8/16 and some might be raw offsets. The ranges vary based on instruction too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be EA_PTRSIZE
-- the register that is written is the address that was used for the load, so it is always pointer sized.
The offset is an unscaled 9-bit signed immediate for the (single register) loads that support the write-back. So -256 to 255 is supported for the combined form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be EA_PTRSIZE -- the register that is written is the address that was used for the load, so it is always pointer sized.
👍
The offset is an unscaled 9-bit signed immediate for the (single register) loads that support the write-back. So -256 to 255 is supported for the combined form.
Yeah, I was looking at the wrong instruction here. There's a few different post-indexing
forms. For ldp
, as an example, it's:
For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of 4 in the range -256 to 252, encoded in the "imm7" field as /4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of 8 in the range -512 to 504, encoded in the "imm7" field as /8.
There's then ldapr
which is fixed 4
or 8
, ldiapp
which is fixed 8
or 16
, and ldpsw
which has the ... a multiple of 4 in the range -256 to 252 ...
text
Seems we're not handling these ones in this PR, which is fine, just had a mental disconnect due to the differences between them 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, some of those forms we could definitely handle in the future. One complication is that some of those forms allow redefining the GC ness of up to 3 registers which instrDesc
does not support today, so we would need to expand it in some way.
return false; | ||
} | ||
|
||
if ((emitLastIns->idInsFmt() != IF_LS_2A) || emitLastIns->idIsTlsGD()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any load/store instructions this doesn't cover for the initial work being done?
There's a lot of different instructions that support post-indexing, but not sure if all of them are IF_LS_2A
or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this doesn't support ldp
and stp
. Those instructions can have 7-bit scaled offsets.
There's a lot of different instructions that support post-indexing, but not sure if all of them are IF_LS_2A or not
I don't think there are any other instructions that load or stores that support the post-indexed writeback form, but I could be wrong. ldp
and stp
have support for post-indexing with writeback that we aren't supporting here yet, so that's something we could look into adding in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
Would be nice to log an issue for LDP to be covered as well, but I don't think its important to handle in this PR
Good idea, I opened #105192 for that. |
This transforms sequences like
into the equivalent
The second half of this change will be having lowering and strength reduction set up the IR such that this transformation kicks in.
Example codegen:
No pre-indexing support yet.