-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StoreIND/Store_BLK/Store_OBJ improvements. #38316
Conversation
3370f8f
to
383144f
Compare
2325c40
to
29b2de5
Compare
Support contained `GT_LCL_VAR_ADDR, GT_LCL_FLD_ADDR` under `IND, STOREIND`.
Support contained `GT_LCL_FLD_ADDR` under `GT_STORE_IND, GT_IND`.
We were doing this for `STOREIND`, but forgetting for `STORE_OBJ/BLK`.
Extract without changes, I will need to add additional calls to them later.
Fix a few regressions with `NoRetyping`.
Gives a few improvements when addr has an index.
When it is possible and profitable.
For GC types we were not trying to contain addr(why?) when needed a barrier and it looked dangerous for me to do such a change(for 5.0). For small types we were generating `movzx rax` for `IND byte` instead of `mov ax` .
29b2de5
to
428a86b
Compare
PTAL @erozenfeld @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor suggested changes.
src/coreclr/src/jit/rationalize.cpp
Outdated
@@ -700,10 +700,14 @@ Compiler::fgWalkResult Rationalizer::RewriteNode(GenTree** useEdge, Compiler::Ge | |||
case GT_BLK: | |||
// We should only see GT_BLK for TYP_STRUCT or for InitBlocks. | |||
assert((node->TypeGet() == TYP_STRUCT) || use.User()->OperIsInitBlkOp()); | |||
// Clear the `GTF_IND_ASG_LHS` flag, which overlaps with `GTF_IND_REQ_ADDR_IN_REG`. | |||
node->gtFlags &= ~GTF_IND_ASG_LHS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend moving the GT_BLK
and GT_OBJ
cases next to the GT_IND
cases, so the common characteristics are clearer. You might even consider extracting them all to a common method, e.g. RewriteIndir(GenTreeIndir* indir)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed. I will need to do more changes there if I want to fix missed TGT_ANYWHERE
flag, so it will be useful to have a separate method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent - thanks!
void LowerStoreIndir(GenTreeIndir* node); | ||
GenTree* LowerAdd(GenTreeOp* node); | ||
bool LowerUnsignedDivOrMod(GenTreeOp* divMod); | ||
GenTree* LowerConstIntDivOrMod(GenTree* node); | ||
GenTree* LowerSignedDivOrMod(GenTree* node); | ||
void LowerBlockStore(GenTreeBlk* blkNode); | ||
void LowerBlockStoreCommon(GenTreeBlk* blkNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am missing something, these could be merged; I don't understand the reason for a separate method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean LowerBlockStore
and LowerBlockStoreCommon
? The first is defined separately for armarch and xarch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see - I missed that; thanks!
src/coreclr/src/jit/lower.cpp
Outdated
} | ||
if (varTypeIsSIMD(regType)) | ||
{ | ||
// TODO: support STORE_IND SIMD16(SIMD16, CNT_INT 0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the TODO
below should probably be TODO-CQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Carol Eidt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I left a couple of minor comments.
This is a two parts change:
As usual, I recommend reviewing by commits, because there are several extracting/refactoring changes.First part:
ceec9c8: Arm64: Support contained
GT_LCL_VAR_ADDR, GT_LCL_FLD_ADDR
underIND, STOREIND
.eefeb7e: XARCH: Support contained
GT_LCL_FLD_ADDR
underGT_STORE_IND, GT_IND
.ef2709d: Clear
GTF_IND_ASG_LHS
after Rationalize for STORE_OBJ/BLK.We were doing this for
STOREIND
, but forgetting forSTORE_OBJ/BLK
.eecb41a: Extract
LowerStoreIndirCommon
,LowerIndir
.Extract without changes, I will need to add additional calls to them later.
6f8575e: Call the extracted funtions.
Fix a few regressions with
NoRetyping
.Second part:
d8821cc: Create
LEA
of comples addr expr.Gives a few improvements when addr has an index.
f642505: Extract
LowerBlockStoreCommon
.0624264: Tranform STORE_BLK/OBJ into STOREIND.
When it is possible and profitable.
428a86b: Don't tranform for GC and small types.
For GC types we were not trying to contain addr(why?) when needed a barrier and it looked dangerous for me to do such a change(for 5.0). @erozenfeld helped me to pass tests with GC types transformation, but I still was not confident enough and saw strange asm diffs like byref copy replaced by ref copy in some cases.
For small types we were generating
movzx rax
forIND byte
instead ofmov ax
.Diffs (sorry markdown doesn't understand merged word cells, so I have pasted it as an image)
I am planning to open two follow up issues if this goes in:
CNT_INT int 0
as contained, but have a0
available for reuse already (contained happens in lowering, reusing happens after in LSRA, a simple fix is to check reuse possibility even for contained, will fixxor rax, rax; mov byte ptr [esi], 0;
, but notmov byte ptr [esi], 0; xor rax, rax;
STOREIND
should always generate code not worse thanSTOREOBJ
for the same copy block, right now it doesn't hold for GCRefs and small types.I need these changes for
NoRetyping
, because there we have move structs and forSTORE_LCL_VAR(src)
whensrc
is nota
LCL_VAR
we are generatingSTORE_OBJ
and it caused many regressions.