Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StoreIND/Store_BLK/Store_OBJ improvements. #38316

Merged
merged 13 commits into from
Jun 28, 2020

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Jun 24, 2020

This is a two parts change:

As usual, I recommend reviewing by commits, because there are several extracting/refactoring changes.

First part:

ceec9c8: Arm64: Support contained GT_LCL_VAR_ADDR, GT_LCL_FLD_ADDR under IND, STOREIND.

eefeb7e: XARCH: Support contained GT_LCL_FLD_ADDR under GT_STORE_IND, GT_IND.

ef2709d: Clear GTF_IND_ASG_LHS after Rationalize for STORE_OBJ/BLK.
We were doing this for STOREIND, but forgetting for STORE_OBJ/BLK.

eecb41a: Extract LowerStoreIndirCommon, LowerIndir.
Extract without changes, I will need to add additional calls to them later.

6f8575e: Call the extracted funtions.
Fix a few regressions with NoRetyping.

Second part:

d8821cc: Create LEA of comples addr expr.
Gives a few improvements when addr has an index.

f642505: Extract LowerBlockStoreCommon.

0624264: Tranform STORE_BLK/OBJ into STOREIND.
When it is possible and profitable.

428a86b: Don't tranform for GC and small types.
For GC types we were not trying to contain addr(why?) when needed a barrier and it looked dangerous for me to do such a change(for 5.0). @erozenfeld helped me to pass tests with GC types transformation, but I still was not confident enough and saw strange asm diffs like byref copy replaced by ref copy in some cases.

For small types we were generating movzx rax for IND byte instead of mov ax .

Diffs (sorry markdown doesn't understand merged word cells, so I have pasted it as an image)

image

I am planning to open two follow up issues if this goes in:

  1. combine reuse reg and containment analysis, there are many methods that decide to mark CNT_INT int 0 as contained, but have a 0 available for reuse already (contained happens in lowering, reusing happens after in LSRA, a simple fix is to check reuse possibility even for contained, will fix xor rax, rax; mov byte ptr [esi], 0;, but not mov byte ptr [esi], 0; xor rax, rax;
  2. STOREIND should always generate code not worse than STOREOBJ for the same copy block, right now it doesn't hold for GCRefs and small types.

I need these changes for NoRetyping, because there we have move structs and for STORE_LCL_VAR(src) when src is not
a LCL_VAR we are generating STORE_OBJ and it caused many regressions.

@sandreenko sandreenko added arch-arm32 arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jun 24, 2020
@sandreenko sandreenko force-pushed the storeindImprovements branch from 3370f8f to 383144f Compare June 24, 2020 07:52
@sandreenko sandreenko force-pushed the storeindImprovements branch 2 times, most recently from 2325c40 to 29b2de5 Compare June 25, 2020 02:17
Sergey Andreenko added 9 commits June 25, 2020 13:35
Support contained `GT_LCL_VAR_ADDR, GT_LCL_FLD_ADDR` under `IND, STOREIND`.
Support contained `GT_LCL_FLD_ADDR` under `GT_STORE_IND, GT_IND`.
We were doing this for `STOREIND`, but forgetting for `STORE_OBJ/BLK`.
Extract without changes, I will need to add additional calls to them later.
Fix a few regressions with `NoRetyping`.
Gives a few improvements when addr has an index.
When it is possible and profitable.
For GC types we were not trying to contain addr(why?) when needed a barrier and it looked dangerous for me to do such a change(for 5.0).

For small types we were generating `movzx rax` for `IND byte` instead of `mov ax` .
@sandreenko sandreenko force-pushed the storeindImprovements branch from 29b2de5 to 428a86b Compare June 25, 2020 21:35
@sandreenko
Copy link
Contributor Author

PTAL @erozenfeld @dotnet/jit-contrib

@sandreenko sandreenko marked this pull request as ready for review June 25, 2020 21:43
Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor suggested changes.

@@ -700,10 +700,14 @@ Compiler::fgWalkResult Rationalizer::RewriteNode(GenTree** useEdge, Compiler::Ge
case GT_BLK:
// We should only see GT_BLK for TYP_STRUCT or for InitBlocks.
assert((node->TypeGet() == TYP_STRUCT) || use.User()->OperIsInitBlkOp());
// Clear the `GTF_IND_ASG_LHS` flag, which overlaps with `GTF_IND_REQ_ADDR_IN_REG`.
node->gtFlags &= ~GTF_IND_ASG_LHS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend moving the GT_BLK and GT_OBJ cases next to the GT_IND cases, so the common characteristics are clearer. You might even consider extracting them all to a common method, e.g. RewriteIndir(GenTreeIndir* indir).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed. I will need to do more changes there if I want to fix missed TGT_ANYWHERE flag, so it will be useful to have a separate method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent - thanks!

src/coreclr/src/jit/lower.cpp Outdated Show resolved Hide resolved
void LowerStoreIndir(GenTreeIndir* node);
GenTree* LowerAdd(GenTreeOp* node);
bool LowerUnsignedDivOrMod(GenTreeOp* divMod);
GenTree* LowerConstIntDivOrMod(GenTree* node);
GenTree* LowerSignedDivOrMod(GenTree* node);
void LowerBlockStore(GenTreeBlk* blkNode);
void LowerBlockStoreCommon(GenTreeBlk* blkNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something, these could be merged; I don't understand the reason for a separate method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean LowerBlockStore and LowerBlockStoreCommon? The first is defined separately for armarch and xarch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see - I missed that; thanks!

}
if (varTypeIsSIMD(regType))
{
// TODO: support STORE_IND SIMD16(SIMD16, CNT_INT 0).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the TODO below should probably be TODO-CQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I left a couple of minor comments.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants