Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never use heap for return buffers #112060

Merged
merged 20 commits into from
Feb 11, 2025
Merged

Never use heap for return buffers #112060

merged 20 commits into from
Feb 11, 2025

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 1, 2025

CI experiment for #111127

MyStruct Foo(string name, int age)
{
    return new MyStruct(name, age);
}

record struct MyStruct(string Name, int Age);

Was:

; Method Prog:Foo(System.String,int):MyStruct:this (FullOpts)
       push     rsi
       push     rbx
       mov      rbx, rdx
       mov      rdx, r8
       mov      esi, r9d
       mov      rcx, rbx
       call     CORINFO_HELP_CHECKED_ASSIGN_REF
       mov      dword ptr [rbx+0x08], esi
       mov      rax, rbx
       pop      rbx
       pop      rsi
       ret      
; Total bytes of code: 28

Now:

; Method Prog:Foo(System.String,int):MyStruct:this (FullOpts)
       mov      gword ptr [rdx], r8
       mov      dword ptr [rdx+0x08], r9d
       mov      rax, rdx
       ret      
; Total bytes of code: 11

where the write barrier is put at the callsite if needed (presumably, it happens rarely)

Updated stats for write-barriers after #112227 was merged (it is supposed to help reducing the number of bulk barriers):

aspnet-win-x64 SPMI collection:

CORINFO_HELP_ASSIGN_REF:          -0
CORINFO_HELP_ASSIGN_BYREF:      -123
CORINFO_HELP_CHECKED_ASSIGN_REF: -64
CORINFO_HELP_BULK_WRITEBARRIER:  -31

Looks like the aspnet collection has too many missed contexts currently (so the actual numbers are likely 5-10% higher)

MihuBot (PMI for BCL):

CORINFO_HELP_ASSIGN_REF:           -0
CORINFO_HELP_ASSIGN_BYREF:       -342
CORINFO_HELP_CHECKED_ASSIGN_REF: -838
CORINFO_HELP_BULK_WRITEBARRIER:  -300

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 1, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 3, 2025

/azp run Fuzzlyn

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Feb 4, 2025

/azp run runtime-coreclr jitstress, runtime-coreclr gcstress0x3-gcstress0xc, runtime-coreclr gcstress-extra, runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo, Fuzzlyn, runtime-coreclr pgostress, runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@jakobbotsch
Copy link
Member

Can you also update

GenTree* destAddr = comp->gtNewLclVarAddrNode(tmpNum, TYP_BYREF);
NewCallArg newArg = NewCallArg::Primitive(destAddr).WellKnown(WellKnownArg::RetBuffer);
call->gtArgs.InsertAfterThisOrFirst(comp, newArg);

to TYP_I_IMPL?

Also, if you want to you can fix up most of the pointer -> byref changes made in
#72720

However, I'm also ok with leaving that as is and I can clean it up some other time. But can you please change this part of the docs:

directly pass along its own return buffer parameter to DispatchTailCalls. It is
possible that this return buffer is pointing into GC heap, so the result is
always tracked as a byref in the mechanism.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 7, 2025

Final diffs (slightly better when I added "a call is returning a byref struct -> no copy needed".

I tried to benchmark the PR against Fortunes and FortunesEF but unfortunately, RPS is just too flaky to be able to detect changes (+/-5%).

If no other concern/feedback, can it be approved then? @jkotas @jakobbotsch

@@ -10793,6 +10793,7 @@ class Compiler
STRESS_MODE(POISON_IMPLICIT_BYREFS) \
STRESS_MODE(STORE_BLOCK_UNROLLING) \
STRESS_MODE(THREE_OPT_LAYOUT) \
STRESS_MODE(NONHEAP_RET_BUFFER) \
Copy link
Member

@jkotas jkotas Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we run any of these JIT stress modes with naot? If yes, we may need the helper implemented for naot too.

Also, I am not sure about the durable value of this stress mode and helper. I understand that the helper was useful when implementing the change. Do you think that there is high enough probability that we will passing the heap pointers for return buffers by mistake without noticing it in other ways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we run any of these JIT stress modes with naot? If yes, we may need the helper implemented for naot too.

Can't find any evidence that we run jitstress for NAOT even in outerloop and we definitely have no GCStress for it (#107850)

Also, I am not sure about the durable value of this stress mode and helper. I understand that the helper was useful when implementing the change. Do you think that there is high enough probability that we will passing the heap pointers for return buffers by mistake without noticing it in other ways?

I had it locally for R2R too. It seems my test apps fail badly if I remove the importer code that makes local copies instead of passing heap pointer, even without any explicit stress mode (and the helper), so I presume I can delete it

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@EgorBo
Copy link
Member Author

EgorBo commented Feb 8, 2025

@jakobbotsch @dotnet/jit-contrib does the jit side look good (beside leaving a few things to follow up PRs as improvements on top of it)

Comment on lines 12687 to 12690
if (op->OperIsScalarLocal() && (op->AsLclVarCommon()->GetLclNum() == impInlineRoot()->info.compRetBuffArg))
{
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's ok for this to return true without assigning lclVarTreeOut.
I think it would be better to add a new function that checks for the property we want, e.g. PointsOutsideHeap or similar. GenTreeIndir::IsAddressNotOnHeap could probably be switched to use it as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakobbotsch ah good idea, addressed

Comment on lines 959 to 969
if (op->OperIs(GT_ADD))
{
// If we have (base + offset), inspect the base. We assume someone else normalized the tree
// so the constant offset is always on the right.
GenTree* op2 = op->gtGetOp2();
if (op2->TypeIs(TYP_I_IMPL) && op2->IsCnsIntOrI() && !op2->IsIconHandle() &&
!fgIsBigOffset(op2->AsIntCon()->IconValue()))
{
return fgAddrCouldBeHeap(op->gtGetOp1());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use gtPeelOffsets here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to do it up before the check for op->OperIs(GT_LCL_ADDR), it might also get some cases on the retbuffer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already checked that it doesn't find anything new, but I guess wouldn't hurt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Comment on lines 854 to 859
GenTree* spilledCall = gtNewStoreLclVarNode(tmp, srcCall);
GenTree* comma = gtNewOperNode(GT_COMMA, store->TypeGet(), spilledCall,
gtNewLclvNode(tmp, lvaGetDesc(tmp)->TypeGet()));
store->Data() = comma;
comma->AsOp()->gtOp1 = impStoreStruct(spilledCall, curLevel, pAfterStmt, di, block);
return impStoreStruct(store, curLevel, pAfterStmt, di, block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have the problem here that this reorders the LHS of the store with the RHS. I think if the LHS has side effects/ordering effects we need to introduce a local and another comma for it to evaluate it before the call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakobbotsch can you elaborate? we spill the destination to a local (even before my change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the destination spilled to a local? I think if you call impStoreStruct with a store like STORE_BLK(Foo(), Bar()), then the code here will reorder Foo() so that it happens after Bar().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakobbotsch I think I've addressed it in 4a48e64 Presumably, GT_RET_EXPR doesn't need special treatment, as we don't spill call their by hands

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For inlining we already reorder things because of #112053, so regardless it's probably fine. Once that is fixed we can look into if anything is necessary here to keep the LHS before the call as well.

((store->AsIndir()->Addr()->gtFlags & GTF_ALL_EFFECT) != 0))
{
unsigned lclNum = lvaGrabTemp(true DEBUGARG("fgMakeTemp is creating a new local variable"));
impStoreToTemp(lclNum, store->AsIndir()->Addr(), curLevel, pAfterStmt, di, block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use impStoreStruct here since this function is called from outside import via gtNewTempStore. I think it needs to create a comma, or only call this function in some cases (see the checks below for GT_COMMA)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um.. what impStoreStruct are your referring here? did you mean impStoreToTemp? Also, seems like this function already appends stuff to statements so it's weird expect it to not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's just an implicit contract that when gtNewTempStore calls it - it does not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I meant impStoreTemp

seems like this function already appends stuff to statements so it's weird expect it to not?

Where does it do that? I think we only do that for the GT_COMMA case, and it has guards to ensure that only happens during import

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke offline and Egor convinced me that actually no reordering is happening here, so we don't need to do any spilling here. Sorry about that.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EgorBo EgorBo merged commit b1ab309 into dotnet:main Feb 11, 2025
110 of 112 checks passed
@EgorBo EgorBo deleted the non-heap-retbuf branch February 11, 2025 23:32
grendello added a commit to grendello/runtime that referenced this pull request Feb 12, 2025
* main:
  [Android] Run CoreCLR functional tests on Android (dotnet#112283)
  [LoongArch64] Fix some assertion failures for Debug ILC building Debug NativeAOT testcases. (dotnet#112229)
  Fix suspicious code fragments (dotnet#112384)
  `__ComObject` doesn't support dynamic interface map (dotnet#112375)
  Native DLLs: only load imported DLLs from System32 (dotnet#112359)
  [main] Update dependencies from dotnet/roslyn (dotnet#112314)
  Update SVE instructions that writes to GC regs (dotnet#112389)
  Bring up android+coreclr windows build.  (dotnet#112256)
  Never use heap for return buffers (dotnet#112060)
  Wait to complete the test before releasing the agile reference. (dotnet#112387)
  Prevent returning disposed HTTP/1.1 connections to the pool (dotnet#112383)
  Fingerprint dotnet.js if writing import map to html is enabled (dotnet#112407)
  Remove duplicate definition of CORECLR_HOSTING_API_LINKAGE (dotnet#112096)
  Update the exception message to reflect current behavior. (dotnet#112355)
  Use enum for frametype not v table (dotnet#112166)
  Enable AltJits build for LoongArch64 and RiscV64 (dotnet#110282)
  Guard members of MonoType union & fix related bugs (dotnet#111645)
  Add optional hooks for debugging OpenSSL memory allocations (dotnet#111539)
  JIT: Optimize struct parameter register accesses in the backend (dotnet#110819)
  NativeAOT: Cover more opcodes in type preinitializer (dotnet#112073)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants