-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Never use heap for return buffers #112060
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
a72ec93
to
e00043b
Compare
34edd40
to
f9506f7
Compare
f9506f7
to
d379d38
Compare
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr gcstress0x3-gcstress0xc, runtime-coreclr gcstress-extra, runtime-coreclr libraries-jitstress, runtime-coreclr libraries-pgo, Fuzzlyn, runtime-coreclr pgostress, runtime-coreclr outerloop |
Azure Pipelines successfully started running 8 pipeline(s). |
Can you also update runtime/src/coreclr/jit/rationalize.cpp Lines 160 to 163 in cee8434
to TYP_I_IMPL ?
Also, if you want to you can fix up most of the pointer -> byref changes made in However, I'm also ok with leaving that as is and I can clean it up some other time. But can you please change this part of the docs: runtime/docs/design/features/tailcalls-with-helpers.md Lines 262 to 264 in cee8434
|
Final diffs (slightly better when I added "a call is returning a byref struct -> no copy needed". I tried to benchmark the PR against Fortunes and FortunesEF but unfortunately, RPS is just too flaky to be able to detect changes (+/-5%). If no other concern/feedback, can it be approved then? @jkotas @jakobbotsch |
src/coreclr/jit/compiler.h
Outdated
@@ -10793,6 +10793,7 @@ class Compiler | |||
STRESS_MODE(POISON_IMPLICIT_BYREFS) \ | |||
STRESS_MODE(STORE_BLOCK_UNROLLING) \ | |||
STRESS_MODE(THREE_OPT_LAYOUT) \ | |||
STRESS_MODE(NONHEAP_RET_BUFFER) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we run any of these JIT stress modes with naot? If yes, we may need the helper implemented for naot too.
Also, I am not sure about the durable value of this stress mode and helper. I understand that the helper was useful when implementing the change. Do you think that there is high enough probability that we will passing the heap pointers for return buffers by mistake without noticing it in other ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we run any of these JIT stress modes with naot? If yes, we may need the helper implemented for naot too.
Can't find any evidence that we run jitstress for NAOT even in outerloop and we definitely have no GCStress for it (#107850)
Also, I am not sure about the durable value of this stress mode and helper. I understand that the helper was useful when implementing the change. Do you think that there is high enough probability that we will passing the heap pointers for return buffers by mistake without noticing it in other ways?
I had it locally for R2R too. It seems my test apps fail badly if I remove the importer
code that makes local copies instead of passing heap pointer, even without any explicit stress mode (and the helper), so I presume I can delete it
7b17d37
to
0764818
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@jakobbotsch @dotnet/jit-contrib does the jit side look good (beside leaving a few things to follow up PRs as improvements on top of it) |
src/coreclr/jit/importer.cpp
Outdated
if (op->OperIsScalarLocal() && (op->AsLclVarCommon()->GetLclNum() == impInlineRoot()->info.compRetBuffArg)) | ||
{ | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's ok for this to return true without assigning lclVarTreeOut
.
I think it would be better to add a new function that checks for the property we want, e.g. PointsOutsideHeap
or similar. GenTreeIndir::IsAddressNotOnHeap
could probably be switched to use it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobbotsch ah good idea, addressed
src/coreclr/jit/flowgraph.cpp
Outdated
if (op->OperIs(GT_ADD)) | ||
{ | ||
// If we have (base + offset), inspect the base. We assume someone else normalized the tree | ||
// so the constant offset is always on the right. | ||
GenTree* op2 = op->gtGetOp2(); | ||
if (op2->TypeIs(TYP_I_IMPL) && op2->IsCnsIntOrI() && !op2->IsIconHandle() && | ||
!fgIsBigOffset(op2->AsIntCon()->IconValue())) | ||
{ | ||
return fgAddrCouldBeHeap(op->gtGetOp1()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use gtPeelOffsets
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to do it up before the check for op->OperIs(GT_LCL_ADDR)
, it might also get some cases on the retbuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already checked that it doesn't find anything new, but I guess wouldn't hurt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
src/coreclr/jit/importer.cpp
Outdated
GenTree* spilledCall = gtNewStoreLclVarNode(tmp, srcCall); | ||
GenTree* comma = gtNewOperNode(GT_COMMA, store->TypeGet(), spilledCall, | ||
gtNewLclvNode(tmp, lvaGetDesc(tmp)->TypeGet())); | ||
store->Data() = comma; | ||
comma->AsOp()->gtOp1 = impStoreStruct(spilledCall, curLevel, pAfterStmt, di, block); | ||
return impStoreStruct(store, curLevel, pAfterStmt, di, block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have the problem here that this reorders the LHS of the store with the RHS. I think if the LHS has side effects/ordering effects we need to introduce a local and another comma for it to evaluate it before the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobbotsch can you elaborate? we spill the destination to a local (even before my change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the destination spilled to a local? I think if you call impStoreStruct
with a store like STORE_BLK(Foo(), Bar())
, then the code here will reorder Foo()
so that it happens after Bar()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobbotsch I think I've addressed it in 4a48e64 Presumably, GT_RET_EXPR doesn't need special treatment, as we don't spill call their by hands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For inlining we already reorder things because of #112053, so regardless it's probably fine. Once that is fixed we can look into if anything is necessary here to keep the LHS before the call as well.
src/coreclr/jit/importer.cpp
Outdated
((store->AsIndir()->Addr()->gtFlags & GTF_ALL_EFFECT) != 0)) | ||
{ | ||
unsigned lclNum = lvaGrabTemp(true DEBUGARG("fgMakeTemp is creating a new local variable")); | ||
impStoreToTemp(lclNum, store->AsIndir()->Addr(), curLevel, pAfterStmt, di, block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use impStoreStruct
here since this function is called from outside import via gtNewTempStore
. I think it needs to create a comma, or only call this function in some cases (see the checks below for GT_COMMA
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
um.. what impStoreStruct
are your referring here? did you mean impStoreToTemp? Also, seems like this function already appends stuff to statements so it's weird expect it to not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's just an implicit contract that when gtNewTempStore calls it - it does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I meant impStoreTemp
seems like this function already appends stuff to statements so it's weird expect it to not?
Where does it do that? I think we only do that for the GT_COMMA
case, and it has guards to ensure that only happens during import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spoke offline and Egor convinced me that actually no reordering is happening here, so we don't need to do any spilling here. Sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* main: [Android] Run CoreCLR functional tests on Android (dotnet#112283) [LoongArch64] Fix some assertion failures for Debug ILC building Debug NativeAOT testcases. (dotnet#112229) Fix suspicious code fragments (dotnet#112384) `__ComObject` doesn't support dynamic interface map (dotnet#112375) Native DLLs: only load imported DLLs from System32 (dotnet#112359) [main] Update dependencies from dotnet/roslyn (dotnet#112314) Update SVE instructions that writes to GC regs (dotnet#112389) Bring up android+coreclr windows build. (dotnet#112256) Never use heap for return buffers (dotnet#112060) Wait to complete the test before releasing the agile reference. (dotnet#112387) Prevent returning disposed HTTP/1.1 connections to the pool (dotnet#112383) Fingerprint dotnet.js if writing import map to html is enabled (dotnet#112407) Remove duplicate definition of CORECLR_HOSTING_API_LINKAGE (dotnet#112096) Update the exception message to reflect current behavior. (dotnet#112355) Use enum for frametype not v table (dotnet#112166) Enable AltJits build for LoongArch64 and RiscV64 (dotnet#110282) Guard members of MonoType union & fix related bugs (dotnet#111645) Add optional hooks for debugging OpenSSL memory allocations (dotnet#111539) JIT: Optimize struct parameter register accesses in the backend (dotnet#110819) NativeAOT: Cover more opcodes in type preinitializer (dotnet#112073)
CI experiment for #111127
Was:
Now:
where the write barrier is put at the callsite if needed (presumably, it happens rarely)
Updated stats for write-barriers after #112227 was merged (it is supposed to help reducing the number of bulk barriers):
aspnet-win-x64 SPMI collection:
Looks like the aspnet collection has too many missed contexts currently (so the actual numbers are likely 5-10% higher)
MihuBot (PMI for BCL):