Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never use heap for return buffers #112060

Merged
merged 20 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions docs/design/coreclr/botr/clr-abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,7 @@ Passing/returning structs according to hardware floating-point calling conventio

## Return buffers

The same applies to some return buffers. See `MethodTable::IsStructRequiringStackAllocRetBuf()`. When that returns `false`, the return buffer might be on the heap, either due to reflection/remoting code paths mentioned previously or due to a JIT optimization where a call with a return buffer that then assigns to a field (on the GC heap) are changed into passing the field reference as the return buffer. Conversely, when it returns true, the JIT does not need to use a write barrier when storing to the return buffer, but it is still not guaranteed to be a compiler temp, and as such the JIT should not introduce spurious writes to the return buffer.

NOTE: This optimization is now disabled for all platforms (`IsStructRequiringStackAllocRetBuf()` always returns `false`).
Since .NET 10, return buffers must always be allocated on the stack by the caller. After the call, the caller is responsible for copying the return buffer to the final destination using write barriers if necessary. The JIT can assume that the return buffer is always on the stack and may optimize accordingly, such as by omitting write barriers when writing GC pointers to the return buffer. In addition, the buffer is allowed to be used for temporary storage within the method since its content must not be aliased or cross-thread visible.

ARM64-only: When a method returns a structure that is larger than 16 bytes the caller reserves a return buffer of sufficient size and alignment to hold the result. The address of the buffer is passed as an argument to the method in `R8` (defined in the JIT as `REG_ARG_RET_BUFF`). The callee isn't required to preserve the value stored in `R8`.

Expand Down Expand Up @@ -778,9 +776,7 @@ The return value is handled as follows:
1. Floating-point values are returned on the top of the hardware FP stack.
2. Integers up to 32 bits long are returned in EAX.
3. 64-bit integers are passed with EAX holding the least significant 32 bits and EDX holding the most significant 32 bits.
4. All other cases require the use of a return buffer, through which the value is returned.

In addition, there is a guarantee that if a return buffer is used a value is stored there only upon ordinary exit from the method. The buffer is not allowed to be used for temporary storage within the method and its contents will be unaltered if an exception occurs while executing the method.
4. All other cases require the use of a return buffer, through which the value is returned. See [Return buffers](#return-buffers).

# Control Flow Guard (CFG) support on Windows

Expand Down
6 changes: 3 additions & 3 deletions docs/design/features/tailcalls-with-helpers.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,9 +259,9 @@ stack, even on ARM architectures, due to its return address hijacking mechanism.

When the result is returned by value the JIT will introduce a local and pass a
pointer to it in the second argument. For ret bufs the JIT will typically
directly pass along its own return buffer parameter to DispatchTailCalls. It is
possible that this return buffer is pointing into GC heap, so the result is
always tracked as a byref in the mechanism.
directly pass along its own return buffer parameter to DispatchTailCalls. The
return buffer is always located on the stack, so the result does not need to be
tracked as a byref.

In certain cases the target function pointer is also stored. For some targets
this might require the JIT to perform the equivalent of `ldvirtftn` or `ldftn`
Expand Down
5 changes: 3 additions & 2 deletions src/coreclr/inc/readytorun.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@
// src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h
// If you update this, ensure you run `git grep MINIMUM_READYTORUN_MAJOR_VERSION`
// and handle pending work.
#define READYTORUN_MAJOR_VERSION 11
#define READYTORUN_MAJOR_VERSION 12
#define READYTORUN_MINOR_VERSION 0x0000

#define MINIMUM_READYTORUN_MAJOR_VERSION 11
#define MINIMUM_READYTORUN_MAJOR_VERSION 12

// R2R Version 2.1 adds the InliningInfo section
// R2R Version 2.2 adds the ProfileDataInfo section
Expand All @@ -39,6 +39,7 @@
// R2R Version 10.0 adds support for the statics being allocated on a per type basis instead of on a per module basis, disable support for LogMethodEnter helper
// R2R Version 10.1 adds Unbox_TypeTest helper
// R2R Version 11 uses GCInfo v4, which encodes safe points without -1 offset and does not track return kinds in GCInfo
// R2R Version 12 requires all return buffers to be always on the stack

struct READYTORUN_CORE_HEADER
{
Expand Down
7 changes: 4 additions & 3 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -4938,7 +4938,7 @@ class Compiler
Statement** pAfterStmt = nullptr,
const DebugInfo& di = DebugInfo(),
BasicBlock* block = nullptr);
GenTree* impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel);
GenTree* impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel, GenTreeFlags indirFlags = GTF_EMPTY);

GenTree* impGetNodeAddr(GenTree* val, unsigned curLevel, GenTreeFlags* pDerefFlags);

Expand Down Expand Up @@ -5228,8 +5228,8 @@ class Compiler
static LONG jitNestingLevel;
#endif // DEBUG

static bool impIsInvariant(const GenTree* tree);
static bool impIsAddressInLocal(const GenTree* tree, GenTree** lclVarTreeOut = nullptr);
bool impIsInvariant(const GenTree* tree);
bool impIsAddressInLocal(const GenTree* tree, GenTree** lclVarTreeOut = nullptr);

void impMakeDiscretionaryInlineObservations(InlineInfo* pInlineInfo, InlineResult* inlineResult);

Expand Down Expand Up @@ -6793,6 +6793,7 @@ class Compiler

public:
bool fgAddrCouldBeNull(GenTree* addr);
bool fgAddrCouldBeHeap(GenTree* addr);
void fgAssignSetVarDef(GenTree* tree);

private:
Expand Down
36 changes: 36 additions & 0 deletions src/coreclr/jit/flowgraph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -928,6 +928,42 @@ bool Compiler::fgAddrCouldBeNull(GenTree* addr)
return true; // default result: addr could be null.
}

//------------------------------------------------------------------------------
// fgAddrCouldBeHeap: Check whether the address tree may represent a heap address.
//
// Arguments:
// addr - Address to check
//
// Return Value:
// True if address could be a heap address; false otherwise (i.e. stack, native memory, etc.)
//
bool Compiler::fgAddrCouldBeHeap(GenTree* addr)
{
GenTree* op = addr;
while (op->OperIs(GT_FIELD_ADDR) && op->AsFieldAddr()->IsInstance())
{
op = op->AsFieldAddr()->GetFldObj();
}

target_ssize_t offset;
gtPeelOffsets(&op, &offset);

// Ignore the offset for locals

if (op->OperIs(GT_LCL_ADDR))
{
return false;
}

if (op->OperIsScalarLocal() && (op->AsLclVarCommon()->GetLclNum() == impInlineRoot()->info.compRetBuffArg))
{
// RetBuf is known to be on the stack
return false;
}

return true;
}

//------------------------------------------------------------------------------
// fgOptimizeDelegateConstructor: try and optimize construction of a delegate
//
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18301,7 +18301,7 @@ bool GenTreeIndir::IsAddressNotOnHeap(Compiler* comp)
return true;
}

if (HasBase() && Base()->gtSkipReloadOrCopy()->OperIs(GT_LCL_ADDR))
if (HasBase() && !comp->fgAddrCouldBeHeap(Base()->gtSkipReloadOrCopy()))
{
return true;
}
Expand Down
66 changes: 57 additions & 9 deletions src/coreclr/jit/importer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -843,7 +843,32 @@ GenTree* Compiler::impStoreStruct(GenTree* store,
// TODO-Bug?: verify if flags matter here
GenTreeFlags indirFlags = GTF_EMPTY;
GenTree* destAddr = impGetNodeAddr(store, CHECK_SPILL_ALL, &indirFlags);
NewCallArg newArg = NewCallArg::Primitive(destAddr).WellKnown(wellKnownArgType);

// Make sure we don't pass something other than a local address to the return buffer arg.
// It is allowed to pass current's method return buffer as it is a local too.
if (fgAddrCouldBeHeap(destAddr) && !eeIsByrefLike(srcCall->gtRetClsHnd))
{
unsigned tmp = lvaGrabTemp(false DEBUGARG("stack copy for value returned via return buffer"));
lvaSetStruct(tmp, srcCall->gtRetClsHnd, false);

// Make sure the address is evaluated first if needed.
if (store->OperIs(GT_STOREIND, GT_STORE_BLK) &&
((store->AsIndir()->Addr()->gtFlags & GTF_ALL_EFFECT) != 0))
{
unsigned lclNum = lvaGrabTemp(true DEBUGARG("fgMakeTemp is creating a new local variable"));
impStoreToTemp(lclNum, store->AsIndir()->Addr(), curLevel, pAfterStmt, di, block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use impStoreStruct here since this function is called from outside import via gtNewTempStore. I think it needs to create a comma, or only call this function in some cases (see the checks below for GT_COMMA)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um.. what impStoreStruct are your referring here? did you mean impStoreToTemp? Also, seems like this function already appends stuff to statements so it's weird expect it to not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's just an implicit contract that when gtNewTempStore calls it - it does not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I meant impStoreTemp

seems like this function already appends stuff to statements so it's weird expect it to not?

Where does it do that? I think we only do that for the GT_COMMA case, and it has guards to ensure that only happens during import

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke offline and Egor convinced me that actually no reordering is happening here, so we don't need to do any spilling here. Sorry about that.

store->AsIndir()->Addr() = gtNewLclvNode(lclNum, genActualType(store->AsIndir()->Addr()));
}

GenTree* spilledCall = gtNewStoreLclVarNode(tmp, srcCall);
spilledCall = impStoreStruct(spilledCall, curLevel, pAfterStmt, di, block);
store->Data() = gtNewOperNode(GT_COMMA, store->TypeGet(), spilledCall,
gtNewLclvNode(tmp, lvaGetDesc(tmp)->TypeGet()));

return impStoreStruct(store, curLevel, pAfterStmt, di, block);
}

NewCallArg newArg = NewCallArg::Primitive(destAddr).WellKnown(wellKnownArgType);

if (destAddr->OperIs(GT_LCL_ADDR))
{
Expand Down Expand Up @@ -953,6 +978,27 @@ GenTree* Compiler::impStoreStruct(GenTree* store,
// TODO-Bug?: verify if flags matter here
GenTreeFlags indirFlags = GTF_EMPTY;
GenTree* destAddr = impGetNodeAddr(store, CHECK_SPILL_ALL, &indirFlags);

// Make sure we don't pass something other than a local address to the return buffer arg.
// It is allowed to pass current's method return buffer as it is a local too.
if (fgAddrCouldBeHeap(destAddr) && !eeIsByrefLike(call->gtRetClsHnd))
{
unsigned tmp = lvaGrabTemp(false DEBUGARG("stack copy for value returned via return buffer"));
lvaSetStruct(tmp, call->gtRetClsHnd, false);
destAddr = gtNewLclVarAddrNode(tmp, TYP_I_IMPL);

// Insert address of temp into existing call
NewCallArg retBufArg = NewCallArg::Primitive(destAddr).WellKnown(WellKnownArg::RetBuffer);
call->gtArgs.InsertAfterThisOrFirst(this, retBufArg);

// Now the store needs to copy from the new temp instead.
call->gtType = TYP_VOID;
src->gtType = TYP_VOID;
var_types tmpType = lvaGetDesc(tmp)->TypeGet();
store->Data() = gtNewOperNode(GT_COMMA, tmpType, src, gtNewLclvNode(tmp, tmpType));
return impStoreStruct(store, CHECK_SPILL_ALL, pAfterStmt, di, block);
}

call->gtArgs.InsertAfterThisOrFirst(this,
NewCallArg::Primitive(destAddr).WellKnown(WellKnownArg::RetBuffer));

Expand Down Expand Up @@ -1010,21 +1056,22 @@ GenTree* Compiler::impStoreStruct(GenTree* store,
// impStoreStructPtr: Store (copy) the structure from 'src' to 'destAddr'.
//
// Arguments:
// destAddr - address of the destination of the store
// value - value to store
// curLevel - stack level for which a spill may be being done
// destAddr - address of the destination of the store
// value - value to store
// curLevel - stack level for which a spill may be being done
// indirFlags - flags to be used on the store node
//
// Return Value:
// The tree that should be appended to the statement list that represents the store.
//
// Notes:
// Temp stores may be appended to impStmtList if spilling is necessary.
//
GenTree* Compiler::impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel)
GenTree* Compiler::impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel, GenTreeFlags indirFlags)
{
var_types type = value->TypeGet();
ClassLayout* layout = (type == TYP_STRUCT) ? value->GetLayout(this) : nullptr;
GenTree* store = gtNewStoreValueNode(type, layout, destAddr, value);
GenTree* store = gtNewStoreValueNode(type, layout, destAddr, value, indirFlags);
store = impStoreStruct(store, curLevel);

return store;
Expand Down Expand Up @@ -11247,18 +11294,19 @@ bool Compiler::impReturnInstruction(int prefixFlags, OPCODE& opcode)

if (info.compRetBuffArg != BAD_VAR_NUM)
{
var_types retBuffType = lvaGetDesc(info.compRetBuffArg)->TypeGet();
// Assign value to return buff (first param)
GenTree* retBuffAddr =
gtNewLclvNode(info.compRetBuffArg, TYP_BYREF DEBUGARG(impCurStmtDI.GetLocation().GetOffset()));
gtNewLclvNode(info.compRetBuffArg, retBuffType DEBUGARG(impCurStmtDI.GetLocation().GetOffset()));

op2 = impStoreStructPtr(retBuffAddr, op2, CHECK_SPILL_ALL);
op2 = impStoreStructPtr(retBuffAddr, op2, CHECK_SPILL_ALL, GTF_IND_TGT_NOT_HEAP);
impAppendTree(op2, CHECK_SPILL_NONE, impCurStmtDI);

// There are cases where the address of the implicit RetBuf should be returned explicitly.
//
if (compMethodReturnsRetBufAddr())
{
op1 = gtNewOperNode(GT_RETURN, TYP_BYREF, gtNewLclvNode(info.compRetBuffArg, TYP_BYREF));
op1 = gtNewOperNode(GT_RETURN, retBuffType, gtNewLclvNode(info.compRetBuffArg, retBuffType));
}
else
{
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/lclvars.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ void Compiler::lvaInitRetBuffArg(InitVarDscInfo* varDscInfo, bool useFixedRetBuf
info.compRetBuffArg = varDscInfo->varNum;

LclVarDsc* varDsc = varDscInfo->varDsc;
varDsc->lvType = TYP_BYREF;
varDsc->lvType = TYP_I_IMPL;
varDsc->lvIsParam = 1;
varDsc->lvIsRegArg = 0;

Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/rationalize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ void Rationalizer::RewriteNodeAsCall(GenTree** use,
tmpNum = comp->lvaGrabTemp(true DEBUGARG("return buffer for hwintrinsic"));
comp->lvaSetStruct(tmpNum, sig->retTypeClass, false);

GenTree* destAddr = comp->gtNewLclVarAddrNode(tmpNum, TYP_BYREF);
GenTree* destAddr = comp->gtNewLclVarAddrNode(tmpNum, TYP_I_IMPL);
NewCallArg newArg = NewCallArg::Primitive(destAddr).WellKnown(WellKnownArg::RetBuffer);

call->gtArgs.InsertAfterThisOrFirst(comp, newArg);
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ struct ReadyToRunHeaderConstants
{
static const uint32_t Signature = 0x00525452; // 'RTR'

static const uint32_t CurrentMajorVersion = 11;
static const uint32_t CurrentMajorVersion = 12;
static const uint32_t CurrentMinorVersion = 0;
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ internal struct ReadyToRunHeaderConstants
{
public const uint Signature = 0x00525452; // 'RTR'

public const ushort CurrentMajorVersion = 11;
public const ushort CurrentMajorVersion = 12;
public const ushort CurrentMinorVersion = 0;
}
#if READYTORUN
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -182,13 +182,6 @@ private void FakeGcScanRoots(MethodDesc method, ArgIterator argit, CORCOMPILE_GC
return;
}

// Also if the method has a return buffer, then it is the first argument, and could be an interior ref,
// so always promote it.
if (argit.HasRetBuffArg())
{
frame[_transitionBlock.GetRetBuffArgOffset(argit.HasThis)] = CORCOMPILE_GCREFMAP_TOKENS.GCREFMAP_INTERIOR;
}

//
// Now iterate the arguments
//
Expand Down
14 changes: 0 additions & 14 deletions src/coreclr/vm/frames.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1376,13 +1376,6 @@ void TransitionFrame::PromoteCallerStackHelper(promote_func* fn, ScanContext* sc
(fn)(PTR_PTR_Object(pThis), sc, CHECK_APP_DOMAIN);
}

if (argit.HasRetBuffArg())
{
PTR_PTR_VOID pRetBuffArg = dac_cast<PTR_PTR_VOID>(pTransitionBlock + argit.GetRetBuffArgOffset());
LOG((LF_GC, INFO3, " ret buf Argument promoted from" FMT_ADDR "\n", DBG_ADDR(*pRetBuffArg) ));
PromoteCarefully(fn, PTR_PTR_Object(pRetBuffArg), sc, GC_CALL_INTERIOR|CHECK_APP_DOMAIN);
}

int argOffset;
while ((argOffset = argit.GetNextOffset()) != TransitionBlock::InvalidOffset)
{
Expand Down Expand Up @@ -1982,13 +1975,6 @@ void FakeGcScanRoots(MetaSig& msig, ArgIterator& argit, MethodDesc * pMD, BYTE *
return;
}

// Also if the method has a return buffer, then it is the first argument, and could be an interior ref,
// so always promote it.
if (argit.HasRetBuffArg())
{
FakePromote((Object **)(pFrame + argit.GetRetBuffArgOffset()), &sc, GC_CALL_INTERIOR);
}

//
// Now iterate the arguments
//
Expand Down
31 changes: 29 additions & 2 deletions src/coreclr/vm/reflectioninvocation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,9 @@ extern "C" void QCALLTYPE RuntimeMethodHandle_InvokeMethod(
// If an exception occurs a gc may happen but we are going to dump the stack anyway and we do
// not need to protect anything.

// Allocate a local buffer for the return buffer if necessary
PVOID pLocalRetBuf = nullptr;

{
BEGINFORBIDGC();
#ifdef _DEBUG
Expand All @@ -501,8 +504,19 @@ extern "C" void QCALLTYPE RuntimeMethodHandle_InvokeMethod(
// Take care of any return arguments
if (fHasRetBuffArg)
{
PVOID pRetBuff = gc.retVal->GetData();
*((LPVOID*) (pTransitionBlock + argit.GetRetBuffArgOffset())) = pRetBuff;
_ASSERT(hasValueTypeReturn);
PTR_MethodTable pMT = retTH.GetMethodTable();
size_t localRetBufSize = retTH.GetSize();

// Allocate a local buffer. The invoked method will write the return value to this
// buffer which will be copied to gc.retVal later.
pLocalRetBuf = _alloca(localRetBufSize);
ZeroMemory(pLocalRetBuf, localRetBufSize);
*((LPVOID*) (pTransitionBlock + argit.GetRetBuffArgOffset())) = pLocalRetBuf;
if (pMT->ContainsGCPointers())
{
pValueClasses = new (_alloca(sizeof(ValueClassInfo))) ValueClassInfo(pLocalRetBuf, pMT, pValueClasses);
}
}

// copy args
Expand Down Expand Up @@ -572,6 +586,19 @@ extern "C" void QCALLTYPE RuntimeMethodHandle_InvokeMethod(
// Call the method
CallDescrWorkerWithHandler(&callDescrData);

if (fHasRetBuffArg)
{
// Copy the return value from the return buffer to the object
if (retTH.GetMethodTable()->ContainsGCPointers())
{
memmoveGCRefs(gc.retVal->GetData(), pLocalRetBuf, retTH.GetSize());
}
else
{
memcpyNoGCRefs(gc.retVal->GetData(), pLocalRetBuf, retTH.GetSize());
}
}

// It is still illegal to do a GC here. The return type might have/contain GC pointers.
if (fConstructor)
{
Expand Down
Loading