Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GT_STORE_BLK - do not call memset for blocks containg gc pointers on heap #95530

Merged
merged 27 commits into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4ecfc65
Loop-based impl
EgorBo Dec 3, 2023
cb23e5c
Add asserts
EgorBo Dec 3, 2023
063cd10
Add another test case
EgorBo Dec 3, 2023
90e53b1
Update StructWithGC_Zeroing.csproj
EgorBo Dec 3, 2023
24a6046
Update StructWithGC_Zeroing.cs
EgorBo Dec 3, 2023
31dc4a8
clean up
EgorBo Dec 3, 2023
2e72231
Fix arm32
EgorBo Dec 3, 2023
8cdf5fa
Merge branch 'main' of github.com:dotnet/runtime into fix-memset-bug
EgorBo Dec 4, 2023
5d9a1db
Address feedback
EgorBo Dec 5, 2023
3b5d1a9
Add a small note
EgorBo Dec 5, 2023
1b3208d
Add RISC-V and LA64
EgorBo Dec 6, 2023
a129a55
Address feedback
EgorBo Dec 6, 2023
a86ecf5
fix build
EgorBo Dec 6, 2023
133c140
fix build
EgorBo Dec 6, 2023
4349a0a
CI test
EgorBo Dec 6, 2023
1a335cb
Merge branch 'main' of github.com:dotnet/runtime into fix-memset-bug
EgorBo Dec 6, 2023
2f135b7
Fix build
EgorBo Dec 6, 2023
3489223
Clean up
EgorBo Dec 6, 2023
2da97d5
Merge branch 'main' of github.com:dotnet/runtime into fix-memset-bug
EgorBo Dec 7, 2023
e4df08a
Apply suggestion
EgorBo Dec 7, 2023
32425a9
Use REG_R0 on RISC-V and LA64, use ZR on ARM64
EgorBo Dec 8, 2023
52119d1
fix build
EgorBo Dec 8, 2023
98a4096
Clean up
EgorBo Dec 8, 2023
e0f50a4
Merge branch 'main' of github.com:dotnet/runtime into fix-memset-bug
EgorBo Jan 4, 2024
73b2066
Update src/coreclr/jit/codegenloongarch64.cpp
EgorBo Jan 4, 2024
4cfcea9
Apply suggestions for risc-v
EgorBo Jan 4, 2024
ef044b8
Merge branch 'fix-memset-bug' of github.com:EgorBo/runtime-1 into fix…
EgorBo Jan 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 35 additions & 9 deletions docs/design/coreclr/botr/clr-abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ For non-rude thread abort, the VM walks the stack, running any catch handler tha

For example:

```
```cs
try { // try 1
try { // try 2
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -223,7 +223,7 @@ L:

In this case, if the address returned in catch 2 corresponding to label L is outside try 1, then the ThreadAbortException re-raised by the VM will not be caught by catch 1, as is expected. The JIT needs to insert a block such that this is the effective code generation:

```
```cs
try { // try 1
try { // try 2
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -240,7 +240,7 @@ L:

Similarly, the automatic re-raise address for a ThreadAbortException can't be within a finally handler, or the VM will abort the re-raise and swallow the exception. This can happen due to call-to-finally thunks marked as "cloned finally", as described above. For example (this is pseudo-assembly-code, not C#):

```
```cs
try { // try 1
try { // try 2
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -256,7 +256,7 @@ L:

This would generate something like:

```
```asm
// beginning of 'try 1'
// beginning of 'try 2'
System.Threading.Thread.CurrentThread.Abort();
Expand All @@ -281,7 +281,7 @@ Finally1:

Note that the JIT must already insert a "step" block so the finally will be called. However, this isn't sufficient to support ThreadAbortException processing, because "L1" is marked as "cloned finally". In this case, the JIT must insert another step block that is within "try 1" but outside the cloned finally block, that will allow for correct re-raise semantics. For example:

```
```asm
// beginning of 'try 1'
// beginning of 'try 2'
System.Threading.Thread.CurrentThread.Abort();
Expand Down Expand Up @@ -399,7 +399,7 @@ To implement this requirement, for any function with EH, we create a frame-local

Note that the since a slot on x86 is 4 bytes, the minimum size is 16 bytes. The idea is to have 1 slot for each handler that could be possibly be invoked at the same time. For example, for:

```
```cs
try {
...
} catch {
Expand All @@ -419,7 +419,7 @@ When calling a finally, we set the appropriate level to 0xFC (aka "finally call"

Thus, calling a finally from JIT generated code looks like:

```
```asm
mov dword ptr [L_02+0x4 ebp-10H], 0 // This must happen before the 0xFC is written
mov dword ptr [L_02+0x8 ebp-0CH], 252 // 0xFC
push G_M52300_IG07
Expand All @@ -430,7 +430,7 @@ In this case, `G_M52300_IG07` is not the address after the 'jmp', so a simple 'c

The code this finally returns to looks like this:

```
```asm
mov dword ptr [L_02+0x8 ebp-0CH], 0
jmp SHORT G_M52300_IG05
```
Expand Down Expand Up @@ -479,7 +479,7 @@ Because a main function body will **always** be on the stack when one of its fun

There is one "corner case" in the VM implementation of WantsReportOnlyLeaf model that has implications for the code the JIT is allowed to generate. Consider this function with nested exception handling:

```
```cs
public void runtest() {
try {
try {
Expand Down Expand Up @@ -804,3 +804,29 @@ In addition to the usual registers it also preserves all float registers and `rc
`CORINFO_HELP_DISPATCH_INDIRECT_CALL` takes the call address in `rax` and it reserves the right to use and trash `r10` and `r11`.
The JIT uses the dispatch helper on x64 whenever possible as it is expected that the code size benefits outweighs the less accurate branch prediction.
However, note that the use of `r11` in the dispatcher makes it incompatible with VSD calls where the JIT must fall back to the validator and a manual call.

# Notes on Memset/Memcpy

Generally, `memset` and `memcpy` do not provide any guarantees of atomicity. This implies that they should only be used when the memory being modified by `memset`/`memcpy` is not observable by any other thread (including GC), or when there are no atomicity requirements according to our [Memory Model](../../specs/Memory-model.md). It's especially important when we modify heap containing managed pointers - those must be updated atomically, e.g. using pointer-sized `mov` instruction (managed pointers are always aligned) - see [Atomic Memory Access](../../specs/Memory-model.md#Atomic-memory-accesses). It's worth noting that by "update" it's implied "set to zero", otherwise, we need a write barrier.

Examples:

```cs
struct MyStruct
{
long a;
string b;
}

void Test1(ref MyStruct m)
{
// We're not allowed to use memset here
m = default;
}

MyStruct Test2()
{
// We can use memset here
return default;
}
```
1 change: 1 addition & 0 deletions src/coreclr/jit/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -1240,6 +1240,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#ifndef TARGET_X86
void genCodeForInitBlkHelper(GenTreeBlk* initBlkNode);
#endif
void genCodeForInitBlkLoop(GenTreeBlk* initBlkNode);
void genCodeForInitBlkRepStos(GenTreeBlk* initBlkNode);
void genCodeForInitBlkUnroll(GenTreeBlk* initBlkNode);
void genJumpTable(GenTree* tree);
Expand Down
71 changes: 71 additions & 0 deletions src/coreclr/jit/codegenarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3243,6 +3243,72 @@ void CodeGen::genCodeForInitBlkHelper(GenTreeBlk* initBlkNode)
genEmitHelperCall(CORINFO_HELP_MEMSET, 0, EA_UNKNOWN);
}

//------------------------------------------------------------------------
// genCodeForInitBlkLoop - Generate code for an InitBlk using an inlined for-loop.
// It's needed for cases when size is too big to unroll and we're not allowed
// to use memset call due to atomicity requirements.
//
// Arguments:
// initBlkNode - the GT_STORE_BLK node
//
void CodeGen::genCodeForInitBlkLoop(GenTreeBlk* initBlkNode)
{
GenTree* const dstNode = initBlkNode->Addr();
genConsumeReg(dstNode);
const regNumber dstReg = dstNode->GetRegNum();

#ifndef TARGET_ARM64
GenTree* const zeroNode = initBlkNode->Data();
genConsumeReg(zeroNode);
const regNumber zeroReg = zeroNode->GetRegNum();
#else
const regNumber zeroReg = REG_ZR;
#endif

if (initBlkNode->IsVolatile())
{
// issue a full memory barrier before a volatile initBlock Operation
instGen_MemoryBarrier();
}

// str xzr, [dstReg]
// mov offsetReg, <block size>
//.LOOP:
// str xzr, [dstReg, offsetReg]
// subs offsetReg, offsetReg, #8
// bne .LOOP

const unsigned size = initBlkNode->GetLayout()->GetSize();
assert((size >= TARGET_POINTER_SIZE) && ((size % TARGET_POINTER_SIZE) == 0));

// The loop is reversed - it makes it smaller.
// Although, we zero the first pointer before the loop (the loop doesn't zero it)
// it works as a nullcheck, otherwise the first iteration would try to access
// "null + potentially large offset" and hit AV.
GetEmitter()->emitIns_R_R(INS_str, EA_PTRSIZE, zeroReg, dstReg);
if (size > TARGET_POINTER_SIZE)
{
// Extend liveness of dstReg in case if it gets killed by the store.
gcInfo.gcMarkRegPtrVal(dstReg, dstNode->TypeGet());

const regNumber offsetReg = initBlkNode->GetSingleTempReg();
instGen_Set_Reg_To_Imm(EA_PTRSIZE, offsetReg, size - TARGET_POINTER_SIZE);

BasicBlock* loop = genCreateTempLabel();
genDefineTempLabel(loop);

GetEmitter()->emitIns_R_R_R(INS_str, EA_PTRSIZE, zeroReg, dstReg, offsetReg);
#ifdef TARGET_ARM64
GetEmitter()->emitIns_R_R_I(INS_subs, EA_PTRSIZE, offsetReg, offsetReg, TARGET_POINTER_SIZE);
#else
GetEmitter()->emitIns_R_R_I(INS_sub, EA_PTRSIZE, offsetReg, offsetReg, TARGET_POINTER_SIZE, INS_FLAGS_SET);
#endif
inst_JMP(EJ_ne, loop);

gcInfo.gcMarkRegSetNpt(genRegMask(dstReg));
}
}

//------------------------------------------------------------------------
// genCall: Produce code for a GT_CALL node
//
Expand Down Expand Up @@ -4513,6 +4579,11 @@ void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
genCodeForCpObj(blkOp->AsBlk());
break;

case GenTreeBlk::BlkOpKindLoop:
assert(!isCopyBlk);
genCodeForInitBlkLoop(blkOp);
break;

case GenTreeBlk::BlkOpKindHelper:
assert(!blkOp->gtBlkOpGcUnsafe);
if (isCopyBlk)
Expand Down
53 changes: 53 additions & 0 deletions src/coreclr/jit/codegenloongarch64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6351,6 +6351,54 @@ void CodeGen::genCodeForInitBlkHelper(GenTreeBlk* initBlkNode)
genEmitHelperCall(CORINFO_HELP_MEMSET, 0, EA_UNKNOWN);
}

//------------------------------------------------------------------------
// genCodeForInitBlkLoop - Generate code for an InitBlk using an inlined for-loop.
// It's needed for cases when size is too big to unroll and we're not allowed
// to use memset call due to atomicity requirements.
//
// Arguments:
// initBlkNode - the GT_STORE_BLK node
//
void CodeGen::genCodeForInitBlkLoop(GenTreeBlk* initBlkNode)
{
GenTree* const dstNode = initBlkNode->Addr();
genConsumeReg(dstNode);
const regNumber dstReg = dstNode->GetRegNum();

if (initBlkNode->IsVolatile())
{
// issue a full memory barrier before a volatile initBlock Operation
instGen_MemoryBarrier();
}

const unsigned size = initBlkNode->GetLayout()->GetSize();
assert((size >= TARGET_POINTER_SIZE) && ((size % TARGET_POINTER_SIZE) == 0));

// The loop is reversed - it makes it smaller.
// Although, we zero the first pointer before the loop (the loop doesn't zero it)
// it works as a nullcheck, otherwise the first iteration would try to access
// "null + potentially large offset" and hit AV.
GetEmitter()->emitIns_R_R_I(INS_st_d, EA_PTRSIZE, REG_R0, dstReg, 0);
if (size > TARGET_POINTER_SIZE)
{
// Extend liveness of dstReg in case if it gets killed by the store.
gcInfo.gcMarkRegPtrVal(dstReg, dstNode->TypeGet());

const regNumber offsetReg = initBlkNode->GetSingleTempReg();
instGen_Set_Reg_To_Imm(EA_PTRSIZE, offsetReg, size - TARGET_POINTER_SIZE);

// loop begin:
// *(dstReg + offsetReg) = 0
GetEmitter()->emitIns_R_R_R(INS_stx_d, EA_PTRSIZE, REG_R0, dstReg, offsetReg);
// offsetReg = offsetReg - 8
GetEmitter()->emitIns_R_R_I(INS_addi_d, EA_PTRSIZE, offsetReg, offsetReg, -8);
// if (offsetReg != 0) goto loop;
GetEmitter()->emitIns_R_I(INS_bnez, EA_8BYTE, offsetReg, -2 << 2);

gcInfo.gcMarkRegSetNpt(genRegMask(dstReg));
}
}

// Generate code for a load from some address + offset
// base: tree node which can be either a local address or arbitrary node
// offset: distance from the base from which to load
Expand Down Expand Up @@ -7262,6 +7310,11 @@ void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
genCodeForCpObj(blkOp->AsBlk());
break;

case GenTreeBlk::BlkOpKindLoop:
assert(!isCopyBlk);
genCodeForInitBlkLoop(blkOp);
break;

case GenTreeBlk::BlkOpKindHelper:
if (isCopyBlk)
{
Expand Down
60 changes: 60 additions & 0 deletions src/coreclr/jit/codegenriscv64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6029,6 +6029,61 @@ void CodeGen::genCodeForCpBlkUnroll(GenTreeBlk* cpBlkNode)
}
}

//------------------------------------------------------------------------
// genCodeForInitBlkLoop - Generate code for an InitBlk using an inlined for-loop.
// It's needed for cases when size is too big to unroll and we're not allowed
// to use memset call due to atomicity requirements.
//
// Arguments:
// initBlkNode - the GT_STORE_BLK node
//
void CodeGen::genCodeForInitBlkLoop(GenTreeBlk* initBlkNode)
{
GenTree* const dstNode = initBlkNode->Addr();
genConsumeReg(dstNode);
const regNumber dstReg = dstNode->GetRegNum();

if (initBlkNode->IsVolatile())
{
// issue a full memory barrier before a volatile initBlock Operation
instGen_MemoryBarrier();
}

const unsigned size = initBlkNode->GetLayout()->GetSize();
assert((size >= TARGET_POINTER_SIZE) && ((size % TARGET_POINTER_SIZE) == 0));

// The loop is reversed - it makes it smaller.
// Although, we zero the first pointer before the loop (the loop doesn't zero it)
// it works as a nullcheck, otherwise the first iteration would try to access
// "null + potentially large offset" and hit AV.
GetEmitter()->emitIns_R_R_I(INS_sd, EA_PTRSIZE, REG_R0, dstReg, 0);
if (size > TARGET_POINTER_SIZE)
{
// Extend liveness of dstReg in case if it gets killed by the store.
gcInfo.gcMarkRegPtrVal(dstReg, dstNode->TypeGet());

const regNumber tempReg = initBlkNode->GetSingleTempReg();
instGen_Set_Reg_To_Imm(EA_PTRSIZE, tempReg, size - TARGET_POINTER_SIZE);

// tempReg = dstReg + tempReg (a new interior pointer, but in a nongc region)
GetEmitter()->emitIns_R_R_R(INS_add, EA_PTRSIZE, tempReg, dstReg, tempReg);

BasicBlock* loop = genCreateTempLabel();
genDefineTempLabel(loop);
GetEmitter()->emitDisableGC(); // TODO: add gcinfo to tempReg and remove nogc

// *tempReg = 0
GetEmitter()->emitIns_R_R_I(INS_sd, EA_PTRSIZE, REG_R0, tempReg, 0);
// tempReg = tempReg - 8
GetEmitter()->emitIns_R_R_I(INS_addi, EA_PTRSIZE, tempReg, tempReg, -8);
// if (tempReg != dstReg) goto loop;
GetEmitter()->emitIns_J(INS_bne, loop, (int)tempReg | ((int)dstReg << 5));
GetEmitter()->emitEnableGC();

gcInfo.gcMarkRegSetNpt(genRegMask(dstReg));
}
}

//------------------------------------------------------------------------
// genCodeForInitBlkHelper - Generate code for an InitBlk node by the means of the VM memcpy helper call
//
Expand Down Expand Up @@ -6918,6 +6973,11 @@ void CodeGen::genCodeForStoreBlk(GenTreeBlk* blkOp)
genCodeForCpObj(blkOp->AsBlk());
break;

case GenTreeBlk::BlkOpKindLoop:
assert(!isCopyBlk);
genCodeForInitBlkLoop(blkOp);
break;

case GenTreeBlk::BlkOpKindHelper:
if (isCopyBlk)
{
Expand Down
Loading