Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment out a false-positive assert that is firing and preventing jit-diffs #52462

Closed
wants to merge 3 commits into from

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented May 7, 2021

Putting this up as suggested by @AndyAyersMS on Discord.

As per the comment in code and the analysis in #51728, we have many scenarios where a pointer is converted to a byref. These conversions can range from something as simple as ref *x to things like ref T Unsafe.AsRef<T>(void*) to even new Span<T>(void*, int).

This conversion is correctly tracked and represented in metadata. However, if the pointer is enregistered and "last use" the register allocator may decide the byref will exist in the same register. There are many places in codegen that have logic similar to:

if (tgtReg != srcReg)
{
    emit(INS_mov, tgtReg, srcReg);
}

Due to how many locations this check exists, it is non trivial to go and fix this to be handled correctly everywhere today. Additionly, for most scenarios, this is a "safe" optimization. However, when that move is for a pointer<->byref conversion, eliding this move means that the liveness update will never occur and so asserts such as the one listed will fire due to the mismatch.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 7, 2021
@AndyAyersMS
Copy link
Member

cc @dotnet/jit-contrib

I think we should take this to unblock other work.

@BruceForstall
Copy link
Member

So a byref can be constructed "out of thin air" and needs to be reported to the GC? In the case above, where the mov is elided, if a byref is going live, do we report it going live to the GC? Or, conversely, if it is going dead (because a byref is being converted to a non-GC object pointer?), do we report it going dead?

@tannergooding
Copy link
Member Author

tannergooding commented May 7, 2021

So a byref can be constructed "out of thin air" and needs to be reported to the GC?

@BruceForstall That's my understanding. We have a pointer (possibly to the stack or to native memory) and we turn it into a byref. In the case of Unsafe.AsRef<T> this is often simply done via ldarg.0; ret.

When we have inlining, struct promotion, and other things at play, we will end up with something like the following, where V01 arg1 long (V01 is a pointer) and V40 tmp31 byref V14._value(offs=0x00) P-INDEP "field V14._value (fldOffset=0x0)" (V40 is the promoted byref field for Span<T>):

Generating: N015 (  1,  1) [000085] ------------        t85 =    LCL_VAR   byref  V01 arg1         u:1 rdx (last use) REG rdx $c0
                                                              /--*  t85    byref
Generating: N017 (  5,  4) [000065] DA----------              *  STORE_LCL_VAR byref  V40 tmp31        d:1 rdx REG rdx

Since they are both assigned rdx, one of the codegen paths decides to not emit the move.

In the case above, where the mov is elided, if a byref is going live, do we report it going live to the GC? Or, conversely, if it is going dead (because a byref is being converted to a non-GC object pointer?), do we report it going dead?

We only do liveness updates for emitted code, eliding the move in codegen means that liveness update never happens.

Updating codegen to still do the liveness update for such moves is non-trivial because we do not centrally elide the moves today. Instead, we have dozens of if (tgtReg != srcReg) checks scattered throughout each codegen file and sometimes in the emitter itself.

  • I started a prototype to centralize the move elision handling: tannergooding@875cb1f, but it isn't done and still needs more work

@BruceForstall
Copy link
Member

I guess I'm having a hard time understanding if this is really a "false-positive" assert. From the scenario you describe, a byref goes live but we don't tell the GC about it. If there's a GC at the object pointed to is not otherwise rooted, then the GC can collect it, and if we don't report the GC byref, the GC doesn't need to update it. So it seems like it's actually a GC hole.

(How byrefs "appear out of thin air" I guess is not worth considering since this is all unsafe code and the user sure better know what they're doing?)

@saucecontrol
Copy link
Member

saucecontrol commented May 7, 2021

You don't have to be unsafe to create a byref out of thin air. e.g. Span<byte> span = stackalloc byte[64]

Edit: never mind, that could still be a real ref 🤦 , but yeah, we magic byrefs out unmanaged pointers with Span all the time.

@tannergooding
Copy link
Member Author

tannergooding commented May 7, 2021

I guess I'm having a hard time understanding if this is really a "false-positive" assert. From the scenario you describe, a byref goes live but we don't tell the GC about it. If there's a GC at the object pointed to is not otherwise rooted, then the GC can collect it, and if we don't report the GC byref, the GC doesn't need to update it. So it seems like it's actually a GC hole.

AFAICT, this scenario can only happen when going from a pointer to a byref or vice-versa. For the first scenario, going from a pointer to a byref, since pointers are not GC tracked, either the user introduced their own GC-hole (such as from void* Unsafe.AsPointer(ref T) where the ref T is not already pinned) or it is not a real byref and so it doesn't actually matter if its reported, its just good for us to correctly track it. Likewise for the latter, going from a byref to a pointer, this is only valid if the byref is pinned.

Now, I could conceive a possible GC hole in a scenario where you go from pinned byref->pointer->byref and then kill the pinned byref. Imagine for example something like:

var a = new int[30];               // a is a live gcref
ref var x = ref a[0];              // x is a live byref
a = null;                          // a is a dead gcref, the backing array is kept alive due to the x byref

fixed (int* p = &x)                // p is a live pinned byref
{
    x = ref Unsafe.NullRef<int>(); // x is a dead byref, the backing array is kept alive due to the p pinned byref
    int* t = p;                    // t is a pointer, not a pinned byref
    x = ref Unsafe.AsRef<int>(t);  // x is a live byref, it still points to the pinned backing array
} // p is a dead pinned byref, x should be keeping the backing array alive

GC.Collect();
Console.WriteLine(x);

Now, if t is given a register (say edx) and then x is given the same register (also edx) the JIT would elide the move and not update the liveness and so the "last tracked reference" to the created array would go dead at the end of the fixed block. While in practice, it should be kept alive by x since that's a byref and should be considered live.

It's possible that this isn't happening in practice due to register allocation or a number of other scenarios. But I could conceive it being an actual hole in some edge case scenario like this. But maybe someone more knowledgeable here could weigh in.

@AndyAyersMS
Copy link
Member

@BruceForstall I have many of the same concerns from you. It is not easy to understand how a reg-reg mov can be the source of a GC ref. But presumably we're already "handling" this case today in cases where the registers are different, it's just when they are the same that we don't do the GC liveness updates.

Tanner has a preliminary set of changes where we make the GC updates in the same reg-reg mov case, but they're extensive and still don't resolve all the cases of this issue. I've also seen cases where the allocator uses commutativity to reorder a last-use add to a byref and introduces a small window where there is no live byref, and the assert fires (see #10821 (comment)).

So I think the assert is intrinsically flawed and is now giving us a lot of false positive cases and so we should disable it for now as we work on a better solution.

My thinking is that the only comfortable resolution to this problem is to walk back up the data flow, and we'll find either pointer being retyped as a byref or a pin, or else something we can't track further (say an arg); once we have that we can retype the entire lifetime as pointer or byref as appropriate.

My understanding is this sort of "gc rooting" flow analysis is akin to how NUTC did (does) gc reporting, instead of the gc pointer kind distinctions we try and maintain throughout the jit today.

But all that requires dataflow. Figuring out where this can/should happen is not straightforward. Since most of these issues seem to arise from inlining, the natural place for this analysis is perhaps the object stack allocation phase; it happens to already run a very similar analysis since it needs to change GC types.

@tannergooding
Copy link
Member Author

@BruceForstall, @AndyAyersMS; #52661 is an alternative to this.

It centralizes the move management and results in zero PMI diffs for x64 S.P.Corelib (still validating for tests and other platforms). It's relatively small (+854, -820 lines) and ensures that the relevant liveness updates happen so also resolves the assert.

@tannergooding
Copy link
Member Author

tannergooding commented Jun 3, 2021

Closing this in favor of #53684, which is the second part to #52661

@ghost ghost locked as resolved and limited conversation to collaborators Jul 3, 2021
@tannergooding tannergooding deleted the workaround-51728 branch November 11, 2022 15:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants