-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: fix bug where a gc struct is not zero initialized #67825
JIT: fix bug where a gc struct is not zero initialized #67825
Conversation
This fixes an unexpected interaction between the zero-init optimization and dead stores. We have a local gc struct that is tracked but not promoted (and so on the frame) with an explicit zero init. The zero-init opt determines that in-prolog init is not needed because the local is tracked and has a live explicit initializer. So it marks the local as `lvHasExplicitInit`. But subsequent control flow optimizations end up making the explicit zero init dead, and it is removed by dead stores. Later on when we report GC info for the struct we report it as untracked. This leads to the GC seeing an uninitialized stack slot as a GC ref. The fix is to inhibit dead stores of zero initializers for `lvHasExplicitInit` (restricted to GC locals with multiple references). [some alternative fixes were considered, see notes in the PR]. Fixes dotnet#65694.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis fixes an unexpected interaction between the zero-init optimization and We have a local gc struct that is tracked but not promoted (and so on the The zero-init opt determines that in-prolog init is not needed because the Later on when we report GC info for the struct we report it as untracked. This The fix is to inhibit dead stores of zero initializers for [some alternative fixes were considered, see notes in the PR]. Fixes #65694.
|
we will want to back port this to .net 6. @BruceForstall PTAL There are a few problematic aspects. First is that we can do untracked GC reporting for a tracked local. Second is that because of optimizations we can change liveness and so deductions/expectations based in earlier phases might end up getting broken. In this case the problematic control flow comes from if (_d?.TryGetValue(k, out p) == true && (p.x == 33)) Initially the jit sees a path where The control flow snippet below is from the original repro case. The call to But it turns out that taking the BB94->BB95 path will then result in always taking the BB95->BB102 path; the jit proves this during subsequent optimization phases and removes the path. As a result the upstream zero initializer becomes dead. Some alternative fixes I considered:
Diffs with this change are minimal -- just one method has diffs in SPMI, and the case with diffs does not have the bug (instead it has a local struct with multiple dead references). Likely the odd control flow setup here is somewhat rare. |
// If this is a zero init of an explicit zero init gc local | ||
// that has at least one other reference, we will keep the zero init. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe expand a bit on why? This just seems to be a restatement of the code below.
Arm64 failures: #67821 and one run that failed with no progress on DDARM64-046. |
Have you considered a solution where you mark an explicit zero initialization with some sort of side effect to prevent its removal if zero-init optimization makes a decision based on the presence of the explicit zero int? |
Or will that still lead to a problem if we report as untracked? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you have is a good solution.
Yeah, that would be more surgical, but I'm not confident I can find a suitable flag -- we seem to widely assume stores to locals can't have operator/lhs side effects. I suppose I could also insert a fake use ala keep alive. |
Since the JIT decided to report the struct as untracked (it would useful to understand under what conditions this happens), doesn't that invalidate the zeroing optimization assumptions that allowed it to use the in-body zeroing in the first place? I.e., even if you leave the dead in-body zeroing, isn't there a range between the end of the prolog and the in-body zeroing where the struct local is uninitialized and GC could occur? I guess |
runtime/src/coreclr/jit/gcencode.cpp Lines 4122 to 4260 in aff3c18
Here we have TYP_STRUCT, which is not a gc type, so we end up at line 4217 and report the GC fields of the struct as untracked.
Not sure. Empirically that doesn't seem to be the case for partially interruptible methods (if so, the repro above should fail under GCStress=3, but it passes). The zero init opt "remove prolog" flavor won't kick in for fully interruptible methods. |
I asked both Jans about this and neither of them thought offset 0 was special either. |
const LclVarDsc& varDsc = lvaTable[node->AsLclVarCommon()->GetLclNum()]; | ||
const bool isExplicitInitLocal = varDsc.lvHasExplicitInit; | ||
const bool isReferencedLocal = varDsc.lvRefCnt() > 1; | ||
const bool isZeroInit = store->OperIsInitBlkOp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you also need to check the init value is actually zero if you only want to avoid deleting only zero inits? E.g., check IsConstInitVal and then the constant init value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this problem occur for normal untracked ref/byref lclVar and not just for struct field gc refs reported as untracked to the gc?
i.e., should we stop marking all gc vars or structs with gc fields as explicit init in optRemoveRedundantZeroInits
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you also need to check the init value is actually zero
I suppose so, but this kicks in so rarely it won't matter in practice.
Couldn't this problem occur for normal untracked ref/byref lclVa
We won't have liveness info for these so won't dead store.
data->SetUnusedValue(); | ||
|
||
if (data->isIndir()) | ||
if (isExplicitInitLocal && isReferencedLocal && isZeroInit && isGCInit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know if this is the first dead store. E.g., what if the lclVar has multiple zeroing that are dead stores? All of them will be kept even though (probably) only the first was the "explicit init" one as determined by optRemoveRedundantZeroInits
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if there are multiple dead stores we will keep them all.
@erozenfeld were you going to take another look? |
Yeah, I'll take another look tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this again and I think this is the right fix.
Thanks @erozenfeld. |
/backport to release/6.0 |
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/2169354977 |
@AndyAyersMS backporting to release/6.0 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: JIT: fix bug where a gc struct is not zero initialized
.git/rebase-apply/patch:131: trailing whitespace.
public static void F()
.git/rebase-apply/patch:164: new blank line at EOF.
+
warning: 2 lines add whitespace errors.
Using index info to reconstruct a base tree...
M src/coreclr/jit/liveness.cpp
M src/coreclr/jit/optimizer.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/jit/optimizer.cpp
CONFLICT (content): Merge conflict in src/coreclr/jit/optimizer.cpp
Auto-merging src/coreclr/jit/liveness.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 JIT: fix bug where a gc struct is not zero initialized
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
Port of dotnet#67825 to release/6.0.
This fixes an unexpected interaction between the zero-init optimization and
dead stores.
We have a local gc struct that is tracked but not promoted (and so on the
frame) with an explicit zero init.
The zero-init opt determines that in-prolog init is not needed because the
local is tracked and has a live explicit initializer. So it marks the local
as
lvHasExplicitInit
. But subsequent control flow optimizations end upmaking the explicit zero init dead, and it is removed by dead stores.
Later on when we report GC info for the struct we report it as untracked. This
leads to the GC seeing an uninitialized stack slot as a GC ref.
The fix is to inhibit dead stores of zero initializers for
lvHasExplicitInit
(restricted to GC locals with multiple references).
[some alternative fixes were considered, see notes in the PR].
Addresses #65694 (needs backporting for a full fix).