Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better fix for #66242 #66335

Merged
merged 1 commit into from
Mar 14, 2022
Merged

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Mar 8, 2022

Lowering was moving nodes without checking legality first.

The conservative forward substitution condition was causing some regressions for TYP_STRUCT LCL_FLDs.

Diffs.

Lowering was moving nodes without checking legality first.
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Mar 8, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 8, 2022
@ghost
Copy link

ghost commented Mar 8, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Lowering was moving nodes without checking legality first.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@SingleAccretion SingleAccretion changed the title Better fix for #66242 A better fix for #66242 Mar 8, 2022
@SingleAccretion SingleAccretion marked this pull request as ready for review March 8, 2022 17:29
@SingleAccretion
Copy link
Contributor Author

CI failure looks like #65914.

@dotnet/jit-contrib

@@ -636,7 +636,7 @@ bool Compiler::fgForwardSubStatement(Statement* stmt)
// If fwdSubNode is an address-exposed local, forwarding it may lose optimizations.
// (maybe similar for dner?)
//
if (fwdSubNode->IsLocal())
if (fwdSubNode->OperIs(GT_LCL_VAR))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the check for GT_LCL_VAR still necessary?

Copy link
Contributor Author

@SingleAccretion SingleAccretion Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above states:

If fwdSubNode is an address-exposed local, forwarding it may lose optimizations.

Forwarding address-exposed things "poisons" the whole tree with GTF_GLOB_REF, and that can have negative downstream consequences.

For local fields we're apparently better off doing it anyway, so I just trust @AndyAyersMS's initial analysis here that for LCL_VAR we're not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I had not noticed that this was not for correctness.

@jakobbotsch
Copy link
Member

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member

I just trust @AndyAyersMS's initial analysis

Probably worth revisiting, sometime?

That check was supposed to be just a heuristic. Didn't realize that we overloaded it correctness.

@SingleAccretion
Copy link
Contributor Author

Probably worth revisiting, sometime?

SPMI diffs on Windows x64 when removing the condition do tend to agree, though there is one regression there that won't be easy to solve.

(Not in this change though)

@AndyAyersMS
Copy link
Member

Seems like the failures could be related.

@SingleAccretion
Copy link
Contributor Author

The x64 Fuzzlyn failure turned out to be unrelated, fix is in #66385.

Looking through the logs of Linux ARM Fuzzlyn failures, I see quite a few LSRA asserts:

Found example with seed 14147555465640047749 that hits error
JIT assert failed:
Assertion failed 'interval->isSpilled' in 'Program:M58(int):S0' during 'LSRA allocate' (IL size 1793; hash 0xbd51a200; FullOpts)

File: /__w/1/s/src/coreclr/jit/lsra.cpp Line: 5924

Found example with seed 18374301469232985840 that hits error
JIT assert failed:
Assertion failed 'inVarToRegMap[varIndex] == REG_STK' in 'Program:M2(byref,byref,C0):long' during 'LSRA allocate' (IL size 1800; hash 0xfa0899f6; FullOpts)

File: /__w/1/s/src/coreclr/jit/lsra.cpp Line: 4050

Found example with seed 14560778836117885446 that hits error
JIT assert failed:
Assertion failed '!foundMismatch' in 'C0:M6(short,bool,byref,byref,C0,long,short):long:this' during 'LSRA allocate' (IL size 1136; hash 0x6a8edd6d; FullOpts)

File: /__w/1/s/src/coreclr/jit/lsra.cpp Line: 8123

00:58:35/01:00:00 elapsed, 4300 programs generated, 2 examples found
Unhandled exception. System.AggregateException: One or more errors occurred. ('F' is an invalid start of a value. Path: $ | LineNumber: 0 | BytePositionInLine: 0.) ('V' is an invalid start of a value. Path: $ | LineNumber: 0 | BytePositionInLine: 1.)
---> System.Text.Json.JsonException: 'F' is an invalid start of a value. Path: $ | LineNumber: 0 | BytePositionInLine: 0.
---> System.Text.Json.JsonReaderException: 'F' is an invalid start of a value. LineNumber: 0 | BytePositionInLine: 0.
at System.Text.Json.ThrowHelper.ThrowJsonReaderException(Utf8JsonReader& json, ExceptionResource resource, Byte nextByte, ReadOnlySpan`1 bytes)

^ Presumably the reason some of the uploading failed

I looked through the logs of the previous Fuzzlyn run, and see the first two are pre-existing, the last one is not however. Unfortunately, the "issues" zips didn't contain the source for the example, so I was not able to investigate further.

@jakobbotsch
Copy link
Member

The ARM32 fuzzlyn runs tend to have its processes killed/die randomly. Unfortunately I have not yet managed to track down why, but I pushed a change that should at least log the malformed response so there should be more information the next time it happens.

Found example with seed 14560778836117885446 that hits error
JIT assert failed:
Assertion failed '!foundMismatch' in 'C0:M6(short,bool,byref,byref,C0,long,short):long:this' during 'LSRA allocate' (IL size 1136; hash 0x6a8edd6d; FullOpts)

I was not able to repro this on my Surface Pro X or my Raspberry Pi 400. Unclear why that would be, perhaps different IR due to CSE of addresses.

@AndyAyersMS
Copy link
Member

I was not able to repro this

Reminds me that I wanted to have some kind of lightweight SPMI always running in CI to catch cases like this.

Maybe you could try full SPMI + Fuzzlyn and get a repro?

@jakobbotsch
Copy link
Member

jakobbotsch commented Mar 10, 2022

Maybe you could try full SPMI + Fuzzlyn and get a repro?

Not a bad idea, however due to #53773 I suppose that might not be enough. But I'll add it to my list of Fuzzlyn-related ideas.
FWIW, I hit the same code path that hits the !foundMismatch assert in my run at https://dev.azure.com/dnceng/public/_build/results?buildId=1653728&view=results, so I don't think it's related to this PR.

@jakobbotsch
Copy link
Member

jakobbotsch commented Mar 14, 2022

Sunday's Fuzzlyn run hit the same three asserts (and now are reporting them correctly), I opened issues for them in #66578, #66579, #66580. This one LGTM, thanks!

@jakobbotsch jakobbotsch merged commit 22a3c4e into dotnet:main Mar 14, 2022
@SingleAccretion SingleAccretion deleted the Better-Fix-For-66242 branch March 14, 2022 10:29
@ghost ghost locked as resolved and limited conversation to collaborators Apr 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants