Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasionally hitting error MSB6006: "csc.dll" exited with code 139 on linux (w/ GCDynamicAdaptationMode=1) #104123

Closed
LoopedBard3 opened this issue Jun 27, 2024 · 42 comments · Fixed by #105551
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-release tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@LoopedBard3
Copy link
Member

Description

In the dotnet-runtime-perf pipeline, we are seeing multiple Linux jobs hitting the error dotnet/x64/sdk/9.0.100-preview.7.24323.5/Roslyn/Microsoft.CSharp.Core.targets(85,5): error MSB6006: "csc.dll" exited with code 139. when building our MicroBenchmarks.csproj file for BDN testing. This is occurring on between 0-3 of the 30 helix workitems we send out for each job with no consistency for which of the 30 workitems is affected or the agent machine hitting the error. Pretty sure I have a CoreDump from some of these failed runs if that would be useful.

Potentially related to: #57558

Reproduction Steps

Need to test more but this should work for reproing, though as mentioned in the description, hitting the error is not consistent.

Steps (high level):

  1. Clone dotnet/performance.
  2. From the top level the performance repo, run python3 ./scripts/benchmarks_ci.py --csproj ./src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture x64 -f net9.0 --dotnet-versions 9.0.100-preview.6.24320.9 --bdn-arguments="--anyCategories Libraries Runtime --logBuildOutput --generateBinLog --partition-count 30 --partition-index 29"
  3. If BDN tests start running successful, you did not hit the error.

Steps (inner command, this should match but ping if this seems to be missing a step):

  1. Clone dotnet/performance.
  2. Install dotnet version equal or newer than 9.0.100-preview.6.24320.9 with dotnet-install.sh: dotnet-install.sh -InstallDir ./performance/tools/dotnet/x64 -Architecture x64 -Version 9.0.100-preview.6.24320.9
  3. From the top level of the performance repo, run dotnet run --project ./src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net9.0 --no-restore --no-build -- --anyCategories Libraries Runtime "" --logBuildOutput --generateBinLog --partition-count 30 --partition-index 29 --artifacts ./artifacts/BenchmarkDotNet.Artifacts --packages ./artifacts/packages --buildTimeout 1200

Expected behavior

Build is successful and continues to run the BenchmarkDotNet tests.

Actual behavior

The build fails

dotnet build /home/helixbot/work/B45E09D9/w/AC2C09AE/e/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net9.0 --no-restore /p:NuGetPackageRoot=/home/helixbot/work/B45E09D9/w/AC2C09AE/e/performance/artifacts/packages /p:RestorePackagesPath=/home/helixbot/work/B45E09D9/w/AC2C09AE/e/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1
   Reporting -> /home/helixbot/work/B45E09D9/w/AC2C09AE/e/performance/artifacts/bin/Reporting/Release/netstandard2.0/Reporting.dll
   BenchmarkDotNet.Extensions -> /home/helixbot/work/B45E09D9/w/AC2C09AE/e/performance/artifacts/bin/BenchmarkDotNet.Extensions/Release/netstandard2.0/BenchmarkDotNet.Extensions.dll
/home/helixbot/work/B45E09D9/w/AC2C09AE/e/performance/tools/dotnet/x64/sdk/9.0.100-preview.7.24323.5/Roslyn/Microsoft.CSharp.Core.targets(85,5): error MSB6006: "csc.dll" exited with code 139. 

Full logs from example run with the error available: dotnet-runtime-perf Run 20240620.3. The specific partitions are Partition 2 and Partition 6 from the job 'Performance linux x64 release coreclr JIT micro perfowl NoJS False False False net9.0'.

Regression?

This started occurring between our runs dotnet-runtime-perf Run 20240620.2 and dotnet-runtime-perf Run 20240620.3.

The runtime repo comparison for between these two jobs is 4a7fe65...b0c4728.
Our performance repo also took one update but it seems highly unlikely to be related: dotnet/performance#4279.
Version difference information available in the information section below.

Known Workarounds

None

Configuration

.NET Version information:
Information from first run with error dotnet-runtime-perf Run 20240620.3:

$ dotnet --info
.NET SDK:
 Version:           9.0.100-preview.6.24320.9
 Commit:            7822425c3e
 Workload version:  9.0.100-manifests.cc027b4d
 MSBuild version:   17.11.0-preview-24318-05+4a45d5633

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  22.04
 OS Platform: Linux
 RID:         linux-x64
 Base Path:   <Path>/performance/tools/dotnet/x64/sdk/9.0.100-preview.6.24320.9/

.NET workloads installed:
Configured to use loose manifests when installing new manifests.
There are no installed workloads to display.

Host:
  Version:      9.0.0-preview.6.24319.11
  Architecture: x64
  Commit:       static

.NET SDKs installed:
  9.0.100-preview.6.24320.9 [<Path>/performance/tools/dotnet/x64/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 9.0.0-preview.6.24320.4 [<Path>/performance/tools/dotnet/x64/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 9.0.0-preview.6.24319.11 [<Path>/performance/tools/dotnet/x64/shared/Microsoft.NETCore.App]

Information from run before error dotnet-runtime-perf Run 20240620.2:

$ dotnet --info
.NET SDK:
 Version:           9.0.100-preview.6.24319.5
 Commit:            f3ebfb5ccb
 Workload version:  9.0.100-manifests.bae61ee5
 MSBuild version:   17.11.0-preview-24318-02+0a3683cf7

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  22.04
 OS Platform: Linux
 RID:         linux-x64
 Base Path:   <path>/performance/tools/dotnet/x64/sdk/9.0.100-preview.6.24319.5/

.NET workloads installed:
Configured to use loose manifests when installing new manifests.
There are no installed workloads to display.

Host:
  Version:      9.0.0-preview.6.24307.2
  Architecture: x64
  Commit:       static

.NET SDKs installed:
  9.0.100-preview.6.24319.5 [<path>/performance/tools/dotnet/x64/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 9.0.0-preview.6.24309.2 [<path>/performance/tools/dotnet/x64/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 9.0.0-preview.6.24307.2 [<path>/performance/tools/dotnet/x64/shared/Microsoft.NETCore.App]

This is happening across multiple different machine hardware configurations.

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 27, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jun 27, 2024
@jkotas
Copy link
Member

jkotas commented Jun 27, 2024

Pretty sure I have a CoreDump from some of these failed runs if that would be useful.

Yes, that would be useful. Are you able to extract the stacktrace from the coredumps? It would help with routing of this issue.

(https://learn.microsoft.com/en-us/troubleshoot/developer/webapps/aspnetcore/practice-troubleshoot-linux/lab-1-2-analyze-core-dumps-lldb-debugger has the steps.)

@jkotas jkotas added area-VM-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 28, 2024
@jkotas
Copy link
Member

jkotas commented Jun 28, 2024

Example of a crash: https://dev.azure.com/dnceng/internal/_build/results?buildId=2478580&view=ms.vss-test-web.build-test-results-tab&runId=53769489&resultId=100053&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

Crash during GC at:

0:000> k
 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     libcoreclr!MethodTable::GetFlag [/__w/1/s/src/coreclr/vm/methodtable.h @ 3655] 
01 (Inline Function) --------`--------     libcoreclr!MethodTable::HasComponentSize [/__w/1/s/src/coreclr/vm/../gc/gcinterface.h @ 1699] 
02 (Inline Function) --------`--------     libcoreclr!SVR::my_get_size+0x7 [/crossrootfs/x64/usr/include/stdint.h @ 11552] 
03 (Inline Function) --------`--------     libcoreclr!SVR::gc_heap::add_to_promoted_bytes+0x7 [/crossrootfs/x64/usr/include/stdint.h @ 26299] 
04 00007a14`cb50b840 00007a15`58e5ad7a     libcoreclr!SVR::gc_heap::mark_object_simple1+0xab7 [/crossrootfs/x64/usr/include/stdint.h @ 27115] 
05 00007a14`cb50b8d0 00007a15`58e61cf0     libcoreclr!SVR::gc_heap::mark_object_simple+0x30a [/__w/1/s/src/coreclr/gc/gc.cpp @ 15732480] 
06 (Inline Function) --------`--------     libcoreclr!SVR::gc_heap::mark_through_cards_helper+0xba [/__w/1/s/src/coreclr/gc/gc.cpp @ 41065] 
07 00007a14`cb50b940 00007a15`58e4b6d4     libcoreclr!SVR::gc_heap::mark_through_cards_for_uoh_objects+0xbd0 [/crossrootfs/x64/usr/include/stdint.h @ 46548] 
08 00007a14`cb50ba90 00007a15`58e45949     libcoreclr!SVR::gc_heap::mark_phase+0xe94 [/__w/1/s/src/coreclr/gc/gc.cpp @ 29669] 
09 00007a14`cb50bb70 00007a15`58e2b465     libcoreclr!SVR::gc_heap::gc1+0x2c9 [/__w/1/s/src/coreclr/gc/gc.cpp @ 15732480] 
0a 00007a14`cb50bc40 00007a15`58e27e8d     libcoreclr!SVR::gc_heap::garbage_collect+0xa85 [/__w/1/s/src/coreclr/gc/gc.cpp @ 24361] 
0b 00007a14`cb50bce0 00007a15`58e26906 (T) libcoreclr!SVR::gc_heap::gc_thread_function+0x157d [/__w/1/s/src/coreclr/gc/gc.cpp @ 7175] 
0c 00007a14`cb50bd60 00007a15`58d4583e     libcoreclr!SVR::gc_heap::gc_thread_stub+0x31 [/__w/1/s/src/coreclr/gc/gc.cpp @ 37262] 

The GC heap is corrupted:

0:000> !verifyheap
*** WARNING: Unable to verify timestamp for doublemapper (deleted)
Heap Segment          Object           Failure                          Reason
1    79d443483540     79d4dc35f8f0     InvalidObjectReference           Object 79d4dc35f8f0 has a bad member at offset 8: 79d4e0600a98
3    79d443483de0     79d4df004068     InvalidObjectReference           Object 79d4df004068 has a bad member at offset 10: 79d4e0600a98
3    79d443483de0     79d4df0040b8     InvalidObjectReference           Object 79d4df0040b8 has a bad member at offset 8: 79d4e0600a98

@jkotas
Copy link
Member

jkotas commented Jun 28, 2024

This is likely duplicate of #102919 , fixed by #103301

@jkotas
Copy link
Member

jkotas commented Jun 28, 2024

@LoopedBard3 Could you please let us know whether you still see it crashing after picking up a build that includes #103301?

@jkotas jkotas added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 28, 2024
@LoopedBard3
Copy link
Member Author

Yup, will watch for if the update fixes the issue 👍.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 28, 2024
@jkotas jkotas added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 28, 2024
@LoopedBard3
Copy link
Member Author

LoopedBard3 commented Jul 1, 2024

Looking at one of the recent failing runs, #103301 does not seem to have fixed the issue. The SDK verison used in this recent build that still hit the failure had commit dotnet/sdk@e18cfb7 and had a Microsoft.NETCore.App.Ref commit of a900bbf (from Version.Details.xml#L19-L20). If there is a different version/link I should be looking at to make sure we have the update, let me know.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 1, 2024
@jkotas
Copy link
Member

jkotas commented Jul 1, 2024

Looking at one of the recent failing runs, #103301

Would it possible to set DOTNET_GCDynamicAdaptationMode=0 environment variable in your build and see whether it still reproduces the crashes? It would be a very useful data point for us.

@jkotas jkotas added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 1, 2024
@JulieLeeMSFT
Copy link
Member

@LoopedBard3
Copy link
Member Author

LoopedBard3 commented Jul 3, 2024

Looking at one of the recent failing runs, #103301

Would it possible to set DOTNET_GCDynamicAdaptationMode=0 environment variable in your build and see whether it still reproduces the crashes? It would be a very useful data point for us.

I ran a test building both with and without the envvar set here: https://dev.azure.com/dnceng/internal/_build/results?buildId=2487207&view=results, jobs with gcdynamicadaptationmodeoff have DOTNET_GCDynamicAdaptationMode=0 set. This is only one test but turning off GCDynamicAdaptationMode does seem to have fixed the issue as all three jobs with the envvar set succeeded while the other three failed.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 3, 2024
@LoopedBard3 LoopedBard3 removed their assignment Jul 3, 2024
@jkotas jkotas changed the title Occasionally hitting error MSB6006: "csc.dll" exited with code 139 on linux Occasionally hitting error MSB6006: "csc.dll" exited with code 139 on linux (w/ GCDynamicAdaptationMode=1) Jul 3, 2024
@jkotas
Copy link
Member

jkotas commented Jul 3, 2024

@dotnet/gc Could you please take a look give that this crash does not repro with DATAS disabled?

Note that these builds run on machines with many cores. It can explain why we do not see more instances of this crash.

I have looked at number of the crash dumps. The only common pattern that I have observed was that DATAS scaled up number of GC heaps multiple times, but the nature of GC heap corruption was very different each time.

@mangod9
Copy link
Member

mangod9 commented Jul 3, 2024

Yeah we will take a look. @LoopedBard3, does this repro locally for you or only does on the build machines?

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2024
@mangod9 mangod9 added this to the 9.0.0 milestone Jul 3, 2024
@LoopedBard3
Copy link
Member Author

Yeah we will take a look. @LoopedBard3, does this repro locally for you or only does on the build machines?

I have not tried locally yet as I don't have a personal dedicated Linux box, but I can see if I can get it to repro manually on one of the pipeline runners as they are actual machines.

@mrsharm mrsharm reopened this Jul 26, 2024
@jakobbotsch
Copy link
Member

@mrsharm I reproed it using your instructions and script (thanks!) and looked at the dump. I think it is the same issue -- I see a call from Microsoft.CodeAnalysis.Diagnostics.AnalyzerExecutor.ExecuteAndCatchIfThrows_NoLock into a prejitted method Microsoft.CodeAnalysis.Diagnostics.AnalyzerExecutor+<>c.<ExecuteSymbolActions>b__45_1 that has a tailcall in it when going through the stack trace. The full workaround needs DOTNET_ReadyToRun=0 as well:

DOTNET_ReadyToRun=0
DOTNET_TailCallOpt=0

I have not been able to repro it with these yet. Sadly DOTNET_ReadyToRun=0 is quite a big hammer and I expect it is going to cause some significant degradation in build speed if we have to apply that in CI.

@cshung
Copy link
Member

cshung commented Jul 26, 2024

@jakobbotsch
Can the bug lead to a reference to an instance of Microsoft.CodeAnalysis.CSharp.Symbols.PublicModel.FieldSymbol not being reported to the GC, it got moved, but the reference is not relocated, so it ends up in some ConcurrentDictionary?

hoyosjs pushed a commit that referenced this issue Jul 26, 2024
…calls in face of bulk copy with write barrier calls (#105572)

* JIT: Fix placement of `GT_START_NOGC` for tailcalls in face of bulk copy with write barrier calls

When the JIT generates code for a tailcall it must generate code to
write the arguments into the incoming parameter area. Since the GC ness
of the arguments of the tailcall may not match the GC ness of the
parameters, we have to disable GC before we start writing these. This is
done by finding the earliest `GT_PUTARG_STK` node and placing the start
of the NOGC region right before it.

In addition, there is logic to take care of potential overlap between
the arguments and parameters. For example, if the call has an operand
that uses one of the parameters, then we must take care that we do not
override that parameter with the tailcall argument before the use of it.
To do so, we sometimes may need to introduce copies from the parameter
locals to locals on the stack frame.

This used to work fine, however, with #101761 we started transforming
block copies into managed calls in certain scenarios. It was possible
for the JIT to decide to introduce a copy to a local and for this
transformation to then kick in. This would cause us to end up with the
managed helper call after starting the nogc region. In checked builds
this would hit an assert during GC scan; in release builds, it would end
up with corrupted data.

The fix here is to make sure we insert the `GT_START_NOGC` after all the
potential temporary copies we may introduce as part of the tailcat stll
logic.

There was an additional assumption that the first `PUTARG_STK` operand
was the earliest one in execution order. That is not guaranteed, so this
change stops relying on that as well by introducing a new
`LIR::FirstNode` and using that to determine the earliest `PUTARG_STK`
node.

Fix #102370
Fix #104123
Fix #105441
---------

Co-authored-by: Jakob Botsch Nielsen <[email protected]>
@mrsharm
Copy link
Member

mrsharm commented Jul 27, 2024

As a heads up, have also been running the repro with both:

DOTNET_ReadyToRun=0
DOTNET_TailCallOpt=0

for the past 2-3 hours and so far, we haven't observed a repro.

@jakobbotsch
Copy link
Member

@jakobbotsch Can the bug lead to a reference to an instance of Microsoft.CodeAnalysis.CSharp.Symbols.PublicModel.FieldSymbol not being reported to the GC, it got moved, but the reference is not relocated, so it ends up in some ConcurrentDictionary?

I am not entirely sure what the result is in release builds when the VM comes across the IP inside the nogc region (@VSadov can tell us) -- however, my understanding is that this results in generic GC hole-like behavior, with certain GC references not being updated during relocation. If so, then it can definitely result in what you say and more general arbitrary heap corruption.

@VSadov
Copy link
Member

VSadov commented Jul 27, 2024

I am not entirely sure what the result is in release builds when the VM comes across the IP inside the nogc region (@VSadov can tell us) -- however, my understanding is that this results in generic GC hole-like behavior,

That is correct.
We will not stop a thread for GC if its leaf method frame is in No-GC region, but if any of upper frames are in No-GC regions (i.e. calls to managed methods were made in No-GC region), we would not know that and that should not happen. If it happens we may not be able to correctly report GC roots for those call frames when we do stackwalks for GC purposes since typically No-GC region is there for a reason that such reporting cannot be done reliably for locations within particular instruction range.

Incorrect root reporting will likely lead to heap corruptions. Something may get collected while still reachable, something may get moved and not updated,... and sometimes you may get lucky and nothing wrong will happen.

In a few cases in checked builds this will cause asserts in GcInfoDecoder::EnumerateLiveSlots, but even in checked build we may not always detect this situation and would just trust that all call sites are GC-safe points.

@jakobbotsch
Copy link
Member

If it happens we may not be able to correctly report GC roots for those call frames when we do stackwalks for GC purposes since typically No-GC region is there for a reason that such reporting cannot be done reliably for locations within particular instruction range.

In this particular case it's not that the GC reporting from this location is problematic or that the GC information is wrong, so I think the VM must be skipping the reporting entirely in release builds when it comes across this situation. Otherwise I think we wouldn't have seen the issue here.

The fix does not change any GC information reporting, it just starts the nogc region from a later location.

@AndyAyersMS
Copy link
Member

Would it make sense for the runtime to handle this situation (non-leaf frame at a non-gc safe point during GC) as a fatal error?

@VSadov
Copy link
Member

VSadov commented Jul 28, 2024

Only fully interruptible methods keep the interruptibility info. And this is a case that we can detect and currently assert.
I wonder if the scenario is easier to detect at JIT time.

@mrsharm
Copy link
Member

mrsharm commented Jul 30, 2024

@LoopedBard3 - feel free to change the tags and assignees but this doesn't seem like a GC related issue.

@mrsharm mrsharm added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-GC-coreclr labels Jul 30, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@LoopedBard3
Copy link
Member Author

LoopedBard3 commented Jul 30, 2024

Sounds good, it seems the error is no longer, or at least far less often, occurring in our pipeline. A new error with the same code does seem to be happening in our wasm runs though:

/home/helixbot/work/A2290968/w/9DE2085C/e/performance/artifacts/packages/microsoft.net.illink.tasks/9.0.0-rc.1.24377.4/build/Microsoft.NET.ILLink.targets(143,5): error MSB6006: "dotnet" exited with code 139. [/home/helixbot/work/A2290968/w/9DE2085C/e/performance/artifacts/bin/for-running/MicroBenchmarks/Job-ALGJUV/BenchmarkDotNet.Autogenerated.csproj]
/home/helixbot/work/A2290968/w/9DE2085C/e/performance/artifacts/packages/microsoft.net.illink.tasks/9.0.0-rc.1.24377.4/build/Microsoft.NET.ILLink.targets(96,5): error NETSDK1144: Optimizing assemblies for size failed. [/home/helixbot/work/A2290968/w/9DE2085C/e/performance/artifacts/bin/for-running/MicroBenchmarks/Job-ALGJUV/BenchmarkDotNet.Autogenerated.csproj]

The error seems to have last a primary problem in this build: 20240727.1 the following build no longer hitting the issue: 20240727.2.

The compare range for these builds is: 7e429c2...dc7d7bc but that doesn't have anything obvious for a change in the runtime for the fix.

The dotnet sdk versions for each run were:
broken:

.NET SDK:
Version:           9.0.100-rc.1.24377.4
Commit:            74dafbfb0c
Workload version:  9.0.100-manifests.54e3c8aa
MSBuild version:   17.12.0-preview-24376-06+59c2ff861

fixed:

.NET SDK:
Version:           9.0.100-rc.1.24377.5
Commit:            1b87a11061
Workload version:  9.0.100-manifests.66cfb043
MSBuild version:   17.12.0-preview-24376-06+59c2ff861

@ellahathaway
Copy link
Member

ellahathaway commented Jul 31, 2024

Just encountered this while source-building preview7 after rebootstrapping to consume in #105572. The failure occurred in this build:

/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error : Unhandled exception. ILCompiler.CodeGenerationFailedException: Code generation failed for method '[System.Reflection.Emit]System.Reflection.Emit.MethodBuilderInstantiation.get_MemberType()' [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :  ---> System.NullReferenceException: Object reference not set to an instance of an object. [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at Internal.JitInterface.CorInfoImpl._beginInlining(IntPtr thisHandle, IntPtr* ppException, CORINFO_METHOD_STRUCT_* inlinerHnd, CORINFO_METHOD_STRUCT_* inlineeHnd) in /_/src/runtime/src/coreclr/tools/Common/JitInterface/CorInfoImpl_generated.cs:line 154 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    --- End of inner exception stack trace --- [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at Internal.JitInterface.CorInfoImpl.CompileMethodInternal(IMethodNode methodCodeNodeNeedingCode, MethodIL methodIL) in /_/src/runtime/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs:line 381 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at Internal.JitInterface.CorInfoImpl.CompileMethod(MethodWithGCInfo methodCodeNodeNeedingCode, Logger logger) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs:line 810 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at ILCompiler.ReadyToRunCodegenCompilation.<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompileOneMethod|5(DependencyNodeCore`1 dependency, Int32 compileThreadId) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs:line 898 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at ILCompiler.ReadyToRunCodegenCompilation.<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompileOnThread|4(Int32 compilationThreadId) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs:line 833 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at ILCompiler.ReadyToRunCodegenCompilation.<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompilationThread|3(Object objThreadId) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs:line 811 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error : Unhandled exception. System.ArgumentNullException: Value cannot be null. (Parameter 'array') [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at System.Array.Clear(Array array, Int32 index, Int32 length) [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at Internal.JitInterface.CorInfoImpl.CompileMethodCleanup() in /_/src/runtime/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs:line 700 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at Internal.JitInterface.CorInfoImpl.CompileMethod(MethodWithGCInfo methodCodeNodeNeedingCode, Logger logger) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs:line 826 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at ILCompiler.ReadyToRunCodegenCompilation.<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompileOneMethod|5(DependencyNodeCore`1 dependency, Int32 compileThreadId) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs:line 898 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at ILCompiler.ReadyToRunCodegenCompilation.<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompileOnThread|4(Int32 compilationThreadId) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs:line 833 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(469,5): error :    at ILCompiler.ReadyToRunCodegenCompilation.<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompilationThread|3(Object objThreadId) in /_/src/runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs:line 811 [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]
/vmr/src/runtime/artifacts/bin/Crossgen2Tasks/Release/net9.0/Microsoft.NET.CrossGen.targets(357,5): error NETSDK1096: Optimizing assemblies for performance failed. You can either exclude the failing assemblies from being optimized, or set the PublishReadyToRun property to false. [/vmr/src/runtime/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Runtime.sfxproj]

FWIW the error more closely aligns with what's described in #105441

cc @dotnet/source-build-internal

@AndyAyersMS
Copy link
Member

FWIW the error more closely aligns with what's described in #105441

@ellahathaway looks like the last several builds of main have been fine, and the error you saw was on arm64, so guessing this might have been fixed by #105832.

In those builds, is crossgen2 being run with a live-built dotnet (or something relatively close)?

@jakobbotsch if the above is true it might give us a bit of reassurance that the fix from #105832 worked, though I don't know how often this particular crossgen2 error surfaced in the weeks before, so it might not...

@jakobbotsch
Copy link
Member

I think we have good confidence that this issue is fixed, but I'm going to add blocking-release to this and keep it open and in .NET 9 for tracking purposes until we pick up the new SDK.

@eerhardt
Copy link
Member

We just hit this error in https://dev.azure.com/dnceng-public/public/_build/results?buildId=774394&view=logs&j=5ac7b393-e840-5549-7fb4-a4479af8e7e3&t=29df2fa2-0d20-51bd-e85a-8b546e86c529

  Microsoft.Extensions.Caching.Hybrid -> /mnt/vss/_work/1/s/artifacts/bin/Microsoft.Extensions.Caching.Hybrid/Release/netstandard2.0/Microsoft.Extensions.Caching.Hybrid.dll
/mnt/vss/_work/1/s/.dotnet/sdk/9.0.100-preview.7.24371.4/Roslyn/Microsoft.CSharp.Core.targets(89,5): error MSB6006: "csc.dll" exited with code 139. [/mnt/vss/_work/1/s/src/Components/Components/src/Microsoft.AspNetCore.Components.csproj]
##[error].dotnet/sdk/9.0.100-preview.7.24371.4/Roslyn/Microsoft.CSharp.Core.targets(89,5): error MSB6006: (NETCORE_ENGINEERING_TELEMETRY=Build) "csc.dll" exited with code 139.

But it looks like we are using an older SDK (9.0.100-preview.7.24371.4). I'm not sure if the fix came in after, but logging the instance here for tracking purposes.

@jakobbotsch
Copy link
Member

Yeah, I believe the fix not included until 9.0.100-preview.7.24380.1 (see this PR).

#106333 updated this repo to a new enough preview 7 SDK so I will close this issue given that the fix is in now.

@ellahathaway
Copy link
Member

dotnet/source-build#4576 - I suspect that SB just encountered this error in one of our 9.0 builds

@github-actions github-actions bot locked and limited conversation to collaborators Sep 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-release tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet