-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/9.0] bgc deadlock fix #108774
[release/9.0] bgc deadlock fix #108774
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. we will take for consideration in 9 GA
This is a correct fix. The build break is due to calling |
Backport of #108773 to release/9.0
/cc @Maoni0
Customer Impact
This was found as a hang during the build process, specifically always when it executes this command -
"some_dir\dotnet.exe" "repo_root\artifacts\bin\ILLink.Tasks\Release\net9.0\illink.dll" @"C:\Users\andya\AppData\Local\Temp\MSBuildTemp\tmp.rsp"
some_dir could be
repo_root\.dotnet
orC:\Program Files\dotnet
.This will using the runtime from the 9.0 RC1 build.
This issue can cause applications to deadlock on startup when using DATAS (on by default in 9.0 for SVR workloads).
Regression
this was a regression introduced in #105545. this would only manifest during the beginning of a process when we needed to grow the # of heaps (and can grow) and we haven't done a gen2 GC yet so we set it to doing one.
the problem is the check I made did not include the ephemeral GC that may happen at the beginning of a BGC before we set gc_background_running to true. so at the end of that eph GC, we are in calculate_new_heap_count and set trigger_initial_gen2_p to true, not realizing we are already in a BGC.
then during the joined_generation_to_condemn at the beginning of the next GC, if our conclusion was still doing an eph GC we'd make it a BGC due to trigger_initial_gen2_p which obviously would cause problem if a BGC is already in progress (I do have an assert for this but we haven't seen this in a dbg build...).
the fix is to simply use the right check - is_bgc_in_progress() instead of background_running_p() which includes that eph GC case.
Testing
#105545 went through the GC team's stress testing but this particular issue was not observed (this manifests as a hang in retail and an assert in debug but neither was observed). this basically only happens during the beginning of a process. I could repro it with that particular illink command so I tested it using that command. will be adding more testing specifically for this area in .NET 10.
Risk
Low, this missed a case where we should be checking during a possible ephemeral GC at the beginning of a BGC and the fix is to add that check.
IMPORTANT: If this backport is for a servicing release, please verify that:
release/X.0-staging
, notrelease/X.0
.