Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono] Fix deadlock during gcdump when using interp full AOT fallback. #89726

Merged
merged 4 commits into from
Aug 3, 2023

Conversation

lateralusX
Copy link
Member

@lateralusX lateralusX commented Jul 31, 2023

GC thread doing gcdump will dump the EventPipe events after world has restarted but before releasing GC lock. There was one case where we logged a bulk type during that phase where a type didn't have its finalizer data initialized and at the same time main thread running interpreter held loader lock and tried to acquire GC lock, that triggers a deadlock since the GC thread (still holding the GC lock) would trigger logic to initialize the finalizer data, but that in turn requires the GC lock.

Fix delays the fire of GC dump events until after we completed GC. All GC dump events have been cached into a temp file and will be written into EventPipe, the only potential issue with this is that we keep vtable pointers in cache that will be resolved when emitting EventPipe event, after releasing GC lock, but since we currently won't unload vtables, that is not an issue, but needs to be addressed if/when we implement ability to unload vtables. We would then need to root the vtables while stored in temporary cache.

@lateralusX
Copy link
Member Author

lateralusX commented Aug 1, 2023

@lambdageek possible to take a look so it can get into RC1?

@lateralusX
Copy link
Member Author

Will need to adjust fix a little. Need to move the logging of EventPipe events until after releasing GC lock since there are to many scenarios where we can deadlock holding GC lock (and attempting to acquire loader lock). In theory we would need to root the vtables stored in cache as well to make sure they are not unloaded/freed while keeping them in cache, but since we currently doesn't fully support unloading of class/vtables, I mark it as a future TODO.

GC thread doing gcdump will dump the EventPipe events after world
has restarted but before releasing GC lock. There was one case
where we logged a bulk type during that face where a type didn't
have its finalizer data initialized and at the same time main thread
running interpreter held loader lock and tried to acquire GC lock,
that triggers a deadlock since the GC thread (still holding the GC lock)
would trigger logic to initialize the finalizer data, but that in turn
requires the GC lock.

Fix delays the fire of GC dump events until after we completed GC.
All GC dump events have been cached into a temp file and will be
written into EventPipe, the only potential issue with this is that
we keep vtable pointers in cache that will be resolved when emitting
EventPipe event, after releasing GC lock, but since we currently
won't unload vtables, that is not an issue, but needs to be addressed
if/when we implement ability to unload vtables. We would then need
to root the vtables while stored in temporary cache.
@lateralusX lateralusX force-pushed the lateralusX/fix-deadlock-gcdump branch from ccfa932 to c53bd9a Compare August 1, 2023 11:00
@lateralusX lateralusX changed the title [Mono] Fix rare deadlock during gcdump when using interp full AOT fallback. [Mono] Fix deadlock during gcdump when using interp full AOT fallback. Aug 1, 2023
@lateralusX
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lateralusX
Copy link
Member Author

lateralusX commented Aug 3, 2023

State of runtime-extra-platforms is currently unstable over several different platforms. Will merge this PR and once we start to fix up runtime-extra-platforms and if we have issues with the gcdump runtime tests added in this PR on some Mono platforms in runtime-extra-platforms, we should disable the test only on identified platform and log issues to investigate/fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants