-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a deadlock in NonGC + Profiler API #90847
Conversation
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
64b9fd0
to
cec6257
Compare
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Jan Kotas <[email protected]>
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Jan Kotas <[email protected]>
src/coreclr/vm/frozenobjectheap.cpp
Outdated
@@ -221,11 +266,11 @@ Object* FrozenObjectSegment::GetNextObject(Object* obj) const | |||
{ | |||
// Input must not be null and should be within the segment | |||
_ASSERT(obj != nullptr); | |||
_ASSERT((uint8_t*)obj >= m_pStart + sizeof(ObjHeader) && (uint8_t*)obj < m_pCurrent); | |||
_ASSERT((uint8_t*)obj >= m_pStart + sizeof(ObjHeader) && (uint8_t*)obj < m_pCurrentRegistered); | |||
|
|||
// FOH doesn't support objects with non-DATA_ALIGNMENT alignment yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set m_NumComponents
for arrays as part of TryAllocateObject
?
We are setting it too late and we can end up enumerating arrays without m_NumComponents
set that is not going to end wel..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! It also allowed to simplify the PublishObject logic a bit. The final API might be simplified a bit with C++ template to allow use of capturing lambdas for simplicity but that needed a bit more changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me otherwise. Thank you!
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Thanks for the help! I wish I could easily spot all possible race conditions/corner cases just like you 🙂 |
Co-authored-by: Jan Kotas <[email protected]>
/backport to release/8.0 |
Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/5979164788 |
Fixes #90830
Quick explanation how's the dead-lock happening:
Thread1:
Someone (typically, JIT) tries to allocate an object on NonGC heap.
FrozenObjectHeapManager
(FOHM) acquires its lock and calls GC's APIRegisterNewSegment
. That API internally can hit a case when a GC is happening so it has to wait for GC to complete.Thread2 (GC's):
GC is executing a callback (e.g.
GarbageCollectionFinished
or*Started
) and Profiler uses that callback to enumerate objects on NonGC heap viaICorProfilerInfo14::GetNonGCHeapBounds
thus, it also tries to acquire FOHM's lock (to be able to safely enumerate the objects). Thus, GC's thread (Thread2) is wating for FOHM's lock to release (it's taken by Thread1) while Thread1 is waiting for GC to finish.The fix is #90830 (comment)