Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a deadlock in NonGC + Profiler API #90847

Merged
merged 21 commits into from
Aug 25, 2023
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 19, 2023

Fixes #90830

Quick explanation how's the dead-lock happening:
Thread1:
Someone (typically, JIT) tries to allocate an object on NonGC heap. FrozenObjectHeapManager (FOHM) acquires its lock and calls GC's API RegisterNewSegment. That API internally can hit a case when a GC is happening so it has to wait for GC to complete.

Thread2 (GC's):
GC is executing a callback (e.g. GarbageCollectionFinished or *Started) and Profiler uses that callback to enumerate objects on NonGC heap via ICorProfilerInfo14::GetNonGCHeapBounds thus, it also tries to acquire FOHM's lock (to be able to safely enumerate the objects). Thus, GC's thread (Thread2) is wating for FOHM's lock to release (it's taken by Thread1) while Thread1 is waiting for GC to finish.

The fix is #90830 (comment)

@EgorBo
Copy link
Member Author

EgorBo commented Aug 19, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo EgorBo force-pushed the fix-deadlock-nongc branch from 64b9fd0 to cec6257 Compare August 20, 2023 13:21
@EgorBo EgorBo marked this pull request as ready for review August 20, 2023 13:59
@EgorBo
Copy link
Member Author

EgorBo commented Aug 20, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

src/coreclr/vm/profilingenumerators.cpp Show resolved Hide resolved
src/coreclr/vm/frozenobjectheap.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/frozenobjectheap.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/frozenobjectheap.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/appdomain.hpp Outdated Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Aug 20, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davmason davmason requested a review from a team August 23, 2023 01:02
@EgorBo
Copy link
Member Author

EgorBo commented Aug 23, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -221,11 +266,11 @@ Object* FrozenObjectSegment::GetNextObject(Object* obj) const
{
// Input must not be null and should be within the segment
_ASSERT(obj != nullptr);
_ASSERT((uint8_t*)obj >= m_pStart + sizeof(ObjHeader) && (uint8_t*)obj < m_pCurrent);
_ASSERT((uint8_t*)obj >= m_pStart + sizeof(ObjHeader) && (uint8_t*)obj < m_pCurrentRegistered);

// FOH doesn't support objects with non-DATA_ALIGNMENT alignment yet.
Copy link
Member

@jkotas jkotas Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set m_NumComponents for arrays as part of TryAllocateObject?

We are setting it too late and we can end up enumerating arrays without m_NumComponents set that is not going to end wel..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! It also allowed to simplify the PublishObject logic a bit. The final API might be simplified a bit with C++ template to allow use of capturing lambdas for simplicity but that needed a bit more changes

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me otherwise. Thank you!

@EgorBo
Copy link
Member Author

EgorBo commented Aug 25, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Aug 25, 2023

Looks good to me otherwise. Thank you!

Thanks for the help! I wish I could easily spot all possible race conditions/corner cases just like you 🙂

@EgorBo EgorBo merged commit f9382df into dotnet:main Aug 25, 2023
@EgorBo EgorBo deleted the fix-deadlock-nongc branch August 25, 2023 18:19
@EgorBo EgorBo restored the fix-deadlock-nongc branch August 25, 2023 18:19
@EgorBo
Copy link
Member Author

EgorBo commented Aug 25, 2023

/backport to release/8.0

@github-actions
Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/5979164788

@EgorBo EgorBo deleted the fix-deadlock-nongc branch August 25, 2023 18:28
@ghost ghost locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ProfilingAPI: ICorProfilerInfo14::GetNonGCHeapBounds deadlock on .NET 8.0-preview7
4 participants