Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SampleProfiler incorrectly marks all samples as external code on Windows #45179

Closed
josalem opened this issue Nov 25, 2020 · 2 comments · Fixed by #59974
Closed

SampleProfiler incorrectly marks all samples as external code on Windows #45179

josalem opened this issue Nov 25, 2020 · 2 comments · Fixed by #59974
Assignees
Labels
area-Diagnostics-coreclr enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@josalem
Copy link
Contributor

josalem commented Nov 25, 2020

The SampleProfiler sends ThreadSample events that reference stacks. The payload of these events is supposed to indicate what type of code was executing in the leaf frame at the time of suspension.

// Set the payload. If the GC mode on suspension > 0, then the thread was in cooperative mode.
// Even though there are some cases where this is not managed code, we assume it is managed code here.
// If the GC mode on suspension == 0 then the thread was in preemptive mode, which we qualify as external here.
// Write the sample.
EventPipe::WriteSampleProfileEvent(
s_pSamplingThread,
s_pThreadTimeEvent,
pTargetThread,
stackContents,
pTargetThread->GetGCModeOnSuspension() ? s_ManagedPayload.Rawdata : s_ExternalPayload.Rawdata,
c_payloadSize);
}

The SampleProfiler uses a cached value of m_fPreemptiveGCDisabled to determine whether to mark a sample as "managed" (GC was in COOP mode) or "external" (GC was in preemptive mode).

#ifdef FEATURE_PERFTRACING
private:
// SampleProfiler thread state. This is set on suspension and cleared before restart.
// True if the thread was in cooperative mode. False if it was in preemptive when the suspension started.
Volatile<ULONG> m_gcModeOnSuspension;
// The activity ID for the current thread.
// An activity ID of zero means the thread is not executing in the context of an activity.
GUID m_activityId;
public:
bool GetGCModeOnSuspension()
{
LIMITED_METHOD_CONTRACT;
return m_gcModeOnSuspension != 0U;
}
void SaveGCModeOnSuspension()
{
LIMITED_METHOD_CONTRACT;
m_gcModeOnSuspension = m_fPreemptiveGCDisabled;
}
void ClearGCModeOnSuspension()
{
m_gcModeOnSuspension = 0;
}

Currently, SaveGCModeOnSuspension is only called from HandleGCSuspensionForInterruptedThread.

// This function is called when a GC is pending. It tries to ensure that the current
// thread is taken to a GC-safe place as quickly as possible. It does this by doing
// one of the following:
//
// - If the thread is in native code or preemptive GC is not disabled, there's
// nothing to do, so we return.
//
// - If the thread is in interruptible managed code, we will push a frame that
// has information about the context that was interrupted and then switch to
// preemptive GC mode so that the pending GC can proceed, and then switch back.
//
// - If the thread is in uninterruptible managed code, we will patch the return
// address to take the thread to the appropriate stub (based on the return
// type of the method) which will then handle preparing the thread for GC.
//
void HandleGCSuspensionForInterruptedThread(CONTEXT *interruptedContext)
{
Thread *pThread = GetThread();
if (pThread->PreemptiveGCDisabled() != TRUE)
return;
#ifdef FEATURE_PERFTRACING
// Mark that the thread is currently in managed code.
pThread->SaveGCModeOnSuspension();
#endif // FEATURE_PERFTRACING

Unfortunately, this code is only used for thread suspension on non-Windows platforms. This means that all sample events on Windows are marked as "external" code since the underlying value is never changed from the uninitialized value of 0 (meaning preemptive mode). On Linux/Mac systems, however, stacks are correctly labeled.

This value is used in tools like PerfView to do analysis on things like CPU blocking time.

We need to determine a location to call SaveGCModeOnSuspension that works for both platforms or add a call to the equivalent location on Windows.

One spot I'm vetting is here:

while (true)
{
Thread* thread = NULL;
while ((thread = ThreadStore::GetThreadList(thread)) != NULL)
{
if (thread == pCurThread)
continue;

in ThreadSuspend::SuspendRuntime where it enumerates all the threads it is about to suspend.

I'm not sure how long this has been an issue, but I'm guessing it has been since this logic went in (~4 years ago circa .NET Core 2.2).

Is there another way we could transmit this information? We could save the 4 bytes per sample event if we do away with this payload. That can add up fast when you have ~1000 samples per thread per second. That's roughly 4 kb/s per thread for a value that will always be 1 or 2. Could we intern this information directly into the stack events? That would require a format change, though.

CC @sywhang @noahfalk @brianrob @tommcdon

@josalem josalem added this to the 6.0.0 milestone Nov 25, 2020
@ghost
Copy link

ghost commented Nov 25, 2020

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

The SampleProfiler sends ThreadSample events that reference stacks. The payload of these events is supposed to indicate what type of code was executing in the leaf frame at the time of suspension.

// Set the payload. If the GC mode on suspension > 0, then the thread was in cooperative mode.
// Even though there are some cases where this is not managed code, we assume it is managed code here.
// If the GC mode on suspension == 0 then the thread was in preemptive mode, which we qualify as external here.
// Write the sample.
EventPipe::WriteSampleProfileEvent(
s_pSamplingThread,
s_pThreadTimeEvent,
pTargetThread,
stackContents,
pTargetThread->GetGCModeOnSuspension() ? s_ManagedPayload.Rawdata : s_ExternalPayload.Rawdata,
c_payloadSize);
}

The SampleProfiler uses a cached value of m_fPreemptiveGCDisabled to determine whether to mark a sample as "managed" (GC was in COOP mode) or "external" (GC was in preemptive mode).

#ifdef FEATURE_PERFTRACING
private:
// SampleProfiler thread state. This is set on suspension and cleared before restart.
// True if the thread was in cooperative mode. False if it was in preemptive when the suspension started.
Volatile<ULONG> m_gcModeOnSuspension;
// The activity ID for the current thread.
// An activity ID of zero means the thread is not executing in the context of an activity.
GUID m_activityId;
public:
bool GetGCModeOnSuspension()
{
LIMITED_METHOD_CONTRACT;
return m_gcModeOnSuspension != 0U;
}
void SaveGCModeOnSuspension()
{
LIMITED_METHOD_CONTRACT;
m_gcModeOnSuspension = m_fPreemptiveGCDisabled;
}
void ClearGCModeOnSuspension()
{
m_gcModeOnSuspension = 0;
}

Currently, SaveGCModeOnSuspension is only called from HandleGCSuspensionForInterruptedThread.

// This function is called when a GC is pending. It tries to ensure that the current
// thread is taken to a GC-safe place as quickly as possible. It does this by doing
// one of the following:
//
// - If the thread is in native code or preemptive GC is not disabled, there's
// nothing to do, so we return.
//
// - If the thread is in interruptible managed code, we will push a frame that
// has information about the context that was interrupted and then switch to
// preemptive GC mode so that the pending GC can proceed, and then switch back.
//
// - If the thread is in uninterruptible managed code, we will patch the return
// address to take the thread to the appropriate stub (based on the return
// type of the method) which will then handle preparing the thread for GC.
//
void HandleGCSuspensionForInterruptedThread(CONTEXT *interruptedContext)
{
Thread *pThread = GetThread();
if (pThread->PreemptiveGCDisabled() != TRUE)
return;
#ifdef FEATURE_PERFTRACING
// Mark that the thread is currently in managed code.
pThread->SaveGCModeOnSuspension();
#endif // FEATURE_PERFTRACING

Unfortunately, this code is only used for thread suspension on non-Windows platforms. This means that all sample events on Windows are marked as "external" code since the underlying value is never changed from the uninitialized value of 0 (meaning preemptive mode). On Linux/Mac systems, however, stacks are correctly labeled.

This value is used in tools like PerfView to do analysis on things like CPU blocking time.

We need to determine a location to call SaveGCModeOnSuspension that works for both platforms or add a call to the equivalent location on Windows.

One spot I'm vetting is here:

while (true)
{
Thread* thread = NULL;
while ((thread = ThreadStore::GetThreadList(thread)) != NULL)
{
if (thread == pCurThread)
continue;

in ThreadSuspend::SuspendRuntime where it enumerates all the threads it is about to suspend.

I'm not sure how long this has been an issue, but I'm guessing it has been since this logic went in (~4 years ago circa .NET Core 2.2).

Is there another way we could transmit this information? We could save the 4 bytes per sample event if we do away with this payload. That can add up fast when you have ~1000 samples per thread per second. That's roughly 4kb/s for a value that will always be 1 or 2. Could we intern this information directly into the stack events? That would require a format change, though.

CC @sywhang @noahfalk @brianrob @tommcdon

Author: josalem
Assignees: -
Labels:

area-Diagnostics-coreclr

Milestone: 6.0.0

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 25, 2020
@josalem josalem removed the untriaged New issue has not been triaged by the area owner label Nov 25, 2020
@brianrob
Copy link
Member

Good catch @josalem. I suspect you are right that this has been this way since 2.2, as the initial EventPipe implementation in 2.0 was limited to UNIX which uses a different suspension mechanism.

@tommcdon tommcdon added the enhancement Product code improvement that does NOT require public API changes/additions label Mar 3, 2021
@tommcdon tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 16, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 5, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 7, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics-coreclr enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants