Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace.speedscope.json is empty (just 230B) #1572

Closed
sunny868 opened this issue Sep 17, 2020 · 15 comments · Fixed by microsoft/perfview#1329 or #1878
Closed

trace.speedscope.json is empty (just 230B) #1572

sunny868 opened this issue Sep 17, 2020 · 15 comments · Fixed by microsoft/perfview#1329 or #1878
Assignees
Labels
nettrace conversion related to converting nettrace to other file formats, e.g., chromium eventtrace or speedscope
Milestone

Comments

@sunny868
Copy link

#1532

Description

$ dotnet-trace collect -p 6061 --format speedscope
No profile or providers specified, defaulting to trace profile 'cpu-sampling'

Provider Name Keywords Level Enabled By
Microsoft-DotNETCore-SampleProfiler 0x0000F00000000000 Informational(4) --profile
Microsoft-Windows-DotNETRuntime 0x00000014C14FCCBD Informational(4) --profile

Process : /home/gysun/dotnet-sdk-3.1.401-linux-x64/dotnet
Output File : /home/gysun/gysun-perf/trace.nettrace

[00:00:00:13] Recording trace 479.778 (KB)
Press <Enter> or <Ctrl+C> to exit...

Trace completed.
Writing: /home/gysun/gysun-perf/trace.speedscope.json
Conversion complete

$ ls -lh trace.nettrace
-rw-rw-r-- 1 gysun gysun 469K 9月 17 15:50 trace.nettrace

$ ls -lh trace.speedscope.json
-rw-rw-r-- 1 gysun gysun 230 9月 17 15:50 trace.speedscope.json

Configuration

$ dotnet-trace --version
3.1.141901+3f460984c62efa118b6bd87fa4c5f07b11074b34

$ uname -a
Linux fool-OptiPlex-7060 4.15.0-115-generic #116-Ubuntu SMP Wed Aug 26 14:04:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

@josalem
Copy link
Contributor

josalem commented Sep 17, 2020

@sunny868 is it possible to upload the nettrace file that you're having this issue with and possible the contents of the 230B file? It will help to be able to look at the conversion process and see if it repros locally.

BTW, is there a connection between #1532 and this issue? You mentioned it at the top.

@josalem josalem added the nettrace conversion related to converting nettrace to other file formats, e.g., chromium eventtrace or speedscope label Sep 17, 2020
@sunny868
Copy link
Author

nettrace.zip

@josalem nettrace.zip include file trace.nettrace and trace.speedscope.json

@sunny868
Copy link
Author

@josalem Will a new version be released for this problem?

@josalem
Copy link
Contributor

josalem commented Sep 21, 2020

image

I popped open the nettrace file and it looks like there are over 8000 events from the sample profiler (as well as rundown and a few other events). The trace is definitely not empty, but PerfView can't seem to parse the stacks from the events. This is quite odd. Let me collect a trace from a basic 3.1.8 app and see if I can repro this. Is there anything noteworthy about the application?

For reference I was using PerfView 2.0.57.

Hmm one thing I'm noticing is that the sample events have empty payloads. They are 4 byte payloads with the value "2" in them for all the ones I spot checked.

CC @brianrob @noahfalk @sywhang

@sunny868
Copy link
Author

SeekUnroll.zip
@josalem This is the c# source code and it come from https://github.com/dotnet/performance.git

@josalem
Copy link
Contributor

josalem commented Sep 22, 2020

Okay, I poked around a bit more. There are stacks in the nettrace file. I had my include/exclude paths messed up and couldn't see them before. There is an odd behavior where the event name itself is being used as a frame name, but if you ignore those frames, there is a stack visible in perfview. Presumably due to inlining you can only see up to SeekUnroll.FindByte, though.

image

Now, the interesting part: when I convert the nettrace file to speedscope using dotnet-trace, I get a speedscope file with no stacks, but when I convert it using the PerfView->File->Save View As dialogue I get a working file.

CC @adamsitnik

@thargy
Copy link

thargy commented Oct 29, 2020

@josalem,

I have encountered the same problem, even after updating to version 5.0.152202+4d281c71a14e6226ab0bf0c98687db4a5c4217e3 from the (https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet5/nuget/v3/index.json) package source (was initially on v.3.1.141901, which had identical behaviour).

When capturing a trace on Windows 10, using:

dotnet-trace collect --format Speedscope -n NDI.TestApp --providers System.Runtime:0:1:EventCounterIntervalSec=1,DevDecoder.NDI:0:1:EventCounterIntervalSec=1

Where, DevDecoder.NDI is a set of custom events, I get a valid trace.nettrace file, but the trace.speedscope.json file is effectively empty.

I get the same problem with --format Chromium (also produces an 'empty' trace.chromium.json). I've included all 3 files in the attached zip (though the Chromium format was captured with an earlier run, it demonstrates the same 'empty' issue).
Traces.zip

@josalem
Copy link
Contributor

josalem commented Nov 20, 2020

I think I've narrowed this down to this method in TraceEvent: https://github.com/microsoft/perfview/blob/e7a025a9ac9bec1348beeb118b8eb986fcd42703/src/TraceEvent/Computers/SampleProfilerThreadTimeComputer.cs#L62

There are stacks in the traces provided above. I can dump them by poking at the trace directly. The StackSource that is supposed to be populated by GenerateThreadTimeStacks is empty after running this method. When I just look for the sample events, I can see plenty of them. I think there may be something going on with the computers attempts at computing thread time that are messing up.

I'll try debugging through the above method and see what I can find.

CC @adamsitnik

@josalem
Copy link
Contributor

josalem commented Nov 21, 2020

I'm pretty sure I've identified the issue. I'm not 100% sure if this the "correct" solution, but I know it fixes the issue. The logic that I believe is at fault is here:

https://github.com/microsoft/perfview/blob/e7a025a9ac9bec1348beeb118b8eb986fcd42703/src/TraceEvent/Computers/SampleProfilerThreadTimeComputer.cs#L475-L555

This method in particular is the one to blame:

            public void LogThreadStack(double timeRelativeMSec, StackSourceCallStackIndex stackIndex, TraceThread thread, SampleProfilerThreadTimeComputer computer, bool onCPU)
            {
                if (onCPU)
                {
                    if (ThreadRunning) // continue running 
                    {
                        AddCPUSample(timeRelativeMSec, thread, computer);
                    }
                    else if (ThreadBlocked) // unblocked
                    {
                        AddBlockTimeSample(timeRelativeMSec, thread, computer);
                        LastBlockStackRelativeMSec = -timeRelativeMSec;
                    }

                    LastCPUStackRelativeMSec = timeRelativeMSec;
                    LastCPUCallStack = stackIndex;
                }
                else
                {
                    if (ThreadBlocked) // continue blocking
                    {
                        AddBlockTimeSample(timeRelativeMSec, thread, computer);
                    }
                    else if (ThreadRunning) // blocked
                    {
                        AddCPUSample(timeRelativeMSec, thread, computer);
                    }

                    LastBlockStackRelativeMSec = timeRelativeMSec;
                    LastBlockCallStack = stackIndex;
                }
            }

If all the samples in a trace are onCPU, the thread is never put into the blocked state, so ThreadRunning never returns true:

public bool ThreadRunning { get { return LastBlockStackRelativeMSec < 0 && !ThreadDead; } }

If ThreadRunning is never true, no samples are put into the stack source because it never calls AddCPUSample or AddBlockTimeSample. This results in no stacks in the output.

I fixed this by simply changing the first check to be if (ThreadRunning || ThreadUninitialized) so that the thread state starts correctly if all the samples are managed code.

I'll create a PR on microsoft/perfview.

@josalem
Copy link
Contributor

josalem commented Nov 24, 2020

Did a little more digging on this. I think there is something more subtle going on here. All of the sample events in the problematic trace have a payload of 2, which translates to a managed stack. When I've tried to recreate this locally, I end up with a nearly identical trace, except all the sample events have a payload of 1, meaning external.

Looking into the code that sets that value leads me here:

            // Set the payload.  If the GC mode on suspension > 0, then the thread was in cooperative mode.
            // Even though there are some cases where this is not managed code, we assume it is managed code here.
            // If the GC mode on suspension == 0 then the thread was in preemptive mode, which we qualify as external here.


            // Write the sample.
            EventPipe::WriteSampleProfileEvent(
                s_pSamplingThread,
                s_pThreadTimeEvent,
                pTargetThread,
                stackContents,
                pTargetThread->GetGCModeOnSuspension() ? s_ManagedPayload.Rawdata : s_ExternalPayload.Rawdata,
                c_payloadSize);
        }

https://github.com/dotnet/runtime/blob/3f8966546a09ec104ca958eab20a22463907fed1/src/coreclr/src/vm/sampleprofiler.cpp#L238-L250

The value of GetGCModeOnSuspension is what determines whether the sample is logged as "external" or "managed." The logic for that is stored here:

#ifdef FEATURE_PERFTRACING
private:


    // SampleProfiler thread state.  This is set on suspension and cleared before restart.
    // True if the thread was in cooperative mode.  False if it was in preemptive when the suspension started.
    Volatile<ULONG> m_gcModeOnSuspension;


    // The activity ID for the current thread.
    // An activity ID of zero means the thread is not executing in the context of an activity.
    GUID m_activityId;


public:
    bool GetGCModeOnSuspension()
    {
        LIMITED_METHOD_CONTRACT;
        return m_gcModeOnSuspension != 0U;
    }


    void SaveGCModeOnSuspension()
    {
        LIMITED_METHOD_CONTRACT;
        m_gcModeOnSuspension = m_fPreemptiveGCDisabled;
    }


    void ClearGCModeOnSuspension()
    {
        m_gcModeOnSuspension = 0;
    }

https://github.com/dotnet/runtime/blob/73a79932b9bfcf2078f21f173661f2d4afed9e90/src/coreclr/src/vm/threads.h#L4627-L4654

Somehow, it is always "external" in my attempts to reproduce the issue, while the problematic trace is always "managed." If the entire trace is "external," the computer handles the trace, but if all the events are "managed" it doesn't. My PR into PerfView effectively makes the state machine cover the all maanged case, but I think there may be something more to this issue.

I'll be running this under a debugger to observe this behavior directly.

@josalem
Copy link
Contributor

josalem commented Nov 25, 2020

Figured out why I couldn't reproduce the original issue: I was running on Windows. There is an omission in the runtime that is causing all stacks on Windows to be marked as external code. When I collect the same trace for the same program on my Mac, I get the correct output which is a reproduction of the original issue.

My PR in TraceEvent should fix the conversion code in this case, but I'll need to open a coreclr issue to address the underlying problem the underlying issue will be tracked here: dotnet/runtime#45179.

@josalem
Copy link
Contributor

josalem commented Nov 25, 2020

Reopening until we have taken a TraceEvent update containing the fix

@josalem josalem reopened this Nov 25, 2020
@tommcdon
Copy link
Member

@josalem have we taken a traceevent update yet?

@tommcdon tommcdon added this to the 6.0 milestone Dec 18, 2020
@josalem
Copy link
Contributor

josalem commented Dec 18, 2020

@josalem have we taken a traceevent update yet?

No, TraceEvent hasn't done a release since that fix went in as far as I'm aware.

@noahfalk
Copy link
Member

noahfalk commented Jan 4, 2021

Have we asked @brianrob to do one? It sounds like we could use it : )

josalem pushed a commit that referenced this issue Jan 6, 2021
josalem pushed a commit that referenced this issue Jan 21, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
nettrace conversion related to converting nettrace to other file formats, e.g., chromium eventtrace or speedscope
Projects
None yet
5 participants