Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME #1166

Closed
josalem opened this issue May 26, 2020 · 17 comments · Fixed by microsoft/perfview#1613
Closed

Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME #1166

josalem opened this issue May 26, 2020 · 17 comments · Fixed by microsoft/perfview#1613
Assignees
Labels
documentation Documentation related issue enhancement New feature or request
Milestone

Comments

@josalem
Copy link
Contributor

josalem commented May 26, 2020

Is there any documentation about what is UNMANAGED_CODE_TIME? When I look into the Thread Time stack, UNMANAGED_CODE_TIME always takes up 100% of the time. It seems very obscure as to what exactly it is collecting.

Originally posted by @iliamosko in #976 (comment)

@josalem
Copy link
Contributor Author

josalem commented May 26, 2020

@iliamosko, could you post a sample trace with this behavior and describe your environment, e.g., what version of .NET the target app is running, what version of dotnet-trace was used, what platform you were running on, what you were using to view the trace, and whether you were running inside a container?

@iliamosko
Copy link

My application is running on .NET Core 3.1 which is trying to simulate high CPU load. The dotnet-trace version I'm using is 3.1.120604. For the trace collected I used the default dotnet-trace collect command with a 1 minute duration:

dotnet-trace collect -p <PID> --duration 00:00:01:00

I've used that command on both a virtual Linux machine(WSL) and on a Windows 10 machine. I'm getting similar results for both traces, but I'm going to focus on the trace collected on Windows. I'm using PerfView that is running on version 2.0.52 to view the collected trace file.

By Name stack:
image

Call-Tree stack:
image

Callers stack:
image

So my questions is,

What exactly is UNMANAGED_CODE_TIME? What is it collecting?

When comparing that trace to one that uses this dotnet-trace collect command:

dotnet-trace collect -p <PID> --providers '*' --duration 00:00:01:00

UNMANAGED_CODE_TIME gets split into CPU_TIME which is understandable but it still doesn't answer what UNMANAGED_CODE_TIME is collecting.

image

@josalem
Copy link
Contributor Author

josalem commented May 26, 2020

Can you expand on what your test app is doing? UNMANAGED_CODE_TIME means what the name implies: it is time where unmanaged code was on the stack, e.g., calls into native code like the runtime. The example stack you have there shows UNMANAGED_CODE_TIME on top of System.DateTime.get_UtcNow() which is probably because the implementation of the getter for UtcNow() calls into native code to retrieve the datetime value for the system. The in-box CPU sampler in the runtime can only walk managed code, which means that native code will show up in traces as UNMANAGED_CODE_TIME. If you want native frames as well as managed frames, you would need to use a different tool to do your CPU profiling (see: perf on linux or ETW on Windows).

@iliamosko
Copy link

So just taking a look at the DoSomeMath method, here is what I am doing:

Main:
image

DoSomeMath method:
image

All it does is get the current time and run the DoSomeMath method on a different thread for a certain amount of time. Other methods in this app follow the same approach.

Would it be possible to determine which actions in UNMANAGED_CODE_TIME are being done in cpu time? or is it the CPU-TIME stack, which was collected in the second trace, that I am looking for?

@baal2000
Copy link

@josalem
To give a bit of the context, we are moving towards OS-agnostic performance tracing in windows/linux by fully replacing PervView with dotnet -trace tool. Considering that a good measure of .NET calls end up in unmanaged space, production incident troubleshooting (with PerfView) often finds CPU being burnt in a system rather than managed call. If, by design, dotnet -trace may not be able to tell whether any given (whether system - bound or managed) .NET call ends up eating CPU, the usability of such a tool is significantly limited and we need to take a look elsewhere and hope it is going to get better in time. Linux is the main concern because of no PerfView there. This is what we are trying to understand.

@baal2000
Copy link

baal2000 commented May 27, 2020

Other methods in this app follow the same approach

This is a test app to understand trace limitations.

@josalem
Copy link
Contributor Author

josalem commented May 27, 2020

A stack sample that has UNMANAGED_CODE_TIME at the top, still has the chain of managed events that resulted in that native code call, e.g., in your sample images above from the PerfView UI you collected samples of the following stack:

UNMANAGED_CODE_TIME                               (18.4%)
System.DateTime.get_UtcNow                        (23.4%)
TestApp.Program.DoSomeMath                        (33.4%)
System.Threading.ThreadHelper.ThreadStart_context (33.4%)
...

where those percentages are inclusive samples. This tells you what managed calls lead to spending CPU time in native code, as well as the % of stack samples that inclusively had that frame in that position. When you put the PerfView UI into Caller-Callee view and bucket on UNMANGED_CODE_TIME, it's going to treat all UNMANAGED_CODE_TIME frames as the same thing and you're going to get a very distorted view of the trace.

If, by design, dotnet -trace may not be able to tell whether any given (whether system - bound or managed) .NET call ends up eating CPU

I'm not quite sure I understand your statement here. The trace shows you which managed calls had the most inclusive samples and by extension spent the most time on the stack. In your images above, you can see that ~23% of stacks sampled on thread 14448 had the frame System.DateTime.get_UtcNow in that position on the stack. 18.4% of samples on that thread were stacks with calls from System.DateTime.get_UtcNow into native code. The % isn't a measure of time, but you can extrapolate that if there is a high % of samples with that stack, that a good amount of time was spent with that method on the stack. That is all relative to the number of samples on a given thread, though. That should indicate to you which calls took the most time. Even if a managed frame calls into native code, the managed frame is still on the stack and you can derive understanding of what "took the most time" from that.

I think what might be causing confusion here is the parsing of the data rather than the data itself. It sounds like you want to know the answer the question: "What managed calls took the most time?" If that is the question you want answered, looking at the unfiltered caller-callee view won't be the best way to answer it. I would recommend looking at a flamegraph (try dotnet trace convert --format Speedscope <trace-file> and open at https://www.speedscope.app for doing this without PerfView). Additionally, I would filter out the UNMANAGED_CODE_TIME frames when using PerfView views other than CallTree and Caller-Callee as it will bucket them all as the same frame despite them not being the same. If you filter them out, you will be able to get a clearer picture of just managed frames using the standard PerfView views.

If you want to answer the question, "What percent of samples were collected while executing kernel code?" or "How much time was spent inside DoStuff() from MyNativeCode.so?" then you will need to use a different tracing tool. Specifically, you will need a CPU sampling tool that is capable of collecting native frames, e.g., perf on Linux.

Let me know if that helps/answered your question 😄

CC @shirhatti @noahfalk

@baal2000
Copy link

baal2000 commented May 27, 2020

@josalem
Thank you John. That makes perfect sense for

UNMANAGED_CODE_TIME                               (18.4%)
System.DateTime.get_UtcNow                        (23.4%)
TestApp.Program.DoSomeMath                        (33.4%)
System.Threading.ThreadHelper.ThreadStart_context (33.4%)
...

I'm not quite sure I understand your statement here.

I think the sticking point is what "the most time on the stack" means in

which managed calls had the most inclusive samples and by extension spent the most time on the stack

Is the thread time CPU or BLOCKED?

For example, what about this screenshot:

UNMANAGED_CODE_TIME                              (66.2%)
System.Console.il!ConsolePal.ReadKey(bool)       (19.3%)
System.DateTime.get_UtcNow                       (17.0%)
...

image

image

ReadKey call is not CPU- bound for what we know, yet it is lumped together with get_UtcNow under the same UNMANAGED_CODE_TIME bucket.

Hard to tell without specialized knowledge of how the framework implements this or that API.
Much easier if we had something like this:

CPU_TIME                                             (79.2%)
    UNMANAGED_CODE_TIME                              (46.2%)
        System.DateTime.get_UtcNow                   (17.0%)
        ...
     Math.Acos()                                     (33.0%)
BLOCKED_TIME                                         (19.3%)
    UNMANAGED_CODE_TIME                              (19.3%)
      System.Console.il!ConsolePal.ReadKey(bool)     (19.3%)    

Thank you for looking into this!

@josalem
Copy link
Contributor Author

josalem commented May 27, 2020

I took a bit of a deeper look into what causes PerfView to output CPU_TIME vs UNMANAGED_CODE_TIME and it looks like there might be an issue in which stacks are getting labelled under the wrong category. I'll investigate and create an issue in the Microsoft/PerfView repo if I can pinpoint what's happening. My understanding is that UNMANAGED_CODE_TIME is supposed to be "external" code frames, while CPU_TIME is general "working time" for a stack.

As for differentiating between "blocked" time and "working" time, that is unfortunately something that I don't think dotnet-trace will be capable of. The traces collected by dotnet-trace were designed to contain the maximum amount of information possible in a platform agnostic way. This means they won't contain information pertaining to the OS scheduler and by extension things like "when a thread has been put to sleep waiting on something to complete". You will be able to see how long managed methods were on the stack and derive information from that, but if you want OS Scheduler details, then you'll need to use ETW on Windows or something like perf on Linux (see perfcollect for something already tuned for .NET).

@baal2000
Copy link

looks like there might be an issue in which stacks are getting labelled under the wrong category.

We are going to try https://www.speedscope.app/ then.

Thanks.

@tommcdon tommcdon added the documentation Documentation related issue label Jun 1, 2020
@tommcdon tommcdon added this to the 5.0 milestone Jun 1, 2020
@josalem josalem changed the title Is there any documentation about what is _UNMANAGED_CODE_TIME_? Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME Nov 24, 2020
@josalem josalem modified the milestones: 5.0, 6.0 Nov 24, 2020
@josalem josalem self-assigned this Nov 24, 2020
@josalem
Copy link
Contributor Author

josalem commented Nov 24, 2020

The code responsible for adding the UNMANAGED_CODE_TIME and CPU_TIME frames for EventPipe traces is found here:

            public void LogThreadStack(double timeRelativeMSec, StackSourceCallStackIndex stackIndex, TraceThread thread, SampleProfilerThreadTimeComputer computer, bool onCPU)
            {
                if (onCPU)
                {
                    if (ThreadRunning) // continue running 
                    {
                        AddCPUSample(timeRelativeMSec, thread, computer);
                    }
                    else if (ThreadBlocked) // unblocked
                    {
                        AddBlockTimeSample(timeRelativeMSec, thread, computer);
                        LastBlockStackRelativeMSec = -timeRelativeMSec;
                    }


                    LastCPUStackRelativeMSec = timeRelativeMSec;
                    LastCPUCallStack = stackIndex;
                }
                else
                {
                    if (ThreadBlocked) // continue blocking
                    {
                        AddBlockTimeSample(timeRelativeMSec, thread, computer);
                    }
                    else if (ThreadRunning) // blocked
                    {
                        AddCPUSample(timeRelativeMSec, thread, computer);
                    }


                    LastBlockStackRelativeMSec = timeRelativeMSec;
                    LastBlockCallStack = stackIndex;
                }
            }

https://github.com/microsoft/perfview/blob/e75e3856d53c123c7f7ba202579d44ba4d91a7b8/src/TraceEvent/Computers/SampleProfilerThreadTimeComputer.cs#L477-L508

AddCPUSample adds the CPU_TIME frame, while AddBlockTimeSameple adds UNMANAGED_CODE_TIME. Unfortunately, I believe these there may be an issue where managed stacks are being marked as "external" code, so stacks that should have CPU_TIME as the leaf frame instead have UNMANAGED_CODE_TIME. Regardless of that issue, we should document the intended and correct meanings of these frames. This issue will be used to track that documentation. (#515)

@baal2000
Copy link

@josalem : I believe these there may be an issue where managed stacks are being marked as "external" code

Then this would be rather be a bug, not enchancement issue?

@josalem
Copy link
Contributor Author

josalem commented Jun 10, 2021

This issue is only tracking the documentation of what those values mean. The underlying issue is being tracked here: dotnet/runtime#45179. I'm not sure why they weren't linked.

@baal2000
Copy link

Thanks for the update @josalem, now I know this does not apply to our situation (on Linux).

@noahfalk
Copy link
Member

noahfalk commented Jun 2, 2022

@josalem I see this got closed but was there a doc change that completed it? I wasn't able to find it.

@josalem
Copy link
Contributor Author

josalem commented Jun 2, 2022

I added docs on the PerfView side here: microsoft/perfview#1613

I didn't add any docs in the dotnet docs though. If we do, we should just link to the PerfView ones.

@noahfalk
Copy link
Member

noahfalk commented Jun 2, 2022

Ah thats right, I remember talking about the PerfView docs now. Thanks for reminding my forgetful ole self ;p

@ghost ghost locked as resolved and limited conversation to collaborators Jun 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Documentation related issue enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants