[Feature Request] dotnet trace should trigger based on namespace and frequency #832

SidShetye · 2020-02-14T18:59:32Z

In production, one typically has a focused suspect area to investigate. Tracing in a busy/noisy production environment would be far more very productive if we could express "trace from Product.Module.SuspectMethod and do that once every 600 seconds."

example

dotnet-trace collect --trigger=MyProduct.MyModule.MyMethod --interval=600 <rest of params>

Additionally, one could have

--interval-jitter=100 to avoid sample aliasing effects.
interval=0 or missing the params: just trace once and be done

The text was updated successfully, but these errors were encountered:

josalem · 2020-02-14T19:38:27Z

Just so I'm clear on what you're looking for: you want to be able to collect a trace that contains CPU sample events only when a specific method is the top frame and then configure the frequency of collecting that specific event? What question do you want answered with this trace, e.g., "What events were fired during this method call?", "How long was this method on the stack?", etc.

Have you tried filtering the trace to that method in PerfView or converting the trace to to the SpeedScope format and looking at the flame graphs there? I would take a look at the Stack Viewer section of the PerfView user guide to get started on how to filter through CPU samples. There's also a good tutorial on doing CPU sample investigations in the User Guide. Note that you will only have one process in the trace so you can ignore chatter in the guide about filtering to a specific process.

SidShetye · 2020-02-14T20:00:57Z

The trace should answer "How long is this method taking in production and what are the measured latencies along the way". That question arises from occasional latency spikes around a suspect method that happens only in production.

Currently, it's a timing issue - we have to ssh in and issue the dotnet-trace command on the server and hope during that run that the suspect method is called. If the suspect method is an occasionally used method on an otherwise busy server this manual trial-and-error approach wastes precious engineering cycles (and gets tedious quickly). Not to mention we're slowing down production by wasteful tracing.

josalem · 2020-02-25T19:20:12Z

The types of questions you are asking are better suited for instrumented diagnostics rather than sampled (although you can get the same info with the latter). If you are looking for information on a specific method and don't want to do general CPU sampling, you are probably better off instrumenting that method with events using the Activity APIs so that you can better understand how long it is on the stack and what happens during its execution. This would give you information like "how long did execution of this method take".

I would recommend writing your own EventSource class and defining a collection of events that make sense for you library, then tracing that provider for targeted diagnostics of your code. You will get stacks with those events. If you're trying to find out things like latency of function calls to external libraries from your code, you should check if their code has a provider with that info, or simply wrap calls to the external code with events + stopwatches.

Alternatively, you could use a service like Application Insights or a paid APM to do codeless instrumentation.

CC @sywhang @noahfalk

noahfalk · 2020-02-26T04:29:43Z

I do think it is a nice goal to be able produce this kind of latency measurement for an arbitrary method defined at debug time (as opposed to build time when you could insert EventSource, Stopwatch or Activity instrumentation directly into the code). I don't expect it is going to be a capability that runtime can add in the near future though so the suggestions that @josalem made are likely to be the best available options for now.

SidShetye · 2020-02-26T06:25:27Z

The types of questions you are asking are better suited for instrumented diagnostics rather than sampled (although you can get the same info with the latter). If you are looking for information on a specific method and don't want to do general CPU sampling, you are probably better off instrumenting that method with events using the Activity APIs so that you can better understand how long it is on the stack and what happens during its execution. This would give you information like "how long did execution of this method take".

I would recommend writing your own EventSource class and defining a collection of events that make sense for you library, then tracing that provider for targeted diagnostics of your code. You will get stacks with those events. If you're trying to find out things like latency of function calls to external libraries from your code, you should check if their code has a provider with that info, or simply wrap calls to the external code with events + stopwatches.

Alternatively, you could use a service like Application Insights or a paid APM to do codeless instrumentation.

CC @sywhang @noahfalk

While those answers could be obtained by app insights, the approach we seek is vastly different. There is 0 desire to instrument any code a-priori hence the ask to be able to filter/capture/trigger at debug time.

Also in our experience, Application Insights leave a lot to be desired in terms of quality (example: microsoft/ApplicationInsights-dotnet#1678) so we actually don’t want to rely on it for tracing, hence the ask to be able to filter at debug time.

josalem added dotnet-trace enhancement New feature or request question Further information is requested labels Feb 14, 2020

tommcdon added this to the 5.0 milestone Mar 11, 2020

tommcdon modified the milestones: 5.0, 6.0 Dec 18, 2020

tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 21, 2021

tommcdon modified the milestones: 7.0.0, 8.0.0 Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] dotnet trace should trigger based on namespace and frequency #832

[Feature Request] dotnet trace should trigger based on namespace and frequency #832

SidShetye commented Feb 14, 2020

josalem commented Feb 14, 2020

SidShetye commented Feb 14, 2020

josalem commented Feb 25, 2020

noahfalk commented Feb 26, 2020

SidShetye commented Feb 26, 2020

[Feature Request] dotnet trace should trigger based on namespace and frequency #832

[Feature Request] dotnet trace should trigger based on namespace and frequency #832

Comments

SidShetye commented Feb 14, 2020

josalem commented Feb 14, 2020

SidShetye commented Feb 14, 2020

josalem commented Feb 25, 2020

noahfalk commented Feb 26, 2020

SidShetye commented Feb 26, 2020