Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] dotnet trace should trigger based on namespace and frequency #832

Open
SidShetye opened this issue Feb 14, 2020 · 5 comments
Labels
dotnet-trace enhancement New feature or request question Further information is requested
Milestone

Comments

@SidShetye
Copy link

In production, one typically has a focused suspect area to investigate. Tracing in a busy/noisy production environment would be far more very productive if we could express "trace from Product.Module.SuspectMethod and do that once every 600 seconds."

example

dotnet-trace collect --trigger=MyProduct.MyModule.MyMethod --interval=600 <rest of params>

Additionally, one could have

--interval-jitter=100 to avoid sample aliasing effects.
interval=0 or missing the params: just trace once and be done

@josalem
Copy link
Contributor

josalem commented Feb 14, 2020

Just so I'm clear on what you're looking for: you want to be able to collect a trace that contains CPU sample events only when a specific method is the top frame and then configure the frequency of collecting that specific event? What question do you want answered with this trace, e.g., "What events were fired during this method call?", "How long was this method on the stack?", etc.

Have you tried filtering the trace to that method in PerfView or converting the trace to to the SpeedScope format and looking at the flame graphs there? I would take a look at the Stack Viewer section of the PerfView user guide to get started on how to filter through CPU samples. There's also a good tutorial on doing CPU sample investigations in the User Guide. Note that you will only have one process in the trace so you can ignore chatter in the guide about filtering to a specific process.

@josalem josalem added dotnet-trace enhancement New feature or request question Further information is requested labels Feb 14, 2020
@SidShetye
Copy link
Author

The trace should answer "How long is this method taking in production and what are the measured latencies along the way". That question arises from occasional latency spikes around a suspect method that happens only in production.

Currently, it's a timing issue - we have to ssh in and issue the dotnet-trace command on the server and hope during that run that the suspect method is called. If the suspect method is an occasionally used method on an otherwise busy server this manual trial-and-error approach wastes precious engineering cycles (and gets tedious quickly). Not to mention we're slowing down production by wasteful tracing.

@josalem
Copy link
Contributor

josalem commented Feb 25, 2020

The types of questions you are asking are better suited for instrumented diagnostics rather than sampled (although you can get the same info with the latter). If you are looking for information on a specific method and don't want to do general CPU sampling, you are probably better off instrumenting that method with events using the Activity APIs so that you can better understand how long it is on the stack and what happens during its execution. This would give you information like "how long did execution of this method take".

I would recommend writing your own EventSource class and defining a collection of events that make sense for you library, then tracing that provider for targeted diagnostics of your code. You will get stacks with those events. If you're trying to find out things like latency of function calls to external libraries from your code, you should check if their code has a provider with that info, or simply wrap calls to the external code with events + stopwatches.

Alternatively, you could use a service like Application Insights or a paid APM to do codeless instrumentation.

CC @sywhang @noahfalk

@noahfalk
Copy link
Member

I do think it is a nice goal to be able produce this kind of latency measurement for an arbitrary method defined at debug time (as opposed to build time when you could insert EventSource, Stopwatch or Activity instrumentation directly into the code). I don't expect it is going to be a capability that runtime can add in the near future though so the suggestions that @josalem made are likely to be the best available options for now.

@SidShetye
Copy link
Author

The types of questions you are asking are better suited for instrumented diagnostics rather than sampled (although you can get the same info with the latter). If you are looking for information on a specific method and don't want to do general CPU sampling, you are probably better off instrumenting that method with events using the Activity APIs so that you can better understand how long it is on the stack and what happens during its execution. This would give you information like "how long did execution of this method take".

I would recommend writing your own EventSource class and defining a collection of events that make sense for you library, then tracing that provider for targeted diagnostics of your code. You will get stacks with those events. If you're trying to find out things like latency of function calls to external libraries from your code, you should check if their code has a provider with that info, or simply wrap calls to the external code with events + stopwatches.

Alternatively, you could use a service like Application Insights or a paid APM to do codeless instrumentation.

CC @sywhang @noahfalk

While those answers could be obtained by app insights, the approach we seek is vastly different. There is 0 desire to instrument any code a-priori hence the ask to be able to filter/capture/trigger at debug time.

Also in our experience, Application Insights leave a lot to be desired in terms of quality (example: microsoft/ApplicationInsights-dotnet#1678) so we actually don’t want to rely on it for tracing, hence the ask to be able to filter at debug time.

@tommcdon tommcdon added this to the 5.0 milestone Mar 11, 2020
@tommcdon tommcdon modified the milestones: 5.0, 6.0 Dec 18, 2020
@tommcdon tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 21, 2021
@tommcdon tommcdon modified the milestones: 7.0.0, 8.0.0 Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dotnet-trace enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants