Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on EventCounters #605

Merged
merged 11 commits into from
Nov 27, 2019
Binary file added documentation/design-docs/EventCounters.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
326 changes: 326 additions & 0 deletions documentation/design-docs/eventcounters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
# EventCounters in .NET Core 3.0

## Introduction
EventCounters are .NET Core APIs used for lightweight, cross-platform, and near real-time performance metric collection. EventCounters that were added as a cross-platform alternative to the "performance counters" on the .NET Framework on Windows. This documentation serves as a guide on what they are, how to implement them, and how to consume them.

The .NET Core runtime (CoreCLR) and few .NET libraries publish basic diagnostics information using EventCounters starting in .NET Core 3.0.

Apart from the EventCounters that are already provided by the .NET runtime or the rest of the framework (i.e. ASP.NET, gRPC, etc.), you may choose to implement your own EventCounters to keep track of various metrics for your service.

EventCounters live as a part of an [EventSource](https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventsource?view=netcore-3.0). Like any other events on an `EventSource`, they can be consumed both in-proc and out-of-proc via [EventListener](https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventlistener?view=netcore-3.0) and EventPipe/ETW.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably have to say that EventCounters report an aggregated value periodically by broadcasting an event before talking about in-proc/out-of-proc consumption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be helpful to ground the discussion by showing an example EventCounter and watching the counter using the dotnet-counters tool.


![EventCounter](EventCounters.jpg)
sywhang marked this conversation as resolved.
Show resolved Hide resolved

## Runtime Counters
The .NET runtime (CoreCLR) publishes the following list of counters:
* CPU usage
* Working Set Size
* GC Heap Size
* Gen 0 GC Rate
* Gen 1 GC Rate
* Gen 2 GC Rate
* % Time in GC
* Gen 0 Heap Size
* Gen 1 Heap Size
* Gen 2 Heap Size
* LOH Heap Size
* Allocation Rate
* Assembly Count
* Exception Rate
* ThreadPool Thread Count
* Monitor Lock Contention Rate
* ThreadPool Queue Length
* ThreadPool Completed Items Rate
* Active Timer Count

ASP.NET Core also publishes the following counters:
* Requests per second
* Total Requests Count
* Current Requests Count
* Failed Requests Count

## EventCounters API Overview
At a high level, there are two types of counters in terms of their *purpose* - counters for ever-increasing values (i.e. Total # of exceptions, Total # of GCs, Total # of requests, etc.) and "snapshot" values (heap usage, CPU usage, working set size, etc.). Within each of these categories of counters, there are also two types of counters for implementation details - polling counters and non-polling counters. That gives us a total of 4 different counters, and each of these can be implemented by `EventCounter`, `PollingCounter`, `IncrementingEventCounter`, and `IncrementingPollingCounter`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fun part : ) I have struggled to find a good description that makes it sound like I knew what I was doing ;p 'snapshot' feels ok for polling counter because it is provided as a single value, but misleading for EventCounter because it is an average over a set of values. For 'ever-increasing values' it would also be legitimate to represent those with a PollingCounter depending on whether the developer wanted to see it presented as a rate or a running total. This is the best I've come up with so far is below. Probably the high order bit is that even though we initially designed them as Average/Sum x Push/Pull, the pull variants deviate enough in usage that it seems clearer to describe them in hindsight as four unique options. Prior to this part of the text in the intro we probably want to cover the concept that each counter reports values at regular user-selectable intervals, and that the value reported at the end of the interval represents in some way the activity which happened during that interval.

The runtime supports four different types of counters for different situations:

  1. EventCounter records a set of values. The WriteMetric() API adds a new value to the set. At the end of each time interval, summary statistics for the set are computed such as the min, max, and mean. dotnet-counters will always display the mean value. EventCounter is useful to describe a discrete set of operations such as the average size in bytes of recent IO operations or the average monetary value of a set of financial transactions.
  2. IncrementingEventCounter records a running total. The Increment() API increases this total. At the end of each time period the total value is reported and then it is reset to zero. dotnet-counters will display this as a rate, the recorded total / time. This counter is useful to measure how frequently an action is occurring such as the number of requests processed each second.
  3. PollingCounter is a customizable counter that doesn't have any state and uses a callback to determine the value that is reported. At the end of each time interval the user provided callback function is invoked and whatever value it returns is reported as the current value of the counter. This counter can be used to query a metric from an external source, for example getting the current free bytes on a disk. It can also be used to report custom statistics that can be computed on demand by an application such as 95th percentile of recent request latencies or the current hit/miss ratio of a cache.
  4. IncrementingPollingCounter is a customizable counter that has no state and uses a callback to determine the increment that is reported. At the end of each time interval the callback is invoked and then the difference between the current invocation and the last invocation is the reported value. dotnet-counters always displays this as a rate, the reported value / time. This is useful to measure the rate at which some action is occurring when it isn't feasible to call an API on each occurrence, but it is possible to query the total number of times it has occurred. For example you could report the number of bytes written to a file / sec even if there is no notification each time a byte is written.


The runtime supports four different types of counters for different situations:

1. `EventCounter` records a set of values. The `WriteMetric()` method adds a new value to the set. At the end of each time interval, summary statistics for the set are computed such as the min, max, and mean. dotnet-counters will always display the mean value. EventCounter is useful to describe a discrete set of operations such as the average size in bytes of recent IO operations or the average monetary value of a set of financial transactions.

2. `IncrementingEventCounter` records a running total. The `Increment()` method increases this total. At the end of each time period the total value is reported and then it is reset to zero. dotnet-counters will display this as a rate, the recorded total / time. This counter is useful to measure how frequently an action is occurring such as the number of requests processed each second.

3. `PollingCounter` is a customizable counter that doesn't have any state and uses a callback to determine the value that is reported. At the end of each time interval the user provided callback function is invoked and whatever value it returns is reported as the current value of the counter. This counter can be used to query a metric from an external source, for example getting the current free bytes on a disk. It can also be used to report custom statistics that can be computed on demand by an application such as 95th percentile of recent request latencies or the current hit/miss ratio of a cache.

4. `IncrementingPollingCounter` is a customizable counter that has no state and uses a callback to determine the increment that is reported. At the end of each time interval the callback is invoked and then the difference between the current invocation and the last invocation is the reported value. `dotnet-counters` always displays this as a rate, the reported value / time. This is useful to measure the rate at which some action is occurring when it isn't feasible to call an API on each occurrence, but it is possible to query the total number of times it has occurred. For example you could report the number of bytes written to a file / sec even if there is no notification each time a byte is written.



## Writing EventCounters

Let's begin by looking at an implementation of sample `EventSource` that contains an `EventCounter`.
```cs

using System;
using System.Diagnostics.Tracing;

[EventSource(Name = "Samples-EventCounterDemos-Minimal")]
public sealed class MinimalEventCounterSource : EventSource
{
// define the singleton instance of the event source
public static MinimalEventCounterSource Log = new MinimalEventCounterSource();
public EventCounter RequestTimeCounter;

private MinimalEventCounterSource() : base(EventSourceSettings.EtwSelfDescribingEventFormat)
{
this.RequestTimeCounter = new EventCounter("request-time", this)
{
DisplayName = "Request Processing Time",
DisplayUnits = "MSec"
};
}

public static void Main()
{
var rand = new Random();
while(true)
{
MinimalEventCounterSource.Log.RequestTimeCounter.WriteMetric(rand.NextDouble());
}
}
}
```

Create a new dotnet console app using the code above and run it. Then use `dotnet-counters` to see what process-id this is:

```cmd
E:\>dotnet-counters ps
1398652 dotnet C:\Program Files\dotnet\dotnet.exe
1399072 dotnet C:\Program Files\dotnet\dotnet.exe
1399112 dotnet C:\Program Files\dotnet\dotnet.exe
1401880 dotnet C:\Program Files\dotnet\dotnet.exe
1400180 sample-counters E:\sample-counters\bin\Debug\netcoreapp3.1\sample-counters.exe
```

You can start monitoring your counter with the following command:
```cmd
E:\>dotnet-counters monitor --process-id 1400180 --providers Samples-EventCounterDemos-Minimal
```

Then you will see the following screen in your console:
```
Press p to pause, r to resume, q to quit.
Status: Running

[Samples-EventCounterDemos-Minimal]
Request Processing Time (MSec) 0.5
```

Let's take a look at a couple of sample EventCounter implementation in the .NET Core runtime (CoreCLR). Here is the runtime implementation for the counter that tracks the working set size of the application.

```cs
PollingCounter workingSetCounter = new PollingCounter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of sync with text above (working set != rate of exceptions)

"working-set",
this,
() => (double)(Environment.WorkingSet / 1_000_000)
)
{
DisplayName = "Working Set",
DisplayUnits = "MB"
};
```

This counter reports the current working set of the app. It is a `PollingCounter`, since it captures a metric at a moment in time. The callback for polling the values is `() => (double)(Environment.WorkingSet / 1_000_000)` which is simply just a call to `Environment.WorkingSet` API. The `DisplayName` and `DisplayUnits` is an optional property that can be set to help the consumer side of the counter to display the value more easily/accurately. For example `dotnet-counters` uses these properties to display the more "pretty" version of the counter names.
sywhang marked this conversation as resolved.
Show resolved Hide resolved

And that's it! For `PollingCounter` (or `IncrementingPollingCounter`), there is nothing else that needs to be done since they poll the values themselves at the interval requested by the consumer.

Here is another example of runtime counter implemented using `IncrementingPollingCounter`.

```cs
IncrementingPollingCounter monitorContentionCounter = new IncrementingPollingCounter(
"monitor-lock-contention-count",
this,
() => Monitor.LockContentionCount
)
{
DisplayName = "Monitor Lock Contention Count",
DisplayRateTimeScale = new TimeSpan(0, 0, 1)
};
```

This counter uses the [Monitor.LockContentionCount](https://docs.microsoft.com/en-us/dotnet/api/system.threading.monitor.lockcontentioncount?view=netcore-3.0) API to report the increment of the total lock contention count. The `DisplayRateTimeScale` property is an optional `TimeSpan` which can be set to provide a hint of what time interval this counter is best displayed at. For example, the lock contention count is best displayed as *count per second*, so its `DisplayRateTimeScale` is set to 1 second. This can be adjusted for different types of rate counters.

There are more runtime counter implementation to use as a reference in the [CoreCLR](https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/src/System/Diagnostics/Eventing/RuntimeEventSource.cs) repo.
sywhang marked this conversation as resolved.
Show resolved Hide resolved

## Concurrency
It is important to note that if the delegates passed to the `PollingCounter`/`IncrementingPollingCounter` instances are called by multiple threads at once, the EventCounters API does not guarantee thread safety. It is the author's responsibility to guarantee the thread-safety of the delegates being passed to the counter APIs.

For example, suppose we have the following `EventSource` which keeps track of requests.

```cs
public class RequestEventSource : EventSource
{
// singleton instance of the eventsource.
public static RequestEventSource Log = new RequestEventSource();

public IncrementingEventCounter requestRateCounter;
private int _requestCnt;

private RequestEventSource() : base(EventSourceSettings.EtwSelfDescribingEventFormat)
{
requestCnt = 0;
this.requestRateCounter = new IncrementingPollingCounter("request-rate", this, () => _requestCnt)
{
DisplayName = "Request Rate",
DisplayRateTimeScale = TimeSpan.FromSeconds(1);
}
}

// Method being called from request handlers to log that a request happened
public void Request()
{
requestCnt += 1;
}
}
```

`RequestEventSource.Request()` can be called from a request handler, and `requestRateCounter` simply polls this value at the interval specified by the consumer of this counter. However, this method can be called by multiple threads at once, so `requestCnt` can be susceptible to race.

Therefore, this method should be modified to the following to prevent the race.
```cs
public void Request()
{
Interlocked.Increment(ref _requestCnt);
}
```

## Consuming EventCounters

There are two main ways of consuming EventCounters: in-proc and out-of-proc.

### Consuming in-proc

You can consume the counter values via the `EventListener` API. `EventListener` is an in-proc way of consuming any Events written by all instances of EventSources in your application. For more details on how to use the EventListener API, refer to the [EventListener documentation](https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventlistener).

First, the EventSource that produces the counter value needs to be enabled. To do this, you can override the `OnEventSourceCreated` method to get a notification when an EventSource is created, and if this is the correct EventSource with your EventCounters, then you can call Enable on it. Here is an example of such override:

```cs
protected override void OnEventSourceCreated(EventSource source)
{
if (source.Name.Equals("System.Runtime"))
{
Dictionary<string, string> refreshInterval = new Dictionary<string, string>()
{
{ "EventCounterIntervalSec", "1" }
};
EnableEvents(source, 1, 1, refreshInterval);
}
}
```

#### Sample Code

This is a sample `EventListener` class that simply prints out all the counter names and values from a the .NET runtime's EventSource for publishing its internal counters (`System.Runtime`) at some interval.

```cs
public class SimpleEventListener : EventListener
{
private readonly EventLevel _level = EventLevel.Verbose;

public int EventCount { get; private set; } = 0;

private int _intervalSec;

public SimpleEventListener(int intervalSec)
{
_intervalSec = intervalSec;
}


protected override void OnEventSourceCreated(EventSource source)
{
if (source.Name.Equals("System.Runtime"))
{
Dictionary<string, string> refreshInterval = new Dictionary<string, string>()
{
{ "EventCounterIntervalSec", _intervalSec.ToString() }
};
EnableEvents(source, _level, (EventKeywords)(-1), refreshInterval);
}
}

private (string Name, string Value) GetRelevantMetric(IDictionary<string, object> eventPayload)
{
string counterName = "";
sywhang marked this conversation as resolved.
Show resolved Hide resolved
string counterValue = "";

foreach ( KeyValuePair<string, object> payload in eventPayload )
{
string key = payload.Key;
string val = payload.Value.ToString();

if (key.Equals("Name"))
{
counterName = val;
}
else if (key.Equals("Mean") || key.Equals("Increment"))
{
counterValue = val;
}
}
return (counterName, counterValue);
}

protected override void OnEventWritten(EventWrittenEventArgs eventData)
{
if (eventData.EventName.Equals("EventCounters"))
{
for (int i = 0; i < eventData.Payload.Count; i++)
{
IDictionary<string, object> eventPayload = eventData.Payload[i] as IDictionary<string, object>;

if (eventPayload != null)
{
var counterKV = GetRelevantMetric(eventPayload);
Console.WriteLine($"{counterKV.Name} : {counterKV.Value}");
}
}
}
}
}
```

As shown above, you *must* make sure the `"EventCounterIntervalSec"` argument is set in the filterPayload argument when calling `EnableEvents`. Otherwise the counters will not be able to flush out values since it doesn't know at which interval it should be getting flushed out.

### Consuming out-of-proc

Consuming EventCounters out-of-proc is also possible. For those that are familiar with ETW (Event Tracing for Windows), you can use ETW to capture counter data as events and view them on your ETW trace viewer (PerfView, WPA, etc.). You may also use `dotnet-counters` to consume it cross-platform via EventPipe. You can also use TraceEvent to consume these.

#### dotnet-counters

dotnet-counters is a cross-platform dotnet CLI tool that can be used to monitor the counter values. To find out how to use `dotnet-counters` to monitor your counters, refer to the [dotnet-counters documentation](https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-counters-instructions.md).


#### ETW/PerfView

Since EventCounter payloads are reported as `EventSource` events, you can use PerfView to collect/view these counter-data.

Here is a command that can be passed to PerfView to collect an ETW trace with the counters.
```
PerfView.exe /onlyProviders=*Samples-EventCounterDemos-Minimal:EventCounterIntervalSec=1 collect
```


#### dotnet-trace

Similar to how PerfView can be used to consume the counter data through ETW, dotnet-trace can be used to consume the counter data through EventPipe.

Here is an example of using dotnet-trace to get the same counter data.
```
dotnet-trace collect --process-id <pid> --providers Samples-EventCounterDemos-Minimal:0:0:EventCounterIntervalSec=1
```

The official dotnet-trace documentation contains a [section](https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-trace-instructions.md#using-dotnet-trace-to-collect-counter-values-over-time) on how to do this in more detail.


#### TraceEvent

TraceEvent is a managed library that makes it easy to consume ETW and EventPipe events. For more information, refer to the [TraceEvent Library Programmers Guide](https://github.com/Microsoft/perfview/blob/master/documentation/TraceEvent/TraceEventProgrammersGuide.md).