From e54ac5601a7bbecb9e04b44426b82a666fd6c450 Mon Sep 17 00:00:00 2001
From: Aaron Abbott <aaronabbott@google.com>
Date: Thu, 15 Oct 2020 15:07:07 -0400
Subject: [PATCH] System metrics semantic conventions (#937)

* System metrics semantic conventions

Conventions from [OTEP
119](https://github.com/open-telemetry/oteps/pull/119)

* change process count to UpDownSumObserver

* fix system.cpu.utilization, use better example

* first several comments

* add description columns, update units to UCUM

* markdown-toc

* clarify OS process level metrics

* clarify load average exapmle

* move general conventions + OTEP 108 into README.md

* renamed swap -> paging

* add addition fs labels

* fix links

* fix link

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* fix tigran comments

* add disk io_time and operation_time

* add descriptions/footnotes for dropped packets and net errors

* lint, more info for net dropped packets/errors

* "dropped_packets" -> "dropped"

* Apply suggestions from James' code review

Co-authored-by: James Bebbington <jbebbington@google.com>

* comments from James' code review

* clarify windows perf counter

* Update specification/metrics/semantic_conventions/README.md

Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

* reflow text

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: James Bebbington <jbebbington@google.com>
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
---
 CHANGELOG.md                                  |   2 +
 .../metrics/semantic_conventions/README.md    | 119 +++++++++++-
 .../semantic_conventions/process-metrics.md   |  22 +++
 .../runtime-environment-metrics.md            |  44 +++++
 .../semantic_conventions/system-metrics.md    | 177 ++++++++++++++++++
 5 files changed, 360 insertions(+), 4 deletions(-)
 create mode 100644 specification/metrics/semantic_conventions/process-metrics.md
 create mode 100644 specification/metrics/semantic_conventions/runtime-environment-metrics.md
 create mode 100644 specification/metrics/semantic_conventions/system-metrics.md

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5f475e4f68a..fea887253e4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -53,6 +53,8 @@ New:
   ([#1063](https://github.com/open-telemetry/opentelemetry-specification/pull/1063))  
 - Add `Shutdown` function to `*Provider` SDK
   ([#1074](https://github.com/open-telemetry/opentelemetry-specification/pull/1074))
+- Add semantic conventions for system metrics
+  ([#937](https://github.com/open-telemetry/opentelemetry-specification/pull/937))
 
 Updates:
 
diff --git a/specification/metrics/semantic_conventions/README.md b/specification/metrics/semantic_conventions/README.md
index 6c63439e4e3..935709d16af 100644
--- a/specification/metrics/semantic_conventions/README.md
+++ b/specification/metrics/semantic_conventions/README.md
@@ -1,7 +1,118 @@
 # Metrics Semantic Conventions
 
-TODO: Add semantic conventions for metric names and labels.
+The following semantic conventions surrounding metrics are defined:
 
-Apart from semantic conventions for metrics and [traces](../../trace/semantic_conventions/README.md),
-OpenTelemetry also defines the concept of overarching [Resources](../../resource/sdk.md) with their own
-[Resource Semantic Conventions](../../resource/semantic_conventions/README.md).
+* [HTTP Metrics](http-metrics.md): Semantic conventions and instruments for HTTP metrics.
+* [System Metrics](system-metrics.md): Semantic conventions and instruments for standard system metrics.
+* [Process Metrics](process-metrics.md): Semantic conventions and instruments for standard process metrics.
+* [Runtime Environment Metrics](runtime-environment-metrics.md): Semantic conventions and instruments for runtime environment metrics.
+
+Apart from semantic conventions for metrics and
+[traces](../../trace/semantic_conventions/README.md), OpenTelemetry also
+defines the concept of overarching [Resources](../../resource/sdk.md) with
+their own [Resource Semantic
+Conventions](../../resource/semantic_conventions/README.md).
+
+## General Guidelines
+
+Metric names and labels exist within a single universe and a single
+hierarchy. Metric names and labels MUST be considered within the universe of
+all existing metric names. When defining new metric names and labels,
+consider the prior art of existing standard metrics and metrics from
+frameworks/libraries.
+
+Associated metrics SHOULD be nested together in a hierarchy based on their
+usage. Define a top-level hierarchy for common metric categories: for OS
+metrics, like CPU and network; for app runtimes, like GC internals. Libraries
+and frameworks should nest their metrics into a hierarchy as well. This aids
+in discovery and adhoc comparison. This allows a user to find similar metrics
+given a certain metric.
+
+The hierarchical structure of metrics defines the namespacing. Supporting
+OpenTelemetry artifacts define the metric structures and hierarchies for some
+categories of metrics, and these can assist decisions when creating future
+metrics.
+
+Common labels SHOULD be consistently named. This aids in discoverability and
+disambiguates similar labels to metric names.
+
+["As a rule of thumb, **aggregations** over all the dimensions of a given
+metric **SHOULD** be
+meaningful,"](https://prometheus.io/docs/practices/naming/#metric-names) as
+Prometheus recommends.
+
+Semantic ambiguity SHOULD be avoided. Use prefixed metric names in cases
+where similar metrics have significantly different implementations across the
+breadth of all existing metrics. For example, every garbage collected runtime
+has slightly different strategies and measures. Using a single set of metric
+names for GC, not divided by the runtime, could create dissimilar comparisons
+and confusion for end users. (For example, prefer `runtime.java.gc*` over
+`runtime.gc.*`.) Measures of many operating system metrics are similarly
+ambiguous.
+
+Conventional metrics or metrics that have their units included in
+OpenTelemetry metadata (e.g. `metric.WithUnit` in Go) SHOULD NOT include the
+units in the metric name. Units may be included when it provides additional
+meaning to the metric name. Metrics MUST, above all, be understandable and
+usable.
+
+## General Metric Semantic Conventions
+
+The following semantic conventions aim to keep naming consistent. They
+provide guidelines for most of the cases in this specification and should be
+followed for other instruments not explicitly defined in this document.
+
+### Instrument Naming
+
+- **limit** - an instrument that measures the constant, known total amount of
+something should be called `entity.limit`. For example, `system.memory.limit`
+for the total amount of memory on a system.
+
+- **usage** - an instrument that measures an amount used out of a known total
+(**limit**) amount should be called `entity.usage`. For example,
+`system.memory.usage` with label `state = used | cached | free | ...` for the
+amount of memory in a each state. Where appropriate, the sum of **usage**
+over all label values SHOULD be equal to the **limit**.
+
+  A measure of the amount of an unlimited resource consumed is differentiated
+  from **usage**.
+
+- **utilization** - an instrument that measures the *fraction* of **usage**
+out of its **limit** should be called `entity.utilization`. For example,
+`system.memory.utilization` for the fraction of memory in use. Utilization
+values are in the range `[0, 1]`.
+
+- **time** - an instrument that measures passage of time should be called
+`entity.time`. For example, `system.cpu.time` with label `state = idle | user
+| system | ...`. **time** measurements are not necessarily wall time and can
+be less than or greater than the real wall time between measurements.
+
+  **time** instruments are a special case of **usage** metrics, where the
+  **limit** can usually be calculated as the sum of **time** over all label
+  values. **utilization** for time instruments can be derived automatically
+  using metric event timestamps. For example, `system.cpu.utilization` is
+  defined as the difference in `system.cpu.time` measurements divided by the
+  elapsed time.
+
+- **io** - an instrument that measures bidirectional data flow should be
+called `entity.io` and have labels for direction. For example,
+`system.network.io`.
+
+- Other instruments that do not fit the above descriptions may be named more
+freely. For example, `system.paging.faults` and `system.network.packets`.
+Units do not need to be specified in the names since they are included during
+instrument creation, but can be added if there is ambiguity.
+
+### Units
+
+Units should follow the [UCUM](http://unitsofmeasure.org/ucum.html) (need
+more clarification in
+[#705](https://github.com/open-telemetry/opentelemetry-specification/issues/705)).
+
+- Instruments for **utilization** metrics (that measure the fraction out of a
+total) are dimensionless and SHOULD use the default unit `1` (the unity).
+- Instruments that measure an integer count of something SHOULD use the
+default unit `1` (the unity) and
+[annotations](https://ucum.org/ucum.html#para-curly) with curly braces to
+give additional meaning. For example `{packets}`, `{errors}`, `{faults}`,
+etc.
diff --git a/specification/metrics/semantic_conventions/process-metrics.md b/specification/metrics/semantic_conventions/process-metrics.md
new file mode 100644
index 00000000000..3d2d4e28e75
--- /dev/null
+++ b/specification/metrics/semantic_conventions/process-metrics.md
@@ -0,0 +1,22 @@
+# Semantic Conventions for OS Process Metrics
+
+This document describes instruments and labels for common OS process level
+metrics in OpenTelemetry. Also consider the [general metric semantic
+conventions](README.md#general-metric-semantic-conventions) when creating
+instruments not explicitly defined in this document. OS process metrics are
+not related to the runtime environment of the program, and should take
+measurements from the operating system. For runtime environment metrics see
+[semantic conventions for runtime environment
+metrics](runtime-environment-metrics.md).
+
+<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->
+
+<!-- toc -->
+
+- [Metric Instruments](#metric-instruments)
+
+<!-- tocstop -->
+
+## Metric Instruments
+
+TODO
diff --git a/specification/metrics/semantic_conventions/runtime-environment-metrics.md b/specification/metrics/semantic_conventions/runtime-environment-metrics.md
new file mode 100644
index 00000000000..a1abb095162
--- /dev/null
+++ b/specification/metrics/semantic_conventions/runtime-environment-metrics.md
@@ -0,0 +1,44 @@
+# Semantic Conventions for Runtime Environment Metrics
+
+This document includes semantic conventions for runtime environment level
+metrics in OpenTelemetry. Also consider the [general
+metric](README.md#general-metric-semantic-conventions), [system
+metrics](system-metrics.md) and [OS Process metrics](process-metrics.md)
+semantic conventions when instrumenting runtime environments.
+
+<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->
+
+<!-- toc -->
+
+- [Metric Instruments](#metric-instruments)
+  * [Runtime Environment Specific Metrics - `runtime.{environment}.`](#runtime-environment-specific-metrics---runtimeenvironment)
+
+<!-- tocstop -->
+
+## Metric Instruments
+
+Runtime environments vary widely in their terminology, implementation, and
+relative values for a given metric. For example, Go and Python are both
+garbage collected languages, but comparing heap usage between the Go and
+CPython runtimes directly is not meaningful. For this reason, this document
+does not propose any standard top-level runtime metric instruments. See [OTEP
+108](https://github.com/open-telemetry/oteps/pull/108/files) for additional
+discussion.
+
+### Runtime Environment Specific Metrics - `runtime.{environment}.`
+
+Metrics specific to a certain runtime environment should be prefixed with
+`runtime.{environment}.` and follow the semantic conventions outlined in
+[general metric semantic
+conventions](README.md#general-metric-semantic-conventions). Authors of
+runtime instrumentations are responsible for the choice of `{environment}` to
+avoid ambiguity when interpreting a metric's name or values.
+
+For example, some programming languages have multiple runtime environments
+that vary significantly in their implementation, like [Python which has many
+implementations](https://wiki.python.org/moin/PythonImplementations). For
+such languages, consider using specific `{environment}` prefixes to avoid
+ambiguity, like `runtime.cpython.` and `runtime.pypy.`.
+
+There are other dimensions even within a given runtime environment to
+consider, for example pthreads vs green thread implementations.
diff --git a/specification/metrics/semantic_conventions/system-metrics.md b/specification/metrics/semantic_conventions/system-metrics.md
new file mode 100644
index 00000000000..7468ed34aa0
--- /dev/null
+++ b/specification/metrics/semantic_conventions/system-metrics.md
@@ -0,0 +1,177 @@
+# Semantic Conventions for System Metrics
+
+This document describes instruments and labels for common system level
+metrics in OpenTelemetry. Consider the [general metric semantic
+conventions](README.md#general-metric-semantic-conventions) when creating
+instruments not explicitly defined in the specification.
+
+<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->
+
+<!-- toc -->
+
+- [Metric Instruments](#metric-instruments)
+  * [`system.cpu.` - Processor metrics](#systemcpu---processor-metrics)
+  * [`system.memory.` - Memory metrics](#systemmemory---memory-metrics)
+  * [`system.paging.` - Paging/swap metrics](#systempaging---pagingswap-metrics)
+  * [`system.disk.` - Disk controller metrics](#systemdisk---disk-controller-metrics)
+  * [`system.filesystem.` - Filesystem metrics](#systemfilesystem---filesystem-metrics)
+  * [`system.network.` - Network metrics](#systemnetwork---network-metrics)
+  * [`system.process.` - Aggregate system process metrics](#systemprocess---aggregate-system-process-metrics)
+  * [`system.{os}.` - OS Specific System Metrics](#systemos---os-specific-system-metrics)
+
+<!-- tocstop -->
+
+## Metric Instruments
+
+### `system.cpu.` - Processor metrics
+
+**Description:** System level processor metrics.
+
+| Name                   | Description | Units | Instrument Type | Value Type | Label Key(s) | Label Values                        |
+| ---------------------- | ----------- | ----- | --------------- | ---------- | ------------ | ----------------------------------- |
+| system.cpu.time        |             | s     | SumObserver     | Double     | state        | idle, user, system, interrupt, etc. |
+|                        |             |       |                 |            | cpu          | CPU number [0..n-1]                 |
+| system.cpu.utilization |             | 1     | ValueObserver   | Double     | state        | idle, user, system, interrupt, etc. |
+|                        |             |       |                 |            | cpu          | CPU number (0..n)                   |
+
+### `system.memory.` - Memory metrics
+
+**Description:** System level memory metrics. This does not include [paging/swap
+memory](#systempaging---pagingswap-metrics).
+
+| Name                      | Description | Units | Instrument Type   | Value Type | Label Key | Label Values             |
+| ------------------------- | ----------- | ----- | ----------------- | ---------- | --------- | ------------------------ |
+| system.memory.usage       |             | By    | UpDownSumObserver | Int64      | state     | used, free, cached, etc. |
+| system.memory.utilization |             | 1     | ValueObserver     | Double     | state     | used, free, cached, etc. |
+
+### `system.paging.` - Paging/swap metrics
+
+**Description:** System level paging/swap memory metrics.
+| Name                      | Description                         | Units        | Instrument Type   | Value Type | Label Key | Label Values |
+| ------------------------- | ----------------------------------- | ------------ | ----------------- | ---------- | --------- | ------------ |
+| system.paging.usage       | Unix swap or windows pagefile usage | By           | UpDownSumObserver | Int64      | state     | used, free   |
+| system.paging.utilization |                                     | 1            | ValueObserver     | Double     | state     | used, free   |
+| system.paging.faults      |                                     | {faults}     | SumObserver       | Int64      | type      | major, minor |
+| system.paging.operations  |                                     | {operations} | SumObserver       | Int64      | type      | major, minor |
+|                           |                                     |              |                   |            | direction | in, out      |
+
+### `system.disk.` - Disk controller metrics
+
+**Description:** System level disk performance metrics.
+| Name                                                      | Description                                     | Units        | Instrument Type | Value Type | Label Key | Label Values |
+| --------------------------------------------------------- | ----------------------------------------------- | ------------ | --------------- | ---------- | --------- | ------------ |
+| system.disk.io<!--notlink-->                              |                                                 | By           | SumObserver     | Int64      | device    | (identifier) |
+|                                                           |                                                 |              |                 |            | direction | read, write  |
+| system.disk.operations                                    |                                                 | {operations} | SumObserver     | Int64      | device    | (identifier) |
+|                                                           |                                                 |              |                 |            | direction | read, write  |
+| system.disk.io_time<sup>[1](#io_time)</sup>               | Time disk spent activated                       | s            | SumObserver     | Double     | device    | (identifier) |
+| system.disk.operation_time<sup>[2](#operation_time)</sup> | Sum of the time each operation took to complete | s            | SumObserver     | Double     | device    | (identifier) |
+|                                                           |                                                 |              |                 |            | direction | read, write  |
+| system.disk.merged                                        |                                                 | {operations} | SumObserver     | Int64      | device    | (identifier) |
+|                                                           |                                                 |              |                 |            | direction | read, write  |
+
+<sup><a name="io_time">1</a></sup> The real elapsed time ("wall clock")
+used in the I/O path (time from operations running in parallel are not
+counted). Measured as:
+
+- Linux: Field 13 from
+[procfs-diskstats](https://www.kernel.org/doc/Documentation/ABI/testing/procfs-diskstats)
+- Windows: The complement of ["Disk\% Idle
+Time"](https://docs.microsoft.com/en-us/archive/blogs/askcore/windows-performance-monitor-disk-counters-explained#windows-performance-monitor-disk-counters-explained:~:text=%25%20Idle%20Time,Idle\)%20to%200%20(meaning%20always%20busy).)
+performance counter: `uptime * (100 - "Disk\% Idle Time") / 100`
+
+<sup><a name="operation_time">2</a></sup> Because it is the sum of time each
+request took, parallel-issued requests each contribute to make the count
+grow. Measured as:
+
+- Linux: Fields 7 & 11 from
+[procfs-diskstats](https://www.kernel.org/doc/Documentation/ABI/testing/procfs-diskstats)
+- Windows: "Avg. Disk sec/Read" perf counter multiplied by "Disk Reads/sec"
+perf counter (similar for Writes)
+
+### `system.filesystem.` - Filesystem metrics
+
+**Description:** System level filesystem metrics.
+| Name                          | Description | Units | Instrument Type   | Value Type | Label Key  | Label Values         |
+| ----------------------------- | ----------- | ----- | ----------------- | ---------- | ---------- | -------------------- |
+| system.filesystem.usage       |             | By    | UpDownSumObserver | Int64      | device     | (identifier)         |
+|                               |             |       |                   |            | state      | used, free, reserved |
+|                               |             |       |                   |            | type       | ext4, tmpfs, etc.    |
+|                               |             |       |                   |            | mode       | rw, ro, etc.         |
+|                               |             |       |                   |            | mountpoint | (path)               |
+| system.filesystem.utilization |             | 1     | ValueObserver     | Double     | device     | (identifier)         |
+|                               |             |       |                   |            | state      | used, free, reserved |
+|                               |             |       |                   |            | type       | ext4, tmpfs, etc.    |
+|                               |             |       |                   |            | mode       | rw, ro, etc.         |
+|                               |             |       |                   |            | mountpoint | (path)               |
+
+### `system.network.` - Network metrics
+
+**Description:** System level network metrics.
+| Name                                           | Description                                                                   | Units         | Instrument Type   | Value Type | Label Key | Label Values                                                                                   |
+| ---------------------------------------------- | ----------------------------------------------------------------------------- | ------------- | ----------------- | ---------- | --------- | ---------------------------------------------------------------------------------------------- |
+| system.network.dropped<sup>[1](#dropped)</sup> | Count of packets that are dropped or discarded even though there was no error | {packets}     | SumObserver       | Int64      | device    | (identifier)                                                                                   |
+|                                                |                                                                               |               |                   |            | direction | transmit, receive                                                                              |
+| system.network.packets                         |                                                                               | {packets}     | SumObserver       | Int64      | device    | (identifier)                                                                                   |
+|                                                |                                                                               |               |                   |            | direction | transmit, receive                                                                              |
+| system.network.errors<sup>[2](#errors)</sup>   | Count of network errors detected                                              | {errors}      | SumObserver       | Int64      | device    | (identifier)                                                                                   |
+|                                                |                                                                               |               |                   |            | direction | transmit, receive                                                                              |
+| system<!--notlink-->.network.io                |                                                                               | By            | SumObserver       | Int64      | device    | (identifier)                                                                                   |
+|                                                |                                                                               |               |                   |            | direction | transmit, receive                                                                              |
+| system.network.connections                     |                                                                               | {connections} | UpDownSumObserver | Int64      | device    | (identifier)                                                                                   |
+|                                                |                                                                               |               |                   |            | protocol  | tcp, udp, [etc.](https://en.wikipedia.org/wiki/Transport_layer#Protocols)                      |
+|                                                |                                                                               |               |                   |            | state     | [e.g. for tcp](https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Protocol_operation) |
+
+<sup><a name="dropped">1</a></sup> Measured as:
+
+- Linux: the `drop` column in `/proc/dev/net`
+([source](https://web.archive.org/web/20180321091318/http://www.onlamp.com/pub/a/linux/2000/11/16/LinuxAdmin.html)).
+- Windows:
+[`InDiscards`/`OutDiscards`](https://docs.microsoft.com/en-us/windows/win32/api/netioapi/ns-netioapi-mib_if_row2)
+from
+[`GetIfEntry2`](https://docs.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-getifentry2).
+
+<sup><a name="errors">2</a></sup> Measured as:
+
+- Linux: the `errs` column in `/proc/dev/net`
+([source](https://web.archive.org/web/20180321091318/http://www.onlamp.com/pub/a/linux/2000/11/16/LinuxAdmin.html)).
+- Windows:
+[`InErrors`/`OutErrors`](https://docs.microsoft.com/en-us/windows/win32/api/netioapi/ns-netioapi-mib_if_row2)
+from
+[`GetIfEntry2`](https://docs.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-getifentry2).
+
+### `system.process.` - Aggregate system process metrics
+
+**Description:** System level aggregate process metrics. For metrics at the
+individual process level, see [process metrics](process-metrics.md).
+| Name                 | Description                             | Units       | Instrument Type   | Value Type | Label Key | Label Values                                                                                   |
+| -------------------- | --------------------------------------- | ----------- | ----------------- | ---------- | --------- | ---------------------------------------------------------------------------------------------- |
+| system.process.count | Total number of processes in each state | {processes} | UpDownSumObserver | Int64      | status    | running, sleeping, [etc.](https://man7.org/linux/man-pages/man1/ps.1.html#PROCESS_STATE_CODES) |
+
+### `system.{os}.` - OS Specific System Metrics
+
+Instrument names for system level metrics that have different and conflicting
+meaning across multiple OSes should be prefixed with `system.{os}.` and
+follow the hierarchies listed above for different entities like CPU, memory,
+and network.
+
+For example, [UNIX load
+average](https://en.wikipedia.org/wiki/Load_(computing)) over a given
+interval is not well standardized and its value across different UNIX like
+OSes may vary despite being under similar load:
+
+> Without getting into the vagaries of every Unix-like operating system in
+existence, the load average more or less represents the average number of
+processes that are in the running (using the CPU) or runnable (waiting for
+the CPU) states. One notable exception exists: Linux includes processes in
+uninterruptible sleep states, typically waiting for some I/O activity to
+complete. This can markedly increase the load average on Linux systems.
+
+([source of
+quote](https://github.com/torvalds/linux/blob/e4cbce4d131753eca271d9d67f58c6377f27ad21/kernel/sched/loadavg.c#L11-L18),
+[linux source
+code](https://github.com/torvalds/linux/blob/e4cbce4d131753eca271d9d67f58c6377f27ad21/kernel/sched/loadavg.c#L11-L18))
+
+An instrument for load average over 1 minute on Linux could be named
+`system.linux.cpu.load_1m`, reusing the `cpu` name proposed above and having
+an `{os}` prefix to split this metric across OSes.