diff --git a/.chloggen/add_new_container_metrics.yaml b/.chloggen/add_new_container_metrics.yaml new file mode 100644 index 0000000000..4e6c71bb49 --- /dev/null +++ b/.chloggen/add_new_container_metrics.yaml @@ -0,0 +1,21 @@ +# Use this changelog template to create an entry for release notes. +# +# If your change doesn't affect end users you should instead start +# your pull request title with [chore] or use the "Skip Changelog" label. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: "enhancement" + +# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db) +component: "container" + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: "Add new container metrics for `cpu`, `memory`, `disk` and `network`" + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +issues: [282, 72] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: diff --git a/docs/attributes-registry/container.md b/docs/attributes-registry/container.md index d9f66e67f9..4e6ec20847 100644 --- a/docs/attributes-registry/container.md +++ b/docs/attributes-registry/container.md @@ -11,6 +11,7 @@ | `container.command` | string | The command used to run the container (i.e. the command name). [1] | `otelcontribcol` | | `container.command_args` | string[] | All the command arguments (including the command/executable itself) run by the container. [2] | `[otelcontribcol, --config, config.yaml]` | | `container.command_line` | string | The full command run by the container as a single string representing the full command. [2] | `otelcontribcol --config config.yaml` | +| `container.cpu.state` | string | The CPU state for this data point. | `user`; `kernel` | | `container.id` | string | Container ID. Usually a UUID, as for example used to [identify Docker containers](https://docs.docker.com/engine/reference/run/#container-identification). The UUID might be abbreviated. | `a3bf90e006b2` | | `container.image.id` | string | Runtime specific image identifier. Usually a hash algorithm followed by a UUID. [2] | `sha256:19c92d0a00d1b66d897bceaa7319bee0dd38a10a851c60bcec9474aa3f01e50f` | | `container.image.name` | string | Name of the image the container was built on. | `gcr.io/opentelemetry/operator` | @@ -27,4 +28,12 @@ K8s defines a link to the container registry repository with digest `"imageID": The ID is assinged by the container runtime and can vary in different environments. Consider using `oci.manifest.digest` if it is important to identify the same image in different environments/runtimes. **[3]:** [Docker](https://docs.docker.com/engine/api/v1.43/#tag/Image/operation/ImageInspect) and [CRI](https://github.com/kubernetes/cri-api/blob/c75ef5b473bbe2d0a4fc92f82235efd665ea8e9f/pkg/apis/runtime/v1/api.proto#L1237-L1238) report those under the `RepoDigests` field. + +`container.cpu.state` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `user` | When tasks of the cgroup are in user mode (Linux). When all container processes are in user mode (Windows). | +| `system` | When CPU is used by the system (host OS) | +| `kernel` | When tasks of the cgroup are in kernel mode (Linux). When all container processes are in kernel mode (Windows). | diff --git a/docs/system/container-metrics.md b/docs/system/container-metrics.md new file mode 100644 index 0000000000..d5df4933b2 --- /dev/null +++ b/docs/system/container-metrics.md @@ -0,0 +1,105 @@ + + +# Semantic Conventions for Container Metrics + +**Status**: [Experimental][DocumentStatus] + +## Container Metrics + +### Metric: `container.cpu.time` + +This metric is [opt-in][MetricOptIn]. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `container.cpu.time` | Counter | `s` | Total CPU time consumed [1] | + +**[1]:** Total CPU time consumed by the specific container on all available CPU cores + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`container.cpu.state`](../attributes-registry/container.md) | string | The CPU state for this data point. A container SHOULD be characterized _either_ by data points with no `state` labels, _or only_ data points with `state` labels. | `user`; `kernel` | Opt-In | + +`container.cpu.state` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. + +| Value | Description | +|---|---| +| `user` | When tasks of the cgroup are in user mode (Linux). When all container processes are in user mode (Windows). | +| `system` | When CPU is used by the system (host OS) | +| `kernel` | When tasks of the cgroup are in kernel mode (Linux). When all container processes are in kernel mode (Windows). | + + +### Metric: `container.memory.usage` + +This metric is [opt-in][MetricOptIn]. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `container.memory.usage` | Counter | `By` | Memory usage of the container. [1] | + +**[1]:** Memory usage of the container. + + + + + +### Metric: `container.disk.io` + +This metric is [opt-in][MetricOptIn]. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `container.disk.io` | Counter | `By` | Disk bytes for the container. [1] | + +**[1]:** The total number of bytes read/written successfully (aggregated from all disks). + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`disk.io.direction`](../attributes-registry/disk.md) | string | The disk IO operation direction. | `read` | Recommended | +| `system.device` | string | The device identifier | `(identifier)` | Recommended | + +`disk.io.direction` MUST be one of the following: + +| Value | Description | +|---|---| +| `read` | read | +| `write` | write | + + +### Metric: `container.network.io` + +This metric is [opt-in][MetricOptIn]. + + +| Name | Instrument Type | Unit (UCUM) | Description | +| -------- | --------------- | ----------- | -------------- | +| `container.network.io` | Counter | `By` | Network bytes for the container. [1] | + +**[1]:** The number of bytes sent/received on all network interfaces by the container. + + + +| Attribute | Type | Description | Examples | Requirement Level | +|---|---|---|---|---| +| [`network.io.direction`](../attributes-registry/network.md) | string | The network IO operation direction. | `transmit` | Recommended | +| `system.device` | string | The device identifier | `(identifier)` | Recommended | + +`network.io.direction` MUST be one of the following: + +| Value | Description | +|---|---| +| `transmit` | transmit | +| `receive` | receive | + + +[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md +[MetricOptIn]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.26.0/specification/metrics/metric-requirement-level.md#opt-in diff --git a/model/metrics/container.yaml b/model/metrics/container.yaml new file mode 100644 index 0000000000..3904f168be --- /dev/null +++ b/model/metrics/container.yaml @@ -0,0 +1,53 @@ +groups: + # container.cpu.* metrics and attribute group + - id: metric.container.cpu.time + type: metric + metric_name: container.cpu.time + brief: "Total CPU time consumed" + note: > + Total CPU time consumed by the specific container on all available CPU cores + instrument: counter + unit: "s" + attributes: + - ref: container.cpu.state + brief: "The CPU state for this data point. A container SHOULD be characterized _either_ by data points with no `state` labels, _or only_ data points with `state` labels." + requirement_level: opt_in + + # container.memory.* metrics and attribute group + - id: metric.container.memory.usage + type: metric + metric_name: container.memory.usage + brief: "Memory usage of the container." + note: > + Memory usage of the container. + instrument: counter + unit: "By" + + # container.disk.io.* metrics and attribute group + - id: metric.container.disk.io + type: metric + metric_name: container.disk.io + brief: "Disk bytes for the container." + note: > + The total number of bytes read/written + successfully (aggregated from all disks). + instrument: counter + unit: "By" + attributes: + - ref: disk.io.direction + - ref: system.device + + # container.network.io.* metrics and attribute group + - id: metric.container.network.io + type: metric + metric_name: container.network.io + brief: "Network bytes for the container." + note: > + The number of bytes sent/received + on all network interfaces + by the container. + instrument: counter + unit: "By" + attributes: + - ref: network.io.direction + - ref: system.device diff --git a/model/registry/container.yaml b/model/registry/container.yaml index 343c63b927..2878766b66 100644 --- a/model/registry/container.yaml +++ b/model/registry/container.yaml @@ -95,3 +95,18 @@ groups: brief: > Container labels, `` being the label name, the value being the label value. examples: [ 'container.label.app=nginx' ] + - id: cpu.state + brief: "The CPU state for this data point." + type: + allow_custom_values: true + members: + - id: user + value: 'user' + brief: "When tasks of the cgroup are in user mode (Linux). When all container processes are in user mode (Windows)." + - id: system + value: 'system' + brief: "When CPU is used by the system (host OS)" + - id: kernel + value: 'kernel' + brief: "When tasks of the cgroup are in kernel mode (Linux). When all container processes are in kernel mode (Windows)." + examples: ["user", "kernel"]