From 8a20db1d2fbaf60832a6760e63e9632b5c5d8e72 Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Mon, 22 Apr 2024 09:34:53 +0200 Subject: [PATCH 1/9] docs: Add ADR for module status metrics --- docs/contributor/arch/011-kymastatsmetrics.md | 112 ++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 docs/contributor/arch/011-kymastatsmetrics.md diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md new file mode 100644 index 000000000..6c03ac7ef --- /dev/null +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -0,0 +1,112 @@ +# 11. Kyma Module Status Metrics + +Date: 2024-04-19 + +## Status + +Proposed + +## Context + +The [Advanced pipeline status based on data flow](https://github.com/kyma-project/telemetry-manager/issues/425) epic describes efforts to make problems of the Telemetry module more transparent to users by utilizing CRD status conditions. +To ease day-two operations, the status of the Telemetry and other Kyma modules should by available as metrics to enable users to integrate them into their monitoring system. For instance, by setting up alerts for a module status that differs from the expected value. + +The [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) exporter provides a way to [export metrics for custom resources](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md). +This record describes a way to integrate similar functionality to the Telemetry module's OpenTelemetry Collectors. The solution should avoid the maintenance overhead of an additional third-party-image and allow a dynamic configuration, based on the active Kyma modules. + +An OpenTelemetry Collector receiver with similar scope as kube-state-metrics is the [Kubernetes Cluster Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver). However, this receiver does not support monitoring custom resources. +Another available source to monitor changes of arbitrary Kubernetes objects is the [Kubernetes Objects Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver). This receiver produces only logs and no metrics. + +## Decision + +### Extended MetricPipeline API + +Module status metrics can be activated as a new input in the MetricPipeline CRD. Users can select the Kyma modules of interest by giving a module list. An empty list will include metrics of all active modules. + +```yaml +apiVersion: telemetry.kyma-project.io/v1alpha1 +kind: MetricPipeline +metadata: + name: sample +spec: + input: + kyma: + enabled: true + modules: + - telemetry + runtime: + enabled: true + prometheus: + enabled: true + istio: + enabled: true + output: + otlp: + endpoint: + value: http://example.com:4317 +``` + +Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the telemetry-metric-gateway that produces metrics for the module state and status conditions. +The receiver configuration will follow the shown example. + +```yaml +receivers: + kymastats: + k8s_cluster: + auth_type: serviceAccount + collection_interval: 30s + api_groups: + - operator.kyma-project.io +``` + +The receiver needs the following properties: + +- **auth_type**: The way to authenticate with the Kubernetes API. Possible values are `serviceAccount` (default) or `kubeConfig`. +- **collection_interval**: The interval that is used to emit metrics. +- **api_groups**: List of API groups to scrape. Most of the Kyma modules should use the `operator.kyma-project.io` API group. The list can be extended to support monitoring custom modules. Every CRD in the listed groups is assumed to represent a module. + +### Custom Metrics Receiver for OpenTelemetry Collector + +We assume the status subresource of a module CRD to contain a `conditions` list that uses the [meta/v1/Condition](https://pkg.go.dev/k8s.io/apimachinery@v0.30.0/pkg/apis/meta/v1#Condition) type and an overarching state attribute. We assume positive polarity for all conditions. + +An example for this structure is the status subresource of the Telemetry module: + +```yaml +status: + conditions: + - lastTransitionTime: "2024-04-18T13:43:03Z" + message: All log components are running + observedGeneration: 2 + reason: LogComponentsRunning + status: "True" + type: LogComponentsHealthy + - lastTransitionTime: "2024-04-18T13:41:55Z" + message: All metric components are running + observedGeneration: 2 + reason: MetricComponentsRunning + status: "True" + type: MetricComponentsHealthy + - lastTransitionTime: "2024-04-15T12:36:47Z" + message: All trace components are running + observedGeneration: 2 + reason: TraceComponentsRunning + status: "True" + type: TraceComponentsHealthy + endpoints: + metrics: + grpc: http://telemetry-otlp-metrics.kyma-system:4317 + http: http://telemetry-otlp-metrics.kyma-system:4318 + traces: + grpc: http://telemetry-otlp-traces.kyma-system:4317 + http: http://telemetry-otlp-traces.kyma-system:4318 + state: Ready +``` + +Additional attributes, like the `endpoints` of the Telemetry status, are ignored. + +Status and conditions should result in the following metrics: + +| Metric Name | Attributes | Description | +|-----------------------------------------------|----------------|-----------------------------------------------------------------------------------------------------------------------| +| kyma_module_\_status_state | state | Reflects .status.state field of the module CRD. Value is 1 if the state is `Ready`, else 0. | +| kyma_module_\_status_condition\_\ | reason, status | Exports condition status of all conditions under .status.conditions. Value is 1 if the condition status is 1, else 0. | From d163ef2358b2c5174889f498303ea3756075f8ff Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Fri, 26 Apr 2024 16:47:11 +0200 Subject: [PATCH 2/9] Clarify custom receiver implementation --- docs/contributor/arch/011-kymastatsmetrics.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index 6c03ac7ef..f64e5f389 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -17,8 +17,12 @@ This record describes a way to integrate similar functionality to the Telemetry An OpenTelemetry Collector receiver with similar scope as kube-state-metrics is the [Kubernetes Cluster Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver). However, this receiver does not support monitoring custom resources. Another available source to monitor changes of arbitrary Kubernetes objects is the [Kubernetes Objects Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver). This receiver produces only logs and no metrics. +The OpenTelemetry Collector provides interfaces for enhancements by implementing custom receiver plugins. The OpenTelemetry project provides [documentation](https://opentelemetry.io/docs/collector/building/receiver/) on implementing a custom receiver and adding it to a custom distribution of the Collector. + ## Decision +Due to the restrictions of available telemetry resources for Kubernetes resources, building a custom receiver is the most suitable option. + ### Extended MetricPipeline API Module status metrics can be activated as a new input in the MetricPipeline CRD. Users can select the Kyma modules of interest by giving a module list. An empty list will include metrics of all active modules. @@ -46,14 +50,14 @@ spec: value: http://example.com:4317 ``` -Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the telemetry-metric-gateway that produces metrics for the module state and status conditions. +Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the telemetry-metric-gateway that produces metrics for the module state and status conditions. The receiver configuration will follow the shown example. ```yaml receivers: kymastats: k8s_cluster: - auth_type: serviceAccount + auth_type: serviceAccount collection_interval: 30s api_groups: - operator.kyma-project.io From 3930abb71fa60630d117e4f9d1df3fd420327ebf Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Mon, 29 Apr 2024 13:58:44 +0200 Subject: [PATCH 3/9] Apply feedback --- docs/contributor/arch/011-kymastatsmetrics.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index f64e5f389..cdd7d6c9a 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -110,7 +110,9 @@ Additional attributes, like the `endpoints` of the Telemetry status, are ignored Status and conditions should result in the following metrics: -| Metric Name | Attributes | Description | -|-----------------------------------------------|----------------|-----------------------------------------------------------------------------------------------------------------------| -| kyma_module_\_status_state | state | Reflects .status.state field of the module CRD. Value is 1 if the state is `Ready`, else 0. | -| kyma_module_\_status_condition\_\ | reason, status | Exports condition status of all conditions under .status.conditions. Value is 1 if the condition status is 1, else 0. | +| Metric Name | Attributes | Description | +|------------------------------|----------------------------|-----------------------------------------------------------------------------------------------------------------------| +| kyma.module.status.state | state, name | Reflects .status.state field of the module CRD. Value is 1 if the state is `Ready`, else 0. | +| kyma.module.status.condition | reason, status, name, type | Exports condition status of all conditions under .status.conditions. Value is 1 if the condition status is 1, else 0. | + +Collecting the module specific metrics should continue working in the case of a Node or Pod failure (high availability) without emitting metrics multiple times. To ensure this behavior, Kubernetes API server [leases](https://kubernetes.io/docs/concepts/architecture/leases/) can be used while only the lease holder should emit metrics. We will investigate for a generic solution in the OpenTelemetry Collector. From d3f8e9177a5eb85311eff922ac51fbb4d4615b6b Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Tue, 30 Apr 2024 14:14:21 +0200 Subject: [PATCH 4/9] Apply suggestions from code review Co-authored-by: Nina Hingerl <76950046+NHingerl@users.noreply.github.com> --- docs/contributor/arch/011-kymastatsmetrics.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index cdd7d6c9a..94bceffa7 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -8,16 +8,18 @@ Proposed ## Context -The [Advanced pipeline status based on data flow](https://github.com/kyma-project/telemetry-manager/issues/425) epic describes efforts to make problems of the Telemetry module more transparent to users by utilizing CRD status conditions. -To ease day-two operations, the status of the Telemetry and other Kyma modules should by available as metrics to enable users to integrate them into their monitoring system. For instance, by setting up alerts for a module status that differs from the expected value. +The epic [Advanced pipeline status based on data flow](https://github.com/kyma-project/telemetry-manager/issues/425) describes efforts to make problems of the Telemetry module more transparent to users by utilizing CRD status conditions. +To ease day-two operations, the status of the Telemetry and other Kyma modules should be available as metrics. Then, users can integrate these metrics into their monitoring system; for example, by setting up alerts for a module status that differs from the expected value. The [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) exporter provides a way to [export metrics for custom resources](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md). This record describes a way to integrate similar functionality to the Telemetry module's OpenTelemetry Collectors. The solution should avoid the maintenance overhead of an additional third-party-image and allow a dynamic configuration, based on the active Kyma modules. -An OpenTelemetry Collector receiver with similar scope as kube-state-metrics is the [Kubernetes Cluster Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver). However, this receiver does not support monitoring custom resources. -Another available source to monitor changes of arbitrary Kubernetes objects is the [Kubernetes Objects Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver). This receiver produces only logs and no metrics. +We investigated the following existing solutions: -The OpenTelemetry Collector provides interfaces for enhancements by implementing custom receiver plugins. The OpenTelemetry project provides [documentation](https://opentelemetry.io/docs/collector/building/receiver/) on implementing a custom receiver and adding it to a custom distribution of the Collector. +- The [Kubernetes Cluster Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver) is an OpenTelemetry Collector receiver with similar scope as kube-state-metrics . However, this receiver does not support monitoring custom resources. +- Another available source to monitor changes of arbitrary Kubernetes objects is the [Kubernetes Objects Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver). This receiver produces only logs and no metrics. + +- The OpenTelemetry Collector provides interfaces for enhancements by implementing custom receiver plugins. The OpenTelemetry project provides [documentation](https://opentelemetry.io/docs/collector/building/receiver/) on implementing a custom receiver and adding it to a custom distribution of the Collector. ## Decision @@ -25,7 +27,7 @@ Due to the restrictions of available telemetry resources for Kubernetes resource ### Extended MetricPipeline API -Module status metrics can be activated as a new input in the MetricPipeline CRD. Users can select the Kyma modules of interest by giving a module list. An empty list will include metrics of all active modules. +To activate module status metrics as a new input, the MetricPipeline CRD needs a _module list_. If the module list is empty, metrics of all active modules are collected. As in the following example, users can select the Kyma modules of interest: ```yaml apiVersion: telemetry.kyma-project.io/v1alpha1 @@ -50,7 +52,7 @@ spec: value: http://example.com:4317 ``` -Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the telemetry-metric-gateway that produces metrics for the module state and status conditions. +Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the Telemetry metric gateway that produces metrics for the module state and status conditions. The receiver configuration will follow the shown example. ```yaml @@ -71,9 +73,9 @@ The receiver needs the following properties: ### Custom Metrics Receiver for OpenTelemetry Collector -We assume the status subresource of a module CRD to contain a `conditions` list that uses the [meta/v1/Condition](https://pkg.go.dev/k8s.io/apimachinery@v0.30.0/pkg/apis/meta/v1#Condition) type and an overarching state attribute. We assume positive polarity for all conditions. +We assume that the status subresource of a module CRD contains a `conditions` list that uses the type [meta/v1/Condition](https://pkg.go.dev/k8s.io/apimachinery@v0.30.0/pkg/apis/meta/v1#Condition), and an overarching state attribute. We assume positive polarity for all conditions. -An example for this structure is the status subresource of the Telemetry module: +As an example for this structure, see the **status** subresource of the Telemetry module: ```yaml status: From 319654a50750734e1011424a077e1699fe873d95 Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Tue, 30 Apr 2024 14:16:05 +0200 Subject: [PATCH 5/9] Minor improvements --- docs/contributor/arch/011-kymastatsmetrics.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index 94bceffa7..b34a9a37f 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -12,7 +12,7 @@ The epic [Advanced pipeline status based on data flow](https://github.com/kyma-p To ease day-two operations, the status of the Telemetry and other Kyma modules should be available as metrics. Then, users can integrate these metrics into their monitoring system; for example, by setting up alerts for a module status that differs from the expected value. The [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) exporter provides a way to [export metrics for custom resources](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/extend/customresourcestate-metrics.md). -This record describes a way to integrate similar functionality to the Telemetry module's OpenTelemetry Collectors. The solution should avoid the maintenance overhead of an additional third-party-image and allow a dynamic configuration, based on the active Kyma modules. +This document describes a way to integrate similar functionality to the Telemetry module's OpenTelemetry Collectors. The solution should avoid the maintenance overhead of an additional third-party-image and allow a dynamic configuration, based on the active Kyma modules. We investigated the following existing solutions: @@ -53,7 +53,8 @@ spec: ``` Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the Telemetry metric gateway that produces metrics for the module state and status conditions. -The receiver configuration will follow the shown example. + +The receiver configuration will follow the shown example: ```yaml receivers: From 14cc7823e1f49e765d558e4ba654af346137613e Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Tue, 30 Apr 2024 14:23:38 +0200 Subject: [PATCH 6/9] Add clarification for Kyma CRD --- docs/contributor/arch/011-kymastatsmetrics.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index b34a9a37f..6786c4823 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -119,3 +119,5 @@ Status and conditions should result in the following metrics: | kyma.module.status.condition | reason, status, name, type | Exports condition status of all conditions under .status.conditions. Value is 1 if the condition status is 1, else 0. | Collecting the module specific metrics should continue working in the case of a Node or Pod failure (high availability) without emitting metrics multiple times. To ensure this behavior, Kubernetes API server [leases](https://kubernetes.io/docs/concepts/architecture/leases/) can be used while only the lease holder should emit metrics. We will investigate for a generic solution in the OpenTelemetry Collector. + +The status of the Kyma CR should not be exported as a metric by the described approach. The receiver should also work with individually installed modules that are not managed by the lifecycle manager. The synchronization of the Kyma CR to module CRs is considered to be out of scope for this end-user facing metrics. From f454e29cdcbd54124aa5e4824501579fb116a7fd Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Tue, 30 Apr 2024 15:34:09 +0200 Subject: [PATCH 7/9] Fixed style --- docs/contributor/arch/011-kymastatsmetrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index 6786c4823..349954a1a 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -76,7 +76,7 @@ The receiver needs the following properties: We assume that the status subresource of a module CRD contains a `conditions` list that uses the type [meta/v1/Condition](https://pkg.go.dev/k8s.io/apimachinery@v0.30.0/pkg/apis/meta/v1#Condition), and an overarching state attribute. We assume positive polarity for all conditions. -As an example for this structure, see the **status** subresource of the Telemetry module: +As an example for this structure, see the `status` subresource of the Telemetry module: ```yaml status: From 2e141a91d7dd92a64184b64ce4db4c4eee879b69 Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Tue, 30 Apr 2024 16:55:43 +0200 Subject: [PATCH 8/9] Apply suggestions from code review Co-authored-by: Nina Hingerl <76950046+NHingerl@users.noreply.github.com> --- docs/contributor/arch/011-kymastatsmetrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index 349954a1a..26b2acfb5 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -27,7 +27,7 @@ Due to the restrictions of available telemetry resources for Kubernetes resource ### Extended MetricPipeline API -To activate module status metrics as a new input, the MetricPipeline CRD needs a _module list_. If the module list is empty, metrics of all active modules are collected. As in the following example, users can select the Kyma modules of interest: +To activate module status metrics as a new input, the MetricPipeline CRD needs a _module list_. If the module list is empty, metrics of all active modules are collected. Users can select the Kyma modules of interest. In the following example, only the `telemetry` module is selected: ```yaml apiVersion: telemetry.kyma-project.io/v1alpha1 @@ -54,7 +54,7 @@ spec: Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the Telemetry metric gateway that produces metrics for the module state and status conditions. -The receiver configuration will follow the shown example: +See the following example for the receiver configuration: ```yaml receivers: From 5375c94f6e71d5c8abeb7cc2303fd8347509595a Mon Sep 17 00:00:00 2001 From: Christoph Kleineweber Date: Tue, 30 Apr 2024 16:56:59 +0200 Subject: [PATCH 9/9] Apply feedback --- docs/contributor/arch/011-kymastatsmetrics.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/contributor/arch/011-kymastatsmetrics.md b/docs/contributor/arch/011-kymastatsmetrics.md index 26b2acfb5..2c778391a 100644 --- a/docs/contributor/arch/011-kymastatsmetrics.md +++ b/docs/contributor/arch/011-kymastatsmetrics.md @@ -54,6 +54,12 @@ spec: Enabling the Kyma input will enable a custom metrics receiver, called `kymastats`, in the Telemetry metric gateway that produces metrics for the module state and status conditions. +The receiver needs the following properties: + +- **auth_type**: The way to authenticate with the Kubernetes API. Possible values are `serviceAccount` (default) or `kubeConfig`. +- **collection_interval**: The interval that is used to emit metrics. +- **api_groups**: List of API groups to scrape. Most of the Kyma modules should use the `operator.kyma-project.io` API group. The list can be extended to support monitoring custom modules. Every CRD in the listed groups is assumed to represent a module. + See the following example for the receiver configuration: ```yaml @@ -66,12 +72,6 @@ receivers: - operator.kyma-project.io ``` -The receiver needs the following properties: - -- **auth_type**: The way to authenticate with the Kubernetes API. Possible values are `serviceAccount` (default) or `kubeConfig`. -- **collection_interval**: The interval that is used to emit metrics. -- **api_groups**: List of API groups to scrape. Most of the Kyma modules should use the `operator.kyma-project.io` API group. The list can be extended to support monitoring custom modules. Every CRD in the listed groups is assumed to represent a module. - ### Custom Metrics Receiver for OpenTelemetry Collector We assume that the status subresource of a module CRD contains a `conditions` list that uses the type [meta/v1/Condition](https://pkg.go.dev/k8s.io/apimachinery@v0.30.0/pkg/apis/meta/v1#Condition), and an overarching state attribute. We assume positive polarity for all conditions.