diff --git a/.spelling b/.spelling index df9fc644c217..f9c44d655bcf 100644 --- a/.spelling +++ b/.spelling @@ -205,6 +205,7 @@ sandboxed shortcodes stateful stderr +temporality triaged un-reconciled v1 diff --git a/docs/metrics-3.6.md b/docs/metrics-3.6.md deleted file mode 100644 index 65850364ec4e..000000000000 --- a/docs/metrics-3.6.md +++ /dev/null @@ -1,58 +0,0 @@ -# Metrics upgrade notes - -Metrics have changed in 3.6. - -You can now retrieve metrics using the OpenTelemetry Protocol using the [OpenTelemetry collector](https://opentelemetry.io/docs/collector/), and this is the recommended mechanism. - -These notes explain the differences in using the Prometheus `/metrics` endpoint to scrape metrics for a minimal effort upgrade. It is not recommended you follow this guide blindly, the new metrics have been introduced because they add value, and so they should be worth collecting and using. - -## New metrics - -The following are new metrics: - -* `build_info` -* `controller_build_info` -* `cronworkflows_triggered_total` -* `k8s_request_duration` -* `pods_total_count` -* `pod_pending_count` -* `queue_duration` -* `queue_longest_running` -* `queue_retries` -* `queue_unfinished_work` -* `total_count` -* `workflowtemplate_runtime` -* `workflowtemplate_triggered_total` - -and can be disabled with - -```yaml -metricsConfig: | - modifiers: - build_info: - disable: true -... -``` - -## Renamed metrics - -If you are using these metrics in your recording rules, dashboards, or alerts, you will need to update their names after the upgrade: - -| Old name | New name | -|------------------------------------|------------------------------------| -| `argo_workflows_count` | `argo_workflows_gauge` | -| `argo_workflows_pods_count` | `argo_workflows_pods_gauge` | -| `argo_workflows_queue_depth_count` | `argo_workflows_queue_depth_gauge` | -| `log_messages` | `argo_workflows_log_messages` | - -## Custom metrics - -Custom metric names and labels must be valid Prometheus and OpenTelemetry names now. This prevents the use of `:`, which was usable in earlier versions of workflows - -Custom metrics, as defined by a workflow, could be defined as one type (say counter) in one workflow, and then as a histogram of the same name in a different workflow. This would work in 3.5 if the first usage of the metric had reached TTL and been deleted. This will no-longer work in 3.6, and custom metrics may not be redefined. It doesn't really make sense to change a metric in this way, and the OpenTelemetry SDK prevents you from doing so. - -## TLS - -The Prometheus `/metrics` endpoint now has TLS enabled by default. - -To disable this set `metricsConfig.secure` to `false`. diff --git a/docs/metrics.md b/docs/metrics.md index 427865e68e55..d71e4a55ac4a 100644 --- a/docs/metrics.md +++ b/docs/metrics.md @@ -3,7 +3,7 @@ > v2.7 and after !!! Metrics changes in 3.6 - Please read [this](metrics-3.6.md) short guide on what you must consider when upgrading to 3.6. + Please read [this short guide](upgrading.md#metrics_changes) on what you must consider when upgrading to 3.6. ## Introduction diff --git a/docs/upgrading.md b/docs/upgrading.md index 6ac60775eefb..af02a9dddb45 100644 --- a/docs/upgrading.md +++ b/docs/upgrading.md @@ -15,6 +15,64 @@ Previously it was `--basehref` (no dash in between) and `ARGO_BASEHREF` (no unde `ALLOWED_LINK_PROTOCOL` and `BASE_HREF` have been removed as redundant. Use `ARGO_ALLOWED_LINK_PROTOCOL` and `ARGO_BASE_HREF` instead. +### Metrics changes + +You can now retrieve metrics using the OpenTelemetry Protocol using the [OpenTelemetry collector](https://opentelemetry.io/docs/collector/), and this is the recommended mechanism. + +These notes explain the differences in using the Prometheus `/metrics` endpoint to scrape metrics for a minimal effort upgrade. It is not recommended you follow this guide blindly, the new metrics have been introduced because they add value, and so they should be worth collecting and using. + +#### New metrics + +The following are new metrics: + +* `build_info` +* `controller_build_info` +* `cronworkflows_triggered_total` +* `k8s_request_duration` +* `leader` +* `pods_total_count` +* `pod_pending_count` +* `queue_duration` +* `queue_longest_running` +* `queue_retries` +* `queue_unfinished_work` +* `total_count` +* `workflowtemplate_runtime` +* `workflowtemplate_triggered_total` + +and can be disabled with + +```yaml +metricsConfig: | + modifiers: + build_info: + disable: true +... +``` + +#### Renamed metrics + +If you are using these metrics in your recording rules, dashboards, or alerts, you will need to update their names after the upgrade: + +| Old name | New name | +|------------------------------------|------------------------------------| +| `argo_workflows_count` | `argo_workflows_gauge` | +| `argo_workflows_pods_count` | `argo_workflows_pods_gauge` | +| `argo_workflows_queue_depth_count` | `argo_workflows_queue_depth_gauge` | +| `log_messages` | `argo_workflows_log_messages` | + +#### Custom metrics + +Custom metric names and labels must be valid Prometheus and OpenTelemetry names now. This prevents the use of `:`, which was usable in earlier versions of workflows + +Custom metrics, as defined by a workflow, could be defined as one type (say counter) in one workflow, and then as a histogram of the same name in a different workflow. This would work in 3.5 if the first usage of the metric had reached TTL and been deleted. This will no-longer work in 3.6, and custom metrics may not be redefined. It doesn't really make sense to change a metric in this way, and the OpenTelemetry SDK prevents you from doing so. + +#### TLS + +The Prometheus `/metrics` endpoint now has TLS enabled by default. + +To disable this set `metricsConfig.secure` to `false`. + ## Upgrading to v3.5 There are no known breaking changes in this release. Please file an issue if you encounter any unexpected problems after upgrading. diff --git a/mkdocs.yml b/mkdocs.yml index f43533c6a47d..3403c489f3d3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -248,7 +248,6 @@ nav: - offloading-large-workflows.md - workflow-archive.md - metrics.md - - metrics-3.6.md - workflow-executors.md - workflow-restrictions.md - sidecar-injection.md