From 0470adb08ab4ab12b58bfe4ab9d9715611d95034 Mon Sep 17 00:00:00 2001 From: beorn7 Date: Mon, 26 Jul 2021 19:08:00 +0200 Subject: [PATCH] Make CortexIngesterReachingSeriesLimit warning less sensitive As it turns out, during normal shuffle-sharding operation, the 70% mark is often exceeded, but not by much. Therefore, this change sets the new warning mark at 75%. It also increases the `for` duration to 15m as the expected reaction time for warning alerts is usually in the order of hours, so we can as well wait a bit longer to see if the problem is transient. Signed-off-by: beorn7 --- CHANGELOG.md | 1 + cortex-mixin/alerts/alerts.libsonnet | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 86c6aed2..8f355387 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,7 @@ * [ENHANCEMENT] cortex-mixin: Added `alert_excluded_routes` config to exclude specific routes from alerts. #338 * [ENHANCEMENT] Added `CortexMemcachedRequestErrors` alert. #346 * [ENHANCEMENT] Ruler dashboard: added "Per route p99 latency" panel in the "Configuration API" row. #353 +* [ENHANCEMENT] Tweaked threshould and `for` duration for `CortexIngesterReachingSeriesLimit` warning alert. #362 * [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308 * [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. #329 * [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. #335 diff --git a/cortex-mixin/alerts/alerts.libsonnet b/cortex-mixin/alerts/alerts.libsonnet index 9eefe7f8..203623ec 100644 --- a/cortex-mixin/alerts/alerts.libsonnet +++ b/cortex-mixin/alerts/alerts.libsonnet @@ -257,7 +257,7 @@ (cortex_ingester_instance_limits{limit="max_series"} > 0) ) > 0.7 |||, - 'for': '5m', + 'for': '3h', labels: { severity: 'warning', },