Skip to content

Commit

Permalink
Merge pull request #408 from treid314/distributor-inflight-push-alerts
Browse files Browse the repository at this point in the history
add rule for critical distributor inflight push request alert
  • Loading branch information
Tyler Reid authored Oct 22, 2021
2 parents b8901a9 + 1d0d032 commit 92f3b64
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
* [ENHANCEMENT] Use configured `ruler` jobname for ruler dashboard panels. #409
* [ENHANCEMENT] Add ability to override `datasource` for generated dashboards. #407
* [ENHANCEMENT] Use alertmanager jobname for alertmanager dashboard panels #411
* [ENHANCEMENT] Added `CortexDistributorReachingInflightPushRequestLimit` alert. #408
* [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308
* [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. #329
* [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. #335
Expand Down
19 changes: 19 additions & 0 deletions cortex-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,25 @@
|||,
},
},
{
alert: 'CortexDistributorReachingInflightPushRequestLimit',
expr: |||
(
(cortex_distributor_inflight_push_requests / ignoring(limit) cortex_distributor_instance_limits{limit="max_inflight_push_requests"})
and ignoring (limit)
(cortex_distributor_instance_limits{limit="max_inflight_push_requests"} > 0)
) > 0.8
|||,
'for': '5m',
labels: {
severity: 'critical',
},
annotations: {
message: |||
Distributor {{ $labels.job }}/{{ $labels.instance }} has reached {{ $value | humanizePercentage }} of its inflight push request limit.
|||,
},
},
],
},
{
Expand Down
24 changes: 24 additions & 0 deletions cortex-mixin/docs/playbooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,30 @@ How to **fix**:
1. Ensure shuffle-sharding is enabled in the Cortex cluster
1. Assuming shuffle-sharding is enabled, scaling up ingesters will lower the number of tenants per ingester. However, the effect of this change will be visible only after `-blocks-storage.tsdb.close-idle-tsdb-timeout` period so you may have to temporarily increase the limit
### CortexDistributorReachingInflightPushRequestLimit
This alert fires when the `cortex_distributor_inflight_push_requests` per distributor instance limit is enabled and the actual number of inflight push requests is approaching the set limit. Once the limit is reached, push requests to the distributor will fail (5xx) for new requests, while existing inflight push requests will continue to succeed.
In case of **emergency**:
- If the actual number of inflight push requests is very close to or already at the set limit, then you can increase the limit via CLI flag or config to gain some time
- Increasing the limit will increase the number of inflight push requests which will increase distributors' memory utilization. Please monitor the distributors' memory utilization via the `Cortex / Writes Resources` dashboard
How the limit is **configured**:
- The limit can be configured either by the CLI flag (`-distributor.instance-limits.max-inflight-push-requests`) or in the config:
```
distributor:
instance_limits:
max_inflight_push_requests: <int>
```
- These changes are applied with a distributor restart.
- The configured limit can be queried via `cortex_distributor_instance_limits{limit="max_inflight_push_requests"})`
How to **fix**:
1. **Temporarily increase the limit**<br />
If the actual number of inflight push requests is very close to or already hit the limit.
2. **Scale up distributors**<br />
Scaling up distributors will lower the number of inflight push requests per distributor.
### CortexRequestLatency
This alert fires when a specific Cortex route is experiencing an high latency.
Expand Down

0 comments on commit 92f3b64

Please sign in to comment.