Skip to content

Commit

Permalink
update alerts for resource limits
Browse files Browse the repository at this point in the history
this change updates the resource limit alerts to use the new metric
introduced in kubernetes/autoscaler#5059.
  • Loading branch information
elmiko committed Sep 14, 2022
1 parent fcffbcd commit 14a6c32
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 8 deletions.
4 changes: 2 additions & 2 deletions docs/user/alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ cluster autoscaler (default 320000 cores).
### Query
```
# for: 15m
cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction="maximum"}
increase(cluster_autoscaler_skipped_scale_events_count{direction=\"up\",reason=\"CpuLimitReached\"}[15]) > 0
```

### Possible Causes
Expand All @@ -95,7 +95,7 @@ for the cluster autoscaler (default 6400000 gigabytes).
### Query
```
# for: 15m
cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction="maximum"}
increase(cluster_autoscaler_skipped_scale_events_count{direction=\"up\",reason=\"MemoryResourceLimit\"}[15]) > 0
```

### Possible Causes
Expand Down
12 changes: 6 additions & 6 deletions pkg/controller/clusterautoscaler/monitoring.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,31 +199,31 @@ true then the cluster autoscaler will enter an unsafe to scale state until the c
},
{
Alert: "ClusterAutoscalerUnableToScaleCPULimitReached",
Expr: intstr.FromString("cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction=\"maximum\"}"),
Expr: intstr.FromString("increase(cluster_autoscaler_skipped_scale_events_count{direction=\"up\",reason=\"CpuResourceLimit\"}[15]) > 0"),

For: "15m",
Labels: map[string]string{
"severity": "info",
},
Annotations: map[string]string{
"summary": "Cluster Autoscaler has reached its CPU core limit and is unable to scale out",
"summary": "Cluster Autoscaler has reached its maximum CPU core limit and is unable to scale out",
"description": `The number of total cores in the cluster has exceeded the maximum number set on the
cluster autoscaler. This is calculated by summing the cpu capacity for all nodes in the cluster and comparing that number against the maximum cores value set for the
cluster autoscaler (default 320000 cores).`,
cluster autoscaler (default 320000 cores). Limits can be adjusted by modifying the ClusterAutoscaler resource.`,
},
},
{
Alert: "ClusterAutoscalerUnableToScaleMemoryLimitReached",
Expr: intstr.FromString("cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction=\"maximum\"}"),
Expr: intstr.FromString("increase(cluster_autoscaler_skipped_scale_events_count{direction=\"up\",reason=\"MemoryResourceLimit\"}[15]) > 0"),
For: "15m",
Labels: map[string]string{
"severity": "info",
},
Annotations: map[string]string{
"summary": "Cluster Autoscaler has reached its Memory bytes limit and is unable to scale out",
"summary": "Cluster Autoscaler has reached its maximum Memory bytes limit and is unable to scale out",
"description": `The number of total bytes of RAM in the cluster has exceeded the maximum number set on
the cluster autoscaler. This is calculated by summing the memory capacity for all nodes in the cluster and comparing that number against the maximum memory bytes value set
for the cluster autoscaler (default 6400000 gigabytes).`,
for the cluster autoscaler (default 6400000 gigabytes). Limits can be adjusted by modifying the ClusterAutoscaler resource.`,
},
},
},
Expand Down

0 comments on commit 14a6c32

Please sign in to comment.