Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1997396: update alerts for resource limits #250

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/user/alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ cluster autoscaler (default 320000 cores).
### Query
```
# for: 15m
cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction="maximum"}
increase(cluster_autoscaler_skipped_scale_events_count{direction="up",reason="CpuLimitReached"}[15]) > 0
```

### Possible Causes
Expand All @@ -95,7 +95,7 @@ for the cluster autoscaler (default 6400000 gigabytes).
### Query
```
# for: 15m
cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction="maximum"}
increase(cluster_autoscaler_skipped_scale_events_count{direction="up",reason="MemoryResourceLimit"}[15]) > 0
```

### Possible Causes
Expand Down
12 changes: 6 additions & 6 deletions pkg/controller/clusterautoscaler/monitoring.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,31 +199,31 @@ true then the cluster autoscaler will enter an unsafe to scale state until the c
},
{
Alert: "ClusterAutoscalerUnableToScaleCPULimitReached",
Expr: intstr.FromString("cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction=\"maximum\"}"),
Expr: intstr.FromString("increase(cluster_autoscaler_skipped_scale_events_count{direction=\"up\",reason=\"CpuResourceLimit\"}[15]) > 0"),

For: "15m",
Labels: map[string]string{
"severity": "info",
},
Annotations: map[string]string{
"summary": "Cluster Autoscaler has reached its CPU core limit and is unable to scale out",
"summary": "Cluster Autoscaler has reached its maximum CPU core limit and is unable to scale out",
"description": `The number of total cores in the cluster has exceeded the maximum number set on the
cluster autoscaler. This is calculated by summing the cpu capacity for all nodes in the cluster and comparing that number against the maximum cores value set for the
cluster autoscaler (default 320000 cores).`,
cluster autoscaler (default 320000 cores). Limits can be adjusted by modifying the ClusterAutoscaler resource.`,
},
},
{
Alert: "ClusterAutoscalerUnableToScaleMemoryLimitReached",
Expr: intstr.FromString("cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction=\"maximum\"}"),
Expr: intstr.FromString("increase(cluster_autoscaler_skipped_scale_events_count{direction=\"up\",reason=\"MemoryResourceLimit\"}[15]) > 0"),
For: "15m",
Labels: map[string]string{
"severity": "info",
},
Annotations: map[string]string{
"summary": "Cluster Autoscaler has reached its Memory bytes limit and is unable to scale out",
"summary": "Cluster Autoscaler has reached its maximum Memory bytes limit and is unable to scale out",
"description": `The number of total bytes of RAM in the cluster has exceeded the maximum number set on
the cluster autoscaler. This is calculated by summing the memory capacity for all nodes in the cluster and comparing that number against the maximum memory bytes value set
for the cluster autoscaler (default 6400000 gigabytes).`,
for the cluster autoscaler (default 6400000 gigabytes). Limits can be adjusted by modifying the ClusterAutoscaler resource.`,
},
},
},
Expand Down