Skip to content

Commit

Permalink
manifests: Remove etcd gRPC calls failed alerts
Browse files Browse the repository at this point in the history
These alerts are firing constantly due to some issues around how to etcd
increases its metrics. See etcd-io/etcd#10289
  • Loading branch information
brancz committed Apr 30, 2019
1 parent 44b4406 commit 400f412
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 26 deletions.
24 changes: 0 additions & 24 deletions assets/prometheus-k8s/rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1163,30 +1163,6 @@ spec:
for: 15m
labels:
severity: warning
- alert: etcdHighNumberOfFailedGRPCRequests
annotations:
message: 'etcd cluster "{{ $labels.job }}": {{ $value }}% of requests for
{{ $labels.grpc_method }} failed on etcd instance {{ $labels.instance }}.'
expr: |
100 * sum(rate(grpc_server_handled_total{job=~".*etcd.*", grpc_code!="OK"}[5m])) BY (job, instance, grpc_service, grpc_method)
/
sum(rate(grpc_server_handled_total{job=~".*etcd.*"}[5m])) BY (job, instance, grpc_service, grpc_method)
> 1
for: 10m
labels:
severity: warning
- alert: etcdHighNumberOfFailedGRPCRequests
annotations:
message: 'etcd cluster "{{ $labels.job }}": {{ $value }}% of requests for
{{ $labels.grpc_method }} failed on etcd instance {{ $labels.instance }}.'
expr: |
100 * sum(rate(grpc_server_handled_total{job=~".*etcd.*", grpc_code!="OK"}[5m])) BY (job, instance, grpc_service, grpc_method)
/
sum(rate(grpc_server_handled_total{job=~".*etcd.*"}[5m])) BY (job, instance, grpc_service, grpc_method)
> 5
for: 5m
labels:
severity: critical
- alert: etcdGRPCRequestsSlow
annotations:
message: 'etcd cluster "{{ $labels.job }}": gRPC requests to {{ $labels.grpc_method
Expand Down
12 changes: 12 additions & 0 deletions jsonnet/main.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,18 @@ local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') +
},
},
},
} + {
prometheusAlerts+:: {
groups:
std.map(
function(ruleGroup)
if ruleGroup.name == 'etcd' then
ruleGroup { rules: std.filter(function(rule) !('alert' in rule && rule.alert == 'etcdHighNumberOfFailedGRPCRequests'), ruleGroup.rules) }
else
ruleGroup,
super.groups,
),
},
} +
(import 'telemeter-client/client.libsonnet') +
{
Expand Down
Loading

0 comments on commit 400f412

Please sign in to comment.