-
Notifications
You must be signed in to change notification settings - Fork 145
Alerts
The main usage of the Grafana/Prometheus is graph display of time series. But there are cases where it is not enough, for example, when a node is down. It is clearer to get a text alert that a node was down with a timestamp of when it has happened.
We are going to explore the node-down example to explain the alert mechanism.
We use Prometheus, Grafana, and alert-manager to report the alarms.
In the context of alarm reporting each plays a different role:
The Prometheus server stores the data and creates the Alarms. Prometheus 1.8 Alarms.
Prometheus alarms are stored in a file, you can find it in prometheus/prometheus.rules
The basic structure of an alert is
ALERT <alert name>
IF <expression>
[ FOR <duration> ]
[ LABELS <label set> ]
[ ANNOTATIONS <label set> ]
For the node-down
example
ALERT InstanceDown
IF up == 0
FOR 30s
LABELS { severity = "1" }
ANNOTATIONS {
summary = "Instance {{ $labels.instance }} down",
description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 30 seconds.",
}
Meaning that if a node is not responsive for 30 seconds it will trigger a down
alert.
The way alerts are implemented in Prometheus, the Prometheus server will continue to generate alerts as long as the condition stands. This is not very clear in the grafana dashboard, so we are adding an alertmanager
The Alertmanager comes from prometheus.io. We use it to limit the number of alerts we get in Grafana, but make sure to learn more about its capabilities.
The Alertmanager can send notification via various channels like email or slack.
It can apply an extra layer of logic on alerts to group and silence alerts.
For our use case, it serves as a data source for Grafana. A default configuration for the alertmanager is included, but you should check the alertmanager configuration guide to learn how to use it for alerts reporting.
Grafana 3.0 and up support the Prometheus alertmanager data source plugin. If you are using scylla containers stack, the start-all command will load the plugin for you.
To see the alerts add a table panel to your dashboard