-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prometheus Exporter for Pyrometer (#480)
* add ingress to pyrometer chart Since the beginning of tezos-k8s we have always deployed ingresses outside of our charts. Is it the right approach? Maybe not: the default helm chart created with `helm init` has a pre-configured ingress template file. This ingress template file is sufficient for our use cases: we can pass ingress annotations and TLS configuration with values.yaml. Therefore, we add an option to define the ingress with helm directly. This allows to remove an ingress definition from the teztnets and oxheadinfra code bases, simplifying it. We should consider doing that for Tezos RPC as well. * fix no full name for pyrometer * add missing service description in values.yaml * add helpers file * add exporter * fix filename * parse when json header is missing (temporary) * exporter part * better default pyrometer values * add prometheus rule * fix wrong helm * fix enable service monitor * fix service monitor name * add service label for servicemonitor * fix endpoint name "metrics" * fix prom syntax * do not enable service mon by def * fix typo in arerting rule * rules only apply to current namespace * missing quotes * add README based on PR descr * per review: make ingress example more realistic * do not templatize ingress hosts * make example host the empty string * remove force=true after pyrometer 0.5.2 * remove service tuning in values
- Loading branch information
1 parent
ebcdd8b
commit 2921424
Showing
7 changed files
with
137 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,27 @@ | ||
## Pyrometer chart | ||
|
||
A chart to deploy the [pyrometer](https://gitlab.com/tezos-kiln/pyrometer) Tezos monitoring tool. | ||
|
||
Pass a complete pyrometer configuration with the `config` key in values, yaml, it will be transparently applied to pyrometer. | ||
|
||
### Prometheus exporter | ||
|
||
Pyrometer is a self-sustaining tool that manages its own alerts and alerting channels. | ||
|
||
Quoting pyrometer [architecture doc](https://gitlab.com/tezos-kiln/pyrometer/-/blob/main/doc/monitoring.md): | ||
|
||
> Primary installation target for initial monitoring implementation is a | ||
personal computer. Consequently, implementation should prioritize | ||
simplicity when it comes to number of individual, isolated components, | ||
processes, their runtime dependencies, | ||
administration/configuration. | ||
|
||
|
||
The Prometheus exporter for Pyrometer consumes pyrometer events using webhooks and monitors only one of them: baker health status. It then aggregates the number of unhealthy bakers and exposes this as a prometheus metric. | ||
|
||
The ServiceMonitor and PrometheusRule are also included in the chart. | ||
|
||
This gives you: | ||
|
||
* the concept of an active alert that can be fed into an incident management system such as pagerduty. | ||
* the ability to monitor a baker baking for several addresses, where it is not desirable to alert for an individual unhealthy address, but only when all the configured bakers are unhealtly. The threshold is configurable in the chart. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env python | ||
from flask import Flask, request, jsonify | ||
import requests | ||
import datetime | ||
|
||
import logging | ||
log = logging.getLogger('werkzeug') | ||
log.setLevel(logging.ERROR) | ||
|
||
application = Flask(__name__) | ||
|
||
unhealthy_bakers = set() | ||
|
||
@application.route('/pyrometer_webhook', methods=['POST']) | ||
def pyrometer_webhook(): | ||
''' | ||
Receive all events from pyrometer | ||
''' | ||
for msg in request.get_json(): | ||
if msg["kind"] == "baker_unhealthy": | ||
print(f"Baker {msg['baker']} is unhealthy") | ||
unhealthy_bakers.add(msg["baker"]) | ||
if msg["kind"] == "baker_recovered": | ||
print(f"Baker {msg['baker']} recovered") | ||
unhealthy_bakers.remove(msg["baker"]) | ||
|
||
return "Webhook received" | ||
|
||
@application.route('/metrics', methods=['GET']) | ||
def prometheus_metrics(): | ||
''' | ||
Prometheus endpoint | ||
''' | ||
return f'''# total number of monitored bakers that are currently unhealthy | ||
pyrometer_unhealthy_bakers_total {len(unhealthy_bakers)} | ||
''' | ||
|
||
if __name__ == "__main__": | ||
application.run(host = "0.0.0.0", port = 31732, debug = False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{{- if .Values.prometheusRule.enabled }} | ||
apiVersion: monitoring.coreos.com/v1 | ||
kind: PrometheusRule | ||
metadata: | ||
labels: | ||
{{- toYaml .Values.prometheusRule.labels | nindent 4 }} | ||
name: baker-external-monitor-alerter | ||
spec: | ||
groups: | ||
- name: pyrometer.rules | ||
rules: | ||
- alert: BakersUnhealthy | ||
annotations: | ||
description: '{{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }} or more unhealthy bakers' | ||
summary: "{{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }} or more unhealthy Tezos baker according to Pyrometer external monitoring" | ||
expr: pyrometer_unhealthy_bakers_total{namespace="{{ .Release.Namespace }}"} >= {{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }} | ||
for: 1m | ||
labels: | ||
severity: critical | ||
{{- end }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{{- if .Values.serviceMonitor.enabled }} | ||
apiVersion: monitoring.coreos.com/v1 | ||
kind: ServiceMonitor | ||
metadata: | ||
labels: | ||
app: pyrometer | ||
name: pyrometer-service-monitor | ||
namespace: {{ .Release.Namespace }} | ||
spec: | ||
endpoints: | ||
- interval: 15s | ||
port: metrics | ||
path: /metrics | ||
selector: | ||
matchLabels: | ||
app: pyrometer | ||
{{- end }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters