Skip to content

Commit

Permalink
Prometheus Exporter for Pyrometer (#480)
Browse files Browse the repository at this point in the history
* add ingress to pyrometer chart

Since the beginning of tezos-k8s we have always deployed ingresses
outside of our charts.

Is it the right approach? Maybe not: the default helm chart created with
`helm init` has a pre-configured ingress template file.

This ingress template file is sufficient for our use cases: we can pass
ingress annotations and TLS configuration with values.yaml.

Therefore, we add an option to define the ingress with helm directly.
This allows to remove an ingress definition from the teztnets and
oxheadinfra code bases, simplifying it.

We should consider doing that for Tezos RPC as well.

* fix no full name for pyrometer

* add missing service description in values.yaml

* add helpers file

* add exporter

* fix filename

* parse when json header is missing (temporary)

* exporter part

* better default pyrometer values

* add prometheus rule

* fix wrong helm

* fix enable service monitor

* fix service monitor name

* add service label for servicemonitor

* fix endpoint name "metrics"

* fix prom syntax

* do not enable service mon by def

* fix typo in arerting rule

* rules only apply to current namespace

* missing quotes

* add README based on PR descr

* per review: make ingress example more realistic

* do not templatize ingress hosts

* make example host the empty string

* remove force=true after pyrometer 0.5.2

* remove service tuning in values
  • Loading branch information
nicolasochem authored Aug 30, 2022
1 parent ebcdd8b commit 2921424
Show file tree
Hide file tree
Showing 7 changed files with 137 additions and 2 deletions.
24 changes: 24 additions & 0 deletions charts/pyrometer/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
## Pyrometer chart

A chart to deploy the [pyrometer](https://gitlab.com/tezos-kiln/pyrometer) Tezos monitoring tool.

Pass a complete pyrometer configuration with the `config` key in values, yaml, it will be transparently applied to pyrometer.

### Prometheus exporter

Pyrometer is a self-sustaining tool that manages its own alerts and alerting channels.

Quoting pyrometer [architecture doc](https://gitlab.com/tezos-kiln/pyrometer/-/blob/main/doc/monitoring.md):

> Primary installation target for initial monitoring implementation is a
personal computer. Consequently, implementation should prioritize
simplicity when it comes to number of individual, isolated components,
processes, their runtime dependencies,
administration/configuration.


The Prometheus exporter for Pyrometer consumes pyrometer events using webhooks and monitors only one of them: baker health status. It then aggregates the number of unhealthy bakers and exposes this as a prometheus metric.

The ServiceMonitor and PrometheusRule are also included in the chart.

This gives you:

* the concept of an active alert that can be fed into an incident management system such as pagerduty.
* the ability to monitor a baker baking for several addresses, where it is not desirable to alert for an individual unhealthy address, but only when all the configured bakers are unhealtly. The threshold is configurable in the chart.
39 changes: 39 additions & 0 deletions charts/pyrometer/scripts/pyrometer_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env python
from flask import Flask, request, jsonify
import requests
import datetime

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

application = Flask(__name__)

unhealthy_bakers = set()

@application.route('/pyrometer_webhook', methods=['POST'])
def pyrometer_webhook():
'''
Receive all events from pyrometer
'''
for msg in request.get_json():
if msg["kind"] == "baker_unhealthy":
print(f"Baker {msg['baker']} is unhealthy")
unhealthy_bakers.add(msg["baker"])
if msg["kind"] == "baker_recovered":
print(f"Baker {msg['baker']} recovered")
unhealthy_bakers.remove(msg["baker"])

return "Webhook received"

@application.route('/metrics', methods=['GET'])
def prometheus_metrics():
'''
Prometheus endpoint
'''
return f'''# total number of monitored bakers that are currently unhealthy
pyrometer_unhealthy_bakers_total {len(unhealthy_bakers)}
'''

if __name__ == "__main__":
application.run(host = "0.0.0.0", port = 31732, debug = False)
12 changes: 12 additions & 0 deletions charts/pyrometer/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,18 @@ spec:
volumeMounts:
- name: config-volume
mountPath: /config/
- name: prom-exporter
image: {{ .Values.tezos_k8s_images.utils }}
ports:
- name: metrics
containerPort: 31732
protocol: TCP
command:
- /usr/local/bin/python
args:
- "-c"
- |
{{ tpl ($.Files.Get (print "scripts/pyrometer_exporter.py")) $ | indent 12 }}
volumes:
- name: config-volume
configMap:
Expand Down
20 changes: 20 additions & 0 deletions charts/pyrometer/templates/prometheusrule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{{- if .Values.prometheusRule.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
{{- toYaml .Values.prometheusRule.labels | nindent 4 }}
name: baker-external-monitor-alerter
spec:
groups:
- name: pyrometer.rules
rules:
- alert: BakersUnhealthy
annotations:
description: '{{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }} or more unhealthy bakers'
summary: "{{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }} or more unhealthy Tezos baker according to Pyrometer external monitoring"
expr: pyrometer_unhealthy_bakers_total{namespace="{{ .Release.Namespace }}"} >= {{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }}
for: 1m
labels:
severity: critical
{{- end }}
6 changes: 6 additions & 0 deletions charts/pyrometer/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,18 @@ kind: Service
metadata:
name: pyrometer
namespace: {{ .Release.Namespace }}
labels:
app: pyrometer
spec:
type: NodePort
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
- port: 31732
targetPort: metrics
protocol: TCP
name: metrics
selector:
app: pyrometer
17 changes: 17 additions & 0 deletions charts/pyrometer/templates/servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{{- if .Values.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: pyrometer
name: pyrometer-service-monitor
namespace: {{ .Release.Namespace }}
spec:
endpoints:
- interval: 15s
port: metrics
path: /metrics
selector:
matchLabels:
app: pyrometer
{{- end }}
21 changes: 19 additions & 2 deletions charts/pyrometer/values.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
# Pass below the pyrometer config, in yaml format (will be converted to toml)
config: {}
images:
pyrometer: registry.gitlab.com/tezos-kiln/pyrometer:latest
tezos_k8s_images:
utils: tezos-k8s-utils:dev
# Pass below the pyrometer config, in yaml format
config:
node_monitor:
nodes:
- http://tezos-node-rpc:8732
webhook:
enabled: true
url: http://127.0.0.1:31732/pyrometer_webhook
ingress:
enabled: false
className: ""
Expand All @@ -13,3 +21,12 @@ ingress:
# - secretName: chart-example-tls
# hosts:
# - chart-example.local

# Prometheus Operator is required in your cluster in order to enable
# serviceMonitor and prometheusRule below.
serviceMonitor:
enabled: false
prometheusRule:
enabled: false
numberOfUnhealthyBakersAlertThreshold: 1
labels: {}

0 comments on commit 2921424

Please sign in to comment.