Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Exporter for Pyrometer #480

Merged
merged 29 commits into from
Aug 30, 2022
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
788590a
add ingress to pyrometer chart
nicolasochem Aug 15, 2022
eb12f3a
fix no full name for pyrometer
nicolasochem Aug 15, 2022
1c122e3
add missing service description in values.yaml
nicolasochem Aug 16, 2022
294f3ff
add helpers file
nicolasochem Aug 16, 2022
87b9d06
add exporter
nicolasochem Aug 19, 2022
85dfefd
fix filename
nicolasochem Aug 19, 2022
78ca113
parse when json header is missing (temporary)
nicolasochem Aug 19, 2022
7f18029
exporter part
nicolasochem Aug 20, 2022
20d5a27
better default pyrometer values
nicolasochem Aug 20, 2022
5dcfab9
add prometheus rule
nicolasochem Aug 22, 2022
5eceeaa
fix wrong helm
nicolasochem Aug 22, 2022
c729005
fix enable service monitor
nicolasochem Aug 22, 2022
dcd4fb5
fix service monitor name
nicolasochem Aug 22, 2022
d0c8031
add service label for servicemonitor
nicolasochem Aug 22, 2022
7446978
fix endpoint name "metrics"
nicolasochem Aug 22, 2022
929feed
fix prom syntax
nicolasochem Aug 23, 2022
ecf945c
do not enable service mon by def
nicolasochem Aug 23, 2022
923860d
fix typo in arerting rule
nicolasochem Aug 23, 2022
a30abed
rules only apply to current namespace
nicolasochem Aug 23, 2022
1e55f11
missing quotes
nicolasochem Aug 23, 2022
8751cc2
add README based on PR descr
nicolasochem Aug 23, 2022
2c0ae16
per review: make ingress example more realistic
nicolasochem Aug 23, 2022
3a1c1be
do not templatize ingress hosts
nicolasochem Aug 24, 2022
20631b3
make example host the empty string
nicolasochem Aug 24, 2022
c49f8e7
Merge branch 'pyrometer_ingress' into pyrometer_prometheus
nicolasochem Aug 25, 2022
d8099bc
remove force=true after pyrometer 0.5.2
nicolasochem Aug 25, 2022
24cbc17
Merge branch 'master' into pyrometer_prometheus
nicolasochem Aug 25, 2022
6f3d6cc
Merge branch 'master' into pyrometer_prometheus
nicolasochem Aug 26, 2022
4fd9219
remove service tuning in values
nicolasochem Aug 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions charts/pyrometer/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
## Pyrometer chart

A chart to deploy the [pyrometer](https://gitlab.com/tezos-kiln/pyrometer) Tezos monitoring tool.

Pass a complete pyrometer configuration with the `config` key in values, yaml, it will be transparently applied to pyrometer.

### Prometheus exporter

Pyrometer is a self-sustaining tool that manages its own alerts and alerting channels.

Quoting pyrometer [architecture doc](https://gitlab.com/tezos-kiln/pyrometer/-/blob/main/doc/monitoring.md):

> Primary installation target for initial monitoring implementation is a
personal computer. Consequently, implementation should prioritize
simplicity when it comes to number of individual, isolated components,
processes, their runtime dependencies,
administration/configuration.


The Prometheus exporter for Pyrometer consumes pyrometer events using webhooks and monitors only one of them: baker health status. It then aggregates the number of unhealthy bakers and exposes this as a prometheus metric.

The ServiceMonitor and PrometheusRule are also included in the chart.

This gives you:

* the concept of an active alert that can be fed into an incident management system such as pagerduty.
* the ability to monitor a baker baking for several addresses, where it is not desirable to alert for an individual unhealthy address, but only when all the configured bakers are unhealtly. The threshold is configurable in the chart.
39 changes: 39 additions & 0 deletions charts/pyrometer/scripts/pyrometer_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env python
from flask import Flask, request, jsonify
import requests
import datetime

import logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)

application = Flask(__name__)

unhealthy_bakers = set()

@application.route('/pyrometer_webhook', methods=['POST'])
def pyrometer_webhook():
'''
Receive all events from pyrometer
'''
for msg in request.get_json():
if msg["kind"] == "baker_unhealthy":
print(f"Baker {msg['baker']} is unhealthy")
unhealthy_bakers.add(msg["baker"])
if msg["kind"] == "baker_recovered":
print(f"Baker {msg['baker']} recovered")
unhealthy_bakers.remove(msg["baker"])

return "Webhook received"

@application.route('/metrics', methods=['GET'])
def prometheus_metrics():
'''
Prometheus endpoint
'''
return f'''# total number of monitored bakers that are currently unhealthy
pyrometer_unhealthy_bakers_total {len(unhealthy_bakers)}
'''

if __name__ == "__main__":
application.run(host = "0.0.0.0", port = 31732, debug = False)
12 changes: 12 additions & 0 deletions charts/pyrometer/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,18 @@ spec:
volumeMounts:
- name: config-volume
mountPath: /config/
- name: prom-exporter
image: {{ .Values.tezos_k8s_images.utils }}
ports:
- name: metrics
containerPort: 31732
protocol: TCP
command:
- /usr/local/bin/python
args:
- "-c"
- |
{{ tpl ($.Files.Get (print "scripts/pyrometer_exporter.py")) $ | indent 12 }}
volumes:
- name: config-volume
configMap:
Expand Down
20 changes: 20 additions & 0 deletions charts/pyrometer/templates/prometheusrule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{{- if .Values.prometheusRule.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
{{- toYaml .Values.prometheusRule.labels | nindent 4 }}
name: baker-external-monitor-alerter
spec:
groups:
- name: pyrometer.rules
rules:
- alert: BakersUnhealthy
annotations:
description: '{{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }} or more unhealthy bakers'
summary: "{{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }} or more unhealthy Tezos baker according to Pyrometer external monitoring"
expr: pyrometer_unhealthy_bakers_total{namespace="{{ .Release.Namespace }}"} >= {{ .Values.prometheusRule.numberOfUnhealthyBakersAlertThreshold }}
for: 1m
labels:
severity: critical
{{- end }}
6 changes: 6 additions & 0 deletions charts/pyrometer/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,18 @@ kind: Service
metadata:
name: pyrometer
namespace: {{ .Release.Namespace }}
labels:
app: pyrometer
spec:
type: NodePort
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
- port: 31732
targetPort: metrics
protocol: TCP
name: metrics
selector:
app: pyrometer
17 changes: 17 additions & 0 deletions charts/pyrometer/templates/servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{{- if .Values.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: pyrometer
name: pyrometer-service-monitor
namespace: {{ .Release.Namespace }}
spec:
endpoints:
- interval: 15s
port: metrics
path: /metrics
selector:
matchLabels:
app: pyrometer
{{- end }}
24 changes: 22 additions & 2 deletions charts/pyrometer/values.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,18 @@
# Pass below the pyrometer config, in yaml format (will be converted to toml)
config: {}
images:
pyrometer: registry.gitlab.com/tezos-kiln/pyrometer:latest
tezos_k8s_images:
utils: tezos-k8s-utils:dev
# Pass below the pyrometer config, in yaml format
config:
node_monitor:
nodes:
- http://tezos-node-rpc:8732
webhook:
enabled: true
url: http://127.0.0.1:31732/pyrometer_webhook
service:
type: ClusterIP
port: 80
nicolasochem marked this conversation as resolved.
Show resolved Hide resolved
ingress:
enabled: false
className: ""
Expand All @@ -13,3 +24,12 @@ ingress:
# - secretName: chart-example-tls
# hosts:
# - chart-example.local

# Prometheus Operator is required in your cluster in order to enable
# serviceMonitor and prometheusRule below.
serviceMonitor:
enabled: false
nicolasochem marked this conversation as resolved.
Show resolved Hide resolved
prometheusRule:
enabled: false
numberOfUnhealthyBakersAlertThreshold: 1
labels: {}