-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPIKE: Export and analysis of benchmarking metrics #1399
Comments
So after researching Grafana more, I've revise my initial plan. This article details an architecture I think we should imitate: https://dzone.com/articles/go-microservices-part-15-monitoring-with-prometheu. Rather than the metrics running locally or being viewable during the Jenkins run, I think we should use a remote Grafana instance to monitor "snapshot builds" of Seceretless, which use some reusable configuration. Essentially, when Secretless succeeds in a Jenkins pipeline, Jenkins then deploys a snapshot instance of that commit to remote cluster. In the same cluster, we have an instance of Prometheus and Grafana. We build a simple discovery service to poll the namespace where Secretless instances run, which builds a list of endpoints for Prometheus to query. The discovery service outputs a simple json document Prometheus references, and then scrapes the noted endpoints. Then, Grafana can be used to analyze the data from Prometheus. In the article, they use the following setup, which we'll adjust to fit our needs:
However, instead of docker swarm, we run this in a remote Kubernetes cluster. There's lots of excellent guides on using helm to deploy Prometheus and Grafana, like this one: https://www.fosstechnix.com/install-prometheus-and-grafana-on-kubernetes-using-helm/ I can't speak to the implementation of a I should also call out that, since secretless can proxy to multiple endpoints, exposing metrics on a single An important point to remember is cleanup. Since prometheus and grafana are meant to be non-temporary monitoring solutions, we'll need to set up some kind of simple cleanup service for the cluster to remove stale secretless instances. Monitoring locally, this will be trivial. |
@BradleyBoutcher I think the pipeline you have in mind (
Glad you're already thinking about how this would work in CI. For the moment the goal is to have the pipeline defined and working on a single Secretless (latest release) instance. With the pipeline defined it'll be possible to then define and run benchmarking "experiments" on the current Secretless snapshot. We want to keep POC lightweight so we should be thinking to deploy the components locally with Docker/Docker Compose. A subset of the steps you describe above should get us what we want:
For (1) it doesn't have to be a mock HTTP server you could have some method that populates the fake metrics data, like how temperature is being set to random values via a loop in the OpenTelemetry example https://github.com/open-telemetry/opentelemetry-go/blob/main/example/prom-collector/main.go#L121. However, an HTTP server seems like a natural way to have fine grain dynamic control of the generated metrics.
You're right. I think as part of making the POC complete we'd want to explore Prometheus labels within the context of OpenTelemetry. In part that's why I suggest using an HTTP server for (1), for the POC we can label the metrics for a given route with the corresponding HTTP method and path. |
OutcomeThe outcome from this spike is available in the POC in the telemetry branch. secretless-broker/telemetry/docker-compose.yml Lines 1 to 23 in 1de7c61
The branch demonstrates a locally runnable pipeline (via docker-compose) of
The pipeline works as follows
Remaining questions
|
Overview
The goal here is to have a good answer to the question, "Given some well-defined metrics that we know how to measure, how do we record and export them and to where?"
Opentelemetry supports many exporters e.g. Cloudwatch or Prometheus.
The idea is to show an end to end pipeline that
The spikes are to de-risk the general approach
Definition of done
The text was updated successfully, but these errors were encountered: