This proof of concept aims to analyze the options and requirements to monitor distributed applications and their interfaces. Utilizing the open-source observability stack of Grafana Labs and other vendors a layered container stack is implemented.
Warning
This is a proof of concept and far from a production ready setup! Please visit the respective documentation of each component for a secure and reliable deployment.
- Docker
- Docker compose v2
- Grafana Loki's Docker plugin - if need be
- this repository
- Loki - a Log aggregation system
- Prometheus - a systems monitoring and alerting toolkit
- Grafana - a visualization and observability platform
- MinIO - s3 compatible object storage
- cAdvisor - resource and performance characteristics of running containers
- Mimir - long term storage for Prometheus
- successfully integrated: compose-mimir.yml
- OnCall - easy-to-use on-call management tool
- successfully integrated compose-oncall.yml
- use case pending
The applications are separated into three layers:
- infrastructure
- Storage
- Log aggregation
- metrics
- Metric/log collection
- observability
- Visualization
- TimeSeries Database (for custom metrics)
Every layer has it's own compose files, but all of them share the same bridge network monitoring-network
. To pass configuration files, some components have their own directory, containing a config file e.g.:
.
├── loki
│ └── loki-config.yml
├── prometheus
│ └── prometheus-config.yml
├── compose-infra.yml
├── compose-observe.yml
.
To aggregate the logs of our monitoring stack we use Grafana's Loki. The initial configuration was inspired by.
For a single node deployment the referenced configuration should be suitable. But aggregating multiple applications, nodes and systems a deployment like loki/getting-started or loki/production should be considered.
Therefore, the Grafana's production template, containing separate read and write instances, a nginx gateway and a MinIO storage instance, was utilized in this setup.
Using the recommended client to send logs, avoids configuration differences and generalizes the uses cases. Furthermore, a comparison with a custom implementation is possible
See this gist and the associated article for information on scraping docker logs with Promtail.
There are three methods to pass the logs of containerized applications save to Loki:
-
changing the default logging driver
-
via the compose file
Note
To avoid unexpected behavior or losing logs, we don't want to modify the default behavior and integrate the settings in our compose files.
Prometheus scraps metrics from predefined targets and stores them in it's time-series database. Some applications like Grafana, MinIO or cAdvisor implement their own metrics endpoint, for others a custom endpoint may be developed (see client libraries for additional information).
Monitor a docker host and it's running containers the following steps are necessary:
- expose a metrics endpoint on the docker host to be scraped by Prometheus
- add cAdvisor alongside our setup, which provides scrapable metrics per container
As described in Docker docs, we modify the current .../.docker/deamon.json
to expose a metrics endpoint:
{
"builder": { "gc": { "defaultKeepStorage": "20GB", "enabled": true } },
"experimental": false,
"features": { "buildkit": true }
}
becomes:
{
"builder": { "gc": { "defaultKeepStorage": "20GB", "enabled": true } },
"features": { "buildkit": true },
"metrics-addr" : "127.0.0.1:9323",
"experimental" : true
}
See the official Prometheus documentation to monitor containers.
For a setup behind a company proxy grafana requires additional certificates. Updating the certificates allows downloading plugins and dashboards via the GUI.
Requirements:
- a certificate (.crt/.pem) or a certificate bundle (.pem)
- build the custom image by providing the certificate as build argument:
Warning
Uploading a custom image to a public registry exposes your private certificates!
docker build . -t <image-name>:<tag> --build-arg CERTIFICATE_FILE=<path-to-certificate>
e.g.
docker build . -t grafana-custom --build-arg CERTIFICATE_FILE=certificate-bundle.pem
docker compose exec -it -u 0 <service-name> bash
e.g.
docker compose exec -it -u 0 grafana bash
Note
Make sure you have all Prerequisites!
Pull and initialize the application stack:
.\up.ps1
Visit the user interfaces:
- check the created buckets in MinIO
- monitor your running containers with cAdvisor
- verify all scraping targets are up and running Prometheus UI
- visit Grafana login with default credentials: username
admin
passwordadmin
Shutdown the stack:
.\shutdown.ps1
Restart the setup:
.\startup.ps1
Bring all containers down, make sure they are stopped and remove them as well as the created bridge network afterwards.
setup the data sources in grafana:
pay attention to the auth of loki: custom header key: "X-Scope-OrgID" val:1 see: https://github.com/grafana/loki/blob/main/production/docker/config/datasources.yaml
.\down.ps1