This is a demo of high level metrics that Crossplane tracks to help you more deeply understand the health and activities of your control plane.
This functionality was originally added to the crossplane-runtime
in
crossplane/crossplane-runtime#683 and then included in
the Upjet framework in
v1.3.0. Crossplane
providers based on Upjet starting with that version are capable of producing
these metrics, as long as they provide some simple metrics initialization
logic.
- A Kubernetes cluster with Crossplane v1.16.0+ installed and metrics enabled, e.g.:
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update
helm install crossplane --namespace crossplane-system --create-namespace --set metrics.enabled=true crossplane-stable/crossplane
To get observability and monitoring flowing in the control plane, let's start by installing the Prometheus stack:
helm install -n monitoring --create-namespace prometheus prometheus-community/kube-prometheus-stack
Tell Prometheus where to be gathering Crossplane metrics from (i.e., the
provider-aws-ec2
pod) by creating a PodMonitor
resource:
kubectl apply -f podmonitor.yaml
Set up a port forward to the Grafana dashboard, so it can be reached from your local machine:
kubectl port-forward -n monitoring svc/prometheus-grafana 8080:80
Now we can visit the Grafana dashboard at http://localhost:8080/ and log in with the default credentials:
user: admin
pass: prom-operator
Import our custom Crossplane metrics dashboard with the following steps:
Dashboards
in the left navigation paneNew
button on the top rightImport
option- Paste the contents of the grafana-dashboard.json file
- Name the dashboard
Crossplane Metrics
First install the AWS provider and the composition functions:
cd aws
kubectl apply -f functions.yaml
kubectl apply -f providers.yaml
Wait for the AWS providers and composition functions to become installed and healthy:
kubectl get pkg
Create credentials for the AWS provider to create resources in your AWS account:
AWS_PROFILE=default && echo -e "[default]\naws_access_key_id = $(aws configure get aws_access_key_id --profile $AWS_PROFILE)\naws_secret_access_key = $(aws configure get aws_secret_access_key --profile $AWS_PROFILE)" > aws-creds.txt
kubectl create secret generic aws-creds -n crossplane-system --from-file=credentials=./aws-creds.txt
kubectl apply -f aws-default-provider.yaml
Create the Crossplane CompositeResourceDefinition
(XRD
) and Composition
that define which resources and how they are created in AWS:
kubectl apply -f definition.yaml
kubectl apply -f composition.yaml
Now we can create our network resources in AWS! Feel free to customize the
values in claim.yaml
to your liking, e.g. to create more
or less network resources by changing the value of count: 5
:
kubectl apply -f claim.yaml
Using the Crossplane CLI, we can examine the state of our composed resources that our functions pipeline requested:
crossplane beta trace network.demo-kcl.crossplane.io/network-kcl
We should see count
number of VPC
and InternetGateway
resources and they
should be trending towards status Ready: true
.
With provider-aws
doing work to create resources in AWS, we should be able to
see some metrics about these resources now. Go to the Crossplane Metrics
dashboard in Grafana at http://localhost:8080/ and explore the metrics that are
being collected.
Let's make these metrics a little more interesting by creating some unhealthy
resources. We can do this by creating another claim that specifies poison: true
which will cause the underlying Composition
to create malformed
resources:
kubectl apply -f claim-poison.yaml
Check out the state of these unhealthy resources with the Crossplane CLI, we
should see them stuck in the Ready: false
status:
crossplane beta trace network.demo-kcl.crossplane.io/network-kcl-poison
The dashboard on http://localhost:8080 should update soon and show us metrics for all the resources, both healthy and unhealthy.
We could consider setting up alerts based on these metrics to notify us when resources are unhealthy or when they are taking too long to become ready. We now have all sorts of data to help us understand the health and activities of our control plane, such as:
- How many resources is this control plane managing? How many of them are ready
and synced?
crossplane_managed_resource_exists
crossplane_managed_resource_ready
crossplane_managed_resource_synced
- How long is it taking for each type of resource to be reconciled and to become
ready for the first time?
crossplane_managed_resource_first_time_to_reconcile_seconds
crossplane_managed_resource_first_time_to_readiness_seconds
- How long is it taking for various managed resource types to be deleted?
crossplane_managed_resource_deletion_seconds
- How long is it taking to discover that a resource is out of sync and needs to
be updated?
crossplane_managed_resource_drift_seconds
Clean up all the managed resources in this demo by deleting both claims:
kubectl delete -f claim.yaml
kubectl delete -f claim-poison.yaml
kubectl get managed
If metrics don't appear to be getting collected, you can connect directly to the provider pod to see if metrics are being generated correctly at the source:
kubectl -n crossplane-system port-forward "$(kubectl -n crossplane-system get pod -o name -l pkg.crossplane.io/provider=provider-aws-ec2)" 8081:8080
curl -s localhost:8081/metrics | grep crossplane_managed_resource_exists