-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ziti-controller: Add prometheus serviceMonitor #209
ziti-controller: Add prometheus serviceMonitor #209
Conversation
Signed-off-by: Mario Trangoni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great contribution, thanks! How best to inform the user? The ServiceMonitor resource will not obtain metrics unless input value fabric.events.enabled=true
. The particular metrics available on the scrape target are defined by fabric.events.subscriptions
. Will you mention this in the input values documentation or README or both to help the user to configure a functioning ServiceMonitor?
Good point, will add some docs there |
Signed-off-by: Mario Trangoni <[email protected]>
@qrkourier PTAL, feel free to adjust the wording accordingly. |
4a58b0c
into
openziti:mjtrangoni-add-controller-servicemonitor
@mjtrangoni I'll move the doc to |
@mjtrangoni What's an excellent way to test the SM resource you added? We could install the Prometheus-operated chart in the CI job and confirm that it is scraping some metrics from the target. In that job, the latest release is installed and upgraded to the current branch in Minikube. Also, I was experimenting with Grafana dashboards for the Prometheus data source with this dashboard. |
@qrkourier cool! Yes, kube-prometheus-stack would be the best way to install a Prometheus environment for scraping some metrics. After that a job running promql queries would do the job or also a bash script with some curl calls. Can we have a release so that I can start testing it on my side? |
@qrkourier thanks for the release! I've tried without success, serviceMonitor:
tlsConfig:
ca:
configMap:
name: ziti-controller-ctrl-plane-cas
key: ctrl-plane-cas.crt Where should I find the TLS configuration otherwise? Of course, with |
My past self left a note about it in this SM manifest, but it was not solved: https://github.com/openziti-terraform-modules/terraform-lke-ziti/blob/ba0797cb68a96fc57d8c1669acc1824a8c7d8d40/plan-45-observability/main.tf#L246 I must have encountered the same issue you did. Perhaps the problem is SM must use a configmap, but only bundle is available, or not available in the monitoring namespace. I think there's a way to propagate cacerts with feature flag for clusterbundle and certs alphav1 API group, but I'm unsure how to do it off hand. |
@qrkourier it seems that the bundle config is correct, but the SAN is failing because prometheus is calling an IP. debug:/# curl --cacert bundle.crt https://172.26.2.54:9090/metrics
curl: (60) SSL: no alternative certificate subject name matches target host name '172.26.2.54'
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above. and you can check all the available SANs here, curl -w '\n%{certs}\n' https://172.26.2.54:9090/metrics will check further if there is a solution for this. |
It is best if Prometheus can connect to the ClusterIP service by domain name instead of IP address. I assume the server cert is Ziti controller's prometheus scrape target binding on its web listener. That cert could be configured to present an IP SAN in addition to the DNS SANs by enhancing the controller chart's configmap template, but the problem is knowing the cluster IP in advance. This is why we have DNS...so we don't need to know the IP. |
Hi @qrkourier,
related to #86, I was missing the Prometheus serviceMonitor here.
FTR, some clarifications,
Let me know if I should change something here.
Thank you!
Mario