Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ziti-controller: Add prometheus serviceMonitor #209

Conversation

mjtrangoni
Copy link
Contributor

Hi @qrkourier,

related to #86, I was missing the Prometheus serviceMonitor here.

FTR, some clarifications,

  1. the ServiceMonitor is default enabled but will be used only if the default disabled prometheus service is enabled as well. So there should be no CRD issue anywhere.
  2. I had to add an extra default label for the prometheus service, otherwise the serviceMonitor would match all OpenZiti services. Dont know if you agree with the naming.

Let me know if I should change something here.

Thank you!

Mario

Copy link
Member

@qrkourier qrkourier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great contribution, thanks! How best to inform the user? The ServiceMonitor resource will not obtain metrics unless input value fabric.events.enabled=true. The particular metrics available on the scrape target are defined by fabric.events.subscriptions. Will you mention this in the input values documentation or README or both to help the user to configure a functioning ServiceMonitor?

@mjtrangoni
Copy link
Contributor Author

Good point, will add some docs there

@mjtrangoni
Copy link
Contributor Author

@qrkourier PTAL, feel free to adjust the wording accordingly.

@qrkourier qrkourier changed the base branch from main to mjtrangoni-add-controller-servicemonitor June 4, 2024 15:38
@qrkourier qrkourier merged commit 4a58b0c into openziti:mjtrangoni-add-controller-servicemonitor Jun 4, 2024
3 checks passed
@qrkourier
Copy link
Member

@mjtrangoni I'll move the doc to charts/ziti-controller/README.md.gotmpl so that it will render as README.md. The comments in values.yaml like # -- will render in README.md as "Values Reference."

@mjtrangoni mjtrangoni deleted the add-controller-servicemonitor branch June 4, 2024 17:06
@qrkourier
Copy link
Member

@mjtrangoni What's an excellent way to test the SM resource you added? We could install the Prometheus-operated chart in the CI job and confirm that it is scraping some metrics from the target. In that job, the latest release is installed and upgraded to the current branch in Minikube. Also, I was experimenting with Grafana dashboards for the Prometheus data source with this dashboard.

@mjtrangoni
Copy link
Contributor Author

@qrkourier cool! Yes, kube-prometheus-stack would be the best way to install a Prometheus environment for scraping some metrics. After that a job running promql queries would do the job or also a bash script with some curl calls.

Can we have a release so that I can start testing it on my side?

@qrkourier
Copy link
Member

https://github.com/openziti/helm-charts/releases/tag/ziti-controller-1.0.3

@mjtrangoni
Copy link
Contributor Author

@qrkourier thanks for the release!
I have a small question about the TLS configuration, which we should provide by default and is currently missing on values.yaml file.

I've tried without success,

serviceMonitor:
  tlsConfig:
    ca:
      configMap:
        name: ziti-controller-ctrl-plane-cas
        key: ctrl-plane-cas.crt

Where should I find the TLS configuration otherwise? Of course, with insecureSkipVerify: true it is working correctly.

@qrkourier
Copy link
Member

My past self left a note about it in this SM manifest, but it was not solved: https://github.com/openziti-terraform-modules/terraform-lke-ziti/blob/ba0797cb68a96fc57d8c1669acc1824a8c7d8d40/plan-45-observability/main.tf#L246

I must have encountered the same issue you did. Perhaps the problem is SM must use a configmap, but only bundle is available, or not available in the monitoring namespace.

I think there's a way to propagate cacerts with feature flag for clusterbundle and certs alphav1 API group, but I'm unsure how to do it off hand.

@mjtrangoni
Copy link
Contributor Author

@qrkourier it seems that the bundle config is correct, but the SAN is failing because prometheus is calling an IP.

debug:/# curl --cacert bundle.crt  https://172.26.2.54:9090/metrics
curl: (60) SSL: no alternative certificate subject name matches target host name '172.26.2.54'
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

and you can check all the available SANs here,

curl -w '\n%{certs}\n'  https://172.26.2.54:9090/metrics

will check further if there is a solution for this.

@qrkourier
Copy link
Member

It is best if Prometheus can connect to the ClusterIP service by domain name instead of IP address. I assume the server cert is Ziti controller's prometheus scrape target binding on its web listener. That cert could be configured to present an IP SAN in addition to the DNS SANs by enhancing the controller chart's configmap template, but the problem is knowing the cluster IP in advance. This is why we have DNS...so we don't need to know the IP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants