-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service monitor #501
Service monitor #501
Conversation
tie servicemonitor with service with a label, allow custom labels for servicemonitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Declutter FTW.
- containerPort: 9932 | ||
name: metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{- if .Values.serviceMonitor.enabled }}
?
- port: 9732 | ||
name: p2p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why wasn't this needed before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics endpoint needs to be here so the servicemonitor can pick up the metrics. While debugging this, I noticed there was no endpoint associated with the p2p port either, so I added it. It never mattered, because:
- internally we target pods directly
- when we want external p2p access we deploy a different service anyway, of type
LoadBalancer
I guess this service was pretty useless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service is a headless service. K8s then allows you to target each pod individually instead of creating a clusterIP that would load balance across the pods.
I wonder because ports are now defined, rpc port isn't, so would that break? i'm not sure i get why we need to set ports here. Being we can anyways target the pod and use the port number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried mkchain with 2 bakers and 1 node, and nodes are peering with each other and everything is working. So, it didn't break. I'm not sure why p2p is here though. Maybe safe to remove.
RPC is served from a different service.
It makes sense to have a p2p service targeting several pods, because it's fine to load balance the p2p connections. But, I did forget that the purpose of this (headless) service was to be able to target pods individually.
Knowing that, maybe it would have been better to leave it alone, and set the metrics port on the rpc service.
I'm pretty sure I did this for a reason. Probably the service monitor would not work if the port was not named on the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an example where we target a specific pod rpc endpoints is indexers:
tezos-k8s/charts/tezos/values.yaml
Line 449 in cb8eed6
# rpc_url: http://archive-node-0.archive-node:8732 |
So we need to know that we can still target any port we want. If we can, i don't know why we need to specify any ports now. 9732, and the metrics port. As long as the sts pod containers are exposing the port.
It appears to me the tezos-service-monitor is load balancing across all nodes. Does this allow us to monitor each node individually instead of conflating each nodes' metric data together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so every 15s it sends a request to /metrics and each pod will eventuallly be hit. What if there are many many pods? They won't have their metrics updated for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need to know that we can still target any port we want. If we can, i don't know why we need to specify any ports now. 9732, and the metrics port. As long as the sts pod containers are exposing the port.
once we deploy this release, we can prob test rpc of a specific node just by sshing into another one. That is prob sufficient. And then we can also remove the ports from the headless svc later just so it isn't confusing why we are exposing some ports on the svc and not others. not a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's what is going on. I am aware of "pod monitor" but I've never used it, ServiceMonitor is what I have been using.
See these heavily thumbed-up comments: prometheus-operator/prometheus-operator#3119 (comment)
@harryttd answering all of your remaining comments at once: I have enabled metrics port by default on tezos-k8s, by choice. I have also ensured that octez default config in tezos-k8s is exposing its prometheus port. It's better this way, than making it dependent of So then, we would have A future |
Add a service monitor to target the new metrics endpoint of v14.
Also remove the old "metrics" container since it is deprecated and replaced by the native endpoint.
Also add a serviceMonitor for convenience. We already have a service monitor in the pyrometer chart, I'm adding a similar one here. This helps the user not writing their own monitor, and saves a step in the upcoming guide.