Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service monitor #501

Merged
merged 6 commits into from
Nov 2, 2022
Merged

Service monitor #501

merged 6 commits into from
Nov 2, 2022

Conversation

nicolasochem
Copy link
Contributor

@nicolasochem nicolasochem commented Oct 28, 2022

Add a service monitor to target the new metrics endpoint of v14.

Also remove the old "metrics" container since it is deprecated and replaced by the native endpoint.

Also add a serviceMonitor for convenience. We already have a service monitor in the pyrometer chart, I'm adding a similar one here. This helps the user not writing their own monitor, and saves a step in the upcoming guide.

@nicolasochem nicolasochem marked this pull request as ready for review October 31, 2022 18:24
Copy link
Contributor

@orcutt989 orcutt989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Declutter FTW.

Comment on lines +161 to +162
- containerPort: 9932
name: metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{- if .Values.serviceMonitor.enabled }} ?

Comment on lines +25 to +26
- port: 9732
name: p2p
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why wasn't this needed before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics endpoint needs to be here so the servicemonitor can pick up the metrics. While debugging this, I noticed there was no endpoint associated with the p2p port either, so I added it. It never mattered, because:

  • internally we target pods directly
  • when we want external p2p access we deploy a different service anyway, of type LoadBalancer

I guess this service was pretty useless?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service is a headless service. K8s then allows you to target each pod individually instead of creating a clusterIP that would load balance across the pods.

I wonder because ports are now defined, rpc port isn't, so would that break? i'm not sure i get why we need to set ports here. Being we can anyways target the pod and use the port number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried mkchain with 2 bakers and 1 node, and nodes are peering with each other and everything is working. So, it didn't break. I'm not sure why p2p is here though. Maybe safe to remove.

RPC is served from a different service.

It makes sense to have a p2p service targeting several pods, because it's fine to load balance the p2p connections. But, I did forget that the purpose of this (headless) service was to be able to target pods individually.

Knowing that, maybe it would have been better to leave it alone, and set the metrics port on the rpc service.

I'm pretty sure I did this for a reason. Probably the service monitor would not work if the port was not named on the service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an example where we target a specific pod rpc endpoints is indexers:

# rpc_url: http://archive-node-0.archive-node:8732

So we need to know that we can still target any port we want. If we can, i don't know why we need to specify any ports now. 9732, and the metrics port. As long as the sts pod containers are exposing the port.

It appears to me the tezos-service-monitor is load balancing across all nodes. Does this allow us to monitor each node individually instead of conflating each nodes' metric data together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2022-10-28 17-38-58

it's labeling per pods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so every 15s it sends a request to /metrics and each pod will eventuallly be hit. What if there are many many pods? They won't have their metrics updated for a while.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to know that we can still target any port we want. If we can, i don't know why we need to specify any ports now. 9732, and the metrics port. As long as the sts pod containers are exposing the port.

once we deploy this release, we can prob test rpc of a specific node just by sshing into another one. That is prob sufficient. And then we can also remove the ports from the headless svc later just so it isn't confusing why we are exposing some ports on the svc and not others. not a big deal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's what is going on. I am aware of "pod monitor" but I've never used it, ServiceMonitor is what I have been using.

See these heavily thumbed-up comments: prometheus-operator/prometheus-operator#3119 (comment)

charts/tezos/templates/static.yaml Show resolved Hide resolved
charts/tezos/values.yaml Show resolved Hide resolved
charts/tezos/templates/_containers.tpl Show resolved Hide resolved
@nicolasochem
Copy link
Contributor Author

nicolasochem commented Nov 1, 2022

@harryttd answering all of your remaining comments at once:

I have enabled metrics port by default on tezos-k8s, by choice.

I have also ensured that octez default config in tezos-k8s is exposing its prometheus port.

It's better this way, than making it dependent of ServiceMonitor being enabled in the values: you could want to scrape this ports with something else than the prometheus operator (for example a manually provisioned prometheus): then you would need no service monitor but you would still needs the services to be aware of these ports.

So then, we would have metricsExposed: true separate from serviceMonitor: enabled: true but I feel that it's simpler to have it enabled all the time. It's not a security concern (it stays within the cluster and you can't kill the node with such metrics requests, in theory)

A future NetworkPolicy in our chart will tighten security appropriately, I think.

@nicolasochem nicolasochem merged commit 6bde524 into master Nov 2, 2022
@harryttd harryttd deleted the serviceMonitor branch March 8, 2023 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants