You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While rolling out Kubernetes scaling for Presto we weren't thrilled about some aspects or weren't able to find information in the docs to make the below suggestions moot.
These are the notes I made while we went through the experience.
Metrics be exposed as rest endpoints.
We'd rather avoid running JMX exporter for prometheus and having ports open for the JMX. With rest endpoints, it is straight forward to funnel everything through an ingress controller and force authentication, logging, and auditing.
Coordinator offers a single endpoint to pull in all node metrics
We'd rather be able to send a request to the coordinator and have it handle pulling in the individual worker metrics than us building tooling around asking k8s for all existing workers and making the individual requests out.
It appears someone else in the community asked a similar feature, here.
Cluster level metrics vs node level metrics
There are times we're interested in monitoring individual health of each node but also metrics that would be across the cluster. Cluster level metrics like total memory in use across all queries, total queued queries, etc. When using JMX it is very confusing when pointing at a worker node if it is reporting the stats of what it is executing or if they're the stats across the whole cluster or if it has asked the coordinator, etc.
The text was updated successfully, but these errors were encountered:
While rolling out Kubernetes scaling for Presto we weren't thrilled about some aspects or weren't able to find information in the docs to make the below suggestions moot.
These are the notes I made while we went through the experience.
Metrics be exposed as rest endpoints.
We'd rather avoid running JMX exporter for prometheus and having ports open for the JMX. With rest endpoints, it is straight forward to funnel everything through an ingress controller and force authentication, logging, and auditing.
Coordinator offers a single endpoint to pull in all node metrics
We'd rather be able to send a request to the coordinator and have it handle pulling in the individual worker metrics than us building tooling around asking k8s for all existing workers and making the individual requests out.
It appears someone else in the community asked a similar feature, here.
Cluster level metrics vs node level metrics
There are times we're interested in monitoring individual health of each node but also metrics that would be across the cluster. Cluster level metrics like total memory in use across all queries, total queued queries, etc. When using JMX it is very confusing when pointing at a worker node if it is reporting the stats of what it is executing or if they're the stats across the whole cluster or if it has asked the coordinator, etc.
The text was updated successfully, but these errors were encountered: