-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate scraping of cluster and node metrics #156
Milestone
Comments
This was referenced Aug 2, 2023
Draft PR: #164 |
This is a potentially breaking change, needs to go into a new major release. |
Merged initial PR #180 |
Released 3.0.0b1 right now. People interested in testing this can pull the beta using an explicit version tag. E.g.:
Or:
|
This is available in any release >= 3.0.0. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem: Scraping the
nodes
API path is inefficientThe prometheus pve exporter traditionally scraped
cluster
metrics (i.e., metrics available from the cluster API path). Those can be scraped efficiently from any node in the cluster.However, some users wish to also scrape
node
metrics (i.e., metrics available from the nodes/{node} API path). Regrettably, it turned out that looping throughnodes
is inherently inefficient. This is because the API server connects to the target node for each resource which is requested through the API.For example: Given a data center with 100 VMs and containers. Scraping the
pve_onboot_status
flag will result in 100 HTTP requests initiated from the API host during one scrape.Also note, that if any cluster node goes down during that scrape, the metrics will be incomplete.
Due to this problem, no new PRs are accepted which attempt to loop through the
nodes
path.Approach
Introduce two new query parameters in order to specify which metrics are scraped.
The
cluster
query parameter governs whethercluster
metrics are reported in the scrape. It defaults to1
(cluster metrics are scraped by default). The query stringcluster=0
can be used in order to disable reporting of cluster metrics.The
node
query parameter specifies whethernode
metrics are reported from the given node. It defaults to1
(node metrics are scraped by default). The query stringnode=0
can be used in order to disable reporting of node metrics.This approach would permit users to configure separate scrapes for node metrics (one for each node). Like this, collection of node metrics would be parallelized by prometheus automatically. Also it won't affect metrics of other nodes if one node is down.
Impact on existing deployments
The change will have the following impact on existing deployments:
cluster
andnode
url parameters are enabled by default.--no-collector.config
: No effect. Even though both, thecluster
andnode
url parameters are enabled by default, scrapes will still report exactly the same set of metrics as before since theconfig
collector is disabled.--no-collector.config
(or with--collector.config
): Effect depends on the scraping strategy. Config metrics are only reported for those nodes being scraped. If all nodes are scraped in a cluster, then the upgrade to3.0.0
has no effect on the set of metrics reported.The text was updated successfully, but these errors were encountered: