Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate scraping of cluster and node metrics #156

Closed
znerol opened this issue Aug 2, 2023 · 5 comments
Closed

Separate scraping of cluster and node metrics #156

znerol opened this issue Aug 2, 2023 · 5 comments
Milestone

Comments

@znerol
Copy link
Member

znerol commented Aug 2, 2023

Problem: Scraping the nodes API path is inefficient

The prometheus pve exporter traditionally scraped cluster metrics (i.e., metrics available from the cluster API path). Those can be scraped efficiently from any node in the cluster.

However, some users wish to also scrape node metrics (i.e., metrics available from the nodes/{node} API path). Regrettably, it turned out that looping through nodes is inherently inefficient. This is because the API server connects to the target node for each resource which is requested through the API.

For example: Given a data center with 100 VMs and containers. Scraping the pve_onboot_status flag will result in 100 HTTP requests initiated from the API host during one scrape.

Also note, that if any cluster node goes down during that scrape, the metrics will be incomplete.

Due to this problem, no new PRs are accepted which attempt to loop through the nodes path.

Approach

Introduce two new query parameters in order to specify which metrics are scraped.

  • cluster
  • node

The cluster query parameter governs whether cluster metrics are reported in the scrape. It defaults to 1 (cluster metrics are scraped by default). The query string cluster=0 can be used in order to disable reporting of cluster metrics.

The node query parameter specifies whether node metrics are reported from the given node. It defaults to 1 (node metrics are scraped by default). The query string node=0 can be used in order to disable reporting of node metrics.

This approach would permit users to configure separate scrapes for node metrics (one for each node). Like this, collection of node metrics would be parallelized by prometheus automatically. Also it won't affect metrics of other nodes if one node is down.

Impact on existing deployments

The change will have the following impact on existing deployments:

  • Single node systems: No effect. Scrapes will report exactly the same set of metrics as before since both, the cluster and node url parameters are enabled by default.
  • Clusters operating with --no-collector.config: No effect. Even though both, the cluster and node url parameters are enabled by default, scrapes will still report exactly the same set of metrics as before since the config collector is disabled.
  • Clusters operating without --no-collector.config (or with --collector.config): Effect depends on the scraping strategy. Config metrics are only reported for those nodes being scraped. If all nodes are scraped in a cluster, then the upgrade to 3.0.0 has no effect on the set of metrics reported.
@znerol
Copy link
Member Author

znerol commented Aug 3, 2023

Draft PR: #164

@znerol znerol added this to the 3.0.0 milestone Oct 1, 2023
@znerol
Copy link
Member Author

znerol commented Oct 1, 2023

This is a potentially breaking change, needs to go into a new major release.

@znerol
Copy link
Member Author

znerol commented Oct 16, 2023

Merged initial PR #180

@znerol
Copy link
Member Author

znerol commented Oct 16, 2023

Released 3.0.0b1 right now. People interested in testing this can pull the beta using an explicit version tag. E.g.:

% podman run -it --rm prompve/prometheus-pve-exporter:3.0.0b1 --help

Or:

% python3 -m venv pve_exporter
% ./pve_exporter/bin/pip install "prometheus-pve-exporter==3.0.0b1"

@znerol znerol unpinned this issue Nov 5, 2023
@znerol znerol closed this as completed Nov 13, 2023
@znerol
Copy link
Member Author

znerol commented Nov 13, 2023

This is available in any release >= 3.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant