Separate scraping of cluster and node metrics #156

znerol · 2023-08-02T05:54:36Z

Problem: Scraping the `nodes` API path is inefficient

The prometheus pve exporter traditionally scraped cluster metrics (i.e., metrics available from the cluster API path). Those can be scraped efficiently from any node in the cluster.

However, some users wish to also scrape node metrics (i.e., metrics available from the nodes/{node} API path). Regrettably, it turned out that looping through nodes is inherently inefficient. This is because the API server connects to the target node for each resource which is requested through the API.

For example: Given a data center with 100 VMs and containers. Scraping the pve_onboot_status flag will result in 100 HTTP requests initiated from the API host during one scrape.

Also note, that if any cluster node goes down during that scrape, the metrics will be incomplete.

Due to this problem, no new PRs are accepted which attempt to loop through the nodes path.

Approach

Introduce two new query parameters in order to specify which metrics are scraped.

cluster
node

The cluster query parameter governs whether cluster metrics are reported in the scrape. It defaults to 1 (cluster metrics are scraped by default). The query string cluster=0 can be used in order to disable reporting of cluster metrics.

The node query parameter specifies whether node metrics are reported from the given node. It defaults to 1 (node metrics are scraped by default). The query string node=0 can be used in order to disable reporting of node metrics.

This approach would permit users to configure separate scrapes for node metrics (one for each node). Like this, collection of node metrics would be parallelized by prometheus automatically. Also it won't affect metrics of other nodes if one node is down.

Impact on existing deployments

The change will have the following impact on existing deployments:

Single node systems: No effect. Scrapes will report exactly the same set of metrics as before since both, the cluster and node url parameters are enabled by default.
Clusters operating with --no-collector.config: No effect. Even though both, the cluster and node url parameters are enabled by default, scrapes will still report exactly the same set of metrics as before since the config collector is disabled.
Clusters operating without --no-collector.config (or with --collector.config): Effect depends on the scraping strategy. Config metrics are only reported for those nodes being scraped. If all nodes are scraped in a cluster, then the upgrade to 3.0.0 has no effect on the set of metrics reported.

The text was updated successfully, but these errors were encountered:

znerol · 2023-08-03T06:18:29Z

Draft PR: #164

znerol · 2023-10-01T11:46:30Z

This is a potentially breaking change, needs to go into a new major release.

znerol · 2023-10-16T12:43:17Z

Merged initial PR #180

znerol · 2023-10-16T13:35:46Z

Released 3.0.0b1 right now. People interested in testing this can pull the beta using an explicit version tag. E.g.:

% podman run -it --rm prompve/prometheus-pve-exporter:3.0.0b1 --help

Or:

% python3 -m venv pve_exporter
% ./pve_exporter/bin/pip install "prometheus-pve-exporter==3.0.0b1"

znerol · 2023-11-13T15:22:44Z

This is available in any release >= 3.0.0.

znerol pinned this issue Aug 2, 2023

This was referenced Aug 2, 2023

Collect hard disk information #155

Closed

Collect io limits for ide, sata, scsi, virtio #113

Closed

Collect disk commitment info #133

Closed

znerol added this to the 3.0.0 milestone Oct 1, 2023

znerol mentioned this issue Oct 1, 2023

Scrape /nodes endpoint from current node only #164

Closed

znerol mentioned this issue Oct 16, 2023

Scrape /nodes endpoint from current node only #180

Merged

znerol unpinned this issue Nov 5, 2023

znerol closed this as completed Nov 13, 2023

znerol mentioned this issue Nov 13, 2023

Collect from current node only information about node's VMs/CTs/etc #54

Closed

znachtman mentioned this issue Jan 10, 2024

Node=1&Cluster=0 Returns no data #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate scraping of cluster and node metrics #156

Separate scraping of cluster and node metrics #156

znerol commented Aug 2, 2023 •

edited

Loading

znerol commented Aug 3, 2023

znerol commented Oct 1, 2023

znerol commented Oct 16, 2023

znerol commented Oct 16, 2023

znerol commented Nov 13, 2023

Separate scraping of cluster and node metrics #156

Separate scraping of cluster and node metrics #156

Comments

znerol commented Aug 2, 2023 • edited Loading

Problem: Scraping the nodes API path is inefficient

Approach

Impact on existing deployments

znerol commented Aug 3, 2023

znerol commented Oct 1, 2023

znerol commented Oct 16, 2023

znerol commented Oct 16, 2023

znerol commented Nov 13, 2023

znerol commented Aug 2, 2023 •

edited

Loading

Problem: Scraping the `nodes` API path is inefficient