Does not report any metrics when one node in the cluster is down #127

olasd · 2022-12-19T09:31:46Z

When one node in the cluster is down, getting metrics from it fails with a http status code 595 (connection timeout).

This makes the whole /pve view raise a 500 error.

Dec 19 09:29:17 mucem pve_exporter[627383]: 192.168.100.29 - - [19/Dec/2022 09:29:17] "GET /pve?target=127.0.0.1 HTTP/1.1" 500 -
Dec 19 09:30:19 mucem pve_exporter[627383]: Exception thrown while rendering view
Dec 19 09:30:19 mucem pve_exporter[627383]: Traceback (most recent call last):
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 96, in view
Dec 19 09:30:19 mucem pve_exporter[627383]:     return view_registry[endpoint](**params)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 38, in on_pve
Dec 19 09:30:19 mucem pve_exporter[627383]:     output = collect_pve(self._config[module], target, self._collectors)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 332, in collect_pve
Dec 19 09:30:19 mucem pve_exporter[627383]:     return generate_latest(registry)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/prometheus_client/exposition.py", line 107, in generate_latest
Dec 19 09:30:19 mucem pve_exporter[627383]:     for metric in registry.collect():
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/prometheus_client/registry.py", line 83, in collect
Dec 19 09:30:19 mucem pve_exporter[627383]:     for metric in collector.collect():
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 289, in collect
Dec 19 09:30:19 mucem pve_exporter[627383]:     for vmdata in self._pve.nodes(node['node']).qemu.get():
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 84, in get
Dec 19 09:30:19 mucem pve_exporter[627383]:     return self(args)._request("GET", params=params)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 78, in _request
Dec 19 09:30:19 mucem pve_exporter[627383]:     raise ResourceException("{0} {1}: {2}".format(resp.status_code, httplib.responses[resp.status_code],
Dec 19 09:30:19 mucem pve_exporter[627383]: KeyError: 595

The exporter should probably report this condition and keep going to report metrics instead of crashing

The text was updated successfully, but these errors were encountered:

znerol · 2022-12-19T09:43:11Z

This is a known issue (#30). Please disable the config collector using the --no-collector.config flag in order to work around this problem.

olasd · 2022-12-19T09:45:24Z

Ah, thanks for the quick answer, and sorry for failing to find this existing issue. I can confirm using --no-collector.config indeed works around the issue.

znerol · 2022-12-19T09:48:19Z

No worries. I probably should push a new release at some point with the config collector disabled by default.

znerol closed this as completed Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does not report any metrics when one node in the cluster is down #127

Does not report any metrics when one node in the cluster is down #127

olasd commented Dec 19, 2022

znerol commented Dec 19, 2022

olasd commented Dec 19, 2022

znerol commented Dec 19, 2022

Does not report any metrics when one node in the cluster is down #127

Does not report any metrics when one node in the cluster is down #127

Comments

olasd commented Dec 19, 2022

znerol commented Dec 19, 2022

olasd commented Dec 19, 2022

znerol commented Dec 19, 2022