You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When one node in the cluster is down, getting metrics from it fails with a http status code 595 (connection timeout).
This makes the whole /pve view raise a 500 error.
Dec 19 09:29:17 mucem pve_exporter[627383]: 192.168.100.29 - - [19/Dec/2022 09:29:17] "GET /pve?target=127.0.0.1 HTTP/1.1" 500 -
Dec 19 09:30:19 mucem pve_exporter[627383]: Exception thrown while rendering view
Dec 19 09:30:19 mucem pve_exporter[627383]: Traceback (most recent call last):
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 96, in view
Dec 19 09:30:19 mucem pve_exporter[627383]: return view_registry[endpoint](**params)
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 38, in on_pve
Dec 19 09:30:19 mucem pve_exporter[627383]: output = collect_pve(self._config[module], target, self._collectors)
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 332, in collect_pve
Dec 19 09:30:19 mucem pve_exporter[627383]: return generate_latest(registry)
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/prometheus_client/exposition.py", line 107, in generate_latest
Dec 19 09:30:19 mucem pve_exporter[627383]: for metric in registry.collect():
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/prometheus_client/registry.py", line 83, in collect
Dec 19 09:30:19 mucem pve_exporter[627383]: for metric in collector.collect():
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 289, in collect
Dec 19 09:30:19 mucem pve_exporter[627383]: for vmdata in self._pve.nodes(node['node']).qemu.get():
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 84, in get
Dec 19 09:30:19 mucem pve_exporter[627383]: return self(args)._request("GET", params=params)
Dec 19 09:30:19 mucem pve_exporter[627383]: File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 78, in _request
Dec 19 09:30:19 mucem pve_exporter[627383]: raise ResourceException("{0} {1}: {2}".format(resp.status_code, httplib.responses[resp.status_code],
Dec 19 09:30:19 mucem pve_exporter[627383]: KeyError: 595
The exporter should probably report this condition and keep going to report metrics instead of crashing
The text was updated successfully, but these errors were encountered:
Ah, thanks for the quick answer, and sorry for failing to find this existing issue. I can confirm using --no-collector.config indeed works around the issue.
When one node in the cluster is down, getting metrics from it fails with a http status code 595 (connection timeout).
This makes the whole /pve view raise a 500 error.
The exporter should probably report this condition and keep going to report metrics instead of crashing
The text was updated successfully, but these errors were encountered: