Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not report any metrics when one node in the cluster is down #127

Closed
olasd opened this issue Dec 19, 2022 · 3 comments
Closed

Does not report any metrics when one node in the cluster is down #127

olasd opened this issue Dec 19, 2022 · 3 comments

Comments

@olasd
Copy link

olasd commented Dec 19, 2022

When one node in the cluster is down, getting metrics from it fails with a http status code 595 (connection timeout).

This makes the whole /pve view raise a 500 error.

Dec 19 09:29:17 mucem pve_exporter[627383]: 192.168.100.29 - - [19/Dec/2022 09:29:17] "GET /pve?target=127.0.0.1 HTTP/1.1" 500 -
Dec 19 09:30:19 mucem pve_exporter[627383]: Exception thrown while rendering view
Dec 19 09:30:19 mucem pve_exporter[627383]: Traceback (most recent call last):
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 96, in view
Dec 19 09:30:19 mucem pve_exporter[627383]:     return view_registry[endpoint](**params)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 38, in on_pve
Dec 19 09:30:19 mucem pve_exporter[627383]:     output = collect_pve(self._config[module], target, self._collectors)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 332, in collect_pve
Dec 19 09:30:19 mucem pve_exporter[627383]:     return generate_latest(registry)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/prometheus_client/exposition.py", line 107, in generate_latest
Dec 19 09:30:19 mucem pve_exporter[627383]:     for metric in registry.collect():
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/prometheus_client/registry.py", line 83, in collect
Dec 19 09:30:19 mucem pve_exporter[627383]:     for metric in collector.collect():
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 289, in collect
Dec 19 09:30:19 mucem pve_exporter[627383]:     for vmdata in self._pve.nodes(node['node']).qemu.get():
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 84, in get
Dec 19 09:30:19 mucem pve_exporter[627383]:     return self(args)._request("GET", params=params)
Dec 19 09:30:19 mucem pve_exporter[627383]:   File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 78, in _request
Dec 19 09:30:19 mucem pve_exporter[627383]:     raise ResourceException("{0} {1}: {2}".format(resp.status_code, httplib.responses[resp.status_code],
Dec 19 09:30:19 mucem pve_exporter[627383]: KeyError: 595

The exporter should probably report this condition and keep going to report metrics instead of crashing

@znerol
Copy link
Member

znerol commented Dec 19, 2022

This is a known issue (#30). Please disable the config collector using the --no-collector.config flag in order to work around this problem.

@znerol znerol closed this as completed Dec 19, 2022
@olasd
Copy link
Author

olasd commented Dec 19, 2022

Ah, thanks for the quick answer, and sorry for failing to find this existing issue. I can confirm using --no-collector.config indeed works around the issue.

@znerol
Copy link
Member

znerol commented Dec 19, 2022

No worries. I probably should push a new release at some point with the config collector disabled by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants