-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] probe (module scoped) failure reason metric #1077
Comments
Relates to this: #1062 |
Signed-off-by: Marco Cadetg <[email protected]>
Hi, I'm a bit confused as to why this is marked That being said, I'd be very happy to see a feature like this in blackbox exporter and am thankful for the work you put into it! |
I guess because that is the default if you simply hit the "close" button. :) Maybe the actual meaning was "I won't have time to work on this anymore" or "I never got feedback from the maintainers, so I gave up." |
Yes, sorry this is what I did.
It was the second and therefore I thought that it may just not be interesting or needed by anyone. I guess if that is not the case it's still possible to reopen it. |
Thanks for reopening! I think this feature would be great to have and would significantly improve the service my team can provide. One minor change i would propose is to call the resulting metric just @roidelapluie @mem @electron0zero Would you be interested in principle to accept contributions for such a feature? (feeling a bit bad to ping all the maintainers, but that was suggested on the prometheus-developers mailing list, so I hope its ok) |
I think this would be a useful metric. There was a request on slack about having a This implementation may also be useful |
Minor nit, I would probably call it |
I did a draft implementation of this in #1334 and would appreciate some Feedback about whether you think the general approach taken there is viable. |
probe (module scoped) failure reason metric
Current Situation
Currently, the
blackbox_exporter
provides some general metrics such as probe_success and probe_duration_seconds that apply universally to all modules. Additionally, specific modules like the http module / prober offer their own metrics like probe_http_status_code, which help monitor the availability and performance of http endpoints. However, when a probe fails, it can be challenging to pinpoint the exact cause of the failure without manually inspectingblackbox_exporter
logs or attempting to reproduce the error, if possible.Proposal
To address this issue, we propose the addition of a new metric called probe_($MODULE)_failure_reason to the
blackbox_exporter
. This metric would provide more detailed information about the reasons behind probe failures. It would include a label named "reason" with descriptive and enumerable values such as "dns-resolution-error," "http-timeout," or "ssl-certificate-validation-failed," among others. Currently, these failures can only be inferred from the logged errors.Benefits
The introduction of the probe_($MODULE)_failure_reason metric would significantly enhance troubleshooting capabilities. In most cases users would be able to identify the root cause of a probe failure without the need for manual log inspection or additional testing. Moreover, this new metric would facilitate the setup of alerts and notifications tailored to specific failure scenarios.
Contribution
We believe that incorporating the probe_($MODULE)_failure_reason metric would be a valuable enhancement for the
blackbox_exporter
, improving its usability and effectiveness. We would be happy to contribute to the development of this feature and provide feedback on its implementation.Thank you for considering our proposal. If this is something that would be ok to go forward with we’d love to contribute the functionality to
blackbox_exporter
.The text was updated successfully, but these errors were encountered: