Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

be able to persist failed probe history longer #350

Closed
QingsongYao opened this issue Aug 23, 2018 · 8 comments · Fixed by #517
Closed

be able to persist failed probe history longer #350

QingsongYao opened this issue Aug 23, 2018 · 8 comments · Fixed by #517

Comments

@QingsongYao
Copy link

sometime, we only care about the failed probe logs, however since we many succeed logs, which cause the failed log was removed due to history.limit value.

@brian-brazil
Copy link
Contributor

The history is only intended for low volume recent debugging. If you want something more medium to long term, you should enable debug logging.

@ktosiek
Copy link

ktosiek commented Oct 3, 2018

I'm getting ~800kb/minute from debug logging on a pretty small instance (36 targets, 5s scrape interval).

I think having a log level that only shows basic info for failed probes would be desirable - I want to know which step failed (name resolution, connection, response, content) and maybe what IP was resolved, but I don't need much more. Would you consider a PR that changes results logging in such way?

@brian-brazil
Copy link
Contributor

Info-level logging is only for blackbox-exporter level issues, which a probe failure is not, while debug logging is for everything so I'm not seeing the scope for that.

@QingsongYao
Copy link
Author

I still feel that the failed probe record need to persist longer. it is not related to logging, but recent probe html page which is super help to debug what is wrong.

@brian-brazil
Copy link
Contributor

You can always change history.limit

@QingsongYao
Copy link
Author

when probing with large amount of urls, the history will be quickly override.
checking logs is not an option since log might not be available for end user, and it is hard to search jobs. It is a real scenario since user want to know why probe fail and history is the only option for them.

@jutley
Copy link
Contributor

jutley commented Sep 4, 2019

I'd like to see this feature be added. I'm running into the exact issue that @QingsongYao is describing. One of our probes has occasional failures that I would like to debug. The last instance of this was 1.5 hours ago. We currently have about 200 probes a minute, so to be able to keep this history, we'd need to keep the last 18,000 scrapes, of which 17,995 are uninteresting.

Similar to how Kubernetes CronJobs have a different limit for successes and failures, I think the blackbox-exporter should support a similar feature.

@brian-brazil
Copy link
Contributor

Feel free to send a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants