Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add separate history for expired failed probe results #517

Merged
merged 6 commits into from
Sep 10, 2019

Conversation

jutley
Copy link
Contributor

@jutley jutley commented Sep 4, 2019

This PR adds a new section to the history to hold onto failed probes after they expire from the main history. This allows us to check what debug logs for failed probes in environments with a high rate of successful probes.

I implemented this to be strictly additive for the sake of backwards compatibility. The Web UI has a separate section for these preserved failures.

Resolves #350

Signed-off-by: Jake Utley <[email protected]>
Signed-off-by: Jake Utley <[email protected]>
@jutley jutley force-pushed the add-preserved-history branch from 2cfe522 to eae7c47 Compare September 4, 2019 20:54
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards compatibility isn't a concern here, as this is all UI.

Per my comments in #383, I think one combined table would be best.

main.go Outdated
timeoutOffset = kingpin.Flag("timeout-offset", "Offset to subtract from timeout in seconds.").Default("0.5").Float64()
configCheck = kingpin.Flag("config.check", "If true validate the config file and then exit.").Default().Bool()
historyLimit = kingpin.Flag("history.limit", "The maximum amount of items to keep in the history.").Default("100").Uint()
historyPreservedFailedLimit = kingpin.Flag("history.preserved-failed-limit", "The maximum amount of failed items to preserve after expiration.").Default("5").Uint()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an extra setting here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the extra flag it's less obvious to me how this should be implemented. Should the history.limit flag be the cardinality of standard history and the cardinality of preserved failed history? Should that flag be the cardinality of the combined history? Should the preserved failed history be a constant size separate from the history.limit flag?

I'm open to anything, though I'll admit that without the extra flag, the semantics feel confusing to me: "I set the history.limit flag to 10, why do I have 20 items?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have it be the limit of each.

@jutley jutley force-pushed the add-preserved-history branch from 6f24b6a to 2dc250f Compare September 5, 2019 21:38
@jutley jutley force-pushed the add-preserved-history branch from 1d25538 to c164d6b Compare September 5, 2019 21:43
mu sync.Mutex
nextId int64
results []*result
preservedFailedResults []*result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failedResults is enough

@@ -59,14 +68,19 @@ func (rh *resultHistory) List() []*result {
rh.mu.Lock()
defer rh.mu.Unlock()

return rh.results[:]
return append(rh.preservedFailedResults[:], rh.results...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to de-dupe here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true, actually. Maybe there is confusion about how I implemented this. I am adding results to the preservedFailedResults slice only after they expire from the main results slice. As a result, no result will be in both at the same time.

If we put failed results into this slice immediately, then I think it would make more sense to have one list for successes and one list for failures, then merge the two on List.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're doing now. A brief comment would help.

@jutley jutley force-pushed the add-preserved-history branch from 9e5093d to 6459e28 Compare September 9, 2019 23:32
Signed-off-by: Jake Utley <[email protected]>
@brian-brazil brian-brazil merged commit d0c9b46 into prometheus:master Sep 10, 2019
@brian-brazil
Copy link
Contributor

Thanks!

@tibuski
Copy link

tibuski commented Jan 22, 2021

Hello,

Sorry, I feel a bit stupid but I can't find where I can access this separate history ...

@jutley
Copy link
Contributor Author

jutley commented Jan 23, 2021

@lbrichet It is below the main history section in the blackbox-exporter web ui.

@tibuski
Copy link

tibuski commented Jan 23, 2021

Doh ... thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

be able to persist failed probe history longer
3 participants