Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explicitly add +inf bucket in withExemplarsMetric #1094

Merged
merged 4 commits into from
Aug 2, 2022

Conversation

arun-shopify
Copy link
Contributor

@arun-shopify arun-shopify commented Jul 21, 2022

Problem:

Currently if there are exemplar values that are outside the maximum bucket bounds for a histogram metric, client_go lang will panic https://github.com/prometheus/client_golang/blob/main/prometheus/metric.go#L186-L189

When using this client in OpenTelemetry Collector, this panic leads to a crash and therefore we are unable to use client_golang in prometheus exporter in Open Telemetry Collector. There is a discussion about this issue in our PR in that repository.

Proposed solution:

In this PR we are proposing a solution to this issue by explicitly adding +inf bucket when there are exemplar values outside the max bucket bound and picking one of the exemplars if there are more than one in that range.

@bwplotka @kakkoyun

@arun-shopify arun-shopify force-pushed the add_inf_bucket_exemplar branch from c3fe743 to 2d3b601 Compare July 21, 2022 13:41
@kakkoyun kakkoyun requested a review from bwplotka July 21, 2022 14:18
@arun-shopify
Copy link
Contributor Author

@bwplotka @kakkoyun bumping this PR for a review :)

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Some comments, but generally I love this idea - thanks for troubleshooting this 💪🏽

prometheus/metric_test.go Outdated Show resolved Hide resolved
prometheus/metric_test.go Outdated Show resolved Hide resolved
prometheus/metric_test.go Show resolved Hide resolved
prometheus/metric.go Outdated Show resolved Hide resolved
Comment on lines 194 to 196
break
// end looping after creating +inf bucket and adding one exemplar.
// there could be other exemplars that are in the "inf" range but those will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want those comments, let's make them full sentence, but also I am not sure this is safe. Nowhere in the interface/signature we mention that exemplars will be sorted by anything. I think there is no harm to continue the loop, unless we want to optimize this some day. WDYT?

Suggested change
break
// end looping after creating +inf bucket and adding one exemplar.
// there could be other exemplars that are in the "inf" range but those will be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right that there is no mention (or expectation) of sorting exemplars. If we leave the loop running we pick the last exemplar and if we leave it as is and terminate it here, we pick the first exemplar (in that range) - sort order of the exemplar value being arbitrary. I would consider terminating the loop as it would just avoid running through the remaining exemplars that we are not going to use anyway. But please let me know if you think otherwise or see any safety concerns, I can make the change.

Also, I made the comment into a sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not addressed completely. What do you about lack of sorted order invariant?

Nowhere in the interface/signature we mention that exemplars will be sorted by anything. I think there is no harm to continue the loop, unless we want to optimize this some day. WDYT?

Copy link
Contributor Author

@arun-shopify arun-shopify Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the buckets we are comparing the exemplar values to the bucket bound and matching the right exemplar to right bucket (this is the exiting logic):

specifically the check here to get the index of the right bucket for that exemplar:

return pb.Histogram.Bucket[i].GetUpperBound() >= e.GetValue()

And assigning the exemplar to the right bucket here:

pb.Histogram.Bucket[i].Exemplar = e

At a high level the only change in this PR is that instead of a panic in the else condition, we add the +Inf bucket and add one exemplar that is outside of all previous bucket range and break the loop.

If there are multiple exemplars for the +Inf bucket, we could pick the exemplar that is more representative of the group such as a median - a future improvement, would require further discussion. Currently we are just picking the first in the array in the +inf bucket range.

I hope that addresses your concern, if I understood your question correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwplotka waiting for your response :) Would be nice if we could include this in the upcoming release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this discussion, sorry.

So I like the idea about median or tuning the exemplar we take from those belonging to +Inf. My only problem is that those inputs can be not sorted, that's it. Hope this PR #1100 makes sense to you.

Exemplar: e,
}
pb.Histogram.Bucket = append(pb.Histogram.Bucket, b)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this, but I think one thing is still not addressed: #1094 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @arun-shopify - otherwise this PR is rdy to merge (:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix that for you if you don't mind - I am preparing next release

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging and will fix in separate PR.

Copy link
Contributor Author

@arun-shopify arun-shopify Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, also response in #1094 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants