-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Counter type metrics from statsdreceiver being dropped by prometheusexporter #7156
Changes from all commits
5b13254
54466fe
9bd73b5
b6f3693
21a84c8
ef488c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -165,8 +165,8 @@ func (a *lastValueAccumulator) accumulateGauge(metric pdata.Metric, il pdata.Ins | |||||||||||||||||||||
func (a *lastValueAccumulator) accumulateSum(metric pdata.Metric, il pdata.InstrumentationLibrary, now time.Time) (n int) { | ||||||||||||||||||||||
doubleSum := metric.Sum() | ||||||||||||||||||||||
|
||||||||||||||||||||||
// Drop metrics with non-cumulative aggregations | ||||||||||||||||||||||
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative { | ||||||||||||||||||||||
// Drop metrics with non-cumulative aggregations and is not monotonic increasing | ||||||||||||||||||||||
if doubleSum.AggregationTemporality() != pdata.MetricAggregationTemporalityCumulative && !metric.Sum().IsMonotonic() { | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this needs to be handled elsewhere. Sums with delta temporality need to be converted to gauges rather than counters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, thanks for taking a look at this PR. Isn't Gauge type handled by
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since I saw that method and I was looking at the base data model and map: So in Prometheus, counters are always monotonic and cumulative. But then I realized it's not a 100% mapped like that, some Sums could be converted back to Prometheus Gauge with convertSum()
So maybe we could go with this change below:
Suggested change
With:
Prometheus exporter is at the end of the aggregation (it would set the metric to MetricAggregationTemporalityCumulative in the same function anyway) , I don't think it would care about the time which the metric got reported? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds better, since the monotonic part will be used to check later on with convertSum(): opentelemetry-collector-contrib/exporter/prometheusexporter/collector.go Lines 120 to 126 in e52cfd0
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More inputs in #5714 (comment) from @jmacd , I'm reading through his suggestions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To summarize what I understand from this thread: When Prometheus receives a Delta temporality counter, there are two cases: (a) monotonic, (b) non-monotonic. For (a), the treatment can be applied inside Prometheus both easily and correctly. Since this module already keeps state on every metric, performing the delta-to-cumulative is simple. Just keep tallying the sum and it will be cumulative when Prometheus sees it. For (b), I will speculate that no one cares and "correct" is difficult to achieve. These are non-monotonic deltas, which statsd already has a different way to treat. If the user sends gauges prefixed by Continuing case (b), if the statsd user didn't want delta-to-gauge, then they're presumably sending you positive-and-negative increments. We have no information about reset time, so there are two correct ways to interpret this: (a) keep them as deltas, (b) convert them to rate gauges over some time period. Either way, Prometheus users will still have trouble working with this data, which is why I speculate that no one cares. The preceding paragraph lends support to the following claim: Statsdreceiver should by default assume counters are monotonic, because there's a mechanism in statsd for treating non-monotonic counters as gauges already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I understand what you're proposing with respect to monotonic, delta temporality counters and agree that we could likely handle the delta-to-cumulative conversion here since this exporter is already maintaining state. Since https://github.com/open-telemetry/opentelemetry-specification/pull/2266/files#diff-0efae13f08f98e62a81767d5daeff37ebb7ef8c50537c7b9013e72506e9b055aR1085 would specify a different treatment for non-cumulative, monotonic OTLP sum data points being converted to prometheus exposition, do we need to change that or create an exception for the collector, or is there some scoping of that (proposed specification that I'm missing that makes it irrelevant to the collector? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. I will comment in open-telemetry/opentelemetry-specification#2266 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Found the document https://opentelemetry.io/docs/reference/specification/metrics/datamodel/#sums-delta-to-cumulative that exactly aligned with what @jmacd said :D I believe it's 100% the right implementation. Let me see what I could do.
Thanks again for all the great inputs here folks 🪨 |
||||||||||||||||||||||
return | ||||||||||||||||||||||
} | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
statsdreceiver
? Looks like this change (#7156) is made on prometheus exporter only. Please update the lineThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, sorry for this one, I'm waiting to finalize the solution from the above discussion with @jmacd @Aneurysm9 so will update this later as well.