Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize histograms, summaries, and exponential histograms #360

Merged
merged 6 commits into from
Apr 26, 2022

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented Apr 18, 2022

Background

Normalization is done for cumulative streams of metrics. For example, if we have a counter that counts the number of seconds since 1970, we wouldn't want to report that number as having happened since we started reading it, since that would produce an enormous rate (>1 billion over 10 seconds). Instead, the first time we see a metric, we record its current value. Then, each time we see the metric again, we "subtract" the original value from the new one. That is trivial for a counter, and we already did that in #323. It is harder for other data types (how do you subtract a histogram from another histogram?).

Changes

This PR extends tracking for the first time we saw a metric by extending the data point cache for other types. It then copies logic for setting the start time for intervals from the sum implementation. Finally, it implements "subtraction" for each of the remaining types: summaries, histograms, and exponential histograms.

For histograms, just subtract the previous sum, previous count, and subtract each bucket from the previous point.

For summaries, leave the quantiles as they are (treat them like gauges passing through), and just subtract previous sum and previous count.

For exponential histograms, subtract the previous sum, previous count. Also subtract each bucket from the previous point, taking the offset into account.

@dashpole dashpole force-pushed the normalize_histograms branch from cacfe14 to 115fe6b Compare April 18, 2022 21:17
@@ -495,7 +631,7 @@ func TestExponentialHistogramPointToTimeSeries(t *testing.T) {
// Add a second point with no value
hist.DataPoints().AppendEmpty().SetFlags(pmetric.MetricDataPointFlags(pmetric.MetricDataPointFlagNoRecordedValue))

tsl := mapper.exponentialHistogramToTimeSeries(mr, labels{}, metric, hist, point)
tsl := mapper.metricToTimeSeries(mr, labels{}, metric)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this test was broken, in that it wasn't checking the second point with the "NoRecordedValue" flag at all. This change fixes it.

@dashpole dashpole requested review from damemi and aabmass April 18, 2022 21:21
@dashpole dashpole requested a review from quentinmit April 19, 2022 19:15
@quentinmit
Copy link
Member

Commenting on the logic from your description, without reading the code:

  • Histograms and summaries should only be normalized as described if they represent a cumulative "aggregation temporality". (Side note: I don't see a gauge aggregation temporality - that would also of course not require any normalization.) Subtracting the original sum/count/bucket counts seems correct, but make sure you are subtracting the correct buckets - if the bucket boundaries change you need to either reset or check if they're a superset of the previous point.
  • For summaries I don't see any way to normalize quantiles (what does a quantile-over-cumulative even mean in this context?)
  • Exponential histograms seem the same as regular histograms, except you can more reliably line up corresponding buckets

@dashpole dashpole force-pushed the normalize_histograms branch from 0314cfa to 3a42843 Compare April 19, 2022 20:00
@dashpole dashpole mentioned this pull request Apr 20, 2022
exporter/collector/metrics.go Outdated Show resolved Hide resolved
exporter/collector/metrics.go Outdated Show resolved Hide resolved
@dashpole dashpole force-pushed the normalize_histograms branch from 3e82766 to 27e25f8 Compare April 25, 2022 13:48
@dashpole dashpole merged commit af0d907 into GoogleCloudPlatform:main Apr 26, 2022
@dashpole dashpole deleted the normalize_histograms branch April 26, 2022 22:40
damemi added a commit that referenced this pull request Apr 29, 2022
* Do not use IsZero on timestamps obtained from pdata (#336)

* do not use IsZero on timestamps obtained from pdata

* send summary counts as doubles instead of integers (#338)

* drop data points with no value set (#339)

* handle NaN sums in histograms and exponential histograms (#340)

* Fix broken links, typos in docs (#345)

* Update fixtures based on upstream changes to the prometheus receiver (#346)

* update fixtures based on upstream changes to the prometheus receiver

* feat(propagator): create a new one-way propagator (#343)

* feat(propagator): create a new one-way propagator

CloudTraceOneWayPropagator moves trace info from X-cloud-trace-context
into 'traceparent'.

* Apply suggestions from code review

Co-authored-by: David Ashpole <[email protected]>

* Update propagator to extract both, emit only traceparent

Users will no longer need to create a composite propagator, and there's
a clear precedence order between trace headers.

Co-authored-by: David Ashpole <[email protected]>

* Demonstrate that integration tests aren't catching failures (#349)

* demonstrate that integration tests aren't catching failures

* Update dependencies (#347)

update dependencies

* address remaining nits from #347 (#351)

* prepare for release (#353)

* add integration test with metrics from the prometheus receiver (#348)

* add make commands for release (#354)

* Send timeseries synchronously (#358)

* send timeseries syncronously

* update docs, and fix existing broken test

* Drop histograms without a sum (#359)

* update opentelemetry collector to v0.49.0

* drop histograms without a sum

* Add sum of squared deviation estimate (#341)

* add sum of squared deviation estimate

* Add missing otel attributes to Spans exported to Cloud Trace (#365)

* Add instrumentation library to span attributes.

* Add test to ensure resource + instrumentation-scope labels show up in spans.

* Expand testing.

* Move resource mapping into shared module, and add monitored resource labels to trace spans.

* Do the go mod tidy shuffle.

* Fix docker build to include new shared module.

* Bump back to go 1.17

* Migrate to SDK version of semantic convention constants instead of Collector's.

* Clean up documents.

* Normalize histograms, summaries, and exponential histograms (#360)

* normalize remaining cumulative data points

* use the timeout passed to the metrics exporter (#369)

* make gotidy

* Resolve conflicts from missed file

Co-authored-by: David Ashpole <[email protected]>
Co-authored-by: Marc Dougherty <[email protected]>
Co-authored-by: Josh Suereth <[email protected]>
damemi added a commit that referenced this pull request Apr 29, 2022
* Do not use IsZero on timestamps obtained from pdata (#336)

* do not use IsZero on timestamps obtained from pdata

* send summary counts as doubles instead of integers (#338)

* drop data points with no value set (#339)

* handle NaN sums in histograms and exponential histograms (#340)

* Fix broken links, typos in docs (#345)

* Update fixtures based on upstream changes to the prometheus receiver (#346)

* update fixtures based on upstream changes to the prometheus receiver

* feat(propagator): create a new one-way propagator (#343)

* feat(propagator): create a new one-way propagator

CloudTraceOneWayPropagator moves trace info from X-cloud-trace-context
into 'traceparent'.

* Apply suggestions from code review

Co-authored-by: David Ashpole <[email protected]>

* Update propagator to extract both, emit only traceparent

Users will no longer need to create a composite propagator, and there's
a clear precedence order between trace headers.

Co-authored-by: David Ashpole <[email protected]>

* Demonstrate that integration tests aren't catching failures (#349)

* demonstrate that integration tests aren't catching failures

* Update dependencies (#347)

update dependencies

* address remaining nits from #347 (#351)

* prepare for release (#353)

* add integration test with metrics from the prometheus receiver (#348)

* add make commands for release (#354)

* Send timeseries synchronously (#358)

* send timeseries syncronously

* update docs, and fix existing broken test

* Drop histograms without a sum (#359)

* update opentelemetry collector to v0.49.0

* drop histograms without a sum

* Add sum of squared deviation estimate (#341)

* add sum of squared deviation estimate

* Add missing otel attributes to Spans exported to Cloud Trace (#365)

* Add instrumentation library to span attributes.

* Add test to ensure resource + instrumentation-scope labels show up in spans.

* Expand testing.

* Move resource mapping into shared module, and add monitored resource labels to trace spans.

* Do the go mod tidy shuffle.

* Fix docker build to include new shared module.

* Bump back to go 1.17

* Migrate to SDK version of semantic convention constants instead of Collector's.

* Clean up documents.

* Normalize histograms, summaries, and exponential histograms (#360)

* normalize remaining cumulative data points

* use the timeout passed to the metrics exporter (#369)

Co-authored-by: David Ashpole <[email protected]>
Co-authored-by: Marc Dougherty <[email protected]>
Co-authored-by: Josh Suereth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants