Add ExponentialHistogramIndexerBenchmark #4989

jack-berg · 2022-11-25T14:31:41Z

I was analyzing the HistogramBenchmark and struggling to make sense of the results. Somewhat randomly, certain exponential histogram trials are 5-10x slower than not only explicit bucket histogram trials, but also other exponential histogram trials. For example, here's a group of exponential results:

HistogramBenchmark.aggregate_10Threads                               EXPONENTIAL_SMALL_CIRCULAR_BUFFER   FIXED_BUCKET_BOUNDARIES  avgt   10   29181.673 ±   667.898   ns/op
HistogramBenchmark.aggregate_10Threads                               EXPONENTIAL_SMALL_CIRCULAR_BUFFER  UNIFORM_RANDOM_WITHIN_2K  avgt   10   31704.614 ±   556.680   ns/op
HistogramBenchmark.aggregate_10Threads                               EXPONENTIAL_SMALL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  195109.475 ±  2487.755   ns/op
HistogramBenchmark.aggregate_10Threads                                     EXPONENTIAL_CIRCULAR_BUFFER   FIXED_BUCKET_BOUNDARIES  avgt   10   29654.776 ±   289.403   ns/op
HistogramBenchmark.aggregate_10Threads                                     EXPONENTIAL_CIRCULAR_BUFFER  UNIFORM_RANDOM_WITHIN_2K  avgt   10  196956.611 ±  2838.807   ns/op
HistogramBenchmark.aggregate_10Threads                                     EXPONENTIAL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  193478.876 ±  2165.116   ns/op

The GAUSSIAN_LATENCY takes ~6x longer than UNIFORM_RANDOM_WITHIN_2K and FIXED_BUCKET_BOUNDARIES for the EXPONENTIAL_SMALL_CIRCULAR_BUFFER aggregation, but both GAUSSIAN_LATENCY and UNIFORM_RANDOM_WITHIN_2K are ~6x slower for the EXPONENTIAL_CIRCULAR_BUFFER aggregation. What's going on here? Why does the distribution sometimes have such a big impact and other times not?

Well I got to the bottom of it: The code path is different for positive scales than for zero and negative scales. Positive scales compute the bucket using Math.ceil(Math.log(value) * scaleFactor) - 1, while zero and negative scales use bit manipulation. It turns out that Math.log is really slow compared to bit manipulation. Going back to the benchmark, subtle differences in the distribution of values cause the scale to stabilize at different values, sometimes positive, sometimes zero or negative. The scale isn't consistent across runs of the benchmark, but certain distributions (i.e. gaussian) seem to be more prone to resulting in positive scales than others.

What do we do about this?

We need to update the benchmark to be more consistent. I've done this by making the starting scale for the exponential histograms zero. scale can be adjusted lower if needed to accommodate the range of measurements but not higher. Starting it at zero means that the performance hit from Math.log bucketing isn't a factor in these benchmarks.
Make starting scale configurable. Performance sensitive applications may want to avoid the Math.log codepath altogether. Positive scales correspond to small buckets. If you know your app doesn't need small buckets, you should be able to set your starting scale to zero, avoiding Math.log and also reducing the number of rescales.
Add benchmark for bucket indexing. I've added a new ExponentialHistogramIndexerBenchmark benchmark in this PR which specifically illustrates the performance difference between positive, negative, and zero scales. If we think its possible to make performance improvements to bucket indexing for positive scales, this will provide a more focussed place to analyze the results. The baseline results show very clearly the ~9x performance reduction when the scale is positive.

Benchmark                                                              (scale)  Mode  Cnt       Score       Error   Units
ExponentialHistogramIndexerBenchmark.computeIndex                            1  avgt    5  140595.875 ± 21489.356   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate             1  avgt    5      ≈ 10⁻⁴              MB/sec
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate.norm        1  avgt    5       0.057 ±     0.008    B/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.count                  1  avgt    5         ≈ 0              counts
ExponentialHistogramIndexerBenchmark.computeIndex                            0  avgt    5   15333.799 ±   225.434   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate             0  avgt    5      ≈ 10⁻⁴              MB/sec
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate.norm        0  avgt    5       0.006 ±     0.001    B/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.count                  0  avgt    5         ≈ 0              counts
ExponentialHistogramIndexerBenchmark.computeIndex                           -1  avgt    5   15263.419 ±   676.713   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate            -1  avgt    5      ≈ 10⁻⁴              MB/sec
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate.norm       -1  avgt    5       0.006 ±     0.001    B/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.count                 -1  avgt    5         ≈ 0              counts

cc @jsuereth

jack-berg · 2022-11-26T14:39:24Z

Forgot to mention but here's the results for HistogramBenchmark after making the starting scale 0. Notice that the results no longer vary wildly for different value distributions, and most of the performance difference between explicit and exponential histograms has gone away.

codecov · 2022-11-26T14:52:08Z

Codecov Report

Base: 91.26% // Head: 91.26% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (9d30887) compared to base (568bdb4).
Patch coverage: 87.50% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #4989   +/-   ##
=========================================
  Coverage     91.26%   91.26%           
  Complexity     4886     4886           
=========================================
  Files           552      552           
  Lines         14431    14433    +2     
  Branches       1373     1373           
=========================================
+ Hits          13170    13172    +2     
  Misses          874      874           
  Partials        387      387

Impacted Files	Coverage Δ
.../autoconfigure/LogRecordExporterConfiguration.java	`98.50% <ø> (ø)`
.../sdk/autoconfigure/MeterProviderConfiguration.java	`77.27% <ø> (ø)`
...y/sdk/autoconfigure/SpanExporterConfiguration.java	`100.00% <ø> (ø)`
...internal/aggregator/ExponentialBucketStrategy.java	`87.50% <80.00%> (-12.50%)`	⬇️
...onfigure/spi/internal/DefaultConfigProperties.java	`94.11% <100.00%> (ø)`
...nfigure/AutoConfiguredOpenTelemetrySdkBuilder.java	`97.31% <100.00%> (ø)`
.../opentelemetry/exporter/prometheus/Serializer.java	`86.01% <0.00%> (-0.43%)`	⬇️
...metry/sdk/metrics/export/PeriodicMetricReader.java	`90.00% <0.00%> (+2.85%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

…y-java into expo-histogram-fix-benchmark

Add ExponentialHistogramIndexerBenchmark

187d43b

jack-berg requested a review from a team November 25, 2022 14:31

jkwatson approved these changes Nov 25, 2022

View reviewed changes

jack-berg mentioned this pull request Nov 28, 2022

Define configurable max scale for exponential histogram aggregation open-telemetry/opentelemetry-specification#2987

Closed

Merge branch 'main' of https://github.com/open-telemetry/opentelemetr…

9d30887

…y-java into expo-histogram-fix-benchmark

jack-berg merged commit e732328 into open-telemetry:main Dec 1, 2022

jack-berg mentioned this pull request Dec 22, 2022

Optimisation and benchmarking of backing data structure for exponential histogram #3848

Closed

dmarkwat pushed a commit to dmarkwat/opentelemetry-java that referenced this pull request Dec 30, 2022

Add ExponentialHistogramIndexerBenchmark (open-telemetry#4989)

4430a32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ExponentialHistogramIndexerBenchmark #4989

Add ExponentialHistogramIndexerBenchmark #4989

jack-berg commented Nov 25, 2022

jack-berg commented Nov 26, 2022

codecov bot commented Nov 26, 2022 •

edited

Loading

Add ExponentialHistogramIndexerBenchmark #4989

Add ExponentialHistogramIndexerBenchmark #4989

Conversation

jack-berg commented Nov 25, 2022

jack-berg commented Nov 26, 2022

codecov bot commented Nov 26, 2022 • edited Loading

Codecov Report

codecov bot commented Nov 26, 2022 •

edited

Loading