Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Draft weighted and timing histograms
The following investigation occurred during development. Add TimingHistogram impl that shares lock with WeightedHistogram Benchmarking and profiling shows that two layers of locking is noticeably more expensive than one. After adding this new alternative, I now get the following benchmark results. ``` (base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogram$ k8s.io/component-base/metrics/prometheusextension goos: darwin goarch: amd64 pkg: k8s.io/component-base/metrics/prometheusextension cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkTimingHistogram-16 22232037 52.79 ns/op 0 B/op 0 allocs/op PASS ok k8s.io/component-base/metrics/prometheusextension 1.404s (base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogram$ k8s.io/component-base/metrics/prometheusextension goos: darwin goarch: amd64 pkg: k8s.io/component-base/metrics/prometheusextension cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkTimingHistogram-16 22190997 54.50 ns/op 0 B/op 0 allocs/op PASS ok k8s.io/component-base/metrics/prometheusextension 1.435s ``` and ``` (base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogramDirect$ k8s.io/component-base/metrics/prometheusextension goos: darwin goarch: amd64 pkg: k8s.io/component-base/metrics/prometheusextension cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkTimingHistogramDirect-16 28863244 40.99 ns/op 0 B/op 0 allocs/op PASS ok k8s.io/component-base/metrics/prometheusextension 1.890s (base) mspreitz@mjs12 kubernetes % (base) mspreitz@mjs12 kubernetes % (base) mspreitz@mjs12 kubernetes % go test -benchmem -run=^$ -bench ^BenchmarkTimingHistogramDirect$ k8s.io/component-base/metrics/prometheusextension goos: darwin goarch: amd64 pkg: k8s.io/component-base/metrics/prometheusextension cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkTimingHistogramDirect-16 27994173 40.37 ns/op 0 B/op 0 allocs/op PASS ok k8s.io/component-base/metrics/prometheusextension 1.384s ``` So the new implementation is roughly 20% faster than the original. Add overlooked exception, rename timingHistogram to timingHistogramLayered Use the direct (one mutex) style of TimingHistogram impl This is about a 20% gain in CPU speed on my development machine, in benchmarks without lock contention. Following are two consecutive trials. (base) mspreitz@mjs12 prometheusextension % go test -benchmem -run=^$ -bench Histogram . goos: darwin goarch: amd64 pkg: k8s.io/component-base/metrics/prometheusextension cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkTimingHistogramLayered-16 21650905 51.91 ns/op 0 B/op 0 allocs/op BenchmarkTimingHistogramDirect-16 29876860 39.33 ns/op 0 B/op 0 allocs/op BenchmarkWeightedHistogram-16 49227044 24.13 ns/op 0 B/op 0 allocs/op BenchmarkHistogram-16 41063907 28.82 ns/op 0 B/op 0 allocs/op PASS ok k8s.io/component-base/metrics/prometheusextension 5.432s (base) mspreitz@mjs12 prometheusextension % go test -benchmem -run=^$ -bench Histogram . goos: darwin goarch: amd64 pkg: k8s.io/component-base/metrics/prometheusextension cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz BenchmarkTimingHistogramLayered-16 22483816 51.72 ns/op 0 B/op 0 allocs/op BenchmarkTimingHistogramDirect-16 29697291 39.39 ns/op 0 B/op 0 allocs/op BenchmarkWeightedHistogram-16 48919845 24.03 ns/op 0 B/op 0 allocs/op BenchmarkHistogram-16 41153044 29.26 ns/op 0 B/op 0 allocs/op PASS ok k8s.io/component-base/metrics/prometheusextension 5.044s Remove layered implementation of TimingHistogram
- Loading branch information