Lock-free atomic observations in Histograms! #457

beorn7 · 2018-09-08T23:56:33Z

@stuartnelson3 you might enjoy reviewing this. (Also, @grobie is on vacations...)

Fixes #275

This is rather tricky and required some studying of the Go memory
model. I have added copious code comments to explain what's going on.

Benchmarks haven't changed significantly, despite the additional
atomic operations now required during Observe. Write performance is
noticable, but it is also much more involved now and has a mutex. (But
note that Write is supposed to be a relatively rare operation and thus
not in the hot path compared to Observe.) Allocs haven't changed at
all.

OLD:

BenchmarkHistogramWithLabelValues-4     10000000               151 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramNoLabels-4            50000000                36.0 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve1-4            50000000                28.1 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve2-4            10000000               160 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve4-4             5000000               378 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve8-4             2000000               768 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramWrite1-4               1000000              1589 ns/op             896 B/op         37 allocs/op
BenchmarkHistogramWrite2-4                500000              2973 ns/op            1792 B/op         74 allocs/op
BenchmarkHistogramWrite4-4                300000              6979 ns/op            3584 B/op        148 allocs/op
BenchmarkHistogramWrite8-4                100000             10701 ns/op            7168 B/op        296 allocs/op

NEW:

BenchmarkHistogramWithLabelValues-4     10000000               191 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramNoLabels-4            30000000                50.1 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve1-4            30000000                40.0 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve2-4            20000000                91.5 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve4-4             5000000               317 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve8-4             2000000               636 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramWrite1-4               1000000              2072 ns/op             896 B/op         37 allocs/op
BenchmarkHistogramWrite2-4                300000              3729 ns/op            1792 B/op         74 allocs/op
BenchmarkHistogramWrite4-4                200000              7847 ns/op            3584 B/op        148 allocs/op
BenchmarkHistogramWrite8-4                100000             16975 ns/op            7168 B/op        296 allocs/op

Signed-off-by: beorn7 <[email protected]>

stuartnelson3

Looks good! A few small comments on comments.

stuartnelson3 · 2018-09-12T09:00:04Z

prometheus/histogram.go

+	// This is a complicated one. For lock-free yet atomic observations, we
+	// need to save the total count of observations again, combined with the
+	// index of the currently hot counts struct, so that we can perform one
+	// atomic operations on both values. The least significant bit defines


Reworded the sentence.

stuartnelson3 · 2018-09-12T10:05:08Z

prometheus/histogram.go

+	// This is a bit arcane, which is why the following spells out this if
+	// clause in English:
+	//
+	// If the currently hot counts struct is #0, we atomically increment


Actually, is this the "currently hot" counts struct, and not the currently "hot counts" struct, I guess?

I consistently changed "currently hot" to "currently-hot" to make the "operator precedence" more obvious.

stuartnelson3 · 2018-09-12T10:24:47Z

prometheus/histogram.go

+	// count of observations. This happens under the assumption that the
+	// 63bit count will never overflow. Rationale: An observations takes
+	// about 30ns. Let's assume it could happen in 10ns. Overflowing the
+	// counter will then take at least (2^63)*10ns, which is about 3000


sounds like we'll be fine :)

stuartnelson3 · 2018-09-12T10:25:32Z

prometheus/histogram.go

+
+	// We increment h.countAndHotIdx by 2 so that the counter in the upper
+	// 63 bits gets incremented by 1. At the same time, we get the new value
+	// back, which we can use to find the currently hot counts.


Yeah, perhaps too clever. As they say: Don't be clever when it comes to concurrency. But I believe there is no other way if we want to keep the hot Observe path lock-free.

Fixes #275 This is rather tricky and required some studying of the Go memory model. I have added copious code comments to explain what's going on. Benchmarks haven't changed significantly, despite the additional atomic operations now required during Observe. Write performance is noticable, but it is also much more involved now and has a mutex. (But note that Write is supposed to be a relatively rare operation and thus not in the hot path compared to Observe.) Allocs haven't changed at all. OLD: BenchmarkHistogramWithLabelValues-4 10000000 151 ns/op 0 B/op 0 allocs/op BenchmarkHistogramNoLabels-4 50000000 36.0 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve1-4 50000000 28.1 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve2-4 10000000 160 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve4-4 5000000 378 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve8-4 2000000 768 ns/op 0 B/op 0 allocs/op BenchmarkHistogramWrite1-4 1000000 1589 ns/op 896 B/op 37 allocs/op BenchmarkHistogramWrite2-4 500000 2973 ns/op 1792 B/op 74 allocs/op BenchmarkHistogramWrite4-4 300000 6979 ns/op 3584 B/op 148 allocs/op BenchmarkHistogramWrite8-4 100000 10701 ns/op 7168 B/op 296 allocs/op NEW: BenchmarkHistogramWithLabelValues-4 10000000 191 ns/op 0 B/op 0 allocs/op BenchmarkHistogramNoLabels-4 30000000 50.1 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve1-4 30000000 40.0 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve2-4 20000000 91.5 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve4-4 5000000 317 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve8-4 2000000 636 ns/op 0 B/op 0 allocs/op BenchmarkHistogramWrite1-4 1000000 2072 ns/op 896 B/op 37 allocs/op BenchmarkHistogramWrite2-4 300000 3729 ns/op 1792 B/op 74 allocs/op BenchmarkHistogramWrite4-4 200000 7847 ns/op 3584 B/op 148 allocs/op BenchmarkHistogramWrite8-4 100000 16975 ns/op 7168 B/op 296 allocs/op Signed-off-by: beorn7 <[email protected]>

mxinden · 2019-03-01T12:49:51Z

Just went through this while analysing memory allocations of histograms (kubernetes/kubernetes#74806). @beorn7 fancy, I haven't thought about the hot-cold for writes-reads before.

beorn7 · 2019-03-01T13:08:06Z

Thanks. I gave a talk about it at the Berlin Gopher meetup (not recorded, but here are slides). I was genuinely worried that somebody in the audience would tell me that there is a much easier way… But what I learned from some people in the audience is that "atomically change two separate values" is something many concurrency researchers would like to have but what they cannot get from CPU developers. Now I'm thinking I should give a refined version of that talk again at some Go conference.

Also, the code got some improvements in #536 by @pascaldekloe .

Add test to expose #275

1e08f78

Signed-off-by: beorn7 <[email protected]>

stuartnelson3 self-requested a review September 11, 2018 08:09

stuartnelson3 reviewed Sep 12, 2018

View reviewed changes

beorn7 force-pushed the beorn7/histogram branch from f445a20 to 1b2bd1d Compare September 12, 2018 13:03

beorn7 merged commit b5bfa0e into master Sep 12, 2018

beorn7 deleted the beorn7/histogram branch September 12, 2018 13:04

beorn7 mentioned this pull request Dec 24, 2018

Lock-free summaries without objectives #521

Merged

mxinden mentioned this pull request Apr 5, 2020

src/histogram: Make Histogram::observe atomic across collects tikv/rust-prometheus#314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock-free atomic observations in Histograms! #457

Lock-free atomic observations in Histograms! #457

beorn7 commented Sep 8, 2018

stuartnelson3 left a comment

stuartnelson3 Sep 12, 2018

beorn7 Sep 12, 2018

stuartnelson3 Sep 12, 2018

stuartnelson3 Sep 12, 2018

beorn7 Sep 12, 2018

stuartnelson3 Sep 12, 2018

beorn7 Sep 12, 2018

stuartnelson3 Sep 12, 2018

beorn7 Sep 12, 2018

mxinden commented Mar 1, 2019

beorn7 commented Mar 1, 2019

Lock-free atomic observations in Histograms! #457

Lock-free atomic observations in Histograms! #457

Conversation

beorn7 commented Sep 8, 2018

stuartnelson3 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Mar 1, 2019

beorn7 commented Mar 1, 2019