-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock-free atomic observations in Histograms! #457
Conversation
Signed-off-by: beorn7 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few small comments on comments.
prometheus/histogram.go
Outdated
// This is a complicated one. For lock-free yet atomic observations, we | ||
// need to save the total count of observations again, combined with the | ||
// index of the currently hot counts struct, so that we can perform one | ||
// atomic operations on both values. The least significant bit defines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded the sentence.
prometheus/histogram.go
Outdated
// This is a bit arcane, which is why the following spells out this if | ||
// clause in English: | ||
// | ||
// If the currently hot counts struct is #0, we atomically increment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is this the "currently hot" counts struct
, and not the currently "hot counts" struct
, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consistently changed "currently hot" to "currently-hot" to make the "operator precedence" more obvious.
prometheus/histogram.go
Outdated
// count of observations. This happens under the assumption that the | ||
// 63bit count will never overflow. Rationale: An observations takes | ||
// about 30ns. Let's assume it could happen in 10ns. Overflowing the | ||
// counter will then take at least (2^63)*10ns, which is about 3000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like we'll be fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
prometheus/histogram.go
Outdated
|
||
// We increment h.countAndHotIdx by 2 so that the counter in the upper | ||
// 63 bits gets incremented by 1. At the same time, we get the new value | ||
// back, which we can use to find the currently hot counts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, perhaps too clever. As they say: Don't be clever when it comes to concurrency. But I believe there is no other way if we want to keep the hot Observe path lock-free.
Fixes #275 This is rather tricky and required some studying of the Go memory model. I have added copious code comments to explain what's going on. Benchmarks haven't changed significantly, despite the additional atomic operations now required during Observe. Write performance is noticable, but it is also much more involved now and has a mutex. (But note that Write is supposed to be a relatively rare operation and thus not in the hot path compared to Observe.) Allocs haven't changed at all. OLD: BenchmarkHistogramWithLabelValues-4 10000000 151 ns/op 0 B/op 0 allocs/op BenchmarkHistogramNoLabels-4 50000000 36.0 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve1-4 50000000 28.1 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve2-4 10000000 160 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve4-4 5000000 378 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve8-4 2000000 768 ns/op 0 B/op 0 allocs/op BenchmarkHistogramWrite1-4 1000000 1589 ns/op 896 B/op 37 allocs/op BenchmarkHistogramWrite2-4 500000 2973 ns/op 1792 B/op 74 allocs/op BenchmarkHistogramWrite4-4 300000 6979 ns/op 3584 B/op 148 allocs/op BenchmarkHistogramWrite8-4 100000 10701 ns/op 7168 B/op 296 allocs/op NEW: BenchmarkHistogramWithLabelValues-4 10000000 191 ns/op 0 B/op 0 allocs/op BenchmarkHistogramNoLabels-4 30000000 50.1 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve1-4 30000000 40.0 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve2-4 20000000 91.5 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve4-4 5000000 317 ns/op 0 B/op 0 allocs/op BenchmarkHistogramObserve8-4 2000000 636 ns/op 0 B/op 0 allocs/op BenchmarkHistogramWrite1-4 1000000 2072 ns/op 896 B/op 37 allocs/op BenchmarkHistogramWrite2-4 300000 3729 ns/op 1792 B/op 74 allocs/op BenchmarkHistogramWrite4-4 200000 7847 ns/op 3584 B/op 148 allocs/op BenchmarkHistogramWrite8-4 100000 16975 ns/op 7168 B/op 296 allocs/op Signed-off-by: beorn7 <[email protected]>
f445a20
to
1b2bd1d
Compare
Just went through this while analysing memory allocations of histograms (kubernetes/kubernetes#74806). @beorn7 fancy, I haven't thought about the hot-cold for writes-reads before. |
Thanks. I gave a talk about it at the Berlin Gopher meetup (not recorded, but here are slides). I was genuinely worried that somebody in the audience would tell me that there is a much easier way… But what I learned from some people in the audience is that "atomically change two separate values" is something many concurrency researchers would like to have but what they cannot get from CPU developers. Now I'm thinking I should give a refined version of that talk again at some Go conference. Also, the code got some improvements in #536 by @pascaldekloe . |
@stuartnelson3 you might enjoy reviewing this. (Also, @grobie is on vacations...)
Fixes #275
This is rather tricky and required some studying of the Go memory
model. I have added copious code comments to explain what's going on.
Benchmarks haven't changed significantly, despite the additional
atomic operations now required during Observe. Write performance is
noticable, but it is also much more involved now and has a mutex. (But
note that Write is supposed to be a relatively rare operation and thus
not in the hot path compared to Observe.) Allocs haven't changed at
all.
OLD:
NEW: