Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock-free atomic observations in Histograms! #457

Merged
merged 2 commits into from
Sep 12, 2018
Merged

Conversation

beorn7
Copy link
Member

@beorn7 beorn7 commented Sep 8, 2018

@stuartnelson3 you might enjoy reviewing this. (Also, @grobie is on vacations...)

Fixes #275

This is rather tricky and required some studying of the Go memory
model. I have added copious code comments to explain what's going on.

Benchmarks haven't changed significantly, despite the additional
atomic operations now required during Observe. Write performance is
noticable, but it is also much more involved now and has a mutex. (But
note that Write is supposed to be a relatively rare operation and thus
not in the hot path compared to Observe.) Allocs haven't changed at
all.

OLD:

BenchmarkHistogramWithLabelValues-4     10000000               151 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramNoLabels-4            50000000                36.0 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve1-4            50000000                28.1 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve2-4            10000000               160 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve4-4             5000000               378 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve8-4             2000000               768 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramWrite1-4               1000000              1589 ns/op             896 B/op         37 allocs/op
BenchmarkHistogramWrite2-4                500000              2973 ns/op            1792 B/op         74 allocs/op
BenchmarkHistogramWrite4-4                300000              6979 ns/op            3584 B/op        148 allocs/op
BenchmarkHistogramWrite8-4                100000             10701 ns/op            7168 B/op        296 allocs/op

NEW:

BenchmarkHistogramWithLabelValues-4     10000000               191 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramNoLabels-4            30000000                50.1 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve1-4            30000000                40.0 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve2-4            20000000                91.5 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve4-4             5000000               317 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve8-4             2000000               636 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramWrite1-4               1000000              2072 ns/op             896 B/op         37 allocs/op
BenchmarkHistogramWrite2-4                300000              3729 ns/op            1792 B/op         74 allocs/op
BenchmarkHistogramWrite4-4                200000              7847 ns/op            3584 B/op        148 allocs/op
BenchmarkHistogramWrite8-4                100000             16975 ns/op            7168 B/op        296 allocs/op

Signed-off-by: beorn7 <[email protected]>
@stuartnelson3 stuartnelson3 self-requested a review September 11, 2018 08:09
Copy link
Contributor

@stuartnelson3 stuartnelson3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few small comments on comments.

// This is a complicated one. For lock-free yet atomic observations, we
// need to save the total count of observations again, combined with the
// index of the currently hot counts struct, so that we can perform one
// atomic operations on both values. The least significant bit defines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded the sentence.

// This is a bit arcane, which is why the following spells out this if
// clause in English:
//
// If the currently hot counts struct is #0, we atomically increment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is this the "currently hot" counts struct, and not the currently "hot counts" struct, I guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consistently changed "currently hot" to "currently-hot" to make the "operator precedence" more obvious.

// count of observations. This happens under the assumption that the
// 63bit count will never overflow. Rationale: An observations takes
// about 30ns. Let's assume it could happen in 10ns. Overflowing the
// counter will then take at least (2^63)*10ns, which is about 3000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like we'll be fine :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK


// We increment h.countAndHotIdx by 2 so that the counter in the upper
// 63 bits gets incremented by 1. At the same time, we get the new value
// back, which we can use to find the currently hot counts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps too clever. As they say: Don't be clever when it comes to concurrency. But I believe there is no other way if we want to keep the hot Observe path lock-free.

Fixes #275

This is rather tricky and required some studying of the Go memory
model. I have added copious code comments to explain what's going on.

Benchmarks haven't changed significantly, despite the additional
atomic operations now required during Observe. Write performance is
noticable, but it is also much more involved now and has a mutex. (But
note that Write is supposed to be a relatively rare operation and thus
not in the hot path compared to Observe.) Allocs haven't changed at
all.

OLD:

BenchmarkHistogramWithLabelValues-4     10000000               151 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramNoLabels-4            50000000                36.0 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve1-4            50000000                28.1 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve2-4            10000000               160 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve4-4             5000000               378 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve8-4             2000000               768 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramWrite1-4               1000000              1589 ns/op             896 B/op         37 allocs/op
BenchmarkHistogramWrite2-4                500000              2973 ns/op            1792 B/op         74 allocs/op
BenchmarkHistogramWrite4-4                300000              6979 ns/op            3584 B/op        148 allocs/op
BenchmarkHistogramWrite8-4                100000             10701 ns/op            7168 B/op        296 allocs/op

NEW:

BenchmarkHistogramWithLabelValues-4     10000000               191 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramNoLabels-4            30000000                50.1 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve1-4            30000000                40.0 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve2-4            20000000                91.5 ns/op             0 B/op          0 allocs/op
BenchmarkHistogramObserve4-4             5000000               317 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramObserve8-4             2000000               636 ns/op               0 B/op          0 allocs/op
BenchmarkHistogramWrite1-4               1000000              2072 ns/op             896 B/op         37 allocs/op
BenchmarkHistogramWrite2-4                300000              3729 ns/op            1792 B/op         74 allocs/op
BenchmarkHistogramWrite4-4                200000              7847 ns/op            3584 B/op        148 allocs/op
BenchmarkHistogramWrite8-4                100000             16975 ns/op            7168 B/op        296 allocs/op

Signed-off-by: beorn7 <[email protected]>
@beorn7 beorn7 merged commit b5bfa0e into master Sep 12, 2018
@beorn7 beorn7 deleted the beorn7/histogram branch September 12, 2018 13:04
@mxinden
Copy link
Member

mxinden commented Mar 1, 2019

Just went through this while analysing memory allocations of histograms (kubernetes/kubernetes#74806). @beorn7 fancy, I haven't thought about the hot-cold for writes-reads before.

@beorn7
Copy link
Member Author

beorn7 commented Mar 1, 2019

Thanks. I gave a talk about it at the Berlin Gopher meetup (not recorded, but here are slides). I was genuinely worried that somebody in the audience would tell me that there is a much easier way… But what I learned from some people in the audience is that "atomically change two separate values" is something many concurrency researchers would like to have but what they cannot get from CPU developers. Now I'm thinking I should give a refined version of that talk again at some Go conference.

Also, the code got some improvements in #536 by @pascaldekloe .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants