-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make histogram aggregator checkpoint consistent #438
Make histogram aggregator checkpoint consistent #438
Conversation
The algorithm in Prometheus will block the collection pass using But wait, I'd like to abstract the mechanism for computing this concurrency technique into a self-contained package. The idea is that we can use one of these devices for the Histogram aggregator, and we can use one for the MinMaxSumCount aggregator. I think this means creating a struct with just the |
c.current.sum.AddNumberAtomic(kind, number) | ||
current := c.current() | ||
current.count.AddUint64Atomic(1) | ||
current.sum.AddNumberAtomic(kind, number) | ||
|
||
for i, boundary := range c.boundaries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of off-topic, but I'd like to not see a linear search here. I think this would work:
idx := sort.Search(len(c.boundaries), func(i int) bool {
return number.CompareNumber(kind, boundary) >= 0
})
current.buckets.Counts[idx].AddUint64Atomic(1)
... but there's a memory allocation there, so it's not perfect. Linear search is probably faster than even a hand-coded-allocation-free binary search for a small number of buckets, it would take some benchmarking to know if this matters.
Note that in the OpenMetrics exposition format, the Histogram type supports several different bucket configurations. There are "linear" buckets which use evenly spaced boundaries, there are "logarithmic" buckets, which use log-linear spacing between boundaries. If either of these special cases is used, we can do better than linear search or binary search typically using simple division (linear case) or computing a single logarithm + a division (log-linear case). These two special cases are probably more important than the arbitrary boundary case, which is when you begin looking at binary search.
So, I'm not sure what's best. A hand-coded-allocation-free binary search is still not the best outcome if the buckets are somehow linear. If the buckets are going to be linear or log-linear, it calls for new constructors to pass in a different sort of bucket-izer interface. If there's a "bucket configuration" interface that supports (a) listing boundaries and (b) performing the necessary lookup too, then this complexity can be isolated outside of the histogram aggregator itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a LinearBucket(min, max, number) returns you a bucket configuration for number
evenly spaced buckets between min
and max
for example. the search function uses division.
when using absolute measures, in particular, a LogarithmicBucket(minValue, growthFactor)
returns you a bucket configuration where the first bucket is (-Inf, minValue)
and each value after that is exponentially increasing by a factor of growth
. the search function calculates a math.Log()
to find the bucket, etc.
lastly there's a ArbitraryBucket()
configuration that's like what we have now. it uses binary search or linear search, depending on how many buckets there are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this are total valid considerations that we should address in the future, I think it is valid for us to create some issues to track this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on addressing and doing so in a future issue. I like the suggestion of addressing common bucket implementation.
This looks exciting. 🥇 |
Let us know when you'd like us to review this. 😁 |
Sorry for taking so long with this, It has been some stormy weeks. I think it is ready for review now. Naming this was very hard and I'm not totally satisfied, I'm open to suggestions. |
l := StateLocker{} | ||
l.SwapActiveState(func() {}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconded! This is great 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy that you both liked it! :)
"go.opentelemetry.io/otel/sdk/metric/aggregator/histogram" | ||
) | ||
|
||
func TestStressInt64Histogram(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sweet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome, thanks for taking the time to dig into this!
Almost all of my comments are minor nits except for the question about merging Histogram buckets. Other than that one, this looks good to go.
l := StateLocker{} | ||
l.SwapActiveState(func() {}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconded! This is great 😄
c.current.sum.AddNumberAtomic(kind, number) | ||
current := c.current() | ||
current.count.AddUint64Atomic(1) | ||
current.sum.AddNumberAtomic(kind, number) | ||
|
||
for i, boundary := range c.boundaries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on addressing and doing so in a future issue. I like the suggestion of addressing common bucket implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. 🚀
* change the histogram aggregator to have a consistent but blocking Checkpoint() * docs * wrapping docs * remove currentIdx from the 8bit alignment check * stress test * add export and move lockfreewrite algorithm to an external struct. * move state locker to another package. * add todos * minimal tests * renaming and docs * change to context.Background() * add link to algorithm and grammars Co-authored-by: Joshua MacDonald <[email protected]>
This was based this algorithm on the implementation of the prometheus histogram linked in #437.The algorithm is documented in the code.
I've created a separated struct so we can properly reuse it for other aggrators that have similar problems.
I've also added some tests that ensure that for whenever a
Checkpoint()
happens, the state of the histogram is valid, i.e., the overall count is the same as the sum of the count of all buckets. It indeed failed with the initial implementation of these at high concurrency but it is working now.