-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add adaptable circular buffer implementation for ExponentialCounter. #4087
Add adaptable circular buffer implementation for ExponentialCounter. #4087
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, the addition of the factories make switching between and testing different implementations cleaner. The tests give me confidence too, since all implementations are parameterised and tested side by side.
Just a few comments and questions. Also, the linter is complaining about javadoc and style by the looks of it.
/** Constructs fresh new buckets. */ | ||
DoubleExponentialHistogramBuckets newBuckets(); | ||
|
||
/** Constructs "empty" count buckets using settings from previous recorded bucket. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to think if it's a safe assumption to use the scale of the previous accumulations after resetting. Using cumulative aggregation temporality, certainly it's safe. merge()
will downscale it anyway, so it would save some CPU time.
However for delta temporality, could there be a use case where someone might want the highest precision possible for each export?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a related note, I noticed diff / merge / zero are implemented as immutable operations but could they be mutable? Intuitively, always mutating a long array seems like the allocations would be fine (preallocated just once on initialization) and avoid the dynamic dispatch of the byte-short-int-long stuff. The idea does seem to at least require the scale to be preserved on reset though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've exposed it as a thing we can "toy with" and generate a strategy that does NOT do this to compare against other strategies.
However, there are two common use cases we need to optimise for:
- High throughput traffic in a distribution
- Low throughput traffic in a distribution
Given our instrumentation, we're likely to see both of these. For 2 we want smallest amount of memory consumption, for 1 we want high performance. The idea behind saving Scale (and eventually integer array size) is so that we can preserve as much performance as possible for #1 by shifting distributions from use cases #2 to #1 as we "detect" high throughput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short answer on mutable accumulation is "no". They're shared between exporters now, so mutating would (initially) lead to bugs. I'm hoping to do some more optimisation work toward that end,specifically making sure none of it lands on the hot-path. IIRC - all merge
/diff
operations are already off the 'hot' path and only performed during collection/export.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're able to eventually make them mutable, I think we'd be able to make the histogram aggregations no-alloc which should allow removing AdaptingIntegerArray
as the memory overhead would be fixed, or at least trim it down to being bimorphic which Java deals pretty well with. Will be a good area to explore as that class really just feels like a "root of all evil"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure how this particular optimisation is any more evil than hand-rolling Protobuf serialization, from both a maintenance and "scare" sense. Can you elaborate on primary concerns, especially given the benchmark showing performance? I'm happy to add testing around this, but I definitely don't see it as anywhere near as scary as hand-rolled serialization.
Related, I'm really not sure what you mean about mutable + optimising. Could you lay out that scenario/idea? I'd be happy to try it out.
I'm afraid, though, that in practice the we need to optimise for the two scenarios I listed above. "Eventually" using consistent amount of memory may not be good enough if the overhead per-timeseries is still too high. However, I'm not sure I understand what kind of mutation you're proposing, and I'm not sure if I've accurately described why using the adapting array plays to the current use case we have for histograms so well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protobuf optimizations are pretty deterministic - we can reason out the difference in the old and new code to know where performance will come from. The adapting arrays are based on heuristics - in particular, we would expect direct array access to always perform quite a bit better than an access through a megamorphic switch statement, so there is a clear tradeoff involved. This makes it much scarier and microbenchmarks will never be able to give the full picture when relying on heuristics and assumptions. Gradually going up in complexity tends to at least be a bit less scary, for example MapCounter -> long[]
circular buffer -> then AdaptingIntegerArray
cirulcar buffer.
Related, I'm really not sure what you mean about mutable + optimizing. Could you lay out that scenario/idea? I'd be happy to try it out.
As far as I could tell, if all the operations on the bucket counts were mutating an array, including zeroing at the end of collection, there wouldn't be a need to reallocate the bucket count storage during operations / collection. If so, it would seem to incentivize only having one type, long[]
or at most two types since Java could still optimize bimorphic code paths well. If it's not possible to remove the allocations, though, then it is what it is and raises the importance of the adapting integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gradually going up in complexity tends to at least be a bit less scary, for example MapCounter -> long[] circular buffer -> then AdaptingIntegerArray cirulcar buffer.
I shared (a few weeks back, when I first sent the perf benchmarks) a branch I had where I had all three of these.
The long[]
counter is the clear winner for raw CPU performance. There is a minor performance degredation going to the Adapting
array (as you call out, due to the switch). However, as shown in this PR, the Adapting use case saves a good bit of RAM in addition to remaining faster than MapCounter
. Given the differential, and the sheer # of timeseries I see coming out of instrumentation histograms today, I think this RAM optimisation (or something like it) is worth it. Perhaps there's a different mechanism we can use to change the array implementation that would be more ideal in leading to (eventual) monomorphic optimisation?
As far as I could tell, if all the operations on the bucket counts were mutating an array, including zeroing at the end of collection, there wouldn't be a need to reallocate the bucket count storage during operations / collection. If so, it would seem to incentivize only having one type, long[] or at most two types since Java could still optimize bimorphic code paths well. If it's not possible to remove the allocations, though, then it is what it is and raises the importance of the adapting integers
We could actually do this today. These counters are mutable, we only need them immutable in the collection route. Instead of "new"-ing up the counters, we can just create copies to expose externally and then call .zero
. Happy to do that in this PR or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this RAM optimisation (or something like it) is worth it.
300 buckets * 8 is 2.4K per histogram, 2000 histograms in 5MB seems like a lot of histograms for what in Java is not a lot of RAM.
5MB allocations per cycle on the other hand becomes more cause for concern which is where the point of mutability comes in. If reallocations an be avoided, then I really feel the case for an adapting array becomes weaker.
Anyways I already approved since don't care about this code that much. It's just I've generally seen situations that jump at the most complex solution to not scale back to a simpler one even if that happened to be more appropriate, going in steps tends to work better to not get too deep into a complexity hole.
|
||
/** | ||
* These are extra test cases for buckets. Much of this class is already tested via more complex | ||
* test cases at {@link DoubleExponentialHistogramAggregatorTest}. | ||
*/ | ||
public class DoubleExponentialHistogramBucketsTest { | ||
|
||
@Test | ||
void testRecordSimple() { | ||
static Stream<ExponentialBucketStrategy> bucketStrategies() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about @MethodSource
, nice.
ExponentialCounterFactory.mapCounter(), ExponentialCounterFactory.circularBufferCounter()); | ||
} | ||
|
||
@ParameterizedTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are one or two more tests we need once we have sufficient implementations of BucketMapper
- since the current implementation can be inaccurate in certain cases. I can get to that in the next couple of weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be great. I was looking at the Go code (which I based this on), and there's some big complex optimisations to half the cost of the log lookup. That may be the next thing in the critical path here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the ticket: #3842
@@ -22,14 +23,23 @@ | |||
implements Aggregator<ExponentialHistogramAccumulation> { | |||
|
|||
private final Supplier<ExemplarReservoir> reservoirSupplier; | |||
private final ExponentialBucketStrategy bucketStrategy; | |||
|
|||
DoubleExponentialHistogramAggregator(Supplier<ExemplarReservoir> reservoirSupplier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you foresee the instrument / user API looking like for this? Something like new ExponentialHistogramInstrument(maxbuckets, startingScale)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a discussion on the specification right now, but I'll give you my thoughts:
- If we make a user specify anything by default, we've failed (fundamentally) on the goal of these histograms.
- If users want to specify to optimise their use case, it should match their expectations of data. Specifically, We can allow:
- Users may give us a dynamic (base) range. E.g. I want to record between (0.0, 4000.0).
- Users may give us the maximum # of buckets (or RAM) they want to use to store for histograms.
- The SDK would calculate initial scale + bucket maximum based on these values.
result = byteBacking[idx] + count; | ||
if (result > Byte.MAX_VALUE) { | ||
// Resize + add | ||
resizeToShort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of an edge case where result
is, for this case, higher than Short.MAX_VALUE
, in which case resizeToShort()
will cause overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be optimised, but effectively it doesn't write the new value. It resizes to Short THEN attempts to write again where it'd detect the overflow. We can fix this so it resizes tot he appropriate value straight away, happy to fix that in this PR if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah I missed the recursive call, so it's correct. If you want to optimise it then go ahead, but I'm not too fussed seeing as it's a super edge case and not that much more CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reality, we're mostly bumping counts by 1, so I'm not sure any changes will make it more optimal outside of preserving integer-size once a specific histogram "bumps".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a commit missing from the PR corresponding to the "Add performance benchmarks." in the description?
/** Constructs fresh new buckets. */ | ||
DoubleExponentialHistogramBuckets newBuckets(); | ||
|
||
/** Constructs "empty" count buckets using settings from previous recorded bucket. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a related note, I noticed diff / merge / zero are implemented as immutable operations but could they be mutable? Intuitively, always mutating a long array seems like the allocations would be fine (preallocated just once on initialization) and avoid the dynamic dispatch of the byte-short-int-long stuff. The idea does seem to at least require the scale to be preserved on reset though
* | ||
* <p>Instances start by attempting to store one-byte per-cell in the integer array. As values grow, | ||
* this will automatically instantiate the next-size integer array (byte => short => int => long) | ||
* and copy over values into the larger array. This class expects most usage to remain within the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid expectation? If collection interval is 1s, then that's still only 128 recordings per bucket. 128 qps is quite easy on a single app and even if not all of it will be in the same bucket, it seems reasonable to expect 128 to make it in. 32K qps is much less common though so if sticking with type-switching, I would probably remove the BYTE case and start with SHORT. Though I'm suspicious of the general real world performance benefit of type-switching even if synthetic benchmarks can find cases where it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, a lot of latency-based monitoring matches a normal distribution (or a sum of a few normals). Additionally, if you look at our actual client implementation, we tend to have two types of endpoints: Those infrequently used (e.g. periodic maintenance type RPCs, metadata things, infrequent user operations, etc.) + those with heavy traffic (core use case).
We need to optimise across this board, and we're creating a LOT of distributions, as we track them by HTTP endpoint right now in instrumentation. So even if a server has 128 qps, we need endpoints to also see that and we need to optimise for high+low traffic scenarios with our defaults.
I do think it's worthwhile to put into place "detection" for these high-qps latency distributions where we can:
- Preserve the previous Scale (assuming we'll be seeing similar traffic in the next collection cycle.
- Preserve the previous sized integer (I was going to work on that as a future optimisation).
For the purposes of this PR though, we're already seeing better performance than MapCounter and I think we have room to go further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this changes the conversation, but the default collection interval is 60s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really.
If we assume normal distribution (A simple HTTP server usually looks like this. In practice, you have sum-of-normal distributions when you look at latency for various paths your request will take that alter the normal distribution).
We have 320 buckets to represent counts in boundaries within this normal distribution. We know about 68% of counts will occur within the stdeviation from the mean. It's hard to identify how good a job of "tuning" the exponential algorithm can do with aligning its bucket thresholds + scale against the std-deviation, but let's assume we're halfway decent, meaning 160 buckets are within this boundary.
32K qps means we'll have 1,920,000 measurements in a minute = 32K qps * 60s. If we divide by 160 buckets, that averages about 12,000 per-bucket (naively). Back-of-the-envelope (we could use normal distribution function to fix this), I'd argue we would probably see peaks into 20k per-bucket. This is still below the 32k we get from just short alone, and that's for something with relatively high QPS.
Reversing this, let's say if we average 100 counts-per-bucket to fit within byte boundary, I'd say we can measure upwards of 400/500 qps w/ the byte arary.
Remember that, by semantic convention, we're measuring URL Paths with individual timeseries, not just a single web endpoint. This means, e.g. in a REST application we need to allocate a histogram for every single possible endpoint (including things like GET /entity/{id}
). It's very likely the <500 qps optimisation in memory usage pays off. It's entirely likely we never need to upgrade to long
for real world usage.
This is all back-of-the-envelope math. I'm sure someone else could do a better calculation for a more realistic example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. If we follow through with the optimization to preserve the previously size integer I'd expect it to be nicely adapted to a variety of use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that optimisation in this PR.
Yes. We only need to add configuration to the enum values for the benchmark though, as the code/logic for generating data and running the benchmark already exists. See HistogramAggregationParam.java |
Codecov Report
@@ Coverage Diff @@
## main #4087 +/- ##
============================================
+ Coverage 90.21% 90.28% +0.06%
- Complexity 4515 4605 +90
============================================
Files 533 537 +4
Lines 13831 14091 +260
Branches 1321 1347 +26
============================================
+ Hits 12478 12722 +244
- Misses 923 926 +3
- Partials 430 443 +13
Continue to review full report at Codecov.
|
|
||
private AdaptingIntegerArray(int size, ArrayCellSize cellSize) { | ||
this.cellSize = cellSize; | ||
switch (cellSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for the missed lines in this class, or if they don't have a short term need maybe remove the unused methods? There are some methods in other classes too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. I think I added these in preparation of preserving cell-size between collection intervals, I'll either add tests now or move that to a seperate branch/PR fully.
/** Constructs fresh new buckets. */ | ||
DoubleExponentialHistogramBuckets newBuckets(); | ||
|
||
/** Constructs "empty" count buckets using settings from previous recorded bucket. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're able to eventually make them mutable, I think we'd be able to make the histogram aggregations no-alloc which should allow removing AdaptingIntegerArray
as the memory overhead would be fixed, or at least trim it down to being bimorphic which Java deals pretty well with. Will be a good area to explore as that class really just feels like a "root of all evil"
/** Convert from byte => short backing array. */ | ||
@SuppressWarnings("NullAway") | ||
private void resizeToShort() { | ||
this.shortBacking = new short[this.byteBacking.length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's build up in a local variable and assign to the field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason, or is this a style thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly style (setup, then apply), also makes the function more future proof, if locking or volatility changes in the future, it doesn't matter, we see this function acting atomically so easier to reason about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small suggestions but looks good. 👍
import io.opentelemetry.sdk.metrics.internal.state.ExponentialCounterFactory; | ||
|
||
/** The configuration for how to create exponential histogram buckets. */ | ||
interface ExponentialBucketStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have an interface here with the single implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring left-overs. I first added this and gutted all the usage, then wound up only making one implementation. I can fix this up, however I'd also be happy to just remove MapCounter
unless someone objects. Memory usage/allocations were pretty high, and I think a circular buffer is likely more performant for most histogram use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also be happy to just remove MapCounter unless someone objects.
No objections from me.
|
||
package io.opentelemetry.sdk.metrics.internal.state; | ||
|
||
public interface ExponentialCounterFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this ExponentialCounterFactory<T extends ExponentialCounter>
, and have:
T newCounter(int maxSize)
T copy(T other)
That will simplify the constructor public AdaptingCircularBufferCounter(ExponentialCounter toCopy)
which currently tries to accommodate other implementations of ExponentialCounter
. Is there a situation where multiple implementations would be used in tandem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataDog histogram swaps between Dense/Sparse counter implementations. However, they're using double + partial counts for shuffling/resizing dynmaically which isn't an option for us.
I left the ability for us to mix/match dense (circular buffer) with sparse (map counter) to see if we can get any gains, but I wasn't able to find a scenario where this was performant, especially given how we expose dense-buckets in OTLP, and our most likely use case (latency monitoring) tends to follow normal-distributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I started wiring this through, and it actually needs the type to embedd pretty far up into the hierarchy to the point you have ExponentialHistogramAggregation<T>
. where T = the counter type. It starts to feel pretty clunky and wierd. Let me know if:
- You'd like me to continue with that change.
- I can just delete MapCounter and remove a lot of the factory/strategy options.
- We leave it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove MapCounter in this PR if it makes things easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't feel strongly about this. Is it useful to keep MapCounter
around for future performance comparisons or has it officially run its course?
Can also just have public AdaptingCircularBufferCounter(ExponentialCounter toCopy)
throw if !(toCopy instanceOf AdaptingCircularBufferCounter)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for AFK on this. You'll have to bear with my availability this month :(.
My opinion is that MapCounter
is sufficiently poorly performing I think we can oust it for now. There might be a world where spare-vector could perform well here, but not with the currently expected defaults of exponential histograms (specifically 320 buckets). Today though, it puts serious strain on the garbage collector as we're duplicating a large map of atomic integers frequently.
Given @anuraaga's general concerns, I've still been spending time thinking through alternative implementations and mechanisms here. Specifically, I want to prioritize a few optimisations in metrics:
- Figuring out a way to avoid as many allocations in the collection path. As @anuraaga called out, allocating 5MB of RAM every 1 minute (or less) is unacceptable.
- Trying to leverage "dumb java code" that hotspots well as much as possible. While I still think the layered integer is necessary for RAM savings at our current bucket defaults, I would still love a simpler implementation if it didn't come with 2-4x overhead in RAM.
I'll try to shore up this PR's remaining comments. Right now, it looks like I should be able to attend the SiG this week, so perhaps we can discuss in person future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I had some ideas where having MapCounter
around (for the collection path) may be a decent optimisation. Specifically for low-frequency histogram timeseries, if we detect # of buckets with values < 5 e.g. (or count is 0) we can optimise the collection path by NOT allocating a new huge integer array and just returning the Sparse (or empty) counter implementation. However, I'll want to benchmark that to make sure it's actually worth it.
I'd like to shore up other comments in this PR (for toCopy, it's actually important it works on all ExponentialCounter implementations), and then I'll work on a new PR that measures performance during collection we can use to identify and tune allocations in that path.
* | ||
* <p>Instances start by attempting to store one-byte per-cell in the integer array. As values grow, | ||
* this will automatically instantiate the next-size integer array (byte => short => int => long) | ||
* and copy over values into the larger array. This class expects most usage to remain within the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this changes the conversation, but the default collection interval is 60s.
…nd expose hooks to test its use in Exponential Histogram aggregator.
Any reason not to go ahead and merge this? |
- Fix a bug in equality where it was forcing ExponentialCounter to have the same offset, even if it had stored 0 counts in all buckets. This interacts negatively with merge/diff tests where creating a fresh exponential bucket would have different indexStart then diff-ing another. - Modify default exponential bucket counter to be adapting circular buffer. - Remove some not-well-though-out methods (like zeroOf, zeroFrom) in favor of a "clear" method on ExponentialCounter - Modify ExponentialBucketStrategy to be an actual implementation.
b38a8f0
to
d31a643
Compare
@jkwatson Give me another few hours (maybe 24) to do some cleanup. Given the discussion, I'm removing some uneccessary interfaces and improving some testing that was a bit too tied to MapCounter implementation. |
Ok, this should be ready. Further cleanups needed, but I need to flesh out a collection-time benchmark first so we can measure performance on that path. Right now it's pretty much a huge memory hog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few places to double-check and if fine should be good to merge
* </ul> | ||
*/ | ||
private boolean sameBucketCounts(DoubleExponentialHistogramBuckets other) { | ||
int min = Math.min(this.counts.getIndexStart(), other.counts.getIndexStart()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm this logic? Shouldn't it be something like a = this.start, b = other.start, a <= this.end && b <= other.end; a++ && b++)
?
It looks like we could refer to an index that is below start
for one of the two buckets right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually that's the point.
What can happen right now (with diff
) is that we have a bucket, e.g. at index 2 which "diffs" to 0. In the unit tests, we instantiate the histogram-buckets by saying "You should see count 5 at index 3", which means we have one bucket with index start @ 3 and another with index start @ 2. They're semantically equivalent counts.
The previous MapCounter implementation actually did GC so any count that reached zero was removed from the Map. Unfortunately, that's not really feasible in the circular buffer implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be what you are saying:
bucket counts:
0,0,5,1,2,0,0, == 5,1,2
Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't notice the change from throwing to returning 0 for those indices. Can you confirm there is a unit test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
*/ | ||
private boolean sameBucketCounts(DoubleExponentialHistogramBuckets other) { | ||
int min = Math.min(this.counts.getIndexStart(), other.counts.getIndexStart()); | ||
int max = Math.max(this.counts.getIndexEnd(), other.counts.getIndexStart()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be referencing other.getIndexEnd()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch, good catch. I got lucky in the tests based on how it orders equals, will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the unlucky unit test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather find a way to make the unit tests NOT care about equality here, but yeah I'll see if I can get a better unit test (not see my comment to james on alternatives here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the unit test. I think we may be able to do some refactoring of ExponentialHistogramBuckets at some point to clean up this code, and hopefully ensure we're not relying on hashCode/equals in production.
// If toCopy is an AdaptingCircularBuffer, just do a copy of the underlying array | ||
// and baseIndex. | ||
if (toCopy instanceof AdaptingCircularBufferCounter) { | ||
this.backing = new AdaptingIntegerArray(((AdaptingCircularBufferCounter) toCopy).backing); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toCopy
indicates we're supposed to copy but this doesn't appear to copy, is it safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does copy, it's using the convention of passing the thing to copy into the consstructor, so: new AdaptingIntegerArray(otherIntegerArray)
performs a copy. Is there a different convention I should be using for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that the else
statement does the copy in this code while the if
doesn't. Both should have consistent semantics to prevent confusion like mine, is it possible to copy the backing here instead of relying on the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They actually both copy, just via different mechanism.
Edited: Just noticed your comment. Do you mean you want me to add a .copy
method to AdaptingIntegeryArray instead of the constructor variant to make this more obvious it's copying? I guess my original question is asking whether new Collection(otherCollection)
isn't already the Java convention for copying things. I'm happy either way, it just feels unidiomtic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having a .copy
method would be cleaner, or otherwise even a comment would be fine too. Currently this code block is not obvious that it is copying due to the significant difference in pattern between the two if / else branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment on line 31 not enough here?
I switched to using a direct copy method, but I'm confused a bit here, as I had already added those comments a while ago to help keep this section clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads much better, thanks. The issue with the comment is it says "do a copy" without the logic for the copy being there. If there was a comment "Constructor will copy the passed-in array" a bit below it, that could have been fine, but the copy method seems good.
return getBucketCounts().equals(other.getBucketCounts()) | ||
&& this.getOffset() == other.getOffset() | ||
&& this.scale == other.scale; | ||
return this.scale == other.scale && sameBucketCounts(other); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you omitted the offset from hashCode()
and equals()
? I would think offset needs to be the same for two bucket sets to be strictly equal.
edit: oh I see, you are comparing across an extended window of both buckets. I suppose two bucket sets can have different offset but represent the same measurements. The buckets with the lower offset must have all zeroes up until the first non-zero bucket on the other bucket set in order for this to be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly the case. Additionally the other bucket needs to have ONLY zeros after the first bucket set ends.
We could fix this in the diff
method with a "normalize" type method that ensures the first index is populated. I'm fine going that route if you think it's cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely worth a comment in here, either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed, could use a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I had documented the other method, but in-context, it wasn't very good. Added here.
…al-benchmark-and-comparisons
…al-benchmark-and-comparisons
A few results to call out:
MapCounter
is pretty bad in memory usage, consuming a large amount and requiring GC during perf benchmarks.AdaptingCircularBufferCounter
is less memory efficient than ExplicitBucketHistogram, but way more memory efficient than MapCounter.AdaptingCircularBufferCounter
is more performant (wall-clock time) thanMapCounter
. (~95 ms [1ms stdev] vs 116ms [3ms stdev]Given ExponentialHistogram is not released publicly, the idea here is to continue experimenting with implementation and tuning parameters until we find the best available for our instrumentation package. I'd expect us to prune down implementation a bit prior to an official ExponentialHistogram release.