Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginated store #30

Merged
merged 5 commits into from
Nov 20, 2020
Merged

Conversation

richardstartin
Copy link
Contributor

@richardstartin richardstartin commented Nov 17, 2020

What does this PR do?

This PR provides a store which avoids storing ranges of zeros by storing counts in a paginated array.

Motivation

To reduce worst case footprint when data is bimodal leading to large ranges of zeros, without affecting (or needing to reason about) relative error.

Additional Notes

I don't expect to merge this as is, because the postive-only sketch is being removed and I need somewhere to put the factory method, so expect to rebase against #28 during a review.

This can have a sizeable impact on footprint:

POISSON[0.01]/POISSON[0.99]
com.datadoghq.sketch.ddsketch.DDSketch@423158f0d footprint:
     COUNT       AVG       SUM   DESCRIPTION
       677      1040    704080   [D
         1     26224     26224   [[D
         1        48        48   com.datadoghq.sketch.ddsketch.DDSketch
         1        32        32   com.datadoghq.sketch.ddsketch.mapping.BitwiseLinearlyInterpolatedMapping
         1        24        24   com.datadoghq.sketch.ddsketch.store.PaginatedStore
       681              730408   (total)


POISSON[0.01]/POISSON[0.99]
com.datadoghq.sketch.ddsketch.DDSketch@5c1d7c70d footprint:
     COUNT       AVG       SUM   DESCRIPTION
       301      1040    313040   [D
         1      3920      3920   [[D
         1        48        48   com.datadoghq.sketch.ddsketch.DDSketch
         1        32        32   com.datadoghq.sketch.ddsketch.mapping.BitwiseLinearlyInterpolatedMapping
         1        24        24   com.datadoghq.sketch.ddsketch.store.PaginatedStore
       305              317064   (total)


POISSON[0.01]/POISSON[0.99]
com.datadoghq.sketch.ddsketch.DDSketch@4b5ee172d footprint:
     COUNT       AVG       SUM   DESCRIPTION
        36      1040     37440   [D
         1       304       304   [[D
         1        48        48   com.datadoghq.sketch.ddsketch.DDSketch
         1        32        32   com.datadoghq.sketch.ddsketch.mapping.BitwiseLinearlyInterpolatedMapping
         1        24        24   com.datadoghq.sketch.ddsketch.store.PaginatedStore
        40               37848   (total)



POISSON[0.01]/POISSON[0.99]
com.datadoghq.sketch.ddsketch.DDSketch@2271c060d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1   5255184   5255184   [D
         1        48        48   com.datadoghq.sketch.ddsketch.DDSketch
         1        32        32   com.datadoghq.sketch.ddsketch.mapping.BitwiseLinearlyInterpolatedMapping
         1        40        40   com.datadoghq.sketch.ddsketch.store.UnboundedSizeDenseStore
         4             5255304   (total)


POISSON[0.01]/POISSON[0.99]
com.datadoghq.sketch.ddsketch.DDSketch@3fbcff15d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1    647696    647696   [D
         1        48        48   com.datadoghq.sketch.ddsketch.DDSketch
         1        32        32   com.datadoghq.sketch.ddsketch.mapping.BitwiseLinearlyInterpolatedMapping
         1        40        40   com.datadoghq.sketch.ddsketch.store.UnboundedSizeDenseStore
         4              647816   (total)


POISSON[0.01]/POISSON[0.99]
com.datadoghq.sketch.ddsketch.DDSketch@7694bfabd footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1     41488     41488   [D
         1        48        48   com.datadoghq.sketch.ddsketch.DDSketch
         1        32        32   com.datadoghq.sketch.ddsketch.mapping.BitwiseLinearlyInterpolatedMapping
         1        40        40   com.datadoghq.sketch.ddsketch.store.UnboundedSizeDenseStore
         4               41608   (total)

It would also be simple to implement a much faster merge routine which could be vectorized by Hotspot/C2 in PaginatedStore which I may contribute at a later date (but I don't immediately need faster aggregation).

@richardstartin richardstartin force-pushed the paginated-store branch 3 times, most recently from 6de3b58 to cf1e5bb Compare November 17, 2020 12:19
Copy link

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only some minor questions.

@richardstartin richardstartin force-pushed the paginated-store branch 2 times, most recently from 9fde60b to 08e6d7a Compare November 18, 2020 10:52
* @param relativeAccuracy the relative accuracy guaranteed by the sketch
* @return a fast instance of {@code DDSketch}
*/
public static DDSketch fastPaginated(double relativeAccuracy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged #28, which moves preset sketches to DDSketches and tries to use more explicit method names. Should we go for bitwiseLinearlyInterpolatedPaginated maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I just rebased, perhaps we can leave it as is because the constructors allow composition for now, and, as discussed, this store doesn't come without costs for very small sketches.

@CharlesMasson CharlesMasson merged commit e67ec2e into DataDog:master Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants