-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for using tdigests to compute approximate percentiles. #8983
Support for using tdigests to compute approximate percentiles. #8983
Conversation
…e their usage when calling the groupby object.
… in python bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I see the null percentile thing hasn't been resolved yet.
Updated. Null percentile == null output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is heady stuff. Much to learn.
Again, thank you for being flexible on the null handling.
@gpucibot merge |
This PR builds on #8983 and adds Java bindings. Authors: - Andy Grove (https://github.com/andygrove) - https://github.com/nvdbaranec Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #9094
Addresses comments from initial PR (#8983). Specifically implementing a tdigest_column_view for more cleanly accessing the various sub-columns of a tdigest column. Includes several bounds checking fixes for empty groups. Addresses an issue where entirely empty digests could potentially lead to an incorrect min/max values, which isn't technically _wrong_ but makes constructing test cases tricky. Authors: - https://github.com/nvdbaranec Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Robert (Bobby) Evans (https://github.com/revans2) - Jake Hemstad (https://github.com/jrhemstad) - MithunR (https://github.com/mythrocks) - Mike Wilson (https://github.com/hyperbolic2346) URL: #9403
Addresses #7170
Adds 3 pieces of new functionality:
TDIGEST
aggregation which creates a tdigest column (https://arxiv.org/pdf/1902.04023.pdf) from a stream of input scalars.MERGE_TDIGEST
aggregation which merges multiple tdigest columns into a new one.percentile_approx
function which performs percentile queries on tdigest data.Also exposes several ::detail functions (
sort
,merge
,slice
) in detail headers.Ready for review. I do need to add more tests though.