-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ApproxPercentileAccumulator merge api and fix bug #10056
Conversation
@@ -284,7 +284,9 @@ impl ApproxPercentileAccumulator { | |||
} | |||
|
|||
pub(crate) fn merge_digests(&mut self, digests: &[TDigest]) { | |||
self.digest = TDigest::merge_digests(digests); | |||
let mut input_digests = digests.to_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As one Accumulator call merge() should not lose it inner status, this is not a good API desgin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me, but it is hard to review without a test that demonstrates the incorrect behavior. Would it be possible to add a unit test as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree we should add a test for this fix (otherwise we may break the behavior again in a subsequent refactoring, for example)
Marking as draft as I think this PR is waiting on a test |
Sorry for the delay, add a test on digest merge add test for accumulator merge_digests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @Ted-Jiang
I ran the test without the changes in this PR and verified it failed:
---- aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator stdout ----
thread 'aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator' panicked at datafusion/physical-expr/src/aggregate/approx_percentile_cont.rs:471:9:
assertion `left == right` failed
left: 50000.0
right: 100000.0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1574 filtered out; finished in 0.01s
I think this code introduces some unecessary cloning, but that can be avoided using something like Ted-Jiang#118
@@ -284,7 +284,9 @@ impl ApproxPercentileAccumulator { | |||
} | |||
|
|||
pub(crate) fn merge_digests(&mut self, digests: &[TDigest]) { | |||
self.digest = TDigest::merge_digests(digests); | |||
let mut input_digests = digests.to_vec(); | |||
input_digests.push(self.digest.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is possible to avoid these clones -- here is a proposal that targets this PR Ted-Jiang#118 to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Ted-Jiang
Reduce cloneing in ApproxPercentileAccumulator
* improve ApproxPercentileAccumulator merge api and fix bug * add test for accumulator merge_digests * fix test * Reduce cloneing in ApproxPercentileAccumulator --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #10055.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?