Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ApproxPercentileAccumulator merge api and fix bug #10056

Merged
merged 5 commits into from
Apr 16, 2024

Conversation

Ted-Jiang
Copy link
Member

Which issue does this PR close?

Closes #10055.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Apr 12, 2024
@@ -284,7 +284,9 @@ impl ApproxPercentileAccumulator {
}

pub(crate) fn merge_digests(&mut self, digests: &[TDigest]) {
self.digest = TDigest::merge_digests(digests);
let mut input_digests = digests.to_vec();
Copy link
Member Author

@Ted-Jiang Ted-Jiang Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As one Accumulator call merge() should not lose it inner status, this is not a good API desgin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, but it is hard to review without a test that demonstrates the incorrect behavior. Would it be possible to add a unit test as part of this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree we should add a test for this fix (otherwise we may break the behavior again in a subsequent refactoring, for example)

@Ted-Jiang Ted-Jiang requested a review from alamb April 12, 2024 08:22
@Ted-Jiang
Copy link
Member Author

@alamb @jychen7 PTAL

@alamb alamb marked this pull request as draft April 13, 2024 13:36
@alamb
Copy link
Contributor

alamb commented Apr 13, 2024

Marking as draft as I think this PR is waiting on a test

@Ted-Jiang Ted-Jiang marked this pull request as ready for review April 15, 2024 03:29
@Ted-Jiang
Copy link
Member Author

Ted-Jiang commented Apr 15, 2024

Sorry for the delay, add a test on digest merge add test for accumulator merge_digests

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @Ted-Jiang

I ran the test without the changes in this PR and verified it failed:


---- aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator stdout ----
thread 'aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator' panicked at datafusion/physical-expr/src/aggregate/approx_percentile_cont.rs:471:9:
assertion `left == right` failed
  left: 50000.0
 right: 100000.0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    aggregate::approx_percentile_cont::tests::test_combine_approx_percentile_accumulator

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1574 filtered out; finished in 0.01s

I think this code introduces some unecessary cloning, but that can be avoided using something like Ted-Jiang#118

@@ -284,7 +284,9 @@ impl ApproxPercentileAccumulator {
}

pub(crate) fn merge_digests(&mut self, digests: &[TDigest]) {
self.digest = TDigest::merge_digests(digests);
let mut input_digests = digests.to_vec();
input_digests.push(self.digest.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to avoid these clones -- here is a proposal that targets this PR Ted-Jiang#118 to do so

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang

Reduce cloneing in ApproxPercentileAccumulator
@Ted-Jiang Ted-Jiang merged commit 74b966e into apache:main Apr 16, 2024
24 checks passed
Omega359 pushed a commit to Omega359/arrow-datafusion that referenced this pull request Apr 16, 2024
* improve ApproxPercentileAccumulator merge api and fix bug

* add test for accumulator merge_digests

* fix test

* Reduce cloneing in ApproxPercentileAccumulator

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix approx_percentile_cont_with_weight update_batch bug
3 participants