Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43382: [C++][Parquet] min-max Statistics doesn't work well when one of min-max is truncated #43383

Merged
merged 3 commits into from
Aug 5, 2024

Conversation

mapleFU
Copy link
Member

@mapleFU mapleFU commented Jul 23, 2024

Rationale for this change

See #43382

What changes are included in this PR?

Change stats has min-max from min || max to &&

Are these changes tested?

  • TODO

Are there any user-facing changes?

Might affect interface using HasMinMax

This PR includes breaking changes to public APIs.

@mapleFU mapleFU requested a review from wgtmac as a code owner July 23, 2024 05:22
Copy link

⚠️ GitHub issue #43382 has been automatically assigned in GitHub to PR creator.

@mapleFU
Copy link
Member Author

mapleFU commented Jul 23, 2024

This is just a try, I can also extract has-min and has-max

@mapleFU
Copy link
Member Author

mapleFU commented Jul 23, 2024

@pitrou @wgtmac would you mind take a look whether this fixing is ok or there is better api? I'll draft testing once api is stable

@wgtmac
Copy link
Member

wgtmac commented Jul 23, 2024

I'm fine to be conservative. So this change looks good. However, the encoded_min and encoded_max are still set. I'd propose to deprecate virtual bool HasMinMax() const = 0; and add virtual bool HasMin() const = 0; and virtual bool HasMax() const = 0;

@mapleFU
Copy link
Member Author

mapleFU commented Aug 3, 2024

Updated: I'll separate the implemetation into two separate patch:

  • The first one is test and remove the || in HasMinMax, changing the syntax to both has-min and has-max
  • The second one is add HasMin and HasMax.

Anyway this is a bug rather than feature so we should fix it first

@mapleFU
Copy link
Member Author

mapleFU commented Aug 5, 2024

@kou @emkornfield @wgtmac would you mind take a look? This patch only change || -> &&

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Could you uncomment <!-- **This PR includes breaking changes to public APIs.** --> in the PR description?

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Aug 5, 2024
@wgtmac wgtmac changed the title GH-43382: [C++][Parquet] min-max Statistics doesn't works well when one of min-max beging truncated GH-43382: [C++][Parquet] min-max Statistics doesn't work well when one of min-max is truncated Aug 5, 2024
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mapleFU mapleFU merged commit 66cb749 into apache:main Aug 5, 2024
35 of 39 checks passed
@mapleFU mapleFU removed the awaiting merge Awaiting merge label Aug 5, 2024
@mapleFU mapleFU deleted the fix-min-max-stats branch August 5, 2024 08:29
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 66cb749.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants