Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44008: [C++][Parquet] Add support for arrow::ArrayStatistics: boolean #44009

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

kou
Copy link
Member

@kou kou commented Sep 9, 2024

Rationale for this change

Statistics is useful for fast processing.

Target types:

  • Boolean

What changes are included in this PR?

Map ColumnChunkMetaData information to arrow::ArrayStatistics.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

@kou kou requested a review from wgtmac as a code owner September 9, 2024 01:54
Copy link

github-actions bot commented Sep 9, 2024

⚠️ GitHub issue #44008 has been automatically assigned in GitHub to PR creator.

@kou kou force-pushed the cpp-parquet-statistics-bool branch from 7dddc55 to 9e26ec0 Compare September 9, 2024 01:55
Comment on lines +346 to +347
array_statistics->min = static_cast<bool>(min);
array_statistics->max = static_cast<bool>(max);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is min-max useful for boolean? Do they only useful for pruning or agg on same value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If both of min and max are true, we can assume that the column only has true (or null if null_count is not 0).

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 9, 2024
@kou
Copy link
Member Author

kou commented Sep 19, 2024

I'll merge this in this week if nobody objects it.

@kou kou merged commit 192d232 into apache:main Sep 19, 2024
36 of 38 checks passed
@kou kou deleted the cpp-parquet-statistics-bool branch September 19, 2024 21:39
@kou kou removed the awaiting changes Awaiting changes label Sep 19, 2024
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 192d232.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 29 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants