Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return None when Parquet page indexes are not present in file #6639

Merged
merged 4 commits into from
Nov 24, 2024

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Oct 29, 2024

Which issue does this PR close?

Part of #6447. Also see #6582.

Rationale for this change

The behavior of ParquetMetaDataReader when requesting page indexes differs between synchronous and asynchronous implementations. For historical reasons, the synchronous methods currently return empty vectors for the ColumnIndex and OffsetIndex when page indexes are requested but not present in the file. The asynchronous methods instead return None in that case.

What changes are included in this PR?

This PR changes the behavior of ParquetMetaDataReader to always return None when page indexes are requested but not present. It also changes the behavior and signatures of the legacy functions read_columns_indexes and read_offset_indexes. These will now return optional vectors set to None rather than empty vectors when page indexes are not present.

Are there any user-facing changes?

Yes, as noted above.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 29, 2024
@tustvold tustvold added api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version labels Oct 29, 2024
@tustvold tustvold merged commit 73a0c26 into apache:main Nov 24, 2024
16 checks passed
@etseidl etseidl deleted the missing_page_index branch November 25, 2024 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants