Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dictionary encoded page offsets in parquet writer #19032

Merged
merged 1 commit into from
Sep 14, 2023

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Sep 13, 2023

Description

When dictionary is present, dictionary_page_offset should be populated and data_page_offset should be
offset of first data page.
Setting these values correctly allows the parquet reader to avoid reading excess data in
PredicateUtils#readDictionaryPage for checking if a predicate can prune a row group based on dictionary values.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive, Iceberg, Delta
* Reduce data read from parquet files for queries with filters. ({issue}`19032`)

When dictionary is present, dictionary_page_offset should be
populated and data_page_offset should be offset of first data page.
Setting these values correctly allows the parquet reader to avoid
reading excess data in PredicateUtils#readDictionaryPage for checking
if a predicate can prune a row group based on dictionary values.
@raunaqmorarka raunaqmorarka merged commit 8392509 into trinodb:master Sep 14, 2023
@raunaqmorarka raunaqmorarka deleted the pqr-fix-offset branch September 14, 2023 18:31
@github-actions github-actions bot added this to the 427 milestone Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants