-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix total_byte_size in Parquet row group metadata #14802
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! I wonder how this impacted the chunked reader.
Co-authored-by: Vukasin Milovanovic <[email protected]>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very minor suggestion, but feel free to skip. Nice find!
/ok to test |
/ok to test |
/ok to test |
/merge |
Description
The
total_byte_size
field in the row group metadata should be "[t]otal byte size of all the uncompressed column data in this row group". cuDF currently populates this field with the compressed size. This PR fixes that and adds a test.Checklist