Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix total_byte_size in Parquet row group metadata #14802

Merged
merged 8 commits into from
Jan 23, 2024

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Jan 19, 2024

Description

The total_byte_size field in the row group metadata should be "[t]otal byte size of all the uncompressed column data in this row group". cuDF currently populates this field with the compressed size. This PR fixes that and adds a test.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@etseidl etseidl requested a review from a team as a code owner January 19, 2024 18:36
@etseidl etseidl requested review from vyasr and shrshi January 19, 2024 18:36
Copy link

copy-pr-bot bot commented Jan 19, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 19, 2024
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! I wonder how this impacted the chunked reader.

@vuule vuule requested a review from nvdbaranec January 19, 2024 19:10
Co-authored-by: Vukasin Milovanovic <[email protected]>
@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Jan 19, 2024
@vuule
Copy link
Contributor

vuule commented Jan 19, 2024

/ok to test

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very minor suggestion, but feel free to skip. Nice find!

cpp/tests/io/parquet_writer_test.cpp Outdated Show resolved Hide resolved
cpp/tests/io/parquet_writer_test.cpp Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor

vyasr commented Jan 19, 2024

/ok to test

@vuule
Copy link
Contributor

vuule commented Jan 19, 2024

/ok to test
checking if I can add to the command to add some variety to my comments

@vuule
Copy link
Contributor

vuule commented Jan 22, 2024

/ok to test

@vuule
Copy link
Contributor

vuule commented Jan 23, 2024

/merge

@rapids-bot rapids-bot bot merged commit ef3ce4b into rapidsai:branch-24.02 Jan 23, 2024
67 of 68 checks passed
@etseidl etseidl deleted the fix_row_group_size branch January 23, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants