-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bind read_parquet_metadata
API to libcudf instead of pyarrow and extract RowGroup
information
#15398
Bind read_parquet_metadata
API to libcudf instead of pyarrow and extract RowGroup
information
#15398
Conversation
read_parquet_metadata
API to libcudf instead of pyarrow and also extract row_group
informationread_parquet_metadata
API to libcudf instead of pyarrow and extract row_group
information
read_parquet_metadata
API to libcudf instead of pyarrow and extract row_group
informationread_parquet_metadata
API to libcudf instead of pyarrow and extract RowGroup
information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @mhaseeb123 ! Looks like there are some missing columns that is resulting in pytest failures; https://github.com/rapidsai/cudf/actions/runs/8461251922/job/23181884309?pr=15398#step:8:1758
Thanks for reviewing the code @galipremsagar. I was already on it and just pushed a fix. The test should pass now. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few small questions/suggestions
/merge |
Description
The
cudf.io.read_parquet_metadata
is now bound to corresponding libcudf API instead of relying on pyarrow. The libcudf API now also returns high levelRowGroup
metadata to solve #11214. Added additional tests and doc updates as well.More metadata information such
min, max
values for each column in each row group can also be extracted and returned if needed. Thoughts?Recommend: Closing #15320 without merging in favor of this PR.
Checklist