-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RowGroupMetaData
information to cudf.io.read_parquet_metadata
#15320
Add RowGroupMetaData
information to cudf.io.read_parquet_metadata
#15320
Conversation
@mhaseeb123 Could you please re-target this to |
Done! |
/okay to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, couple of minor syntax comments..
Co-authored-by: GALI PREM SAGAR <[email protected]>
Thanks for the suggestions. All applied to the PR. |
/okay to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW (I'm not authorized to review Python code)
@mhaseeb123 you will have to fix the following style check failures for CI to run all the jobs: https://github.com/rapidsai/cudf/actions/runs/8349021043/job/22852262213?pr=15320#step:6:177 |
Hello @mhaseeb123, please excuse the late feedback on this. I'm also hoping to expand our C++ parquet metadata tools as part of solving #11214. Please check out read_parquet_metadata and some of the metadata processing in EDIT: this looks good! We can keep C++ changes in a follow-on PR |
/okay to test |
/merge |
/ok to test |
#15398 supersedes this PR and provides desired functionality directly from libcudf instead of relying on pyarrow. |
Superseded by #15398 |
…tract `RowGroup` information (#15398) The `cudf.io.read_parquet_metadata` is now bound to corresponding libcudf API instead of relying on pyarrow. The libcudf API now also returns high level `RowGroup` metadata to solve #11214. Added additional tests and doc updates as well. More metadata information such `min, max` values for each column in each row group can also be extracted and returned if needed. Thoughts? Recommend: Closing #15320 without merging in favor of this PR. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #15398
Description
The
cudf.io.read_parquet_metadata
now also returns a list ofRowGroupMetaData
objects for all row groups in the dataset resolving #11214. Added additional tests and doc updates as well.Checklist