Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove read_orc options num_rows and skip_rows. #11519

Closed
1 of 2 tasks
vuule opened this issue Aug 12, 2022 · 4 comments
Closed
1 of 2 tasks

Remove read_orc options num_rows and skip_rows. #11519

vuule opened this issue Aug 12, 2022 · 4 comments
Assignees
Labels
cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.

Comments

@vuule
Copy link
Contributor

vuule commented Aug 12, 2022

Options to select row range independently from stripes and row groups are inefficient, not well supported, and not supported by other readers. User can instead select stripes to partially read large files.
Parquet reader removed this feature over two releases and ORC reader can follow:

  • deprecate in 22.10
  • remove in a later release
@vuule vuule added cuIO cuIO issue improvement Improvement / enhancement to an existing function labels Aug 12, 2022
@vuule vuule self-assigned this Aug 12, 2022
@galipremsagar galipremsagar self-assigned this Aug 12, 2022
@galipremsagar
Copy link
Contributor

I opened a PR for 22.10 python deprecation: #11522

rapids-bot bot pushed a commit that referenced this issue Aug 15, 2022
Resolves the first step of #11519 by deprecating `skiprows` and `num_rows` in orc reader.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #11522
@github-actions
Copy link

github-actions bot commented Oct 7, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@vuule
Copy link
Contributor Author

vuule commented Mar 30, 2023

@GregoryKimball Should we reverse the course here? AFAICT, this kind of functionality is needed (at least internally) to implement more granular filtering.

@GregoryKimball GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. and removed inactive-30d labels Apr 2, 2023
@GregoryKimball
Copy link
Contributor

Thank you @vuule for your comment. Yes, I believe we should reverse course.

We should avoid modifying the C++ implementation. In the end we added this back to parquet in #11867.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

3 participants