Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffling read into a sub function in parquet read #12809

Merged

Conversation

hyperbolic2346
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 commented Feb 21, 2023

Description

This change is the first step toward the pipelined parquet reader and moves the chunk creation and file reads into another function. Right now, the operation is the same, but this change will allow for smaller groups to be read at a time for pipelining.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

closes #12811

@hyperbolic2346 hyperbolic2346 added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 21, 2023
@hyperbolic2346 hyperbolic2346 requested a review from a team as a code owner February 21, 2023 05:16
@hyperbolic2346 hyperbolic2346 self-assigned this Feb 21, 2023
@ttnghia
Copy link
Contributor

ttnghia commented Feb 21, 2023

Should this depend on #12808?

* @param row_groups_info vector of information about row groups to read
* @param num_rows Maximum number of rows to read
* @return pair of boolean indicating if compressed chunks were found and a vector of futures for
* read completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* read completion
* read completion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to push it out as well, but pre-commit did that format and changes it back to that.

@hyperbolic2346
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 430d91e into rapidsai:branch-23.04 Feb 23, 2023
@hyperbolic2346 hyperbolic2346 deleted the mwilson/pipeline_refactor branch February 23, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor parquet read code to put chunk creation and reading into a function.
2 participants