Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing parquet coalescing of reads #12808

Conversation

hyperbolic2346
Copy link
Contributor

When digging through the new chunk writer for parquet pipelined reads, I noticed that the code changed and the read coalescing was no longer happening. Through some checking, I found that tests were issuing multiple smaller reads, specifically a read per rowgroup instead of a single read for multiple rowgroups. With this change, there is a single read per file.

I believe this is an unintentional change and a departure from the previous behavior.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@hyperbolic2346 hyperbolic2346 added 3 - Ready for Review Ready for review by team code quality libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Performance Performance related issue labels Feb 21, 2023
@hyperbolic2346 hyperbolic2346 requested a review from a team as a code owner February 21, 2023 03:04
@hyperbolic2346 hyperbolic2346 self-assigned this Feb 21, 2023
@hyperbolic2346 hyperbolic2346 added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 21, 2023
@hyperbolic2346
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 904b8c7 into rapidsai:branch-23.04 Feb 22, 2023
@hyperbolic2346 hyperbolic2346 deleted the mwilson/parquet_reader_single_read branch February 22, 2023 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants