Fixing parquet coalescing of reads #12808

hyperbolic2346 · 2023-02-21T03:04:32Z

When digging through the new chunk writer for parquet pipelined reads, I noticed that the code changed and the read coalescing was no longer happening. Through some checking, I found that tests were issuing multiple smaller reads, specifically a read per rowgroup instead of a single read for multiple rowgroups. With this change, there is a single read per file.

I believe this is an unintentional change and a departure from the previous behavior.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…larger reads is possbile

cpp/src/io/parquet/reader_impl_preprocess.cu

hyperbolic2346 · 2023-02-22T02:32:52Z

/merge

Moving read out a layer so it is only called once so coalescing into …

bf55bd4

…larger reads is possbile

hyperbolic2346 added 3 - Ready for Review Ready for review by team code quality libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Performance Performance related issue labels Feb 21, 2023

hyperbolic2346 requested a review from a team as a code owner February 21, 2023 03:04

hyperbolic2346 self-assigned this Feb 21, 2023

hyperbolic2346 requested review from vyasr, mythrocks, ttnghia and nvdbaranec February 21, 2023 03:04

hyperbolic2346 added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 21, 2023

ttnghia reviewed Feb 21, 2023

View reviewed changes

cpp/src/io/parquet/reader_impl_preprocess.cu Show resolved Hide resolved

ttnghia approved these changes Feb 21, 2023

View reviewed changes

ttnghia mentioned this pull request Feb 21, 2023

Shuffling read into a sub function in parquet read #12809

Merged

3 tasks

Merge branch 'branch-23.04' into mwilson/parquet_reader_single_read

b408354

rapids-bot bot merged commit 904b8c7 into rapidsai:branch-23.04 Feb 22, 2023

hyperbolic2346 deleted the mwilson/parquet_reader_single_read branch February 22, 2023 02:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing parquet coalescing of reads #12808

Fixing parquet coalescing of reads #12808

hyperbolic2346 commented Feb 21, 2023

hyperbolic2346 commented Feb 22, 2023

Fixing parquet coalescing of reads #12808

Fixing parquet coalescing of reads #12808

Conversation

hyperbolic2346 commented Feb 21, 2023

Checklist

hyperbolic2346 commented Feb 22, 2023