Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cap the absolute row index per pass in parquet chunked reader. #15735

Merged

Conversation

nvdbaranec
Copy link
Contributor

Fixes #15690

There was an issue when computing page row counts/indices at the pass level in the chunked reader. Because we estimate list row counts for pages we have not yet decompressed, this can sometimes lead to estimates row counts that are larger than the actual (known) number of rows for a pass. This caused an out-of-bounds read down the line. We were already handling this at the subpass level, just not at the pass level.

Also includes some fixes in debug logging code that is #ifdef'd out.

…er. This handles the case where list row count estimation can sometimes

generate row indices past the actual end of the pass, leading to some out of bounds indexing down the road.

Also cleans up some debug code that had gotten stale.
@nvdbaranec nvdbaranec added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue non-breaking Non-breaking change labels May 13, 2024
@nvdbaranec nvdbaranec requested a review from a team as a code owner May 13, 2024 18:09
@nvdbaranec nvdbaranec requested review from vyasr and davidwendt May 13, 2024 18:09
@nvdbaranec nvdbaranec added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 13, 2024
@nvdbaranec nvdbaranec removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 13, 2024
@nvdbaranec
Copy link
Contributor Author

Spark integration tests passed.

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 15, 2024
@davidwendt
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 4e87069 into rapidsai:branch-24.06 May 16, 2024
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Memcheck error in ParquetChunkedReaderInputLimitTest.Mixed
4 participants