-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use nvcomp's snappy decompression in ORC reader #9235
Use nvcomp's snappy decompression in ORC reader #9235
Conversation
Cmake changes (excluding changes needed in nvcomp's cmake) Replace cuIO's snappy compressor with nvcomp
…or rather than a hardcoded value
When writing statistics, there's not enough space allocated in chunk's compressed buffer. This results in the compressed buffer being written into another chunk's memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. Just a single question regarding error handling.
num_blocks, | ||
[=, actual_uncomp_sizes = actual_uncompressed_data_sizes.data()] __device__(auto i) { | ||
comp_stat[i].bytes_written = actual_uncomp_sizes[i]; | ||
comp_stat[i].status = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it odd that we don't return the actual status and let the caller handle it. I assume things are in a terrible state at this point and there isn't anything useful the caller can do with this information. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended as an improvement. Previously, if status was 0 for any compression block, the reader would give up in the kernel but wouldn't throw any error.
cudf/cpp/src/io/orc/stripe_init.cu
Lines 160 to 163 in 99e4f80
if (shuffle((lane_id == 0) ? dec_out[num_compressed_blocks].status : 0) != 0) { | |
// Decompression failed, not much point in doing anything else | |
break; | |
} |
We discussed that it's better to fail loudly in case we encounter a corrupt compressed chunk.
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #9235 +/- ##
===============================================
Coverage ? 10.49%
===============================================
Files ? 115
Lines ? 19828
Branches ? 0
===============================================
Hits ? 2080
Misses ? 17748
Partials ? 0 Continue to review full report at Codecov.
|
rerun tests |
@@ -549,6 +551,68 @@ class aggregate_orc_metadata { | |||
} | |||
}; | |||
|
|||
void snappy_decompress(device_span<gpu_inflate_input_s> comp_in, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like we could merge this with snappy_decompress in Parquet reader, but I'm fine with leaving this for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that but once cuIO's internal decompressors are removed completely, there's a lot more to refactor around this that I deferred it for then.
Removed CMake code owners as the PR now doesn't have any CMake changes |
@gpucibot merge |
Issue #9205 depends on #9235 Authors: - Devavret Makkar (https://github.com/devavret) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Elias Stehle (https://github.com/elstehle) - https://github.com/nvdbaranec - Mike Wilson (https://github.com/hyperbolic2346) URL: #9242
Issue #9205