Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nvcomp's snappy decompression in ORC reader #9235

Merged
merged 24 commits into from
Sep 18, 2021

Conversation

devavret
Copy link
Contributor

@devavret devavret commented Sep 15, 2021

Issue #9205

devavret added 24 commits May 7, 2021 17:26
Cmake changes (excluding changes needed in nvcomp's cmake)
Replace cuIO's snappy compressor with nvcomp
When writing statistics, there's not enough space allocated in chunk's compressed buffer.
This results in the compressed buffer being written into another chunk's memory.
@devavret devavret requested review from a team as code owners September 15, 2021 22:24
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Sep 15, 2021
@github-actions github-actions bot removed the CMake CMake build issue label Sep 15, 2021
@devavret devavret added 4 - Needs cuIO Reviewer cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change 3 - Ready for Review Ready for review by team labels Sep 15, 2021
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Just a single question regarding error handling.

num_blocks,
[=, actual_uncomp_sizes = actual_uncompressed_data_sizes.data()] __device__(auto i) {
comp_stat[i].bytes_written = actual_uncomp_sizes[i];
comp_stat[i].status = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it odd that we don't return the actual status and let the caller handle it. I assume things are in a terrible state at this point and there isn't anything useful the caller can do with this information. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended as an improvement. Previously, if status was 0 for any compression block, the reader would give up in the kernel but wouldn't throw any error.

if (shuffle((lane_id == 0) ? dec_out[num_compressed_blocks].status : 0) != 0) {
// Decompression failed, not much point in doing anything else
break;
}

We discussed that it's better to fail loudly in case we encounter a corrupt compressed chunk.

@codecov
Copy link

codecov bot commented Sep 16, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@0cf2c5f). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9235   +/-   ##
===============================================
  Coverage                ?   10.49%           
===============================================
  Files                   ?      115           
  Lines                   ?    19828           
  Branches                ?        0           
===============================================
  Hits                    ?     2080           
  Misses                  ?    17748           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0cf2c5f...354e229. Read the comment docs.

@devavret
Copy link
Contributor Author

rerun tests

@devavret devavret requested review from vuule and rgsl888prabhu and removed request for mythrocks September 16, 2021 17:13
@@ -549,6 +551,68 @@ class aggregate_orc_metadata {
}
};

void snappy_decompress(device_span<gpu_inflate_input_s> comp_in,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like we could merge this with snappy_decompress in Parquet reader, but I'm fine with leaving this for another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that but once cuIO's internal decompressors are removed completely, there's a lot more to refactor around this that I deferred it for then.

@robertmaynard robertmaynard removed the request for review from a team September 17, 2021 14:35
@robertmaynard
Copy link
Contributor

Removed CMake code owners as the PR now doesn't have any CMake changes

@vuule
Copy link
Contributor

vuule commented Sep 18, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f08d6f1 into rapidsai:branch-21.10 Sep 18, 2021
rapids-bot bot pushed a commit that referenced this pull request Sep 22, 2021
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants