Use nvcomp's snappy compressor in ORC writer #9242

devavret · 2021-09-16T20:26:02Z

depends on #9235

Cmake changes (excluding changes needed in nvcomp's cmake) Replace cuIO's snappy compressor with nvcomp

…or rather than a hardcoded value

When writing statistics, there's not enough space allocated in chunk's compressed buffer. This results in the compressed buffer being written into another chunk's memory.

…appy

codecov · 2021-09-16T22:12:43Z

Codecov Report

Merging #9242 (11e20e7) into branch-21.10 (3ee3ecf) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

@@               Coverage Diff                @@
##           branch-21.10    #9242      +/-   ##
================================================
- Coverage         10.85%   10.84%   -0.01%     
================================================
  Files               115      116       +1     
  Lines             19158    19171      +13     
================================================
  Hits               2080     2080              
- Misses            17078    17091      +13

Impacted Files	Coverage Δ
python/cudf/cudf/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/_lib/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/io/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/text.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/ioutils.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4defd25...11e20e7. Read the comment docs.

vuule

quick review, looks good so far

cpp/src/io/orc/writer_impl.cu

cpp/src/io/orc/stripe_enc.cu

nvdbaranec · 2021-09-20T21:44:50Z

cpp/src/io/orc/orc_gpu.h

+                            device_span<gpu_inflate_input_s> comp_in,
+                            device_span<gpu_inflate_status_s> comp_out,


Isn't it mildly worse to pass a span here if you don't ever use the size? In the sense of having to hypothetically push another argument on the stack.

As I see it, the trade-off is that with span it's clear that the parameter is an array, not a pointer to a single object. Now, whether this is a good trade-off is up for debate :D

cpp/src/io/orc/stripe_enc.cu

elstehle

LGTM. Just one question.

elstehle · 2021-09-21T07:09:41Z

cpp/src/io/orc/stripe_enc.cu

+        thrust::transform(rmm::exec_policy(stream),
+                          compressed_bytes_written.begin(),
+                          compressed_bytes_written.end(),
+                          comp_out.begin(),
+                          [] __device__(size_t size) {
+                            gpu_inflate_status_s status{};
+                            status.bytes_written = size;
+                            return status;


Could this be replaced with a thrust::transform_output_iterator in the call to nvcompBatchedSnappyCompressAsync, basically to safe allocating and materializing the compressed_bytes_written? [1]

[1] https://thrust.github.io/doc/classthrust_1_1transform__output__iterator.html

nvcompBatchedSnappyCompressAsync is a C API and doesn't take iterators. 😞

Oh, I see. Too bad! Thanks for clarifying 👍

hyperbolic2346

Looks good to me, just a couple copyrights to update.

cpp/src/io/orc/orc_common.h

cpp/src/io/orc/stripe_init.cu

devavret · 2021-09-22T18:58:00Z

@gpucibot merge

This reverts commit 08cbbcd.

devavret added 27 commits May 7, 2021 17:26

Initial changes to get nvcomp integrated

db23741

Cmake changes (excluding changes needed in nvcomp's cmake) Replace cuIO's snappy compressor with nvcomp

Using nvcomp provided max compressed buffer size

a5f3363

Recover from error in nvcomp compressing and encode uncompressed.

61018aa

review changes

64d7d1c

Replace accidental vector with uvector.

27764e7

Provide the actual max uncomp page size to nvcomp's temp size estimat…

95a57ec

…or rather than a hardcoded value

cmake changes requested in review

cc9500a

Merge branch 'branch-21.10' into parquet-writer-nvcomp-snappy

7989b9c

Merge branch 'branch-21.10' into parquet-writer-nvcomp-snappy

f90409c

Update parquet writer to use nvcomp 2.1

40ebd1e

One more cmake change related to updating nvcomp

4a2cb24

Update nvcomp to version with fix for snappy decompressor

6019b0f

Fix allocation size bug

140d3d0

When writing statistics, there's not enough space allocated in chunk's compressed buffer. This results in the compressed buffer being written into another chunk's memory.

Merge branch 'branch-21.10' into parquet-writer-nvcomp-snappy

05f5343

Update cmake to find nvcomp in new manner

62d92b4

Make nvcomp private in cmake and update get_nvcomp

3c73be3

Add an env var flip switch to choose b/w nvcomp and inbuilt compressor

e0a013d

Merge branch 'branch-21.10' into parquet-writer-nvcomp-snappy

7501b11

Static linking nvcomp into libcudf

bfa1366

Review changes

203cf15

Working orc reader with nvcomp

99e4f80

Merge changes from nvcomp -fPIC

6721fb8

Merge branch 'parquet-writer-nvcomp-snappy' into orc-reader-nvcomp-sn…

5391e13

…appy

Merge branch 'branch-21.10' into orc-reader-nvcomp-snappy

354e229

Working ORC writer with nvcomp

66d49e8

Small cleanups. Device span instead of pointers

4e78529

Here you go: range for loop

8ed68ef

devavret requested a review from a team as a code owner September 16, 2021 20:26

devavret requested review from trxcllnt and ttnghia September 16, 2021 20:26

Add switch to control usage of nvcomp

8b471de

vuule added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 17, 2021

vuule reviewed Sep 17, 2021

View reviewed changes

cpp/src/io/orc/writer_impl.cu Show resolved Hide resolved

cpp/src/io/orc/stripe_enc.cu Show resolved Hide resolved

vuule added the 4 - Needs cuIO Reviewer label Sep 20, 2021

Merge branch 'branch-21.10' into orc-writer-nvcomp-snappy

34a42c3

vuule approved these changes Sep 20, 2021

View reviewed changes

nvdbaranec requested changes Sep 20, 2021

View reviewed changes

elstehle approved these changes Sep 21, 2021

View reviewed changes

Replace magic number 3 with BLOCK_HEADER_SIZE

0569281

devavret requested a review from nvdbaranec September 21, 2021 23:54

hyperbolic2346 requested changes Sep 22, 2021

View reviewed changes

cpp/src/io/orc/orc_common.h Show resolved Hide resolved

cpp/src/io/orc/stripe_init.cu Show resolved Hide resolved

Copyright updates

11e20e7

nvdbaranec approved these changes Sep 22, 2021

View reviewed changes

devavret requested review from nvdbaranec and hyperbolic2346 September 22, 2021 18:44

hyperbolic2346 approved these changes Sep 22, 2021

View reviewed changes

rapids-bot bot merged commit 08cbbcd into rapidsai:branch-21.10 Sep 22, 2021

devavret added a commit to devavret/cudf that referenced this pull request Sep 23, 2021

Revert "Use nvcomp's snappy compressor in ORC writer (rapidsai#9242)"

34d66ca

This reverts commit 08cbbcd.

vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use nvcomp's snappy compressor in ORC writer #9242

Use nvcomp's snappy compressor in ORC writer #9242

devavret commented Sep 16, 2021 •

edited by vuule

Loading

codecov bot commented Sep 16, 2021 •

edited

Loading

vuule left a comment

nvdbaranec Sep 20, 2021

vuule Sep 20, 2021

elstehle left a comment

elstehle Sep 21, 2021

devavret Sep 21, 2021

elstehle Sep 21, 2021

hyperbolic2346 left a comment

devavret commented Sep 22, 2021

		device_span<gpu_inflate_input_s> comp_in,
		device_span<gpu_inflate_status_s> comp_out,

Use nvcomp's snappy compressor in ORC writer #9242

Use nvcomp's snappy compressor in ORC writer #9242

Conversation

devavret commented Sep 16, 2021 • edited by vuule Loading

codecov bot commented Sep 16, 2021 • edited Loading

Codecov Report

vuule left a comment

Choose a reason for hiding this comment

nvdbaranec Sep 20, 2021

Choose a reason for hiding this comment

vuule Sep 20, 2021

Choose a reason for hiding this comment

elstehle left a comment

Choose a reason for hiding this comment

elstehle Sep 21, 2021

Choose a reason for hiding this comment

devavret Sep 21, 2021

Choose a reason for hiding this comment

elstehle Sep 21, 2021

Choose a reason for hiding this comment

hyperbolic2346 left a comment

Choose a reason for hiding this comment

devavret commented Sep 22, 2021

devavret commented Sep 16, 2021 •

edited by vuule

Loading

codecov bot commented Sep 16, 2021 •

edited

Loading