Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nvcomp's snappy compressor in parquet writer #8229

Merged

Conversation

devavret
Copy link
Contributor

@devavret devavret commented May 12, 2021

Adds nvcomp dependency and uses nvcomp's batched snappy compression functions in parquet writer.
Adds an environment variable LIBCUDF_USE_NVCOMP to switch between cuIO's internal snappy compressor and nvcomp's compressor.
Using nvcomp is disabled by default.
Use export LIBCUDF_USE_NVCOMP=1 to switch to nvcomp's compressor.

@devavret devavret requested review from robertmaynard and vuule May 12, 2021 21:01
@devavret devavret requested review from a team as code owners May 12, 2021 21:01
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels May 12, 2021
@devavret devavret added improvement Improvement / enhancement to an existing function and removed CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels May 12, 2021
@devavret devavret marked this pull request as draft May 12, 2021 21:01
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of C++ suggestions. Otherwise 🔥
Did not review the CMake stuff, not proficient enough for it to be useful.

cpp/src/io/parquet/parquet_gpu.hpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Outdated Show resolved Hide resolved
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels May 13, 2021
cpp/cmake/thirdparty/CUDF_GetnvCOMP.cmake Outdated Show resolved Hide resolved
cpp/cmake/cudf-config.cmake.in Outdated Show resolved Hide resolved
rapids-bot bot pushed a commit that referenced this pull request May 26, 2021
Updates the Java bindings to nvcomp to statically link libnvcomp.  This will help avoid libnvcomp v1.x and v2.x conflicts when libcudf starts pulling in libnvcomp 2.x as part of #8229.

Switching to a statically-linked libnvcomp requires a small patch to the nvcomp source, as it is hard-coded to only produce a shared library. The patch changes the target to a static library compiled with position-independent code so it can be linked into a shared object like libcudfjni.so.

Authors:
  - Jason Lowe (https://github.com/jlowe)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #8334
cpp/cmake/cudf-build-config.cmake.in Outdated Show resolved Hide resolved
cpp/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/cmake/thirdparty/CUDF_GetnvCOMP.cmake Outdated Show resolved Hide resolved
When writing statistics, there's not enough space allocated in chunk's compressed buffer.
This results in the compressed buffer being written into another chunk's memory.
@devavret
Copy link
Contributor Author

devavret commented Sep 3, 2021

rerun tests

@devavret devavret added the non-breaking Non-breaking change label Sep 3, 2021
@devavret devavret marked this pull request as ready for review September 7, 2021 22:08
@codecov
Copy link

codecov bot commented Sep 8, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@e9caed3). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 569879c differs from pull request most recent head 6721fb8. Consider uploading reports for the commit 6721fb8 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #8229   +/-   ##
===============================================
  Coverage                ?   10.81%           
===============================================
  Files                   ?      115           
  Lines                   ?    18775           
  Branches                ?        0           
===============================================
  Hits                    ?     2030           
  Misses                  ?    16745           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9caed3...6721fb8. Read the comment docs.

cpp/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/CMakeLists.txt Show resolved Hide resolved
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more minor suggestions. My main gripe is the previously posted comment on error handling.

cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Outdated Show resolved Hide resolved
@devavret
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvcomp CMake addition LGTM

@devavret
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e27675a into rapidsai:branch-21.10 Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants