Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial cuIO GPU decompression refactor #10699

Merged
merged 35 commits into from
Apr 28, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Apr 21, 2022

Required to expand future nvcomp integration.

  • Moving nvcomp integration in ORC and Parquet readers to common code. Enables nvcomp use for multiple compression type without code duplication.
  • gpu_inflate_input_s refactor to facilitate unified host/device decompressor interface. Enables further changes to unify CPU and GPU decompression API, which in turn enables ZSTD use in ORC.

@vuule vuule added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 21, 2022
@vuule vuule self-assigned this Apr 21, 2022
@github-actions github-actions bot added CMake CMake build issue Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Apr 21, 2022
@github-actions github-actions bot removed the Python Affects Python cuDF API. label Apr 21, 2022
@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #10699 (f0f389e) into branch-22.06 (dc1435b) will increase coverage by 0.06%.
The diff coverage is 92.42%.

@@               Coverage Diff                @@
##           branch-22.06   #10699      +/-   ##
================================================
+ Coverage         86.37%   86.43%   +0.06%     
================================================
  Files               142      143       +1     
  Lines             22306    22444     +138     
================================================
+ Hits              19266    19399     +133     
- Misses             3040     3045       +5     
Impacted Files Coverage Δ
python/cudf/cudf/_lib/__init__.py 100.00% <ø> (ø)
python/dask_cudf/dask_cudf/io/parquet.py 92.39% <81.08%> (-1.40%) ⬇️
python/cudf/cudf/core/_internals/expressions.py 92.85% <92.85%> (ø)
python/cudf/cudf/core/dataframe.py 93.74% <93.33%> (-0.01%) ⬇️
python/cudf/cudf/core/column/lists.py 92.91% <100.00%> (+1.25%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.79% <100.00%> (+0.34%) ⬆️
python/cudf/cudf/core/indexed_frame.py 91.70% <100.00%> (ø)
...ython/dask_cudf/dask_cudf/io/tests/test_parquet.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/column/string.py 89.21% <0.00%> (+0.12%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ac2477...f0f389e. Read the comment docs.

@vuule vuule marked this pull request as ready for review April 21, 2022 20:12
@vuule vuule requested a review from a team as a code owner April 21, 2022 20:12
@vuule vuule requested review from bdice and elstehle April 25, 2022 23:20
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvements from reviews made this PR look a lot better! I have some comments attached and will approve after some of the comments are addressed.

An apology in advance -- I attempted to find/fix some places where docstrings seemed to be marking inputs as @param[out]. While this may in fact be correct in some of the functions, I'm not sure that all the docstrings are correct -- and I'm also not sure if I correctly fixed all of the errors. Similarly, there are a few device_span<device_span<T const>> that could probably be device_span<device_span<T const> const> (with an extra const on the outer span). I couldn't quite track all the functions' input/output patterns from what I saw in the diffs, and I decided it was not worth digging deeper to verify each one at the cost of delaying the review. I would suggest a very quick pass to double-check your expectations for constness of inputs/outputs/statuses.

cpp/src/io/avro/reader_impl.cu Show resolved Hide resolved
cpp/src/io/avro/reader_impl.cu Show resolved Hide resolved
cpp/src/io/comp/debrotli.cu Outdated Show resolved Hide resolved
cpp/src/io/comp/debrotli.cu Outdated Show resolved Hide resolved
cpp/src/io/comp/debrotli.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/orc_gpu.h Show resolved Hide resolved
cpp/src/io/orc/orc_gpu.h Outdated Show resolved Hide resolved
cpp/src/io/orc/orc_gpu.h Show resolved Hide resolved
cpp/src/io/orc/orc_gpu.h Show resolved Hide resolved
cpp/src/io/parquet/reader_impl.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost through. Just a few minor comments and some nitpicks.

cpp/src/io/parquet/parquet_gpu.hpp Show resolved Hide resolved
cpp/src/io/utilities/hostdevice_vector.hpp Show resolved Hide resolved
cpp/tests/io/comp/decomp_test.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/comp/nvcomp_adapter.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort, @vuule! LGTM, once remaining comments are addressed.

@vuule vuule requested a review from bdice April 27, 2022 02:06
@vuule vuule added the 0 - Waiting on Author Waiting for author to respond to review label Apr 27, 2022
@vuule vuule added 3 - Ready for Review Ready for review by team and removed 0 - Waiting on Author Waiting for author to respond to review labels Apr 27, 2022
@vuule vuule requested a review from bdice April 27, 2022 20:03
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline a bit with @vuule. The changes in 2bd4f4e are nice. 👍

@vuule vuule force-pushed the fea-nvcomp-zstd branch from 098bb77 to f0f389e Compare April 28, 2022 19:17
@vuule
Copy link
Contributor Author

vuule commented Apr 28, 2022

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants