Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor host decompression in ORC reader #10764

Merged
merged 24 commits into from
May 16, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 2, 2022

Another prequel to ORC Zstandard support.
Irons out the various decompression interfaces in cuIO:

  • Removes redundant compression type enum.
  • Replaces HostDecompressor classes with free functions.
  • API improvements - span use, replace error codes/invalid return values with CUDF_EXPECTS.
  • Use uint8_t consistently as the raw data type.

@vuule vuule added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 2, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 2, 2022
@codecov
Copy link

codecov bot commented May 2, 2022

Codecov Report

Merging #10764 (14952a1) into branch-22.06 (e0d94f3) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           branch-22.06   #10764      +/-   ##
================================================
+ Coverage         86.28%   86.32%   +0.03%     
================================================
  Files               144      144              
  Lines             22654    22656       +2     
================================================
+ Hits              19548    19557       +9     
+ Misses             3106     3099       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/resample.py 88.97% <ø> (ø)
python/dask_cudf/dask_cudf/groupby.py 97.42% <100.00%> (+0.01%) ⬆️
python/dask_cudf/dask_cudf/tests/test_groupby.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/dataframe.py 93.78% <0.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 88.78% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.79% <0.00%> (+0.22%) ⬆️
python/dask_cudf/dask_cudf/core.py 73.62% <0.00%> (+0.26%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 91.70% <0.00%> (+0.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ad1e51...14952a1. Read the comment docs.

@vuule vuule marked this pull request as ready for review May 5, 2022 21:17
@vuule vuule requested a review from a team as a code owner May 5, 2022 21:17
@vuule vuule requested review from trxcllnt and nvdbaranec May 5, 2022 21:17
@vuule vuule self-assigned this May 5, 2022
@vuule vuule added the cuIO cuIO issue label May 5, 2022
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this use of host_span and constexpr for the header size. This is looking good.

cpp/src/io/csv/reader_impl.cu Show resolved Hide resolved
cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments below. Nice refactoring overall! I am just going to leave this as a comment rather than blocking by requesting changes. Let me know if you would like me to review again / need a second approval.

cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
cpp/src/io/comp/uncomp.cpp Show resolved Hide resolved
cpp/src/io/json/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/json/reader_impl.cu Show resolved Hide resolved
cpp/src/io/orc/orc.cpp Outdated Show resolved Hide resolved
cpp/src/io/orc/orc.cpp Outdated Show resolved Hide resolved
@bdice bdice changed the title Refactor host decompression in ORC readar Refactor host decompression in ORC reader May 13, 2022
@vuule vuule requested review from hyperbolic2346 and bdice May 13, 2022 06:28
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Thanks for addressing the suggestions and giving context for the pieces I didn't fully understand. Net negative PRs are always happy. 🌻

@vuule
Copy link
Contributor Author

vuule commented May 16, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 712e77f into rapidsai:branch-22.06 May 16, 2022
@vuule vuule deleted the orc-host-decomp branch August 10, 2023 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants