Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Zstandard decompression in ORC reader #10873

Merged
merged 26 commits into from
May 23, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 17, 2022

Adds decompressor API for host-to-host decompression for ORC footer decompression.

Test changes:

  • nvcomp 2.3 still not used in CI, so almost no test changes;
  • Adds a Python test that assumes the Zstandard is not supported, to also pass if reading is successful.

@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels May 17, 2022
@vuule vuule added feature request New feature or request non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels May 17, 2022
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels May 17, 2022
@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #10873 (9ecc7a1) into branch-22.06 (54789ee) will increase coverage by 0.02%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.06   #10873      +/-   ##
================================================
+ Coverage         86.30%   86.32%   +0.02%     
================================================
  Files               144      144              
  Lines             22665    22668       +3     
================================================
+ Hits              19560    19569       +9     
+ Misses             3105     3099       -6     
Impacted Files Coverage Δ
python/cudf/cudf/utils/ioutils.py 79.47% <0.00%> (-0.13%) ⬇️
python/cudf/cudf/io/json.py 97.56% <0.00%> (ø)
python/cudf/cudf/core/dataframe.py 93.78% <0.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 88.78% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.79% <0.00%> (+0.22%) ⬆️
python/cudf/cudf/core/column/numerical.py 96.17% <0.00%> (+0.29%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 91.70% <0.00%> (+0.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0895c1...9ecc7a1. Read the comment docs.

@github-actions github-actions bot removed the Python Affects Python cuDF API. label May 18, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 18, 2022
@vuule vuule self-assigned this May 18, 2022
@vuule vuule added the cuIO cuIO issue label May 18, 2022
cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me after other comments are addressed.

@galipremsagar
Copy link
Contributor

rerun tests

1 similar comment
@vuule
Copy link
Contributor Author

vuule commented May 23, 2022

rerun tests

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one minor nit.


// Copy temporary output to `dst`
CUDF_CUDA_TRY(
cudaMemcpyAsync(dst.data(), d_dst.data(), dst.size(), cudaMemcpyDeviceToHost, stream.value()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cudaMemcpyAsync(dst.data(), d_dst.data(), dst.size(), cudaMemcpyDeviceToHost, stream.value()));
cudaMemcpyAsync(dst.data(), d_dst.data(), max_uncomp_page_size, cudaMemcpyDeviceToHost, stream.value()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just realized from this suggestion that we don't need to copy the whole buffer, just the portion that contains the actual output.

@vuule
Copy link
Contributor Author

vuule commented May 23, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 35f9d5e into rapidsai:branch-22.06 May 23, 2022
@vuule vuule deleted the nvcomp-zstd-orc branch May 23, 2022 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants