Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix alignment of compressed blocks in ORC writer #12077

Merged
merged 17 commits into from
Nov 11, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Nov 4, 2022

Description

Closes #11812
Fixed alignment of compressed blocks in ORC writer - impacted ZLIB compression.
Re-enabled nvCOMP DEFLATE compression in ORC - nvCOMP 2.5+ only.

Refactored the nvCOMP feature status(enabled/disabled in cuDF) checks to include reason why features are not enabled (if not enabled).
Refactored call sites to return the detailed error message if an operation fails because of nvCOMP integration config.
Refactored nvCOMP adapter macros to allow mocking of the parameters that determine if an nvCOMP feature is enabled (env var, GPU compute capability, nvCOMP version).
Added tests to verify the logic of the newly refactored feature status checks (allowed by the mocking above).
Fix a Parquet test that was calling ORC reader/writer 😬

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Nov 4, 2022
@vuule vuule self-assigned this Nov 4, 2022
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Nov 4, 2022
@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Base: 87.47% // Head: 88.08% // Increases project coverage by +0.60% 🎉

Coverage data is based on head (593a594) compared to base (f817d96).
Patch has no changes to coverable lines.

❗ Current head 593a594 differs from pull request most recent head a6ad48b. Consider uploading reports for the commit a6ad48b to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #12077      +/-   ##
================================================
+ Coverage         87.47%   88.08%   +0.60%     
================================================
  Files               133      135       +2     
  Lines             21826    22100     +274     
================================================
+ Hits              19093    19466     +373     
+ Misses             2733     2634      -99     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/interval.py 85.45% <0.00%> (-9.10%) ⬇️
python/cudf/cudf/io/text.py 91.66% <0.00%> (-8.34%) ⬇️
python/cudf/cudf/core/_base_index.py 81.28% <0.00%> (-4.27%) ⬇️
python/cudf/cudf/io/json.py 92.06% <0.00%> (-2.68%) ⬇️
python/cudf/cudf/utils/utils.py 89.91% <0.00%> (-0.69%) ⬇️
python/cudf/cudf/core/column/timedelta.py 90.17% <0.00%> (-0.58%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.21% <0.00%> (-0.51%) ⬇️
python/cudf/cudf/core/column/column.py 87.96% <0.00%> (-0.46%) ⬇️
python/dask_cudf/dask_cudf/core.py 73.72% <0.00%> (-0.41%) ⬇️
python/cudf/cudf/io/parquet.py 90.45% <0.00%> (-0.39%) ⬇️
... and 41 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Comment on lines +2191 to +2193
util::round_up_unsafe<size_t>(max_compressed_block_size, compressed_block_align);
auto const padded_block_header_size =
util::round_up_unsafe<size_t>(block_header_size, uncomp_block_align);
util::round_up_unsafe<size_t>(block_header_size, compressed_block_align);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual fix for the alignment. Everything else is to safely re-enable DEFLATE compression.

@vuule vuule marked this pull request as ready for review November 8, 2022 00:00
@vuule vuule requested review from a team as code owners November 8, 2022 00:00
Copy link
Contributor

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I just had a few minor comments.

cpp/src/io/comp/nvcomp_adapter.cpp Outdated Show resolved Hide resolved
cpp/tests/io/comp/decomp_test.cpp Outdated Show resolved Hide resolved
cpp/tests/io/comp/decomp_test.cpp Show resolved Hide resolved
@vuule vuule requested a review from jbrennan333 November 9, 2022 01:14
Copy link
Contributor

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 looks good to me

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we have tests now. Do we actually cover a good portion of the version/hardware cases in our testing (manual or CI)?

return NVCOMP_HAS_ZSTD_COMP and detail::nvcomp_integration::is_stable_enabled();
default: return false;
case compression_type::DEFLATE: {
if (not NVCOMP_HAS_DEFLATE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this expands to a valid runtime expression as well as being useful at compile time in #if macros? Not super familiar with using macros like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is simply a compile time text substitution, so this one will translate to if (not (MAJOR > 2 or (MAJOR == 2 and MINOR >= 5)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with using macros like this.

Yeah, this is tech from the dark ages, thankfully not commonly used any more. I unfortunately could not avoid it here :(

I think you have the right idea, this is pretty much a type-unsafe constexpr function that can be used in #if conditions. By the means of text substitution, as Mike said :D


feature_status_parameters();
feature_status_parameters(
int major, int minor, int patch, bool all_enabled, bool stable_enabled, int cc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute capabilities are not really integral, there are lots of values like 8.6 or 6.1. Does this get some kind of normalized integral value like 86? How does this interact with params.compute_capability == 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my naming is bad here.
This is just the major capability number.
cudaDeviceGetAttribute(&compute_capability, cudaDevAttrComputeCapabilityMajor, device)) returns an int so this should be fine.
I'll fix the name to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to reflect that this is the major component of compute capability.

@vuule
Copy link
Contributor Author

vuule commented Nov 10, 2022

Do we actually cover a good portion of the version/hardware cases in our testing (manual or CI)?

I don't know about HW, but we definitely only cover a single nvCOMP version.

@vuule vuule requested a review from bdice November 10, 2022 19:50
@vuule
Copy link
Contributor Author

vuule commented Nov 10, 2022

rerun tests

@vuule
Copy link
Contributor Author

vuule commented Nov 11, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d335aa3 into rapidsai:branch-22.12 Nov 11, 2022
@vuule vuule deleted the bug-nvcomp-deflate-align branch November 11, 2022 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ORC ZLIB tests fail with nvCOMP 2.4
4 participants