Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Zstandard decompression only when all nvcomp integrations are enabled #10944

Merged
merged 3 commits into from
May 25, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 24, 2022

Closes #9055
Limiting due to known nvcomp issues.

@vuule vuule added feature request New feature or request cuIO cuIO issue breaking Breaking change labels May 24, 2022
@vuule vuule self-assigned this May 24, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 24, 2022
@codecov
Copy link

codecov bot commented May 24, 2022

Codecov Report

Merging #10944 (abb966f) into branch-22.06 (54789ee) will increase coverage by 0.02%.
The diff coverage is 93.75%.

❗ Current head abb966f differs from pull request most recent head af35471. Consider uploading reports for the commit af35471 to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06   #10944      +/-   ##
================================================
+ Coverage         86.30%   86.32%   +0.02%     
================================================
  Files               144      144              
  Lines             22665    22668       +3     
================================================
+ Hits              19560    19569       +9     
+ Misses             3105     3099       -6     
Impacted Files Coverage Δ
python/cudf/cudf/core/indexed_frame.py 91.70% <ø> (ø)
python/cudf/cudf/utils/ioutils.py 79.47% <87.50%> (-0.13%) ⬇️
python/cudf/cudf/io/avro.py 78.57% <100.00%> (ø)
python/cudf/cudf/io/csv.py 91.80% <100.00%> (ø)
python/cudf/cudf/io/json.py 97.56% <100.00%> (ø)
python/cudf/cudf/io/orc.py 92.77% <100.00%> (ø)
python/cudf/cudf/io/parquet.py 90.83% <100.00%> (ø)
python/cudf/cudf/io/text.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/dataframe.py 93.78% <0.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 88.78% <0.00%> (+0.12%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35f9d5e...af35471. Read the comment docs.

@beckernick
Copy link
Member

Despite caveats, this (and the prior PR) should close #9055 right?

@vuule
Copy link
Contributor Author

vuule commented May 24, 2022

Despite caveats, this (and the prior PR) should close #9055 right?

I think it should. Thanks for the reminder.

@jbrennan333
Copy link
Contributor

I tested this with our spark-rapids integration test (after building with nvcomp-2.3).
With LIBCUDF_NVCOMP_POLICY=STABLE, the zstd compression tests fail with:

ai.rapids.cudf.CudfException: cuDF failure at: /rapids/spark-rapids-jni/thirdparty/cudf/cpp/src/io/comp/nvcomp_adapter.cpp:34: Unsupported compression type

When I set it to LIBCUDF_NVCOMP_POLICY=ALWAYS, the zstd compression tests pass.

../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-none] PASSED [ 93%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-uncompressed] PASSED [ 95%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-snappy] PASSED [ 96%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-gzip] PASSED [ 98%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-zstd] temp bytes 27973
temp bytes 27931
temp bytes 27955
temp bytes 27940
PASSED [100%]

1 similar comment
@jbrennan333
Copy link
Contributor

I tested this with our spark-rapids integration test (after building with nvcomp-2.3).
With LIBCUDF_NVCOMP_POLICY=STABLE, the zstd compression tests fail with:

ai.rapids.cudf.CudfException: cuDF failure at: /rapids/spark-rapids-jni/thirdparty/cudf/cpp/src/io/comp/nvcomp_adapter.cpp:34: Unsupported compression type

When I set it to LIBCUDF_NVCOMP_POLICY=ALWAYS, the zstd compression tests pass.

../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-none] PASSED [ 93%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-uncompressed] PASSED [ 95%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-snappy] PASSED [ 96%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-gzip] PASSED [ 98%]
../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[parquet-reader_confs5-zstd] temp bytes 27973
temp bytes 27931
temp bytes 27955
temp bytes 27940
PASSED [100%]

@vuule vuule marked this pull request as ready for review May 24, 2022 21:06
@vuule vuule requested a review from a team as a code owner May 24, 2022 21:06
@vuule vuule requested review from trxcllnt and ttnghia May 24, 2022 21:06
Copy link
Contributor

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 lgtm. I tested with this change locally as well.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDF_FAIL macro provides helpful context, like line numbers where the error is thrown, which are lost when the macro is replaced by fail_unsupported. I would continue using CUDF_FAIL in all cases even though the message is repeated.

throw cudf::logic_error("cuDF failure at: " __FILE__ ":" CUDF_STRINGIFY(__LINE__) ": " reason)

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that we need the extra level of indirection of fail_unsupported instead of just calling CUDF_FAIL directly, but this looks fine to me.

@vuule vuule requested a review from a team as a code owner May 24, 2022 22:53
@vuule vuule requested a review from galipremsagar May 24, 2022 22:53
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 24, 2022
@vuule vuule requested a review from bdice May 24, 2022 22:54
@vuule
Copy link
Contributor Author

vuule commented May 24, 2022

Reworked the errors completely. Should be more informative now.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed error handling offline. Future work can avoid testing for specific error messages, which was part of the motivation for wrapping the CUDF_FAIL macro in a common function.

@vuule
Copy link
Contributor Author

vuule commented May 25, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 31e1739 into rapidsai:branch-22.06 May 25, 2022
@vuule vuule deleted the fea-zstd-experimantal branch May 25, 2022 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] ZStandard codec support for Parquet reader
8 participants