-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] ZStandard support for Parquet writer #9056
Comments
This issue has been labeled |
Still desired |
This issue has been labeled |
Still desired |
Closes #9058, #9056 Expands nvCOMP adapter to include ZSTD compression. Adds centralized nvCOMP policy. `is_compression_enabled`. Adds centralized nvCOMP alignment utility, `compress_input_alignment_bits`. Adds centralized nvCOMP utility to get the maximum supported compression chunk size - `batched_compress_max_allowed_chunk_size`. Encoded ORC row groups are aligned based on compression requirements. Encoded Parquet pages are aligned based on compression requirements. Parquet fragment size now scales with the page size to better fit the default page size with ZSTD compression. Small refactoring around `decompress_status` for improved type safety and hopefully naming. Replaced `snappy_compress` from the Parquet writer with the nvCOMP adapter call. Vectors of `compression_result`s are initialized before compression to avoid issues with random chunk skipping due to uninitialized memory. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Jason Lowe (https://github.com/jlowe) - Jim Brennan (https://github.com/jbrennan333) - Mike Wilson (https://github.com/hyperbolic2346) - Tobias Ribizel (https://github.com/upsj) - Matthew Roeschke (https://github.com/mroeschke) URL: #11551
Is your feature request related to a problem? Please describe.
Some users wish to write Parquet data using the ZStandard compression codec rather than the Snappy codec. RAPIDS is unable to accelerate writing of these files due to the lack of support for this codec on Parquet writes.
Describe the solution you'd like
The libcudf Parquet writer APIs should support specifying the ZStandard codec as one of the possible compression codecs to use when encoding the Parquet data for writing.
The text was updated successfully, but these errors were encountered: