Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orc + zstd compression support[FEA] #3240

Closed
sauravdev opened this issue Aug 17, 2021 · 2 comments
Closed

orc + zstd compression support[FEA] #3240

sauravdev opened this issue Aug 17, 2021 · 2 comments
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request

Comments

@sauravdev
Copy link

Is your feature request related to a problem? Please describe.
zstd compression is often used as compression ratio is approx 2x than snappy , so it saves a lot of cost when data is in petabyte scale

Describe the solution you'd like
support zstd compression for ORC

Describe alternatives you've considered
NA

Additional context
https://nvidia.github.io/spark-rapids/docs/compatibility.html

@sauravdev sauravdev added ? - Needs Triage Need team to review and classify feature request New feature or request labels Aug 17, 2021
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Aug 17, 2021
@jlowe
Copy link
Contributor

jlowe commented Aug 17, 2021

Thanks for the feature request! In order for the RAPIDS Accelerator to support Zstandard for ORC it must be supported in cudf, as cudf provides the GPU backend for ORC reads and writes. I filed cudf issues rapidsai/cudf#9057 and rapidsai/cudf#9058 to track Zstandard support for ORC reads and writes, respectively.

Note that implementing a performant GPU version of the Zstandard codec will take some time. Once the support is available in cudf, it should be very straightforward to support it in the RAPIDS Accelerator for Apache Spark.

@jlowe jlowe added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Aug 17, 2021
@jlowe
Copy link
Contributor

jlowe commented Feb 28, 2024

This has been implemented.

@jlowe jlowe closed this as completed Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants