Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support LZ4 compression in Parquet readers and writers #14495

Closed
rjzamora opened this issue Nov 27, 2023 · 0 comments · Fixed by #14906
Closed

[FEA] Support LZ4 compression in Parquet readers and writers #14495

rjzamora opened this issue Nov 27, 2023 · 0 comments · Fixed by #14906
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@rjzamora
Copy link
Member

Is your feature request related to a problem? Please describe.
LZ4 is a general-purpose compression algorithm that is known to be quite efficient on GPUs. This codec is expected to produce faster decompression speeds than Snappy on both CPU and GPU systems. For this reason, LZ4-compressed Parquet files have become more prevalent in recent years. We should support LZ4 compression in cudf.

Describe alternatives you've considered
Use Pandas to read in LZ4-compressed files, and convert to cudf after IO.

@rjzamora rjzamora added feature request New feature or request Needs Triage Need team to review and classify labels Nov 27, 2023
@rjzamora rjzamora added this to libcudf Nov 27, 2023
@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Dec 14, 2023
rapids-bot bot pushed a commit that referenced this issue Feb 14, 2024
Closes #14495

Adds support for reading and writing ORC and Parquet files with LZ4 compression.
Also adds the new value to the Python API.

Included basic C++ and Python tests so that the option is exercised in CI.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Shruti Shivakumar (https://github.com/shrshi)
  - MithunR (https://github.com/mythrocks)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #14906
@GregoryKimball GregoryKimball removed this from libcudf Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants