Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] to_orc with snappy compression fails to compress the files #12170

Closed
mlahir1 opened this issue Nov 16, 2022 · 2 comments
Closed

[BUG] to_orc with snappy compression fails to compress the files #12170

mlahir1 opened this issue Nov 16, 2022 · 2 comments
Labels
2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@mlahir1
Copy link

mlahir1 commented Nov 16, 2022

to_orc with snappy compression fails to compress the files

Steps to repro.

  1. Loading an orc file 124 MB into cudf df
  2. writing back with snappy and without compressions.

Both the file sizes are inflated and about the same (223MB and 224MB)

df = cudf.read_orc("test.orc", use_index=False)
df.to_orc("test_snappy.orc", compression='snappy')
df.to_orc("test_nc.orc", compression=None)

$ ls | grep orc | grep test
-rw-r--r--. 1 root root 124M Nov 16 15:30 test.orc
-rw-r--r--. 1 root root 223M Nov 16 18:10 test_nc.orc
-rw-r--r--. 1 root root 224M Nov 16 18:10 test_snappy.

Expected behavior,

The same above experiment performed with rapids 22.06 gives a compressed file size. However it is still inflated and > 124MB

-rw-r--r--. 1 root root 124M Nov 16 15:30 test.orc
-rw-r--r--. 1 root root 223M Nov 16 18:17 test_nc.orc
-rw-r--r--. 1 root root 168M Nov 16 18:17 test_snappy.orc
@mlahir1 mlahir1 added Needs Triage Need team to review and classify bug Something isn't working labels Nov 16, 2022
@GregoryKimball
Copy link
Contributor

Thank you @mlahir1 for raising this issue. It looks like we have a logic error blocking the compressor from running. We expect that #12194 will resolve this issue.

@GregoryKimball GregoryKimball added 2 - In Progress Currently a work in progress cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Nov 19, 2022
@GregoryKimball GregoryKimball added the libcudf Affects libcudf (C++/CUDA) code. label Apr 2, 2023
@GregoryKimball
Copy link
Contributor

Closed by #12194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

2 participants