Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add support for writing dataframes containing decimal columns to orc writer #8159

Closed
ChrisJar opened this issue May 4, 2021 · 1 comment · Fixed by #8198
Closed
Assignees
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@ChrisJar
Copy link
Contributor

ChrisJar commented May 4, 2021

Is your feature request related to a problem? Please describe.
I'd would like to take a dataframe containing columns with type decimal and write it as an orc file. Currently when I try this:

s = cudf.Series(["2.1", "4.9", "3.4", "0.2"]).astype(Decimal64Dtype(7,2))
df = cudf.DataFrame({"val":s})
df.to_orc("test.orc")

I get:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-37-607d097fdab5> in <module>
----> 1 df.to_orc("test.orc")

~/anaconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/dataframe.py in to_orc(self, fname, compression, *args, **kwargs)
   7414         from cudf.io import orc as orc
   7415 
-> 7416         orc.to_orc(self, fname, compression, *args, **kwargs)
   7417 
   7418     def stack(self, level=-1, dropna=True):

~/anaconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/io/orc.py in to_orc(df, fname, compression, enable_statistics, **kwargs)
    325             liborc.write_orc(df, file_obj, compression, enable_statistics)
    326     else:
--> 327         liborc.write_orc(df, path_or_buf, compression, enable_statistics)
    328 
    329 

cudf/_lib/orc.pyx in cudf._lib.orc.write_orc()

cudf/_lib/orc.pyx in cudf._lib.orc.write_orc()

RuntimeError: cuDF failure at: /home/nfs/cjarrett/cudf/cpp/src/io/orc/writer_impl.cu:472: Unsupported ORC type kind
@ChrisJar ChrisJar added Needs Triage Need team to review and classify feature request New feature or request labels May 4, 2021
@galipremsagar galipremsagar added cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. labels May 4, 2021
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels May 10, 2021
@kkraus14 kkraus14 removed the Python Affects Python cuDF API. label May 10, 2021
rapids-bot bot pushed a commit that referenced this issue May 18, 2021
Closes #8159, #7126

Current implementation uses an array to hold the exact size of each encoded element before the encode step. This allows us to simplify the encoding (each element encode is independent) and to allocate streams of exact size instead of the worst-case. The process is different from other types because decimal data streams do not use RLE encoding.

Will add benchmarks once data generator can produce decimal data.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Devavret Makkar (https://github.com/devavret)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)

URL: #8198
@randerzander randerzander reopened this May 19, 2021
@randerzander
Copy link
Contributor

randerzander commented May 19, 2021

Please disregard, had pulled an older conda package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants