[FEA] Add support for writing dataframes containing decimal columns to orc writer #8159

ChrisJar · 2021-05-04T21:07:46Z

Is your feature request related to a problem? Please describe.
I'd would like to take a dataframe containing columns with type decimal and write it as an orc file. Currently when I try this:

s = cudf.Series(["2.1", "4.9", "3.4", "0.2"]).astype(Decimal64Dtype(7,2))
df = cudf.DataFrame({"val":s})
df.to_orc("test.orc")

I get:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-37-607d097fdab5> in <module>
----> 1 df.to_orc("test.orc")

~/anaconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/dataframe.py in to_orc(self, fname, compression, *args, **kwargs)
   7414         from cudf.io import orc as orc
   7415 
-> 7416         orc.to_orc(self, fname, compression, *args, **kwargs)
   7417 
   7418     def stack(self, level=-1, dropna=True):

~/anaconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/io/orc.py in to_orc(df, fname, compression, enable_statistics, **kwargs)
    325             liborc.write_orc(df, file_obj, compression, enable_statistics)
    326     else:
--> 327         liborc.write_orc(df, path_or_buf, compression, enable_statistics)
    328 
    329 

cudf/_lib/orc.pyx in cudf._lib.orc.write_orc()

cudf/_lib/orc.pyx in cudf._lib.orc.write_orc()

RuntimeError: cuDF failure at: /home/nfs/cjarrett/cudf/cpp/src/io/orc/writer_impl.cu:472: Unsupported ORC type kind

The text was updated successfully, but these errors were encountered:

Closes #8159, #7126 Current implementation uses an array to hold the exact size of each encoded element before the encode step. This allows us to simplify the encoding (each element encode is independent) and to allocate streams of exact size instead of the worst-case. The process is different from other types because decimal data streams do not use RLE encoding. Will add benchmarks once data generator can produce decimal data. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Michael Wang (https://github.com/isVoid) - Devavret Makkar (https://github.com/devavret) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) URL: #8198

randerzander · 2021-05-19T17:35:38Z

Please disregard, had pulled an older conda package.

ChrisJar added Needs Triage Need team to review and classify feature request New feature or request labels May 4, 2021

galipremsagar added cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. labels May 4, 2021

kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels May 10, 2021

kkraus14 assigned vuule May 10, 2021

kkraus14 removed the Python Affects Python cuDF API. label May 10, 2021

vuule mentioned this issue May 12, 2021

Add support for decimal types in ORC writer #8198

Merged

rapids-bot bot closed this as completed in #8198 May 18, 2021

randerzander reopened this May 19, 2021

randerzander closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add support for writing dataframes containing decimal columns to orc writer #8159

[FEA] Add support for writing dataframes containing decimal columns to orc writer #8159

ChrisJar commented May 4, 2021 •

edited

Loading

randerzander commented May 19, 2021 •

edited

Loading

[FEA] Add support for writing dataframes containing decimal columns to orc writer #8159

[FEA] Add support for writing dataframes containing decimal columns to orc writer #8159

Comments

ChrisJar commented May 4, 2021 • edited Loading

randerzander commented May 19, 2021 • edited Loading

ChrisJar commented May 4, 2021 •

edited

Loading

randerzander commented May 19, 2021 •

edited

Loading