[FEA] Implement basic reductions for decimal columns #7503

ChrisJar · 2021-03-03T15:43:08Z

Is your feature request related to a problem? Please describe.
I wish to be able to perform basic reductions like .sum(), .mean(), or .std() on DataFrames and Series with decimal columns.

Describe the solution you'd like
I would like to mimic the reductions available for float columns. For example:

df = cudf.DataFrame({'id': [0, 1, 1], 'val': [1.00, 1.01, 1.02]})
df.sum()

returns

id     2.00
val    3.03
dtype: float64

but if the dataframe contains a decimal column

df['val'] = cudf.Series([decimal.Decimal(x) for x in [1.00, 1.01, 1.02]], dtype=cudf.Decimal64Dtype(7,3))
df.sum()

it returns

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-7e5fdb616c56> in <module>
----> 1 df.sum()

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/dataframe.py in sum(self, axis, skipna, dtype, level, numeric_only, min_count, **kwargs)
   6063             numeric_only=numeric_only,
   6064             min_count=min_count,
-> 6065             **kwargs,
   6066         )
   6067 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/dataframe.py in _apply_support_method(self, method, axis, *args, **kwargs)
   6788             result = [
   6789                 getattr(self[col], method)(*args, **kwargs)
-> 6790                 for col in self._data.names
   6791             ]
   6792 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/dataframe.py in <listcomp>(.0)
   6788             result = [
   6789                 getattr(self[col], method)(*args, **kwargs)
-> 6790                 for col in self._data.names
   6791             ]
   6792 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/series.py in sum(self, axis, skipna, dtype, level, numeric_only, min_count, **kwargs)
   2985 
   2986         return self._column.sum(
-> 2987             skipna=skipna, dtype=dtype, min_count=min_count
   2988         )
   2989 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/column/column.py in sum(self, skipna, dtype, min_count)
   1257         self, skipna: bool = None, dtype: Dtype = None, min_count: int = 0
   1258     ):
-> 1259         raise TypeError(f"cannot perform sum with type {self.dtype}")
   1260 
   1261     def product(

TypeError: cannot perform sum with type decimal

The same error is returned for other basic reductions like .mean() and .std() as well.

The text was updated successfully, but these errors were encountered:

karthikeyann · 2021-03-03T21:47:00Z

I can't reproduce this error. I get this error on astype.
TypeError: Cannot interpret 'Decimal64Dtype(precision=7, scale=3)' as a data type

can you give details on your environment?

Following code recreated the error in branch-0.19.

import decimal
import cudf
s = cudf.Series([decimal.Decimal(x) for x in [0.5, 1.5,2.5]], dtype=cudf.Decimal64Dtype(7,3))
s.sum()

karthikeyann · 2021-03-03T21:49:27Z

@codereport added decimal type reduction in libcudf. Python interface might be needed to utilize this.

ChrisJar · 2021-03-03T22:32:45Z

I can't reproduce this error. I get this error on astype.
TypeError: Cannot interpret 'Decimal64Dtype(precision=7, scale=3)' as a data type

can you give details on your environment?

Following code recreated the error in branch-0.19.
import decimal
import cudf
s = cudf.Series([decimal.Decimal(x) for x in [0.5, 1.5,2.5]], dtype=cudf.Decimal64Dtype(7,3))
s.sum()

My apologies that astype call uses a PR that hasn't been merged yet. I'll edit the original post to your code

codereport · 2021-03-03T22:55:13Z

@codereport added decimal type reduction in libcudf. Python interface might be needed to utilize this.

Correct. sum reductions are implemented, but don't have Python bindings yet. We only support a few binary operations atm.

github-actions · 2021-04-04T15:04:37Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

ChrisJar · 2021-04-05T17:54:48Z

Closed with #7776

ChrisJar added Needs Triage Need team to review and classify feature request New feature or request labels Mar 3, 2021

kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Mar 5, 2021

github-actions bot added the inactive-30d label Apr 4, 2021

ChrisJar closed this as completed Apr 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Implement basic reductions for decimal columns #7503

[FEA] Implement basic reductions for decimal columns #7503

ChrisJar commented Mar 3, 2021 •

edited

Loading

karthikeyann commented Mar 3, 2021

karthikeyann commented Mar 3, 2021

ChrisJar commented Mar 3, 2021

codereport commented Mar 3, 2021

github-actions bot commented Apr 4, 2021

ChrisJar commented Apr 5, 2021

[FEA] Implement basic reductions for decimal columns #7503

[FEA] Implement basic reductions for decimal columns #7503

Comments

ChrisJar commented Mar 3, 2021 • edited Loading

karthikeyann commented Mar 3, 2021

karthikeyann commented Mar 3, 2021

ChrisJar commented Mar 3, 2021

codereport commented Mar 3, 2021

github-actions bot commented Apr 4, 2021

ChrisJar commented Apr 5, 2021

ChrisJar commented Mar 3, 2021 •

edited

Loading