Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement basic reductions for decimal columns #7503

Closed
ChrisJar opened this issue Mar 3, 2021 · 6 comments
Closed

[FEA] Implement basic reductions for decimal columns #7503

ChrisJar opened this issue Mar 3, 2021 · 6 comments
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@ChrisJar
Copy link
Contributor

ChrisJar commented Mar 3, 2021

Is your feature request related to a problem? Please describe.
I wish to be able to perform basic reductions like .sum(), .mean(), or .std() on DataFrames and Series with decimal columns.

Describe the solution you'd like
I would like to mimic the reductions available for float columns. For example:

df = cudf.DataFrame({'id': [0, 1, 1], 'val': [1.00, 1.01, 1.02]})
df.sum()

returns

id     2.00
val    3.03
dtype: float64

but if the dataframe contains a decimal column

df['val'] = cudf.Series([decimal.Decimal(x) for x in [1.00, 1.01, 1.02]], dtype=cudf.Decimal64Dtype(7,3))
df.sum()

it returns

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-7e5fdb616c56> in <module>
----> 1 df.sum()

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/dataframe.py in sum(self, axis, skipna, dtype, level, numeric_only, min_count, **kwargs)
   6063             numeric_only=numeric_only,
   6064             min_count=min_count,
-> 6065             **kwargs,
   6066         )
   6067 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/dataframe.py in _apply_support_method(self, method, axis, *args, **kwargs)
   6788             result = [
   6789                 getattr(self[col], method)(*args, **kwargs)
-> 6790                 for col in self._data.names
   6791             ]
   6792 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/dataframe.py in <listcomp>(.0)
   6788             result = [
   6789                 getattr(self[col], method)(*args, **kwargs)
-> 6790                 for col in self._data.names
   6791             ]
   6792 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/series.py in sum(self, axis, skipna, dtype, level, numeric_only, min_count, **kwargs)
   2985 
   2986         return self._column.sum(
-> 2987             skipna=skipna, dtype=dtype, min_count=min_count
   2988         )
   2989 

/home/u00u7rh1e72hXfsipJ357/miniconda3/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/column/column.py in sum(self, skipna, dtype, min_count)
   1257         self, skipna: bool = None, dtype: Dtype = None, min_count: int = 0
   1258     ):
-> 1259         raise TypeError(f"cannot perform sum with type {self.dtype}")
   1260 
   1261     def product(

TypeError: cannot perform sum with type decimal

The same error is returned for other basic reductions like .mean() and .std() as well.

@ChrisJar ChrisJar added Needs Triage Need team to review and classify feature request New feature or request labels Mar 3, 2021
@karthikeyann
Copy link
Contributor

I can't reproduce this error. I get this error on astype.
TypeError: Cannot interpret 'Decimal64Dtype(precision=7, scale=3)' as a data type

can you give details on your environment?

Following code recreated the error in branch-0.19.

import decimal
import cudf
s = cudf.Series([decimal.Decimal(x) for x in [0.5, 1.5,2.5]], dtype=cudf.Decimal64Dtype(7,3))
s.sum()

@karthikeyann
Copy link
Contributor

@codereport added decimal type reduction in libcudf. Python interface might be needed to utilize this.

@ChrisJar
Copy link
Contributor Author

ChrisJar commented Mar 3, 2021

I can't reproduce this error. I get this error on astype.
TypeError: Cannot interpret 'Decimal64Dtype(precision=7, scale=3)' as a data type

can you give details on your environment?

Following code recreated the error in branch-0.19.

import decimal
import cudf
s = cudf.Series([decimal.Decimal(x) for x in [0.5, 1.5,2.5]], dtype=cudf.Decimal64Dtype(7,3))
s.sum()

My apologies that astype call uses a PR that hasn't been merged yet. I'll edit the original post to your code

@codereport
Copy link
Contributor

@codereport added decimal type reduction in libcudf. Python interface might be needed to utilize this.

Correct. sum reductions are implemented, but don't have Python bindings yet. We only support a few binary operations atm.

@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Mar 5, 2021
@github-actions
Copy link

github-actions bot commented Apr 4, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@ChrisJar
Copy link
Contributor Author

ChrisJar commented Apr 5, 2021

Closed with #7776

@ChrisJar ChrisJar closed this as completed Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

4 participants