-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Update numeric_only
behavior in reduction APIs
#12847
[REVIEW] Update numeric_only
behavior in reduction APIs
#12847
Conversation
python/cudf/cudf/core/dataframe.py
Outdated
try: | ||
result = [ | ||
getattr(source._data[col], op)(**kwargs) | ||
for col in source._data.names | ||
] | ||
except AttributeError: | ||
raise TypeError( | ||
f"Not all column dtypes support op {op}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this try-except is still required because of decimal dtypes, but I could be wrong and it may be worth trying it out to see if it is now removable. I'm pretty sure it will fail for numeric dtypes on the statistical moments though (skew/kurt/etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this would still be needed for decimal types when numeric_only=True
:
In [1]: import cudf
In [2]: from decimal import Decimal
In [3]: s = cudf.Series([Decimal(1), Decimal(10), Decimal(2)])
In [4]: df = cudf.DataFrame({'a':s})
In [7]: df.skew(numeric_only=True)
> /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py(5856)_reduce()
-> result = [
(Pdb) c
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py:5856, in DataFrame._reduce(self, op, axis, numeric_only, **kwargs)
5855 import pdb;pdb.set_trace()
-> 5856 result = [
5857 getattr(source._data[col], op)(**kwargs)
5858 for col in source._data.names
5859 ]
5860 except AttributeError:
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py:5857, in <listcomp>(.0)
5855 import pdb;pdb.set_trace()
5856 result = [
-> 5857 getattr(source._data[col], op)(**kwargs)
5858 for col in source._data.names
5859 ]
5860 except AttributeError:
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/numerical_base.py:77, in NumericalBaseColumn.skew(self, skipna)
75 return cudf.utils.dtypes._get_nan_for_dtype(self.dtype)
---> 77 self = self.nans_to_nulls().dropna() # type: ignore
79 if len(self) < 3:
AttributeError: 'Decimal128Column' object has no attribute 'nans_to_nulls'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py:5878, in DataFrame._reduce(self, op, axis, numeric_only, **kwargs)
5877 try:
-> 5878 result = [
5879 getattr(source._data[col], op)(**kwargs)
5880 for col in source._data.names
5881 ]
5882 except AttributeError:
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py:5879, in <listcomp>(.0)
5877 try:
5878 result = [
-> 5879 getattr(source._data[col], op)(**kwargs)
5880 for col in source._data.names
5881 ]
5882 except AttributeError:
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/numerical_base.py:77, in NumericalBaseColumn.skew(self, skipna)
75 return cudf.utils.dtypes._get_nan_for_dtype(self.dtype)
---> 77 self = self.nans_to_nulls().dropna() # type: ignore
79 if len(self) < 3:
AttributeError: 'Decimal128Column' object has no attribute 'nans_to_nulls'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 df.skew(numeric_only=True)
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
76 @wraps(func)
77 def inner(*args, **kwds):
78 with self._recreate_cm():
---> 79 return func(*args, **kwds)
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/frame.py:2346, in Frame.skew(self, axis, skipna, numeric_only, **kwargs)
2343 if axis not in (0, "index", None):
2344 raise NotImplementedError("Only axis=0 is currently supported.")
-> 2346 return self._reduce(
2347 "skew",
2348 axis=axis,
2349 skipna=skipna,
2350 numeric_only=numeric_only,
2351 **kwargs,
2352 )
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
76 @wraps(func)
77 def inner(*args, **kwds):
78 with self._recreate_cm():
---> 79 return func(*args, **kwds)
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py:5883, in DataFrame._reduce(self, op, axis, numeric_only, **kwargs)
5878 result = [
5879 getattr(source._data[col], op)(**kwargs)
5880 for col in source._data.names
5881 ]
5882 except AttributeError:
-> 5883 raise TypeError(
5884 f"Not all column dtypes support op {op}"
5885 )
5886 elif any(
5887 not is_numeric_dtype(self._data[name])
5888 for name in self._data.names
5889 ):
5890 raise TypeError(
5891 "Non numeric columns passed with "
5892 "`numeric_only=False`, pass `numeric_only=True` "
5893 f"to perform DataFrame.{op}"
5894 )
TypeError: Not all column dtypes support op skew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a NotImplementedError instead of a TypeError? These operations are well-defined for decimal columns, we just haven't implemented them, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, agree. Changed to NotImplementedError
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor non-blocking question. Thanks!
python/cudf/cudf/core/dataframe.py
Outdated
try: | ||
result = [ | ||
getattr(source._data[col], op)(**kwargs) | ||
for col in source._data.names | ||
] | ||
except AttributeError: | ||
raise TypeError( | ||
f"Not all column dtypes support op {op}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a NotImplementedError instead of a TypeError? These operations are well-defined for decimal columns, we just haven't implemented them, right?
#12847 introduced support for `numeric_only`, this PR cleans up a `kurt` related pytest that was relying on the old behavior. This PR fixes 18 pytests : ``` = 413 failed, 88257 passed, 2044 skipped, 932 xfailed, 165 xpassed in 463.03s (0:07:43) = ``` On `pandas_2.0_feature_branch`: ``` = 431 failed, 88221 passed, 2044 skipped, 932 xfailed, 165 xpassed in 456.25s (0:07:36) = ```
Description
numeric_only=None
and defaults tonumeric_only=False
.level
parameter from reduction APIs to match pandas-2.0axis
defaults to match pandas-2.0 APIs.Checklist