-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] categoricals with .ordered is None
in cudf and dask_cudf
#11487
Comments
Doesn't Pandas support https://pandas.pydata.org/docs/reference/api/pandas.CategoricalDtype.html#pandas.CategoricalDtype |
Yes, as a "please defer to other categorical ordered property during pandas-dev/pandas#26336 But, pandas |
Let me highlight this from my examples above: In [1]: import cudf
In [2]: import dask_cudf
In [5]: s = cudf.Series(4*['foo'], dtype='category')
In [6]: print(s.dtype.ordered)
False
In [7]: s2 = dask_cudf.from_cudf(s, npartitions=2)
In [8]: print(s2.dtype.ordered)
False
In [11]: print(s2.compute().dtype.ordered) # b/c finalize uses concat
None |
…11604) Fixes: #11487 This PR switches default value of `ordered` parameter in `CategoricalDtype` to `False`. This fixes some issues around concat and building categorical columns. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) - Ashwin Srinath (https://github.com/shwina) URL: #11604
Describe the bug
I expect a categorical dtype to always have
dtype.order
beTrue
orFalse
. It is sometimesNone
after construction, or converted toNone
after an operation such as concat.This can cause
dask_cudf
to fail when using workloads with categorical dtypes with the following error (the mismatch is b/c one isNone
, but there should be no mismatch):Steps/Code to reproduce bug
Expected behavior
I expect a categorical dtype to always have
dtype.order
beTrue
orFalse
.This behavior should probably be fixed here (default on line 130):
cudf/python/cudf/cudf/core/dtypes.py
Lines 130 to 132 in e1a4e03
and here (default on line 1475):
cudf/python/cudf/cudf/core/column/column.py
Lines 1468 to 1476 in e1a4e03
or fix it directly in
DataFrame._concat
here:cudf/python/cudf/cudf/core/dataframe.py
Lines 1692 to 1694 in e1a4e03
via:
cudf/python/cudf/cudf/core/dataframe.py
Lines 7300 to 7311 in e1a4e03
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
This is necessary to use categorical dtypes in cugraph's PropertyGraph: rapidsai/cugraph#2510
The text was updated successfully, but these errors were encountered: