Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: is_categorical_dtype returns False for Sparse[category, nan] #35793

Closed
2 of 3 tasks
sbrugman opened this issue Aug 18, 2020 · 4 comments · Fixed by #52527
Closed
2 of 3 tasks

BUG: is_categorical_dtype returns False for Sparse[category, nan] #35793

sbrugman opened this issue Aug 18, 2020 · 4 comments · Fixed by #52527
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action Sparse Sparse Data Type

Comments

@sbrugman
Copy link
Contributor

sbrugman commented Aug 18, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
from pandas.api.types import is_categorical_dtype

s = pd.Series(
    ["a", "b", "c", "a", "b", "c"],
    dtype=pd.SparseDtype(pd.CategoricalDtype(['a', 'b', 'c']))
)

print(s.dtype)
# out: Sparse[category, nan]
print(is_categorical_dtype(s))
# out: False

Problem description

The is_categorical_dtype function returns False when sparse. This is inconsistent with other types (example for unsigned integer below).

from pandas.api.types import is_unsigned_integer_dtype

s = pd.Series(
    [0, 1, 2, 3],
    name="pd_unsigned_integer",
    dtype=pd.SparseDtype(pd.UInt8Dtype(), pd.NA)
)
print(s.dtype)
# out: Sparse[UInt8, <NA>]
print(is_unsigned_integer_dtype(s))
# out: True
s = pd.Series(
    ["a", "b", "c", "a", "b", "c"],
    dtype=pd.CategoricalDtype(['a', 'b', 'c'])
)

print(s.dtype)
# out: category
print(is_categorical_dtype(s))
# out: True

Expected Output

print(is_categorical_dtype(s))
# out: True

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 19.3.1
setuptools : 45.0.0.post20200113
Cython : None
pytest : 5.3.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.47.0

@sbrugman sbrugman added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 18, 2020
@dsaxton dsaxton added ExtensionArray Extending pandas with custom dtypes or arrays. Sparse Sparse Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 19, 2020
@dsaxton dsaxton added this to the 1.2 milestone Aug 19, 2020
@jorisvandenbossche
Copy link
Member

@sbrugman sidenote: I am assuming you are using this method yourself in one of your packages? If so, please import it from pandas.api.types (which is public) and not from pandas.core.dtypes.common (private API subject to change)

@jorisvandenbossche
Copy link
Member

@sbrugman Can you describe the use case where you encountered this as a problem?
I know you mention the inconsistency with other types, but I am still wondering if this is actually a good idea to change / what the motivation would be (because for sake of consistency, there is also the option to change the other types ..)

@simonjayhawkins simonjayhawkins added Needs Discussion Requires discussion from core team before further action and removed Bug labels Aug 21, 2020
@dsaxton dsaxton removed this from the 1.2 milestone Aug 22, 2020
@sbrugman
Copy link
Contributor Author

@jorisvandenbossche At this moment all other types I have tested (is_complex_dtype, is_bool_dtype, is_unsigned_integer) all return their value based on the dtype of the SparseArray, where is_categorical_dtype does not. Users expect consistent behaviour.

The case that you made that a dtype check (e.g. is_***_dtype) should be informative to the API of that series makes sense to me. The user loses no functionality, as the following snippet could be used to get the current result is_sparse(series) and is_***_dtype(series.dtype.subtype).

@jreback jreback added this to the 1.2 milestone Sep 19, 2020
@jreback jreback modified the milestones: 1.2, Contributions Welcome Nov 20, 2020
@mroeschke mroeschke added the Bug label Aug 10, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

in the majority of cases where we use is_category_dtype we really mean isinstance(dtype, CategoricalDtype). We should use that instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants