BUG: is_categorical_dtype returns False for Sparse[category, nan] #35793

sbrugman · 2020-08-18T22:39:30Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
from pandas.api.types import is_categorical_dtype

s = pd.Series(
    ["a", "b", "c", "a", "b", "c"],
    dtype=pd.SparseDtype(pd.CategoricalDtype(['a', 'b', 'c']))
)

print(s.dtype)
# out: Sparse[category, nan]
print(is_categorical_dtype(s))
# out: False

Problem description

The is_categorical_dtype function returns False when sparse. This is inconsistent with other types (example for unsigned integer below).

from pandas.api.types import is_unsigned_integer_dtype

s = pd.Series(
    [0, 1, 2, 3],
    name="pd_unsigned_integer",
    dtype=pd.SparseDtype(pd.UInt8Dtype(), pd.NA)
)
print(s.dtype)
# out: Sparse[UInt8, <NA>]
print(is_unsigned_integer_dtype(s))
# out: True

s = pd.Series(
    ["a", "b", "c", "a", "b", "c"],
    dtype=pd.CategoricalDtype(['a', 'b', 'c'])
)

print(s.dtype)
# out: category
print(is_categorical_dtype(s))
# out: True

Expected Output

print(is_categorical_dtype(s))
# out: True

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : d9fff27
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 19.3.1
setuptools : 45.0.0.post20200113
Cython : None
pytest : 5.3.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.47.0

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-08-19T06:45:39Z

@sbrugman sidenote: I am assuming you are using this method yourself in one of your packages? If so, please import it from pandas.api.types (which is public) and not from pandas.core.dtypes.common (private API subject to change)

jorisvandenbossche · 2020-08-19T06:50:40Z

@sbrugman Can you describe the use case where you encountered this as a problem?
I know you mention the inconsistency with other types, but I am still wondering if this is actually a good idea to change / what the motivation would be (because for sake of consistency, there is also the option to change the other types ..)

sbrugman · 2020-08-24T19:23:40Z

@jorisvandenbossche At this moment all other types I have tested (is_complex_dtype, is_bool_dtype, is_unsigned_integer) all return their value based on the dtype of the SparseArray, where is_categorical_dtype does not. Users expect consistent behaviour.

The case that you made that a dtype check (e.g. is_***_dtype) should be informative to the API of that series makes sense to me. The user loses no functionality, as the following snippet could be used to get the current result is_sparse(series) and is_***_dtype(series.dtype.subtype).

jbrockmendel · 2022-11-29T00:51:36Z

in the majority of cases where we use is_category_dtype we really mean isinstance(dtype, CategoricalDtype). We should use that instead.

sbrugman added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 18, 2020

dsaxton added ExtensionArray Extending pandas with custom dtypes or arrays. Sparse Sparse Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 19, 2020

dsaxton mentioned this issue Aug 19, 2020

BUG: Fix is_categorical_dtype for Sparse[category] #35797

Closed

5 tasks

dsaxton added this to the 1.2 milestone Aug 19, 2020

simonjayhawkins added Needs Discussion Requires discussion from core team before further action and removed Bug labels Aug 21, 2020

dsaxton removed this from the 1.2 milestone Aug 22, 2020

jreback added this to the 1.2 milestone Sep 19, 2020

jreback modified the milestones: 1.2, Contributions Welcome Nov 20, 2020

mroeschke added the Bug label Aug 10, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

jbrockmendel mentioned this issue Apr 7, 2023

DEPR: is_categorical_dtype #52527

Merged

5 tasks

mroeschke closed this as completed in #52527 Apr 11, 2023

rhshadrach mentioned this issue Oct 4, 2023

Please consider using DeprecationWarning rather than FutureWarning for API changes #54970

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: is_categorical_dtype returns False for Sparse[category, nan] #35793

BUG: is_categorical_dtype returns False for Sparse[category, nan] #35793

sbrugman commented Aug 18, 2020 •

edited

Loading

INSTALLED VERSIONS

jorisvandenbossche commented Aug 19, 2020

jorisvandenbossche commented Aug 19, 2020

sbrugman commented Aug 24, 2020

jbrockmendel commented Nov 29, 2022

BUG: is_categorical_dtype returns False for Sparse[category, nan] #35793

BUG: is_categorical_dtype returns False for Sparse[category, nan] #35793

Comments

sbrugman commented Aug 18, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Aug 19, 2020

jorisvandenbossche commented Aug 19, 2020

sbrugman commented Aug 24, 2020

jbrockmendel commented Nov 29, 2022

sbrugman commented Aug 18, 2020 •

edited

Loading

Output of `pd.show_versions()`