-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame.where with category dtype #16979
Comments
can you make a separate issue about the astype (and remove from the top from here). |
DataFrame.where
and DataFrame.astype
in DataFrames with 'category'
this could be taken up after #16821 |
This looks fixed in master (category dtype is maintained). Could use a test.
|
The |
Thanks for the guidance, @gfyoung! While trying to add a test for the
After running the code above, on some operating systems (
I cannot reproduce this behaviour locally.
INSTALLED VERSIONS ------------------ commit : 35e91f9 python : 3.6.5.final.0 python-bits : 64 OS : Darwin OS-release : 17.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8
pandas : 0.26.0.dev0+734.g0de99558b.dirty
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 39.0.1
Cython : 0.29.14
pytest : 5.2.2
hypothesis : 4.42.6
sphinx : 2.2.1
blosc : 1.8.1
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : 0.3.5
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.6.1
xarray : 0.14.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
|
@ganevgv : I would try opening a PR with this test, but add print statements to confirm whether the dtype is actually changing. It might actually be a platform thing where the dtype is already |
@gfyoung : After further investigation, I identified where the problem is coming from. When initialising df with int data w/o nans, the default dtype for all columns on some platforms (
However, if you initialise df with int data w/ nans the dtype for columns w/o nans (same platforms) is
On the other platforms ( I believe this behaviour is unrelated to this issue as it's not testing the category preservation when using |
Code Sample (it is copy-pastable)
Problem description
df.where
should work with all dtypes, the documentation doesn't say it works only for some dtypes. Also, NaNs are already correctly handled as missing data inpd.Series
of type 'category', so one should be able to assign NaNs to them. Same with converting the dtype.While writing this report I found that doing it column-by-column works correctly, so I'll use that as a workaround.
Output of
pd.show_versions()
INSTALLED VERSIONS [1/1839]
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-81-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
Ubuntu
lsb_release -a
:No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
The text was updated successfully, but these errors were encountered: