Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_hdf fails with boolean index in 1.5 #48667

Closed
3 tasks done
anthonyaag opened this issue Sep 20, 2022 · 2 comments · Fixed by #48696
Closed
3 tasks done

BUG: to_hdf fails with boolean index in 1.5 #48667

anthonyaag opened this issue Sep 20, 2022 · 2 comments · Fixed by #48696
Labels
Bug IO HDF5 read_hdf, HDFStore Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@anthonyaag
Copy link

anthonyaag commented Sep 20, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.DataFrame([[False, 1]], columns=["a", "b"]).set_index(['a']).to_hdf("/tmp/a.hdf", "a")

Issue Description

When saving to_hdf a dataframe which has a boolean index I get an assertion error from inside pytables

   4943     return IndexCol(
   4944         name, values=converted, kind=kind, typ=atom, index_name=index_name
   4945     )
   4946 else:
-> 4947     assert isinstance(converted, np.ndarray) and converted.dtype == object
   4948     assert kind == "object", kind
   4949     atom = _tables().ObjectAtom()

AssertionError:

However, when saving a dataframe with no boolean in the index this works as expected.

Additionally saving dataframes with booleans in the index worked in 1.4.4

Expected Behavior

import pandas as pd
pd.DataFrame([[False, 1]], columns=["a", "b"]).set_index(['a']).to_hdf("/tmp/a.hdf", "a")

works identically to

import pandas as pd
pd.DataFrame([['False', 1]], columns=["a", "b"]).set_index(['a']).to_hdf("/tmp/a.hdf", "a")

Installed Versions

INSTALLED VERSIONS

commit : 87cfe4e
python : 3.10.7.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.117.hrtdev
Version : #3 SMP Tue May 11 13:44:03 EDT 2021
machine : x86_64
processor :
byteorder : little
LC_ALL :
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.0
numpy : 1.23.2
pytz : 2022.1
dateutil : 2.8.2
setuptools : 62.1.0
pip : 22.2.2
Cython : 0.29.30
pytest : 7.1.2
hypothesis : 6.52.3
sphinx : 5.0.2
blosc : 1.10.6
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli : 1.0.9
fastparquet : 0.8.1
fsspec : 2022.5.0
gcsfs : None
matplotlib : 3.5.2
numba : 0.56.2
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : 1.0.9
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : 1.4.39
tables : 3.7.0
tabulate : 0.8.10
xarray : 2022.6.0
xlrd : 2.0.1
xlwt : None
zstandard : 0.18.0
tzdata : 2022.1

@anthonyaag anthonyaag added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 20, 2022
@anthonyaag anthonyaag changed the title BUG: to_hdf fails with boolean index BUG: to_hdf fails with boolean index in 1.5 Sep 20, 2022
@datapythonista datapythonista added this to the 1.5.1 milestone Sep 20, 2022
@datapythonista datapythonista added Regression Functionality that used to work in a prior pandas version IO HDF5 read_hdf, HDFStore and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 20, 2022
@anthonyaag
Copy link
Author

Seems to be a change from the dtype of the index

1.5 behavior
In [12]: pd.DataFrame([[True, 1]], columns=["a", "b"]).set_index("a").index
Out[12]: Index([True], dtype='bool', name='a')

1.4 behavior
In [11]: pd.DataFrame([[True, 1]], columns=["a", "b"]).set_index("a").index
Out[11]: Index([True], dtype='object', name='a')

@phofl
Copy link
Member

phofl commented Sep 21, 2022

ee6b0a09fff7789879c3322edffe9f84d10acee3 is the first bad commit
commit ee6b0a09fff7789879c3322edffe9f84d10acee3
Author: jbrockmendel <[email protected]>
Date:   Sat Feb 5 11:01:21 2022 -0800

    ENH: Index[bool] (#45061)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants