You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
importpandasaspddf=pd.DataFrame([[1, 2], [4, 5]], columns=pd.MultiIndex.from_tuples([(True, 'B'), (False, 'C')]))
df.to_parquet('test.parquet', engine='pyarrow')
pd.read_parquet('test.parquet', engine='pyarrow') # fails# now save out with multi-index on index instead of columns:df.T.to_parquet('test.parquet', engine='pyarrow')
pd.read_parquet('test.parquet', engine='pyarrow') # succeeds# now save out with int instead of bool index:df=pd.DataFrame([[1, 2], [4, 5]], columns=pd.MultiIndex.from_tuples([(1, 'B'), (0, 'C')]))
df.to_parquet('test.parquet', engine='pyarrow')
pd.read_parquet('test.parquet', engine='pyarrow') # succeeds
Issue Description
Parquet IO with multi-index indices or columns is supported. However, if the multi-index contains a level with bools and if that multi-index is on the columns, then while the parquet can be written with the pyarrow engine, it cannot be read back in using pyarrow.
Further note that the fastparquet can neither read nor write such dataframes. There are a panoply of different errors on read/write with multi-index with fastparquet depending on whether the multi-index is on the index or columns, and whether the index has level names or not. I (or someone) should probably open separate bugs on that...
NB. the issue repros in a clean environment with only python, pip, pandas (dev), and pyarrow/fastparquet directly installed.
Expected Behavior
Parquet IO should support bool multi-index levels on columns.
The cause of the problem should be that the type of the index was not correctly marked when reading back with pyarrow, resulting in the inability to convert the index that should have been of bool type into bool type during subsequent type conversion in pandas.
Another point is the special aspect when converting the object type to the bool type. Any non-zero and non-empty values will be converted to True of the bool type, which leads to all the indexes shown in the error message being True.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Parquet IO with multi-index indices or columns is supported. However, if the multi-index contains a level with bools and if that multi-index is on the columns, then while the parquet can be written with the
pyarrow
engine, it cannot be read back in usingpyarrow
.The traceback I get is below:
Further note that the
fastparquet
can neither read nor write such dataframes. There are a panoply of different errors on read/write with multi-index withfastparquet
depending on whether the multi-index is on the index or columns, and whether the index has level names or not. I (or someone) should probably open separate bugs on that...NB. the issue repros in a clean environment with only python, pip, pandas (dev), and pyarrow/fastparquet directly installed.
Expected Behavior
Parquet IO should support bool multi-index levels on columns.
Installed Versions
The text was updated successfully, but these errors were encountered: