BUG: df.set_index()
doesn't maintain ExtensionArray dtype
#38338
Labels
Duplicate Report
Duplicate issue or pull request
ExtensionArray
Extending pandas with custom dtypes or arrays.
Index
Related to the Index class or subclasses
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
dtype
ofExtensionArray
is lost afterset_index()
/reset_index()
.Note: I'm using cyberpandas as a "minimal" implementation of an
ExtensionArray
example (I think it's minimal enough to convey and reproduce the issue - this also happens with my ownExtensionArray
implementation which in turn is inspired by cyberpandas'IPArray
/ExtensionArray
).I found this in
Index.__new__
which always sets the Index dtype of anExtensionArray
toobject
. What's the reason for this?Problem description
Type information is only stored in dtype (and holds additional meta information). current workaround is to capture type info before
set_index()
(which I use to assign column values by index) and cleanup/fix dtype afterwards. This is potentially error prone (as all places where set_index is used need to implement the cleanup)Expected Output
isinstance(dfb.address.dtype, cyberpandas.ip_array.IPType) is True
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 67a3d42
python : 3.9.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.9.11-200.fc33.x86_64
Version : #1 SMP Tue Nov 24 18:18:01 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8
pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: