agg() function on groupby dataframe changes dtype of datetime64[ns] column to float64 if all items in a single group are NaT #12821
Labels
Bug
Datetime
Datetime data dtype
Groupby
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone
The example below shows two variations of a dataframe which contains a date column set to datetime64[ns] format.
In the first example, there is a single missing (NaT) date. After groupby and agg(), the dtypes of all the columns in the aggregated dataframe are the same as the original dataframe, as expected (and as desired).
However, in the second example, there are several missing dates, arranged so that all the dates in one group are NaT. After the same groupby and agg() procedures, the dtype of the date column is changed to float64. This is undesired behaviour in my situation and I believe it is a bug.
Code Sample, a copy-pastable example if possible
Expected Output
The expected output would be for the dtypes in the dataframe after aggregation to be the same as those in the original dataframe.
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
pandas: 0.17.0
nose: 1.3.7
pip: 1.5.6
setuptools: 3.6
Cython: None
numpy: 1.10.1
scipy: None
statsmodels: None
IPython: 4.0.0
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: 2.3.0
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: