'datetime64[ns]' columns cause fillna function fail #3047

simomo · 2013-03-14T03:13:54Z

pd.version: 0.11.0.dev-156e03c
if dataframe had a column which had been recognized as 'datetime64[ns]' type, calling the dataframe's fillna function would cause an expection.

In [149]: df_t1 = DataFrame({'int': [1, 2, None], 'date_obj': [datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), None], 'date_date':[datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1)]})

In [150]: df_t1.dtypes
Out[150]:
date_date    datetime64[ns]
date_obj             object
int                 float64
dtype: object

In [151]: df_t1.fillna(np.nan)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-782ffd0df53e> in <module>()
----> 1 df_t1.fillna(np.nan)

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/frame.pyc in fillna(self, value, method, axis, inplace, limit)
   3385                 return result
   3386             else:
-> 3387                 new_data = self._data.fillna(value, inplace=inplace)
   3388 
   3389         if inplace:

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/internals.pyc in fillna(self, *args, **kwargs)
    944 
    945     def fillna(self, *args, **kwargs):
--> 946         return self.apply('fillna', *args, **kwargs)
    947 
    948     def astype(self, *args, **kwargs):

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
    915                 applied = f(blk, *args, **kwargs)
    916             else:
--> 917                 applied = getattr(blk,f)(*args, **kwargs)
    918 
    919             if isinstance(applied,list):

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/internals.pyc in fillna(self, value, inplace)
    212         new_values = self.values if inplace else self.values.copy()
    213         mask = com.isnull(new_values)
--> 214         np.putmask(new_values, mask, value)
    215 
    216         if inplace:

ValueError: Must be a datetime.date or datetime.datetime object

In [153]: del df_t1['date_date']

In [155]: df_t1.fillna(np.nan)
Out[155]:
date_obj    int
0    2012-01-01 00:00:00     1
1    2012-01-01 00:00:00     2
2    NaN    NaN

and i tried to convert the 'datetime64[ns]' column's type to object, but i failed

In [162]: df_t1.date_date = df_t1.date_date.astype(df_t1.date_obj.dtype)
In [164]: df_t1.dtypes
Out[164]:
date_date    datetime64[ns]
date_obj             object
int                 float64
dtype: object

jreback · 2013-03-14T03:29:24Z

u do realize that df.fillna(np.nan) is a no- op?
(bug non withstanding)

simomo · 2013-03-14T03:39:24Z

In some cases, I got None objs in columns, it's necessary to convert those None to np.nan.

In [169]: df_t1
Out[169]:
        date_date                    date_obj   int
0   2012-01-01 00:00:00  2012-01-01 00:00:00     1
1   2012-01-01 00:00:00  2012-01-01 00:00:00     2
2   2012-01-01 00:00:00  None                   NaN

jreback · 2013-03-14T11:25:08Z

Construct your series like this

In [27]: x = pd.Series([datetime.datetime(2012, 1, 1), 
                 datetime.datetime(2012, 1, 1), None],dtype='M8[ns]')

In [28]: x
Out[28]: 
0   2012-01-01 00:00:00
1   2012-01-01 00:00:00
2                   NaT
dtype: datetime64[ns]

The issue when you are constructing a Series with out specifying a dtype is that it is ambiguous what you want, we cannot automatically convert the missing sentinels (NaN/None) properly.

If you ONLY use NaT and datetimes/Timestamps your construction will work

In [4]: x = pd.Series([datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 2), pd.NaT])

In [5]: x
Out[5]: 
0   2012-01-01 00:00:00
1   2012-01-02 00:00:00
2                   NaT
dtype: datetime64[ns]

We allow np.nan setting after a series of dtype datetimen64[ns] has been constructed
see http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#datetimes-conversion

The following 'work', but leave you in a funny state, and as you have seen fillna doesn't know what to do

In [25]: x = pd.Series([datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), np.nan])

In [26]: x
Out[26]: 
0    2012-01-01 00:00:00
1    2012-01-01 00:00:00
2                    NaN
dtype: object

In [29]: x = pd.Series([datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), None])

In [30]: x
Out[30]: 
0    2012-01-01 00:00:00
1    2012-01-01 00:00:00
2                   None
dtype: object

michaelaye · 2013-03-15T02:08:40Z

If this solves your problem, simomo, you can close this issue.

simomo · 2013-03-15T02:28:31Z

Actually, I met this problem when working with a dataframe which are loaded from a db's table.
The codes of loading data from db's table:

def grab_data(table_name, size_of_page=20000):
    '''
    Grab data from a db table

    size_of_page: the second argument of sql's limit subclass
    '''
    cur.execute('select count(*) from %s' % table_name)
    records_number = cur.fetchone()[0]
    loop_number = records_number / size_of_page + 1
    print '****\nStart Grab %s\n****\nrecords_number: %s\nloop_number: %s' % (table_name, records_number, loop_number)

    start_position = 0
    df = DataFrame()  # WARNING: this dataframe object will contain all records of a table, so BE CAREFUL of the MEMORY USAGE!

    for i in range(0, loop_number):
        sql_export = 'select * from %s limit %s, %s' % (table_name, start_position, size_of_page)
        df = df.append(psql.read_frame(sql_export, conn), verify_integrity=False, ignore_index=True)

        start_position += size_of_page
        print 'start_position: %s' % start_position

    return df

As you see, the dataframe's constructing process are implemented by pd.io.sql module.

jreback · 2013-03-15T02:38:12Z

so after u read from sql, just force the date columns, something like

df['date'] = Series(df['date'].values,dtype='datetime64[ns]')

wesm · 2013-03-15T02:41:52Z

I'd recommend using pandas.to_datetime(df['date'])

simomo · 2013-03-15T03:30:07Z

Yes, now, i'm using the codes below to solve this problem. But, that makes me have to couple my codes to database table structures. Not so pythonic.

df['date'] = Series(df['date'].values,dtype='datetime64[ns]')

jreback · 2013-03-15T12:21:42Z

Practicality beats purety. that said, I have pointed the sql reader authors to this, so that it can be added. Dealing with types is non-trivial and does take some time. thanks for the report. If this solves your issue, pls code this.

simomo · 2013-03-20T12:40:35Z

I've pasted the final solution here:
http://stackoverflow.com/a/15502509/807695

jreback mentioned this issue Mar 15, 2013

ENH: sql support for NaN/NaT conversions #2754

Closed

simomo closed this as completed Mar 15, 2013

TomAugspurger mentioned this issue Dec 30, 2019

PERF: regression in Series constructor #30564

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'datetime64[ns]' columns cause fillna function fail #3047

'datetime64[ns]' columns cause fillna function fail #3047

simomo commented Mar 14, 2013

jreback commented Mar 14, 2013

simomo commented Mar 14, 2013

jreback commented Mar 14, 2013

michaelaye commented Mar 15, 2013

simomo commented Mar 15, 2013

jreback commented Mar 15, 2013

wesm commented Mar 15, 2013

simomo commented Mar 15, 2013

jreback commented Mar 15, 2013

simomo commented Mar 20, 2013

'datetime64[ns]' columns cause fillna function fail #3047

'datetime64[ns]' columns cause fillna function fail #3047

Comments

simomo commented Mar 14, 2013

jreback commented Mar 14, 2013

simomo commented Mar 14, 2013

jreback commented Mar 14, 2013

michaelaye commented Mar 15, 2013

simomo commented Mar 15, 2013

jreback commented Mar 15, 2013

wesm commented Mar 15, 2013

simomo commented Mar 15, 2013

jreback commented Mar 15, 2013

simomo commented Mar 20, 2013