Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'datetime64[ns]' columns cause fillna function fail #3047

Closed
simomo opened this issue Mar 14, 2013 · 10 comments
Closed

'datetime64[ns]' columns cause fillna function fail #3047

simomo opened this issue Mar 14, 2013 · 10 comments
Labels
Milestone

Comments

@simomo
Copy link

simomo commented Mar 14, 2013

  • pd.version: 0.11.0.dev-156e03c
  • if dataframe had a column which had been recognized as 'datetime64[ns]' type, calling the dataframe's fillna function would cause an expection.
In [149]: df_t1 = DataFrame({'int': [1, 2, None], 'date_obj': [datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), None], 'date_date':[datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1)]})

In [150]: df_t1.dtypes
Out[150]:
date_date    datetime64[ns]
date_obj             object
int                 float64
dtype: object

In [151]: df_t1.fillna(np.nan)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-782ffd0df53e> in <module>()
----> 1 df_t1.fillna(np.nan)

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/frame.pyc in fillna(self, value, method, axis, inplace, limit)
   3385                 return result
   3386             else:
-> 3387                 new_data = self._data.fillna(value, inplace=inplace)
   3388 
   3389         if inplace:

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/internals.pyc in fillna(self, *args, **kwargs)
    944 
    945     def fillna(self, *args, **kwargs):
--> 946         return self.apply('fillna', *args, **kwargs)
    947 
    948     def astype(self, *args, **kwargs):

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
    915                 applied = f(blk, *args, **kwargs)
    916             else:
--> 917                 applied = getattr(blk,f)(*args, **kwargs)
    918 
    919             if isinstance(applied,list):

/usr/local/lib/python2.6/dist-packages/pandas-0.11.0.dev_156e03c-py2.6-linux-x86_64.egg/pandas/core/internals.pyc in fillna(self, value, inplace)
    212         new_values = self.values if inplace else self.values.copy()
    213         mask = com.isnull(new_values)
--> 214         np.putmask(new_values, mask, value)
    215 
    216         if inplace:

ValueError: Must be a datetime.date or datetime.datetime object

In [153]: del df_t1['date_date']

In [155]: df_t1.fillna(np.nan)
Out[155]:
date_obj    int
0    2012-01-01 00:00:00     1
1    2012-01-01 00:00:00     2
2    NaN    NaN
  • and i tried to convert the 'datetime64[ns]' column's type to object, but i failed
In [162]: df_t1.date_date = df_t1.date_date.astype(df_t1.date_obj.dtype)
In [164]: df_t1.dtypes
Out[164]:
date_date    datetime64[ns]
date_obj             object
int                 float64
dtype: object
@jreback
Copy link
Contributor

jreback commented Mar 14, 2013

u do realize that df.fillna(np.nan) is a no- op?
(bug non withstanding)

@simomo
Copy link
Author

simomo commented Mar 14, 2013

  • In some cases, I got None objs in columns, it's necessary to convert those None to np.nan.
In [169]: df_t1
Out[169]:
        date_date                    date_obj   int
0   2012-01-01 00:00:00  2012-01-01 00:00:00     1
1   2012-01-01 00:00:00  2012-01-01 00:00:00     2
2   2012-01-01 00:00:00  None                   NaN

@jreback
Copy link
Contributor

jreback commented Mar 14, 2013

Construct your series like this

In [27]: x = pd.Series([datetime.datetime(2012, 1, 1), 
                 datetime.datetime(2012, 1, 1), None],dtype='M8[ns]')

In [28]: x
Out[28]: 
0   2012-01-01 00:00:00
1   2012-01-01 00:00:00
2                   NaT
dtype: datetime64[ns]

The issue when you are constructing a Series with out specifying a dtype is that it is ambiguous what you want, we cannot automatically convert the missing sentinels (NaN/None) properly.

If you ONLY use NaT and datetimes/Timestamps your construction will work

In [4]: x = pd.Series([datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 2), pd.NaT])

In [5]: x
Out[5]: 
0   2012-01-01 00:00:00
1   2012-01-02 00:00:00
2                   NaT
dtype: datetime64[ns]

We allow np.nan setting after a series of dtype datetimen64[ns] has been constructed
see http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#datetimes-conversion

The following 'work', but leave you in a funny state, and as you have seen fillna doesn't know what to do

In [25]: x = pd.Series([datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), np.nan])

In [26]: x
Out[26]: 
0    2012-01-01 00:00:00
1    2012-01-01 00:00:00
2                    NaN
dtype: object

In [29]: x = pd.Series([datetime.datetime(2012, 1, 1), datetime.datetime(2012, 1, 1), None])

In [30]: x
Out[30]: 
0    2012-01-01 00:00:00
1    2012-01-01 00:00:00
2                   None
dtype: object

@michaelaye
Copy link
Contributor

If this solves your problem, simomo, you can close this issue.

@simomo
Copy link
Author

simomo commented Mar 15, 2013

  • Actually, I met this problem when working with a dataframe which are loaded from a db's table.
  • The codes of loading data from db's table:
def grab_data(table_name, size_of_page=20000):
    '''
    Grab data from a db table

    size_of_page: the second argument of sql's limit subclass
    '''
    cur.execute('select count(*) from %s' % table_name)
    records_number = cur.fetchone()[0]
    loop_number = records_number / size_of_page + 1
    print '****\nStart Grab %s\n****\nrecords_number: %s\nloop_number: %s' % (table_name, records_number, loop_number)

    start_position = 0
    df = DataFrame()  # WARNING: this dataframe object will contain all records of a table, so BE CAREFUL of the MEMORY USAGE!

    for i in range(0, loop_number):
        sql_export = 'select * from %s limit %s, %s' % (table_name, start_position, size_of_page)
        df = df.append(psql.read_frame(sql_export, conn), verify_integrity=False, ignore_index=True)

        start_position += size_of_page
        print 'start_position: %s' % start_position

    return df
  • As you see, the dataframe's constructing process are implemented by pd.io.sql module.

@jreback
Copy link
Contributor

jreback commented Mar 15, 2013

so after u read from sql, just force the date columns, something like

df['date'] = Series(df['date'].values,dtype='datetime64[ns]')

@wesm
Copy link
Member

wesm commented Mar 15, 2013

I'd recommend using pandas.to_datetime(df['date'])

@simomo
Copy link
Author

simomo commented Mar 15, 2013

  • Yes, now, i'm using the codes below to solve this problem. But, that makes me have to couple my codes to database table structures. Not so pythonic.
df['date'] = Series(df['date'].values,dtype='datetime64[ns]')

@jreback
Copy link
Contributor

jreback commented Mar 15, 2013

Practicality beats purety. that said, I have pointed the sql reader authors to this, so that it can be added. Dealing with types is non-trivial and does take some time. thanks for the report. If this solves your issue, pls code this.

@simomo simomo closed this as completed Mar 15, 2013
@simomo
Copy link
Author

simomo commented Mar 20, 2013

I've pasted the final solution here:
http://stackoverflow.com/a/15502509/807695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants