ENH: sql support for NaN/NaT conversions #2754

jreback · 2013-01-25T18:21:21Z

UPDATE from @jorisvandenbossche:

Overview of current status writing nan values (see also tests added in #7100):

For MySQL using pymysql/MySQLdb nothing works: you get message Unknown column 'nan' in 'field list' (see also eg http://stackoverflow.com/questions/23353732/python-pandas-write-to-sql-with-nan-values)
Numeric columns:
- working for sqlite and postgres
- Only full NaN columns stay None in sqlite
Object columns (eg strings)
- for postgresql: NaN is converted to the string u'NaN', which is not really what we want
- for sqlite it is returned as None
NaT:
- postgresql: gives error on inserting "0001-255-255T00:00:00"
- sqlite3: writing works, but reading it with query returns '-001--1--1 -1:-1:-1.-00001'
MSSQL: not working with message "The supplied value is not a valid instance of data type float", see Write Null values to mssql table #8088 for more details

not sure exactly what sql expects (Nones?) rather than np.nan (or NaT)

https://groups.google.com/forum/?fromgroups#!topic/pydata/lxhnFtuzvWQ

also provide pandas datetime64[ns], instead of datetime/date types
#3532

The text was updated successfully, but these errors were encountered:

danielballan · 2013-03-07T18:12:17Z

Yes, I believe it expects Nones. I will post some code this weekend.

danielballan · 2013-03-07T18:19:27Z

I may in over my head here, actually, because performance is crucial. Would it helpful to consider #2717 ?

jreback · 2013-03-07T18:36:10Z

this is actually tricky, you might need need to do some cython to get this fast...

e.g. in pytables I had to pass a numpy rec-array because that is how it like it

how are you passing, as a ndarray of dtype=object? (or rec-array)? list?

jreback · 2013-03-07T18:47:06Z

try looking at this:

pandas/io/pytables.py/AppenableTable.write_data (write a test table and break there)

it uses this to write basically a numpy rec-array

lib.pyx/create_hdf_rows_2d

pretty sure you could do something similar
(essentially your are creating a tuple of the data, sucking it out of the frame, and then replace say missing data with whatever you need)

this is a bit more complicated, because I use a mask to avoid writing completely empty rows (which is much more common in say writing a panel)

jreback · 2013-03-15T02:42:36Z

see #3047 for a strategy of how u should deal with conversion to datetime64[ns] to make sure that u r returning NaT for missing values

(this might actually be reasonably efficient too)

jreback · 2013-03-15T02:53:22Z

you can also do this if u have the array directly

wesm commented 10 minutes ago
I'd recommend using pandas.to_datetime(df['date'])

jorisvandenbossche · 2014-04-29T06:39:54Z

Actually I think we should try to fix this before 0.14.
Basic NaN handling is really essential.

jorisvandenbossche · 2014-05-13T10:01:07Z

Overview of current status (see also tests added in #7100):

For MySQL using pymysql nothing works: you get message Unknown column 'nan' in 'field list' (see also eg http://stackoverflow.com/questions/23353732/python-pandas-write-to-sql-with-nan-values)
Numeric columns:
- working for sqlite and postgres
- Only full NaN columns stay None in sqlite
Object columns (eg strings)
- for postgresql: NaN is converted to the string u'NaN', which is not really what we want
- for sqlite it is returned as None
NaT:
- postgresql: gives error on inserting "0001-255-255T00:00:00"
- sqlite3: writing works, but reading it with query returns '-001--1--1 -1:-1:-1.-00001'
MSSQL: not working with message "The supplied value is not a valid instance of data type float", see Write Null values to mssql table #8088 for more details

jreback · 2014-05-13T10:04:46Z

in hdf I do a translation on NaN to a string that can be user specified - (as can't store the NaN directly)

I think that is a reasonable soln in case it can't be natively stored (or maybe always?)

http://pandas-docs.github.io/pandas-docs-travis/io.html#string-columns (see nan_rep)

jorisvandenbossche · 2014-05-13T10:32:33Z

But in theory it should be possible to store it natively in SQL I think, as there is the NULL value.

jreback · 2014-05-13T11:35:38Z

Their are 2 approaches you can use here:

This is what i do in sqlalchemy (I am using a custom date-time type (not Timestamp but same idea)
easy enough to convert Timestamps to datetimes directly and convert NaT to None

class DateTime(sql.types.TypeDecorator):
    """convert to/from Date type """

    impl = sql.types.DateTime

    def process_bind_param(self, value, dialect):
        f = getattr(value,'as_naive_datetime_for_db',None)
        if f is not None:
            return f()
        return value

    def process_result_value(self, value, dialect):
        if value is None: return None
        return dates.Date(value)

alternatively, you can vectorize the conversion

In [23]: df = DataFrame(dict(A = Timestamp('20130101')),index=range(3))

In [26]: df.iloc[1] = np.nan

In [27]: df
Out[27]: 
           A
0 2013-01-01
1        NaT
2 2013-01-01

[3 rows x 1 columns]

In [35]: s = df['A'].astype(object).values

In [36]: s
Out[36]: 
array([datetime.datetime(2013, 1, 1, 0, 0), nan,
       datetime.datetime(2013, 1, 1, 0, 0)], dtype=object)

In [38]: s[s==np.nan] = None

In [39]: s
Out[39]: 
array([datetime.datetime(2013, 1, 1, 0, 0), None,
       datetime.datetime(2013, 1, 1, 0, 0)], dtype=object)

jreback · 2014-05-15T13:38:49Z

@jorisvandenbossche ok...move to 0.14.1?

jorisvandenbossche · 2014-05-15T14:00:54Z

yep

jreback mentioned this issue Jun 29, 2013

write_frame fails to write NaN values (io.sql) #4034

Closed

This was referenced Jul 8, 2013

ENH: sql support #4163

Closed

NaN value in dataframe caused _write_mysql error #4199

Closed

BUG: convert nan to None before insert data into mysql #4200

Closed

This was referenced Mar 23, 2014

Feature request: specify chunksize for read_sql #2908

Closed

ENH: SQL Enhancement for the Future #6701

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Mar 25, 2014

jorisvandenbossche modified the milestones: 0.14.0, 0.15.0 Apr 29, 2014

jorisvandenbossche mentioned this issue May 9, 2014

SQL: datetime writing: add tests + issue with pymysql #7082

Merged

jorisvandenbossche changed the title ~~ENH: sql to provided NaN/NaT conversions~~ ENH: sql support for NaN/NaT conversions May 13, 2014

jorisvandenbossche mentioned this issue May 13, 2014

TST: sql add more tests (NaN handling, ..) #7100

Merged

jorisvandenbossche modified the milestones: 0.14.1, 0.14.0 May 15, 2014

jorisvandenbossche modified the milestones: 0.15.0, 0.14.1 Jul 1, 2014

maxgrenderjones mentioned this issue Jul 23, 2014

ENH: full SQL support for datetime64 values #7103

Closed

jorisvandenbossche modified the milestones: 0.15.1, 0.15.0 Aug 21, 2014

jorisvandenbossche mentioned this issue Aug 21, 2014

Write Null values to mssql table #8088

Closed

jorisvandenbossche mentioned this issue Sep 7, 2014

ENH: sql support for writing NaN + datetime64 values (GH2754, GH7103) #8208

Merged

jorisvandenbossche closed this as completed in #8208 Sep 13, 2014

kelvin22 mentioned this issue May 19, 2015

pd.NaT.date() returns datetime.date(1, 255, 255) #9513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: sql support for NaN/NaT conversions #2754

ENH: sql support for NaN/NaT conversions #2754

jreback commented Jan 25, 2013

danielballan commented Mar 7, 2013

danielballan commented Mar 7, 2013

jreback commented Mar 7, 2013

jreback commented Mar 7, 2013

jreback commented Mar 15, 2013

jreback commented Mar 15, 2013

jorisvandenbossche commented Apr 29, 2014

jorisvandenbossche commented May 13, 2014

jreback commented May 13, 2014

jorisvandenbossche commented May 13, 2014

jreback commented May 13, 2014

jreback commented May 15, 2014

jorisvandenbossche commented May 15, 2014

ENH: sql support for NaN/NaT conversions #2754

ENH: sql support for NaN/NaT conversions #2754

Comments

jreback commented Jan 25, 2013

danielballan commented Mar 7, 2013

danielballan commented Mar 7, 2013

jreback commented Mar 7, 2013

jreback commented Mar 7, 2013

jreback commented Mar 15, 2013

jreback commented Mar 15, 2013

jorisvandenbossche commented Apr 29, 2014

jorisvandenbossche commented May 13, 2014

jreback commented May 13, 2014

jorisvandenbossche commented May 13, 2014

jreback commented May 13, 2014

jreback commented May 15, 2014

jorisvandenbossche commented May 15, 2014