Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: error in handling a sqlalchemy type with arguments (instantiated type, not class) #9083

Closed
pilotstew opened this issue Dec 15, 2014 · 13 comments · Fixed by #9138
Closed
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@pilotstew
Copy link

I'm trying to use DataFrame().to_sql to input a time aware dataframe series. Here is an example of my code.

times = ['201412120154', '201412110254']

df = pd.DataFrame()
df['time'] = pd.to_datetime(times, utc=True)

df.time.to_sql('test', engine, dtype={'time': sqlalchemy.TIMESTAMP(timezone=True)})

The error I recieve is:

TypeError: issubclass() arg 1 must be a class

The following code works but obviously results in a postgresql column that is not timezone aware.

times = ['201412120154', '201412110254']

df = pd.DataFrame()
df['time'] = pd.to_datetime(times, utc=True)

df.time.to_sql('test', engine, dtype={'time': sqlalchemy.TIMESTAMP})

I'm using python 2.7, pandas 0.15.2, postsgresql 9.3 and SQLAlchemy 0.9.7. This same issue also occurs with sqlalchemy.DATETIME(timezone=True) vs sqlalchemy.DATETIME

Full Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-161-ec79a553e6d0> in <module>()
----> 1 df.time.to_sql('test', w.engine, dtype={'time': sqlalchemy.TIMESTAMP(timezone=True)})

/home/stew/.pyenv/versions/p276/lib/python2.7/site-packages/pandas/core/generic.pyc in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    964             self, name, con, flavor=flavor, schema=schema, if_exists=if_exists,   
    965             index=index, index_label=index_label, chunksize=chunksize,
--> 966             dtype=dtype)
    967 
    968     def to_pickle(self, path):

/home/stew/.pyenv/versions/p276/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    536     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    537                       index_label=index_label, schema=schema,
--> 538                       chunksize=chunksize, dtype=dtype)
    539 
    540 

/home/stew/.pyenv/versions/p276/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
   1162             import sqlalchemy.sql.type_api as type_api
   1163             for col, my_type in dtype.items():
-> 1164                 if not issubclass(my_type, type_api.TypeEngine):
   1165                     raise ValueError('The type of %s is not a SQLAlchemy '
   1166                                      'type ' % col)

TypeError: issubclass() arg 1 must be a class
@jorisvandenbossche
Copy link
Member

@pilotstew Thanks for the report!

There appear to be a few problems:

  • first, the fact that the dtype argument does not work is an oversight. This feature was recently introduced (ENH: dtype costumization on to_sql (GH8778) #8926), and only tested with sqlalchemy types (as sqlalchemy.TIMESTAMP) without providing more arguments to it (as sqlalchemy.TIMESTAMP(timezone=True)). So this should be fixed.
  • second, this should work out of the box I think. There is even code for this (https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L915), but this is not working because when you have a timezone, pandas now treats it as an object column of Timestamp objects (converted to TEXT), because datetime64 dtype does not support timezones.

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 15, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.16.0 milestone Dec 15, 2014
@jorisvandenbossche
Copy link
Member

@pilotstew You can solve it for now using this monkey-patch:

from pandas.io.sql import SQLTable

def to_sql(self, frame, name, if_exists='fail', index=True,
           index_label=None, schema=None, chunksize=None, dtype=None):
    """
    patched version of https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L1129
    """
    if dtype is not None:
        from sqlalchemy.types import to_instance, TypeEngine
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                 'type ' % col)

    table = SQLTable(name, self, frame=frame, index=index,
                     if_exists=if_exists, index_label=index_label,
                     schema=schema, dtype=dtype)
    table.create()
    table.insert(chunksize)
    # check for potentially case sensitivity issues (GH7815)
    if name not in self.engine.table_names(schema=schema or self.meta.schema):
        warnings.warn("The provided table name '{0}' is not found exactly "
                      "as such in the database after writing the table, "
                      "possibly due to case sensitivity issues. Consider "
                      "using lower case table names.".format(name), UserWarning)

pd.io.sql.SQLDatabase.to_sql = to_sql

Can you try that? This should let you be able to do df.time.to_sql('test', engine, dtype={'time': sqlalchemy.TIMESTAMP(timezone=True)})

@pilotstew
Copy link
Author

Worked Perfectly!

@jorisvandenbossche
Copy link
Member

@pilotstew Good to hear!
Did you also get the correct time zone in the database? (or are you in the same timezone as your data?) see #9086

@pilotstew
Copy link
Author

Actually, no I didn't. In pandas all timestamps were UTC and converted to local timezone in postgres. Since they are all uniform and converted correctly it's not a huge issue. I can convert back to UTC on the read_sql.

@jorisvandenbossche
Copy link
Member

OK, yes that is what I expected and described in #9086. So this is then another enhacement to do in 'real' timezone support!

@jorisvandenbossche
Copy link
Member

I opened a couple of other issues that I encountered while looking into this:

Lets keep this issue for the actual dtype problem of only accepting classes, and not instances.

@jorisvandenbossche jorisvandenbossche changed the title Error in handling a sqlalchemy type with timezone=True BUG: error in handling a sqlalchemy type with arguments (instantiated type, not class) Dec 15, 2014
@al626
Copy link

al626 commented Jun 14, 2016

Hi, the same code above is causing errors again, albeit different ones:

import sqlalchemy
import pandas as pd

eng = sqlalchemy.create_engine("postgresql+psycopg2://******:********@localhost:5432/postgres")
times = ['201412120154', '201412110254']
df = pd.DataFrame()
df['time'] = pd.to_datetime(times, utc=True)
df.time.to_sql('test', eng, dtype={'time': sqlalchemy.TIMESTAMP}, if_exists='append')

With the traceback:

Traceback (most recent call last):
  File "./test.py", line 10, in <module>
    df.time.to_sql('test', eng, dtype={'time': sqlalchemy.TIMESTAMP}, if_exists='append')
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1165, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 571, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 1250, in to_sql
    table.insert(chunksize)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 748, in insert
    keys, data_list = self.insert_data()
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 729, in insert_data
    d = b.values.astype('M8[us]').astype(object)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.py", line 842, in astype
    raise ValueError('Cannot cast DatetimeIndex to dtype %s' % dtype)
ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]

I'm using pandas version 0.18.1 and sqlalchemy version 1.0.13.

It looks like pandas is trying to convert the datetime into a microsecond timestamp but in pandas/tseries/index.py it can only convert things into objects, strings, integers and nanosecond timestamps.

Cheers

@al626
Copy link

al626 commented Jun 14, 2016

Also not working in the latest build of pandas - 0.18.1+121.g62b4327

Traceback is similar:

Traceback (most recent call last):
  File "./test.py", line 10, in <module>
    df.time.to_sql('test', eng, dtype={'time': sqlalchemy.TIMESTAMP}, if_exists='append')
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1166, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 571, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 1250, in to_sql
    table.insert(chunksize)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 748, in insert
    keys, data_list = self.insert_data()
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/io/sql.py", line 729, in insert_data
    d = b.values.astype('M8[us]').astype(object)
  File "/home/arun/pandas-env/local/lib/python2.7/site-packages/pandas/tseries/index.py", line 847, in astype
    raise ValueError('Cannot cast DatetimeIndex to dtype %s' % dtype)
ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]

@jorisvandenbossche
Copy link
Member

@al626 This is indeed not yet working (but this is due to the timezone aware datetime column, not due to the dtype argument as the original issue here).
I thought there was already an issue about it, but can't directly find it.

@jorisvandenbossche
Copy link
Member

There is the general issue #9086 about time zones not being supported, however this issue is not really up to date (situation has changed because it is now possible to have datetime64 timezone aware columns).

@al626
Copy link

al626 commented Jun 14, 2016

Thanks, I'll keep track of that

@suntracks
Copy link

Would there be any update on using timestamps data with timezones info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants