TEST: add basic postgresql tests #6316

jorisvandenbossche · 2014-02-10T13:22:55Z

@mangecoeur I just copied and adapted the mysql tests and provided some SQL_STRINGS that follow the postgresql dialect. Is this what you had in mind how other sql dialects would be added to the tests?

@jreback I now added it to requirements-2.7.txt. Should I add it to all test environments? Or if only one, is this the correct one?

@mangecoeur Is it possible that there is a pymysql missing in the requirement files to run the MySQL tests (they are all skipped on travis)?

At the moment, there are two tests failing with postgresql (see https://travis-ci.org/jorisvandenbossche/pandas/jobs/18576746).

jreback · 2014-02-10T14:00:48Z

you should prob add to 3.3 build too

jorisvandenbossche · 2014-02-10T14:03:05Z

The first failure is due to checking the type of the boolean column retrieved from the database. But it is checked if it is integer, while in postgresql the returned type is just bool (https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_sql.py#L501). I suppose this is checked as integer for sqlite, which has no bool type.

So, should I overwrite this test in the PostgreSQL test class? But this will also be a problem for mysql

mangecoeur · 2014-02-10T14:11:53Z

Yes, it checks integer because SQLite has no bool. But it should check for boolean type for the others. If it’s wrong for the MySQL case you should fix for both mySQL and Postgres.

Be careful because boolean columns with Nulls should load as object rather than boolean, because pandas can’t store None in a boolean column otherwise (if you coerce to boolean None/NA gets converted to False). I think that’s also tested but good to make sure it works in postgress too.

MySQL needs a pymsql requirement for Travis, i think i just never got round to adding it. I made it pymysql specific because it’s pure python (unlike other drivers) so easier to install, also i had trouble pip-installing the official mysql-connector driver.

On 10 Feb 2014, at 14:03, Joris Van den Bossche [email protected] wrote:

The first failure is due to checking the type of the boolean column retrieved from the database. But it is checked if it is integer, while in postgresql the returned type is just bool (https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_sql.py#L501). I suppose this is checked as integer for sqlite, which has no bool type.

So, should I overwrite this test in the PostgreSQL test class? But this will also be a problem for mysql

—
Reply to this email directly or view it on GitHub.

jorisvandenbossche · 2014-02-10T14:16:19Z

The second failure is due to reading in the iris table:

(Pdb) sql.read_table("iris", self.conn).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 5 columns):
SepalLength    150 non-null object
SepalWidth     150 non-null object
PetalLength    150 non-null object
PetalWidth     150 non-null object
Name           150 non-null object

which are all objects (while it should be floats for the first columns).
However, if I read it with read_sql, it seems correct:

(Pdb) sql.read_sql("SELECT * from iris", self.conn).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 5 columns):
SepalLength    150 non-null float64
SepalWidth     150 non-null float64
PetalLength    150 non-null float64
PetalWidth     150 non-null float64
Name           150 non-null object
dtypes: float64(4), object(1)

mangecoeur · 2014-02-10T14:20:46Z

@jorisvandenbossche wierd. the read_sql version is probably correct due to pandas correctl guessing the datatype. For some reason the explicit type conversion is failing. I don't have time to work on this now, but the place to debug is the _harmonize_columns function, need to check what the reflected col types are vs what types we're tring to convert to.

jorisvandenbossche · 2014-02-10T14:24:10Z

@mangecoeur Thanks for the pointer. I will have a look at it later. If I change the type from NUMERIC to DOUBLE PRECISION in the create table command for postgresql for the iris table, then the tests runs.

mangecoeur · 2014-02-10T14:29:00Z

Ok then I know the issue. We check for isinstance(col, Float), but SQLAlchemy returns a col type Numeric for NUMERIC columns so the check fails. Add something like or isnstance(…, Numeric) with the appropriate import from sqlalchemy and you are good to go.

On 10 Feb 2014, at 14:24, Joris Van den Bossche [email protected] wrote:

@mangecoeur Thanks for the pointer. I will have a look at it later. If I change the type from NUMERIC to DOUBLE PRECISION in the create table command for postgresql for the iris table, then the tests runs.

—
Reply to this email directly or view it on GitHub.

jorisvandenbossche · 2014-02-10T17:46:23Z

I added pymysql for the MySQL test, and there are also two MySQL tests failing.

The PostgreSQL test that is failing is due to the Boolean column with None which is converted to object type and not float (the None is also not converted to NaN)

mangecoeur · 2014-02-10T18:07:27Z

@jorisvandenbossche The behaviour you describe is correct: Boolean with None should become Object, see http://pandas.pydata.org/pandas-docs/dev/missing_data.html#missing-data-casting-rules-and-indexing

However I can see that the tests are confusing - when testing with SQLite we look for Float, not Object. This is because we actually can't know that a column in SQLite is supposed to be treated as a boolean unless we already knew that before hand. Without extra info we have to treat it as numeric, so an integer column with Nulls gets casted to Float and its up to the user to know that their data represent boolean values. However for the Postgres/MySQL case we should check for Object, since we know that a column is supposed to be Boolean. Feel free to update the tests accordingly.

jreback · 2014-02-10T18:53:24Z

you guys can do some heuristic stuff that is not that expensive - something like

check first few values if 0/1 then it could be bool

then in a try except you can astype to bool and compare for equality to original

see core/ common/_possibly_downcast

mangecoeur · 2014-02-10T19:15:44Z

@jreback I think that's dangerous since you could potentially turn random integer columns into bool if they happen to only have 0s and 1s (and maybe Nulls). Data coming out of an sqlite DB just looks like a column of numbers and an arbitrary name, not much to hint at what it's supposed to be for.
People have some very strange data sets so I think it's better to let them decide if they should convert their cols to boolean or not.

We could consider adding a coerce boolean optional arg that would work like parse dates and force columns to be loaded as boolean - would also be handy for other DBs if you have some ugly data where booleans have been stored as the wrong data type (I've seen them stored as ints, floats, and chars - never assume whoever created the dataset knew what they were doing!)

jreback · 2014-02-10T19:44:51Z

I think adding a dtype = arg would be fine then; same interface as read_csv

jorisvandenbossche · 2014-02-10T20:48:08Z

OK, updated the tests regarding the checking for object vs float for the bool with None (it was indeed described behaviour). However, shouldn't the None be converted to NaN as is done for the integer column?
a dtype arg seems a good idea
there are still two MySQL tests failing, but at the moment I can't test what is going on there.

jorisvandenbossche · 2014-02-13T09:11:57Z

@mangecoeur Would you have time to look at the MySQL failures?

It's failing in the test_default_type_convertion test. I get the following for the dataframe:

(Pdb) df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 8 columns):
TextCol            2 non-null object
DateCol            2 non-null datetime64[ns]
IntDateCol         2 non-null int64
FloatCol           2 non-null object
IntCol             2 non-null int64
BoolCol            2 non-null int64
IntColWithNull     1 non-null float64
BoolColWithNull    1 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(3), object(2)

So the things that are wrong: float as object, bool as int and boolwithnull as float instead of object.

test_read_table is also failing, but this is also due to float column that is read as an object column.

TEST: add postgresql to travis

One base class for tests with sqlalchemy backend. So test classes for mysql and postgresql don't have to overwrite tests that are different for sqlite.

jorisvandenbossche · 2014-02-13T09:40:31Z

OK, I removed the activating of MySQL tests (just this commit: jorisvandenbossche@036a3d4) from this PR and leave only the PostgreSQL and little refactoring stuff. So this can be merged.

MySQL can be fixed in another PR.

jorisvandenbossche · 2014-02-13T10:03:36Z

Travis is happy now! (without the mysql)
@mangecoeur OK with my changes to the test classes (my second commit)?

jreback · 2014-02-13T10:14:11Z

@jorisvandenbossche think we ought to add
pymysql. psycopg2 etc to print_versions? (even though only test deps in a failure report u want 2 know)

jorisvandenbossche · 2014-02-13T14:51:38Z

@jreback added them: 7b0317e

jreback · 2014-02-13T14:57:12Z

@jorisvandenbossche looks good..

mangecoeur · 2014-02-13T20:10:15Z

Updates to tests looks good. I'm doing some hardcore dogfooding right now with this and postgres so finding a number of issues. Just added support for Datetimes with timezone info in my private branch (since they are supported in postgres). Need to think what is the sane thing to do when saving to a DB without timezone support.

jorisvandenbossche · 2014-02-13T23:16:12Z

OK, then I am going to merge this one.

@mangecoeur Can you point to the branch with your datetime support? Then I can try it out if it solves the problems I reported (or already open a PR, no problem if it is still work inprogress)

TEST: add basic postgresql tests

jreback · 2014-02-13T23:21:58Z

would you guys would like me to add the postgres test stuff to my windows builds ? don';t have too...but could

I just setup a postgres server right? with any particular database setup?

jorisvandenbossche · 2014-02-14T07:57:01Z

@jreback If you can, certainly welcome I think.
You just have to add (or maybe it is already there by default) a localhost server with username postgres with no password, and then make a database called pandas_nosetest. And that should be all.

jreback · 2014-02-14T17:18:33Z

ok running on 27-32,27-64,33-32,33-64
this just print from 27-32

(mix of sqlalchemy 0.8.1 and 9.2), all the same psycopg2
MySQL should still skip, yes?

INSTALLED VERSIONS
------------------
commit: cab2a93ec5c8ad0790cf1cb4fc3df747b393ee0f
python: 2.7.5.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.12.0
Cython: 0.19.2
numpy: 1.7.1
scipy: None
statsmodels: None
IPython: 1.1.0
sphinx: None
patsy: 0.2.1
scikits.timeseries: 0.91.3
dateutil: 2.2
pytz: 2013.8
bottleneck: 0.7.0
tables: 3.0.0
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.0
lxml: None
bs4: 4.3.2
html5lib: 1.0b3
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.8.3
pymysql: None
psycopg2: 2.5.2 (dt dec pq3 ext)

C:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win32-2.7>type out
test_create_and_drop_table (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_execute_sql (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_invalid_flavor (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_read_sql (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_roundtrip (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql_append (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql_fail (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql_replace (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_tquery (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_create_table (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_date_parsing (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_default_date_load (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_default_type_convertion (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_drop_table (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_execute_sql (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_sql (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_table (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_table_absent (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_table_columns (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_roundtrip (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql_append (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql_fail (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql_replace (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_create_table (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_date_parsing (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_default_date_load (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_default_type_convertion (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_drop_table (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_execute_sql (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_sql (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_table (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_table_absent (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_table_columns (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_create_table (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_date_parsing (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_default_date_load (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_default_type_convertion (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_drop_table (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_execute_sql (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_sql (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_table (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_table_absent (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_table_columns (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
Test case where same column appears in parse_date and index_col ... ok
Test date parsing in read_sql ... ok
test_execute_sql (pandas.io.tests.test_sql.TestSQLApi) ... ok
Test legacy name read_frame ... ok
Test legacy write frame name. ... ok
test_read_sql_iris (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_tquery (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_create_and_drop_table (pandas.io.tests.test_sql.TestSQLite) ... ok
test_execute_sql (pandas.io.tests.test_sql.TestSQLite) ... ok
test_invalid_flavor (pandas.io.tests.test_sql.TestSQLite) ... ok
test_read_sql (pandas.io.tests.test_sql.TestSQLite) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestSQLite) ... ok
test_tquery (pandas.io.tests.test_sql.TestSQLite) ... ok

----------------------------------------------------------------------
Ran 77 tests in 15.166s

OK (SKIP=25)

jorisvandenbossche · 2014-02-14T18:13:28Z

Nice! The test for MySQL are still failing, so we should first fix that before enabling the tests

jorisvandenbossche added 2 commits February 13, 2014 10:30

TEST: add basic postgresql tests

35d61ae

TEST: add postgresql to travis

TEST io.sql: sqlite tests to seperate class

fcad2a5

One base class for tests with sqlalchemy backend. So test classes for mysql and postgresql don't have to overwrite tests that are different for sqlite.

jorisvandenbossche mentioned this pull request Feb 13, 2014

ENH: sql support with SQLAlchemy - follow-up #6292

Closed

17 tasks

jorisvandenbossche added a commit that referenced this pull request Feb 13, 2014

Merge pull request #6316 from jorisvandenbossche/sql-postgresql-tests

5f17e1a

TEST: add basic postgresql tests

jorisvandenbossche merged commit 5f17e1a into pandas-dev:master Feb 13, 2014

jorisvandenbossche deleted the sql-postgresql-tests branch February 14, 2014 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEST: add basic postgresql tests #6316

TEST: add basic postgresql tests #6316

jorisvandenbossche commented Feb 10, 2014

jreback commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jreback commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jreback commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

jorisvandenbossche commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jreback commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jreback commented Feb 13, 2014

mangecoeur commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jreback commented Feb 13, 2014

jorisvandenbossche commented Feb 14, 2014

jreback commented Feb 14, 2014

jorisvandenbossche commented Feb 14, 2014

TEST: add basic postgresql tests #6316

TEST: add basic postgresql tests #6316

Conversation

jorisvandenbossche commented Feb 10, 2014

jreback commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jreback commented Feb 10, 2014

mangecoeur commented Feb 10, 2014

jreback commented Feb 10, 2014

jorisvandenbossche commented Feb 10, 2014

jorisvandenbossche commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jreback commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jreback commented Feb 13, 2014

mangecoeur commented Feb 13, 2014

jorisvandenbossche commented Feb 13, 2014

jreback commented Feb 13, 2014

jorisvandenbossche commented Feb 14, 2014

jreback commented Feb 14, 2014

jorisvandenbossche commented Feb 14, 2014