Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST: add basic postgresql tests #6316

Merged

Conversation

jorisvandenbossche
Copy link
Member

@mangecoeur I just copied and adapted the mysql tests and provided some SQL_STRINGS that follow the postgresql dialect. Is this what you had in mind how other sql dialects would be added to the tests?

@jreback I now added it to requirements-2.7.txt. Should I add it to all test environments? Or if only one, is this the correct one?

@mangecoeur Is it possible that there is a pymysql missing in the requirement files to run the MySQL tests (they are all skipped on travis)?

At the moment, there are two tests failing with postgresql (see https://travis-ci.org/jorisvandenbossche/pandas/jobs/18576746).

@jreback
Copy link
Contributor

jreback commented Feb 10, 2014

you should prob add to 3.3 build too

@jorisvandenbossche
Copy link
Member Author

The first failure is due to checking the type of the boolean column retrieved from the database. But it is checked if it is integer, while in postgresql the returned type is just bool (https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_sql.py#L501). I suppose this is checked as integer for sqlite, which has no bool type.

So, should I overwrite this test in the PostgreSQL test class? But this will also be a problem for mysql

@mangecoeur
Copy link
Contributor

Yes, it checks integer because SQLite has no bool. But it should check for boolean type for the others. If it’s wrong for the MySQL case you should fix for both mySQL and Postgres.

Be careful because boolean columns with Nulls should load as object rather than boolean, because pandas can’t store None in a boolean column otherwise (if you coerce to boolean None/NA gets converted to False). I think that’s also tested but good to make sure it works in postgress too.

MySQL needs a pymsql requirement for Travis, i think i just never got round to adding it. I made it pymysql specific because it’s pure python (unlike other drivers) so easier to install, also i had trouble pip-installing the official mysql-connector driver.

On 10 Feb 2014, at 14:03, Joris Van den Bossche [email protected] wrote:

The first failure is due to checking the type of the boolean column retrieved from the database. But it is checked if it is integer, while in postgresql the returned type is just bool (https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_sql.py#L501). I suppose this is checked as integer for sqlite, which has no bool type.

So, should I overwrite this test in the PostgreSQL test class? But this will also be a problem for mysql


Reply to this email directly or view it on GitHub.

@jorisvandenbossche
Copy link
Member Author

The second failure is due to reading in the iris table:

(Pdb) sql.read_table("iris", self.conn).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 5 columns):
SepalLength    150 non-null object
SepalWidth     150 non-null object
PetalLength    150 non-null object
PetalWidth     150 non-null object
Name           150 non-null object

which are all objects (while it should be floats for the first columns).
However, if I read it with read_sql, it seems correct:

(Pdb) sql.read_sql("SELECT * from iris", self.conn).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 5 columns):
SepalLength    150 non-null float64
SepalWidth     150 non-null float64
PetalLength    150 non-null float64
PetalWidth     150 non-null float64
Name           150 non-null object
dtypes: float64(4), object(1)

@mangecoeur
Copy link
Contributor

@jorisvandenbossche wierd. the read_sql version is probably correct due to pandas correctl guessing the datatype. For some reason the explicit type conversion is failing. I don't have time to work on this now, but the place to debug is the _harmonize_columns function, need to check what the reflected col types are vs what types we're tring to convert to.

@jorisvandenbossche
Copy link
Member Author

@mangecoeur Thanks for the pointer. I will have a look at it later. If I change the type from NUMERIC to DOUBLE PRECISION in the create table command for postgresql for the iris table, then the tests runs.

@mangecoeur
Copy link
Contributor

Ok then I know the issue. We check for isinstance(col, Float), but SQLAlchemy returns a col type Numeric for NUMERIC columns so the check fails. Add something like or isnstance(…, Numeric) with the appropriate import from sqlalchemy and you are good to go.

On 10 Feb 2014, at 14:24, Joris Van den Bossche [email protected] wrote:

@mangecoeur Thanks for the pointer. I will have a look at it later. If I change the type from NUMERIC to DOUBLE PRECISION in the create table command for postgresql for the iris table, then the tests runs.


Reply to this email directly or view it on GitHub.

@jorisvandenbossche
Copy link
Member Author

I added pymysql for the MySQL test, and there are also two MySQL tests failing.

The PostgreSQL test that is failing is due to the Boolean column with None which is converted to object type and not float (the None is also not converted to NaN)

@mangecoeur
Copy link
Contributor

@jorisvandenbossche The behaviour you describe is correct: Boolean with None should become Object, see http://pandas.pydata.org/pandas-docs/dev/missing_data.html#missing-data-casting-rules-and-indexing

However I can see that the tests are confusing - when testing with SQLite we look for Float, not Object. This is because we actually can't know that a column in SQLite is supposed to be treated as a boolean unless we already knew that before hand. Without extra info we have to treat it as numeric, so an integer column with Nulls gets casted to Float and its up to the user to know that their data represent boolean values. However for the Postgres/MySQL case we should check for Object, since we know that a column is supposed to be Boolean. Feel free to update the tests accordingly.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2014

you guys can do some heuristic stuff that is not that expensive - something like

check first few values if 0/1 then it could be bool

then in a try except you can astype to bool and compare for equality to original

see core/ common/_possibly_downcast

@mangecoeur
Copy link
Contributor

@jreback I think that's dangerous since you could potentially turn random integer columns into bool if they happen to only have 0s and 1s (and maybe Nulls). Data coming out of an sqlite DB just looks like a column of numbers and an arbitrary name, not much to hint at what it's supposed to be for.
People have some very strange data sets so I think it's better to let them decide if they should convert their cols to boolean or not.

We could consider adding a coerce boolean optional arg that would work like parse dates and force columns to be loaded as boolean - would also be handy for other DBs if you have some ugly data where booleans have been stored as the wrong data type (I've seen them stored as ints, floats, and chars - never assume whoever created the dataset knew what they were doing!)

@jreback
Copy link
Contributor

jreback commented Feb 10, 2014

I think adding a dtype = arg would be fine then; same interface as read_csv

@jorisvandenbossche
Copy link
Member Author

  • OK, updated the tests regarding the checking for object vs float for the bool with None (it was indeed described behaviour). However, shouldn't the None be converted to NaN as is done for the integer column?
  • a dtype arg seems a good idea
  • there are still two MySQL tests failing, but at the moment I can't test what is going on there.

@jorisvandenbossche
Copy link
Member Author

@mangecoeur Would you have time to look at the MySQL failures?

It's failing in the test_default_type_convertion test. I get the following for the dataframe:

(Pdb) df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 8 columns):
TextCol            2 non-null object
DateCol            2 non-null datetime64[ns]
IntDateCol         2 non-null int64
FloatCol           2 non-null object
IntCol             2 non-null int64
BoolCol            2 non-null int64
IntColWithNull     1 non-null float64
BoolColWithNull    1 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(3), object(2)

So the things that are wrong: float as object, bool as int and boolwithnull as float instead of object.

test_read_table is also failing, but this is also due to float column that is read as an object column.

TEST: add postgresql to travis
One base class for tests with sqlalchemy backend. So test classes
for mysql and postgresql don't have to overwrite tests that are
different for sqlite.
@jorisvandenbossche
Copy link
Member Author

OK, I removed the activating of MySQL tests (just this commit: jorisvandenbossche@036a3d4) from this PR and leave only the PostgreSQL and little refactoring stuff. So this can be merged.

MySQL can be fixed in another PR.

@jorisvandenbossche
Copy link
Member Author

Travis is happy now! (without the mysql)
@mangecoeur OK with my changes to the test classes (my second commit)?

@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

@jorisvandenbossche think we ought to add
pymysql. psycopg2 etc to print_versions? (even though only test deps in a failure report u want 2 know)

@jorisvandenbossche
Copy link
Member Author

@jreback added them: 7b0317e

@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

@jorisvandenbossche looks good..

@mangecoeur
Copy link
Contributor

Updates to tests looks good. I'm doing some hardcore dogfooding right now with this and postgres so finding a number of issues. Just added support for Datetimes with timezone info in my private branch (since they are supported in postgres). Need to think what is the sane thing to do when saving to a DB without timezone support.

@jorisvandenbossche
Copy link
Member Author

OK, then I am going to merge this one.

@mangecoeur Can you point to the branch with your datetime support? Then I can try it out if it solves the problems I reported (or already open a PR, no problem if it is still work inprogress)

jorisvandenbossche added a commit that referenced this pull request Feb 13, 2014
@jorisvandenbossche jorisvandenbossche merged commit 5f17e1a into pandas-dev:master Feb 13, 2014
@jreback
Copy link
Contributor

jreback commented Feb 13, 2014

would you guys would like me to add the postgres test stuff to my windows builds ? don';t have too...but could

I just setup a postgres server right? with any particular database setup?

@jorisvandenbossche jorisvandenbossche deleted the sql-postgresql-tests branch February 14, 2014 07:54
@jorisvandenbossche
Copy link
Member Author

@jreback If you can, certainly welcome I think.
You just have to add (or maybe it is already there by default) a localhost server with username postgres with no password, and then make a database called pandas_nosetest. And that should be all.

@jreback
Copy link
Contributor

jreback commented Feb 14, 2014

ok running on 27-32,27-64,33-32,33-64
this just print from 27-32

(mix of sqlalchemy 0.8.1 and 9.2), all the same psycopg2
MySQL should still skip, yes?

INSTALLED VERSIONS
------------------
commit: cab2a93ec5c8ad0790cf1cb4fc3df747b393ee0f
python: 2.7.5.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.12.0
Cython: 0.19.2
numpy: 1.7.1
scipy: None
statsmodels: None
IPython: 1.1.0
sphinx: None
patsy: 0.2.1
scikits.timeseries: 0.91.3
dateutil: 2.2
pytz: 2013.8
bottleneck: 0.7.0
tables: 3.0.0
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.0
lxml: None
bs4: 4.3.2
html5lib: 1.0b3
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.8.3
pymysql: None
psycopg2: 2.5.2 (dt dec pq3 ext)

C:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win32-2.7>type out
test_create_and_drop_table (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_execute_sql (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_invalid_flavor (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_read_sql (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_roundtrip (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql_append (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql_fail (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_to_sql_replace (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_tquery (pandas.io.tests.test_sql.TestMySQL) ... SKIP
test_create_table (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_date_parsing (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_default_date_load (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_default_type_convertion (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_drop_table (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_execute_sql (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_sql (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_table (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_table_absent (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_read_table_columns (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_roundtrip (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql_append (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql_fail (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_to_sql_replace (pandas.io.tests.test_sql.TestMySQLAlchemy) ... SKIP
test_create_table (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_date_parsing (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_default_date_load (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_default_type_convertion (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_drop_table (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_execute_sql (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_sql (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_table (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_table_absent (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_read_table_columns (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestPostgreSQLAlchemy) ... ok
test_create_table (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_date_parsing (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_default_date_load (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_default_type_convertion (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_drop_table (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_execute_sql (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_sql (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_table (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_table_absent (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_read_table_columns (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestSQLAlchemy) ... ok
Test case where same column appears in parse_date and index_col ... ok
Test date parsing in read_sql ... ok
test_execute_sql (pandas.io.tests.test_sql.TestSQLApi) ... ok
Test legacy name read_frame ... ok
Test legacy write frame name. ... ok
test_read_sql_iris (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_tquery (pandas.io.tests.test_sql.TestSQLApi) ... ok
test_create_and_drop_table (pandas.io.tests.test_sql.TestSQLite) ... ok
test_execute_sql (pandas.io.tests.test_sql.TestSQLite) ... ok
test_invalid_flavor (pandas.io.tests.test_sql.TestSQLite) ... ok
test_read_sql (pandas.io.tests.test_sql.TestSQLite) ... ok
test_roundtrip (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql_append (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql_fail (pandas.io.tests.test_sql.TestSQLite) ... ok
test_to_sql_replace (pandas.io.tests.test_sql.TestSQLite) ... ok
test_tquery (pandas.io.tests.test_sql.TestSQLite) ... ok

----------------------------------------------------------------------
Ran 77 tests in 15.166s

OK (SKIP=25)

@jorisvandenbossche
Copy link
Member Author

Nice! The test for MySQL are still failing, so we should first fix that before enabling the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants