BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent #6846

Acanthostega · 2014-04-09T10:59:37Z

UPDATE: this issue seemed actually to be a bug in the new legacy code so all databases written were only written in memory and not really committed to the database itself (see discussion below).

Hi everybody,

I still have a problem with writing data into a SQL database. With the following example, the resulting database file isn't written, but the structure of the table is created (I assume a pylab environment set with ipython...):

>>> import pandas as pd
>>> import sqlite3

>>> data = pd.DataFrame({"galid":randint(2**63-1, size=100), "objid": randint(2**63-1, size=100), "alpha": rand(100)})

>>> conn = sqlite3.connect("/tmp/bidulechouette.db")

>>> data.to_sql("DATA", conn, if_exists="replace", flavor="sqlite", index=False)

>>> # just to be sure...
>>> conn.close()
>>> conn = sqlite3.connect("/tmp/bidulechouette.db")

>>> result = pd.read_sql("SELECT objid FROM DATA;", conn)
>>> len(result)
0

I tried it on two different systems with different versions of sqlite3 and python(3.2 and 3.4).

If I kill ipython and redo the same without the if_exists option on the same database file, it complains that the table already exists, even if I manually remove the database file of sqlite3. This lets me suppose, that somewhere, a reference to the database is kept, but it's weird because ipython is killed... Or the file in which it write isn't the good one, since with a lot of data, it takes a long time as it is writing the data somewhere, explaining the problem of existing table in a deleted database...

INSTALLED VERSIONS

commit: None
python: 3.4.0.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.8-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_FR.utf8

pandas: 0.13.1-605-g61ea0a3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 2.0.0
sphinx: 1.2.2
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

jorisvandenbossche · 2014-04-09T12:33:25Z

See also #6843.

I can't try to reproduce it at the moment (I get OverflowError: Python int too large to convert to C long on python 2.7, windows, 64 bit, I think something windows-specific (C long is 32 bit on windows)).

But can you try it also with the new sqlalchemy-based functions? It should be something like:

from sqlalchemy import create_engine
engine = create_engine('sqlite:////tmp/bidulechouette.db')
data.to_sql("DATA", engine, if_exists="replace", index=False)

to see if you have the same issue with this.

jreback · 2014-04-09T12:40:30Z

this is for windows:

numpy on windows is not friendly to large ints with the random number generation. might be a bug, as their is no way to specify the dtype upfront (e.g. randint doesn't take a dtype argument). could be an implementation detail as well.

so I think you would have to generate int32's cast them to int64 then multiply to get int64 like numbers (which pandas will handle).

>>> np.random.randint(2**63-1,size=100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mtrand.pyx", line 871, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:6682)
OverflowError: Python int too large to convert to C long

>>> np.random.randint(2**31-1,size=100)
array([1065151596,  141587709, 1034731056, 1421678991, 1113510421,
       1053591656,  721945385,  417353460,  588311441,  451124932,
       2011103288,  606400194, 1767870596, 1577034881,  801116609,
       1094153300,  681848258, 1216074099,  828001710, 1818981627,
        583590097, 1893057734, 2041021821, 1719114523,  659638699,
         36901870, 1500212655,   80923766, 1147783744, 1802598765,
        148074008,  637763026, 2105303772, 1512947348, 1699075859,
        842315446, 1365873318,  820098773,  676695949, 1474271470,
        335374748, 1477530337, 1537800347,   87529923,  516659836,
       1551506257, 1743913044, 1156954863, 1286249176, 1279615765,
       1740947158,  448293258, 1524004071, 1527588882,  795421749,
       1231177972,  264830723,  589977924,  762902232, 1805756724,
        696292676,  915029891, 1311767606,  247202537, 1929326874,
       1238932552, 2026394145,  394577189, 1241699925,  910466670,
       1911660830, 1717398704,  761388787, 1527405720, 1132149302,
        930729795, 1350701423,   50980838, 1224099292,  705686236,
       1529564451,  747506807, 1142508936,  224622576, 1217950640,
       1208551918,  365819633,  456296798,  715871164, 1228926134,
       2030700788, 1192490214, 1276642215, 1186011869, 1479536059,
        334653481, 1961647202, 1838092157,  645948253, 1868626295])

jreback · 2014-04-09T12:55:13Z

this DOES work with @jorisvandenbossche method (python 3.4,numpy 1.8, on 64-bit linux), maybe a problem with sqlite3 dtype conversion of int64's (on the legacy)?

jorisvandenbossche · 2014-04-09T13:17:52Z

I can't reproduce this on Windows (python 2.7, 64-bit), but both with the legacy and new slqalchemy method (using the conversion of int32 to int64 and multiplying as suggested by @jreback above).

The problem with the database that is not deleted properly, do you also have this with other tables (eg with just a toy example with small values)? Or does it only occur with this example with these large values/dtype problem?

Acanthostega · 2014-04-09T13:40:57Z

I confirm that it works with the new SQLAlchemy method, but not with the legacy one.

This is not a specific problem to long integers (int64) since I can reproduce the problem with 28 instead of 263-1.

If I don't close the return of the connect function of sqlite3 in the middle, data seems to be read in the memory. But once the ipython console is closed, the database file is still empty (but with table structure).

cpcloud · 2014-04-09T13:43:04Z

@Acanthostega can you try by just writing a small test case and running nosetests the_test.py to see if there's an ipython issue? thx

Acanthostega · 2014-04-09T13:59:15Z

I tried with nosetests and "simple" python interpreter, but same result...

jorisvandenbossche · 2014-04-09T14:00:49Z

@Acanthostega Can you post the test you used?

Acanthostega · 2014-04-09T14:07:56Z

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

import pandas as pd
import numpy as np
import sqlite3

data = pd.DataFrame(
    {
        "galid": np.random.randint(2**8, size=100),
        "truc": np.random.rand(100),
    }
)

conn = sqlite3.connect("/tmp/machinchose.db")

data.to_sql("DATA", conn, if_exists="replace", index=False, flavor="sqlite")

conn.close()
conn = sqlite3.connect("/tmp/machinchose.db")

result = pd.read_sql("SELECT galid FROM DATA;", conn)

print(len(result))

I'm not familiar with nosetests so I did a simple script to check the resulting database, and used the python interpreter to check. I can improve the test if you want but I have to read a little the doc before...

cpcloud · 2014-04-09T14:10:33Z

just write a function called test_database then run it with nosetests the_file_containing_test_database.py

jorisvandenbossche · 2014-04-09T14:13:55Z

OK, so I think the problem is not with too large ints, or with mixed dtype or something like that, but with not correctly closing the database connection.
As if you don't do the conn.close() in the above script, then it works for me.

Acanthostega · 2014-04-09T14:20:37Z

@jorisvandenbossche So, do you reproduce the problem with closing the connection to the database? Since, if I don't close the conn, the data is read in the memory, but nothing is present in the database file.

@cpcloud I rewrote the test:

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

import pandas as pd
import numpy as np
import sqlite3

def test_database():
    data = pd.DataFrame(
        {
            "galid": np.random.randint(2**8, size=100),
            "truc": np.random.rand(100),
        }
    )

    conn = sqlite3.connect("/tmp/machinchose.db")

    data.to_sql("DATA", conn, if_exists="replace", index=False, flavor="sqlite")

    conn.close()
    conn = sqlite3.connect("/tmp/machinchose.db")

    result = pd.read_sql("SELECT galid FROM DATA;", conn)

    print(len(result))
    assert len(result) == len(data)

Same problem.

jorisvandenbossche · 2014-04-09T14:32:47Z

Yes, indeed I reproduce the problem when closing and reopening the connection manually, so the data are indeed never written to the database on file. I think there is a conn.commit() (see https://docs.python.org/2/library/sqlite3.html) missing in the insert function in the legacy code (https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L779)

jorisvandenbossche · 2014-04-09T14:40:06Z

There was a conn.commit() in the previous version in 0.13.1 (https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py#L240), so I think this is a regression.

Acanthostega · 2014-04-09T15:18:58Z

@jorisvandenbossche I added a call to commit() in the insert method, and the problem seems to disappear. Do you want that I propose a patch or you think it's better if you do it yourself? I think that a test for the writing and reading in the SQL database with legacy should be added too, with the connection to the database closed before reading, to avoid any future regression. But I don't know sufficiently how pandas works in internal to propose one...

jorisvandenbossche · 2014-04-09T16:04:55Z

@Acanthostega Good, if you want, certainly try to put up a PR!
For a test, that would certainly be good, I only don't know how to do this with sqlite, as we use a in-memory sqlite database in the tests, and I think this will lose all content when closing the connection? So we can't really test this bug that way.
For mysql we should certainly test it.

Acanthostega · 2014-04-09T20:42:15Z

@jorisvandenbossche Ok! If I have enough time, I will try to do it this week end!

jorisvandenbossche · 2014-04-09T20:51:46Z

@Acanthostega OK, let know how it goes!

I updated the issue title/description to reflect the dicussion.

jorisvandenbossche added the SQL label Apr 9, 2014

jorisvandenbossche added this to the 0.14.0 milestone Apr 9, 2014

jorisvandenbossche added Bug labels Apr 9, 2014

jorisvandenbossche mentioned this issue Apr 9, 2014

ENH: sql support with SQLAlchemy - follow-up #6292

Closed

17 tasks

jorisvandenbossche changed the title ~~Can't write data into SQL database with sqlite3~~ BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent Apr 9, 2014

Acanthostega mentioned this issue Apr 12, 2014

BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent GH6846 #6875

Merged

jorisvandenbossche closed this as completed in #6875 Apr 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent #6846

BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent #6846

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

jreback commented Apr 9, 2014

jreback commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

cpcloud commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

cpcloud commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent #6846

BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent #6846

Comments

Acanthostega commented Apr 9, 2014

INSTALLED VERSIONS

jorisvandenbossche commented Apr 9, 2014

jreback commented Apr 9, 2014

jreback commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

cpcloud commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

cpcloud commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014

Acanthostega commented Apr 9, 2014

jorisvandenbossche commented Apr 9, 2014