-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: data written with to_sql legacy mode (sqlite/mysql) not persistent #6846
Comments
See also #6843. I can't try to reproduce it at the moment (I get But can you try it also with the new sqlalchemy-based functions? It should be something like:
to see if you have the same issue with this. |
this is for windows: numpy on windows is not friendly to large ints with the random number generation. might be a bug, as their is no way to specify the dtype upfront (e.g. randint doesn't take a dtype argument). could be an implementation detail as well. so I think you would have to generate int32's cast them to int64 then multiply to get int64 like numbers (which pandas will handle).
|
this DOES work with @jorisvandenbossche method (python 3.4,numpy 1.8, on 64-bit linux), maybe a problem with sqlite3 dtype conversion of int64's (on the legacy)? |
I can't reproduce this on Windows (python 2.7, 64-bit), but both with the legacy and new slqalchemy method (using the conversion of int32 to int64 and multiplying as suggested by @jreback above). The problem with the database that is not deleted properly, do you also have this with other tables (eg with just a toy example with small values)? Or does it only occur with this example with these large values/dtype problem? |
I confirm that it works with the new SQLAlchemy method, but not with the legacy one. This is not a specific problem to long integers (int64) since I can reproduce the problem with 28 instead of 263-1. If I don't close the return of the connect function of sqlite3 in the middle, data seems to be read in the memory. But once the ipython console is closed, the database file is still empty (but with table structure). |
@Acanthostega can you try by just writing a small test case and running |
I tried with nosetests and "simple" python interpreter, but same result... |
@Acanthostega Can you post the test you used? |
#!/usr/bin/env python3
#-*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import sqlite3
data = pd.DataFrame(
{
"galid": np.random.randint(2**8, size=100),
"truc": np.random.rand(100),
}
)
conn = sqlite3.connect("/tmp/machinchose.db")
data.to_sql("DATA", conn, if_exists="replace", index=False, flavor="sqlite")
conn.close()
conn = sqlite3.connect("/tmp/machinchose.db")
result = pd.read_sql("SELECT galid FROM DATA;", conn)
print(len(result)) I'm not familiar with nosetests so I did a simple script to check the resulting database, and used the python interpreter to check. I can improve the test if you want but I have to read a little the doc before... |
just write a function called |
OK, so I think the problem is not with too large ints, or with mixed dtype or something like that, but with not correctly closing the database connection. |
@jorisvandenbossche So, do you reproduce the problem with closing the connection to the database? Since, if I don't close the @cpcloud I rewrote the test: #!/usr/bin/env python3
#-*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import sqlite3
def test_database():
data = pd.DataFrame(
{
"galid": np.random.randint(2**8, size=100),
"truc": np.random.rand(100),
}
)
conn = sqlite3.connect("/tmp/machinchose.db")
data.to_sql("DATA", conn, if_exists="replace", index=False, flavor="sqlite")
conn.close()
conn = sqlite3.connect("/tmp/machinchose.db")
result = pd.read_sql("SELECT galid FROM DATA;", conn)
print(len(result))
assert len(result) == len(data) Same problem. |
Yes, indeed I reproduce the problem when closing and reopening the connection manually, so the data are indeed never written to the database on file. I think there is a |
There was a |
@jorisvandenbossche I added a call to |
@Acanthostega Good, if you want, certainly try to put up a PR! |
@jorisvandenbossche Ok! If I have enough time, I will try to do it this week end! |
@Acanthostega OK, let know how it goes! I updated the issue title/description to reflect the dicussion. |
UPDATE: this issue seemed actually to be a bug in the new legacy code so all databases written were only written in memory and not really committed to the database itself (see discussion below).
Hi everybody,
I still have a problem with writing data into a SQL database. With the following example, the resulting database file isn't written, but the structure of the table is created (I assume a pylab environment set with ipython...):
I tried it on two different systems with different versions of sqlite3 and python(3.2 and 3.4).
If I kill ipython and redo the same without the if_exists option on the same database file, it complains that the table already exists, even if I manually remove the database file of sqlite3. This lets me suppose, that somewhere, a reference to the database is kept, but it's weird because ipython is killed... Or the file in which it write isn't the good one, since with a lot of data, it takes a long time as it is writing the data somewhere, explaining the problem of existing table in a deleted database...
INSTALLED VERSIONS
commit: None
python: 3.4.0.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.8-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_FR.utf8
pandas: 0.13.1-605-g61ea0a3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 2.0.0
sphinx: 1.2.2
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: