Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: sql support via SQLAlchemy, with legacy fallback #5950

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ pip install pandas
- [Cython](http://www.cython.org): Only necessary to build development version. Version 0.17.1 or higher.
- [SciPy](http://www.scipy.org): miscellaneous statistical functions
- [PyTables](http://www.pytables.org): necessary for HDF5-based storage
- [SQLAlchemy](http://www.sqlalchemy.org): for SQL database support. Version 0.8.1 or higher recommended.
- [matplotlib](http://matplotlib.sourceforge.net/): for plotting
- [statsmodels](http://statsmodels.sourceforge.net/)
- Needed for parts of `pandas.stats`
Expand Down
1 change: 1 addition & 0 deletions ci/requirements-2.6.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ pytz==2013b
http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz
html5lib==1.0b2
bigquery==2.0.17
sqlalchemy==0.8.1
numexpr==1.4.2
1 change: 1 addition & 0 deletions ci/requirements-2.7.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ scipy==0.10.0
beautifulsoup4==4.2.1
statsmodels==0.5.0
bigquery==2.0.17
sqlalchemy==0.8.1
3 changes: 1 addition & 2 deletions ci/requirements-2.7_LOCALE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ xlrd==0.9.2
numpy==1.6.1
cython==0.19.1
bottleneck==0.6.0
numexpr==2.1
tables==2.3.1
matplotlib==1.3.0
patsy==0.1.0
html5lib==1.0b2
Expand All @@ -17,3 +15,4 @@ scipy==0.10.0
beautifulsoup4==4.2.1
statsmodels==0.5.0
bigquery==2.0.17
sqlalchemy==0.8.1
1 change: 1 addition & 0 deletions ci/requirements-3.3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ lxml==3.2.1
scipy==0.12.0
beautifulsoup4==4.2.1
statsmodels==0.4.3
sqlalchemy==0.9.1
1 change: 1 addition & 0 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ Optional Dependencies
version. Version 0.17.1 or higher.
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended.
* `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting
* `statsmodels <http://statsmodels.sourceforge.net/>`__
* Needed for parts of :mod:`pandas.stats`
Expand Down
200 changes: 149 additions & 51 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3068,13 +3068,48 @@ SQL Queries
-----------

The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
facilitate data retrieval and to reduce dependency on DB-specific API. These
wrappers only support the Python database adapters which respect the `Python
DB-API <http://www.python.org/dev/peps/pep-0249/>`__. See some
:ref:`cookbook examples <cookbook.sql>` for some advanced strategies
facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
is provided by SQLAlchemy if installed, in addition you will need a driver library for
your database.

For example, suppose you want to query some data with different types from a
table such as:
.. versionadded:: 0.14.0


If SQLAlchemy is not installed a legacy fallback is provided for sqlite and mysql.
These legacy modes require Python database adapters which respect the `Python
DB-API <http://www.python.org/dev/peps/pep-0249/>`__.

See also some :ref:`cookbook examples <cookbook.sql>` for some advanced strategies.

The key functions are:
:func:`~pandas.io.sql.to_sql`
:func:`~pandas.io.sql.read_sql`
:func:`~pandas.io.sql.read_table`


In the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
engine. You can use a temporary SQLite database where data are stored in
"memory".

To connect with SQLAlchemy you use the :func:`create_engine` function to create an engine
object from database URI. You only need to create the engine once per database you are
connecting to.

For more information on :func:`create_engine` and the URI formatting, see the examples
below and the SQLAlchemy `documentation <http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html>`__

.. code-block:: python

from sqlalchemy import create_engine
from pandas.io import sql
# Create your connection.
engine = create_engine('sqlite:///:memory:')

Writing DataFrames
~~~~~~~~~~~~~~~~~~

Assuming the following data is in a DataFrame ``data``, we can insert it into
the database using :func:`~pandas.io.sql.to_sql`.


+-----+------------+-------+-------+-------+
Expand All @@ -3088,81 +3123,144 @@ table such as:
+-----+------------+-------+-------+-------+


Functions from :mod:`pandas.io.sql` can extract some data into a DataFrame. In
the following example, we use the `SQlite <http://www.sqlite.org/>`__ SQL database
engine. You can use a temporary SQLite database where data are stored in
"memory". Just do:

.. code-block:: python
.. ipython:: python
:suppress:

import sqlite3
from sqlalchemy import create_engine
from pandas.io import sql
# Create your connection.
cnx = sqlite3.connect(':memory:')
engine = create_engine('sqlite:///:memory:')

.. ipython:: python
:suppress:

c = ['id', 'Date', 'Col_1', 'Col_2', 'Col_3']
d = [(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
(42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
(63, datetime.datetime(2010,10,20), 'Z', 5.73, True)]

import sqlite3
from pandas.io import sql
cnx = sqlite3.connect(':memory:')
data = DataFrame(d, columns=c)

.. ipython:: python
:suppress:

sql.to_sql(data, 'data', engine)

Reading Tables
~~~~~~~~~~~~~~

:func:`~pandas.io.sql.read_table` will read a databse table given the
table name and optionally a subset of columns to read.

cu = cnx.cursor()
# Create a table named 'data'.
cu.execute("""CREATE TABLE data(id integer,
date date,
Col_1 string,
Col_2 float,
Col_3 bool);""")
cu.executemany('INSERT INTO data VALUES (?,?,?,?,?)',
[(26, datetime.datetime(2010,10,18), 'X', 27.5, True),
(42, datetime.datetime(2010,10,19), 'Y', -12.5, False),
(63, datetime.datetime(2010,10,20), 'Z', 5.73, True)])
.. note::

In order to use :func:`~pandas.io.sql.read_table`, you **must** have the
SQLAlchemy optional dependency installed.

.. ipython:: python

sql.read_table('data', engine)

Let ``data`` be the name of your SQL table. With a query and your database
connection, just use the :func:`~pandas.io.sql.read_sql` function to get the
query results into a DataFrame:
You can also specify the name of the column as the DataFrame index,
and specify a subset of columns to be read.

.. ipython:: python

sql.read_sql("SELECT * FROM data;", cnx)
sql.read_table('data', engine, index_col='id')
sql.read_table('data', engine, columns=['Col_1', 'Col_2'])

You can also specify the name of the column as the DataFrame index:
And you can explicitly force columns to be parsed as dates:

.. ipython:: python

sql.read_sql("SELECT * FROM data;", cnx, index_col='id')
sql.read_sql("SELECT * FROM data;", cnx, index_col='date')
sql.read_table('data', engine, parse_dates=['Date'])

Of course, you can specify a more "complex" query.
If needed you can explicitly specifiy a format string, or a dict of arguments
to pass to :func:`pandas.tseries.tools.to_datetime`.

.. code-block:: python

sql.read_table('data', engine, parse_dates={'Date': '%Y-%m-%d'})
sql.read_table('data', engine, parse_dates={'Date': {'format': '%Y-%m-%d %H:%M:%S'}})


You can check if a table exists using :func:`~pandas.io.sql.has_table`

In addition, the class :class:`~pandas.io.sql.PandasSQLWithEngine` can be
instantiated directly for more manual control over the SQL interaction.

Querying
~~~~~~~~

You can query using raw SQL in the :func:`~pandas.io.sql.read_sql` function.
In this case you must use the SQL variant appropriate for your database.
When using SQLAlchemy, you can also pass SQLAlchemy Expression language constructs,
which are database-agnostic.

.. ipython:: python

sql.read_sql('SELECT * FROM data', engine)

sql.read_sql("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", cnx)
Of course, you can specify a more "complex" query.

.. ipython:: python
:suppress:

cu.close()
cnx.close()
sql.read_frame("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)


There are a few other available functions:
You can also run a plain query without creating a dataframe with
:func:`~pandas.io.sql.execute`. This is useful for queries that don't return values,
such as INSERT. This is functionally equivalent to calling ``execute`` on the
SQLAlchemy engine or db connection object. Again, ou must use the SQL syntax
variant appropriate for your database.

- ``tquery`` returns a list of tuples corresponding to each row.
- ``uquery`` does the same thing as tquery, but instead of returning results
it returns the number of related rows.
- ``write_frame`` writes records stored in a DataFrame into the SQL table.
- ``has_table`` checks if a given SQLite table exists.
.. code-block:: python

.. note::
sql.execute('SELECT * FROM table_name', engine)

sql.execute('INSERT INTO table_name VALUES(?, ?, ?)', engine, params=[('id', 1, 12.2, True)])


Engine connection examples
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

from sqlalchemy import create_engine

engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')

engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')

engine = create_engine('oracle://scott:[email protected]:1521/sidname')

engine = create_engine('mssql+pyodbc://mydsn')

# sqlite://<nohostname>/<path>
# where <path> is relative:
engine = create_engine('sqlite:///foo.db')

# or absolute, starting with a slash:
engine = create_engine('sqlite:////absolute/path/to/foo.db')


Legacy
~~~~~~
To use the sqlite support without SQLAlchemy, you can create connections like so:

.. code-block:: python

import sqlite3
from pandas.io import sql
cnx = sqlite3.connect(':memory:')

And then issue the following queries, remembering to also specify the flavor of SQL
you are using.

.. code-block:: python

sql.to_sql(data, 'data', cnx, flavor='sqlite')

sql.read_sql("SELECT * FROM data", cnx, flavor='sqlite')

For now, writing your DataFrame into a database works only with
**SQLite**. Moreover, the **index** will currently be **dropped**.

.. _io.bigquery:

Expand Down
Loading