Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable multivalues insert #19664

Merged
merged 3 commits into from
Mar 7, 2018
Merged

Conversation

danfrankj
Copy link
Contributor

@danfrankj danfrankj commented Feb 12, 2018

Summary

Currently when pushing a dataframe to a database, lines are inserted one by one. This change enables multivalues inserts.

TODO

  • release note
  • address chunksize behavior

Reference

http://docs.sqlalchemy.org/en/rel_0_9/core/dml.html?highlight=insert%20values#sqlalchemy.sql.expression.Insert.values

@danfrankj danfrankj force-pushed the df_multivalues_insert branch from 06fcfcc to af65ea5 Compare February 12, 2018 19:51
@TomAugspurger
Copy link
Contributor

xref #14315 and #8953

I think the previous issues were with sqlalchemy dialects that don't support multi-row inserts, so we'll need to test that.

  • docs, release note, and tests.

@TomAugspurger TomAugspurger added Performance Memory or execution speed performance IO SQL to_sql, read_sql, read_sql_query labels Feb 12, 2018
@danfrankj
Copy link
Contributor Author

For reference, I'm using a SQLAlchemy dialect that supports_multivalues_inserts and inserts still happen line by line. Will add tests though.

@danfrankj danfrankj force-pushed the df_multivalues_insert branch 3 times, most recently from 89fad1c to 4cc8890 Compare February 12, 2018 22:27
@codecov
Copy link

codecov bot commented Feb 13, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@f33e84c). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #19664   +/-   ##
=========================================
  Coverage          ?   91.71%           
=========================================
  Files             ?      150           
  Lines             ?    49104           
  Branches          ?        0           
=========================================
  Hits              ?    45035           
  Misses            ?     4069           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.09% <ø> (?)
#single 41.87% <ø> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f33e84c...f298de1. Read the comment docs.

@TomAugspurger
Copy link
Contributor

Ideally, in the tests we'll be able to introspect the sqlalchemy engine somehow to assert that multi-row inserts are actually happening.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a test

@danfrankj danfrankj force-pushed the df_multivalues_insert branch from 4cc8890 to 6bd1086 Compare February 18, 2018 07:53
@pep8speaks
Copy link

pep8speaks commented Feb 18, 2018

Hello @danfrankj! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 07, 2018 at 21:54 Hours UTC

@danfrankj danfrankj force-pushed the df_multivalues_insert branch 2 times, most recently from f7f1c3d to 9b50c47 Compare February 18, 2018 16:05
@danfrankj
Copy link
Contributor Author

@jreback @TomAugspurger added first stab at a test. Let me know what you think!

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test seems good, thanks.

How does this interact with chunksize? Can we say that you'll have len(df) // chunksize inserts? Is there any way to test that?

Also need a release note.

@@ -479,6 +479,25 @@ def _transaction_test(self):
res2 = self.pandasSQL.read_query('SELECT * FROM test_trans')
assert len(res2) == 1

def _test_insert_multivalues(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment with the Github issue number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And since this is sqlalchemy-specific, could you just define the test directly on _TestSQLAlchemy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added github issues as comments and test moved

@danfrankj danfrankj force-pushed the df_multivalues_insert branch from 9b50c47 to 951b74c Compare February 18, 2018 17:42
@@ -572,8 +572,11 @@ def create(self):
else:
self._execute_create()

def insert_statement(self):
return self.table.insert()
def insert_statement(self, data, conn):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a parameters and returns section as well. http://numpydoc.readthedocs.io/en/latest/format.html

pandas/io/sql.py Outdated
@@ -613,7 +616,7 @@ def insert_data(self):

def _execute_insert(self, conn, keys, data_iter):
data = [{k: v for k, v in zip(keys, row)} for row in data_iter]
conn.execute(self.insert_statement(), data)
conn.execute(*self.insert_statement(data, conn))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a whatsnew note that lists the backends where this would work.

@@ -1665,6 +1665,29 @@ class Temporary(Base):

tm.assert_frame_equal(df, expected)

def test_insert_multivalues(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explicity test which backends support this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I believe I've done this by adding a class variable to the below classes. Let me know if that addresses this concern

@danfrankj danfrankj force-pushed the df_multivalues_insert branch 2 times, most recently from c875d87 to 0db5d5c Compare February 23, 2018 18:28
@@ -323,6 +323,7 @@ Other Enhancements

- ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- :func:`pd.io.sql.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row (:issue:`14315`, :issue: `8953`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this should be an enhancement or a bugfix?

@danfrankj danfrankj force-pushed the df_multivalues_insert branch 3 times, most recently from e3953c6 to c62a9c1 Compare February 23, 2018 19:20
@danfrankj
Copy link
Contributor Author

@jreback @TomAugspurger I believe I've addressed your concerns above, when you get a chance PTAL

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments. Looks good overall.

Can you run a quick benchmark to see how things look compared to master? How much faster are things?

@@ -572,8 +572,11 @@ def create(self):
else:
self._execute_create()

def insert_statement(self):
return self.table.insert()
def insert_statement(self, data, conn):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a parameters and returns section as well. http://numpydoc.readthedocs.io/en/latest/format.html

pandas/io/sql.py Outdated
dialect = getattr(conn, 'dialect', None)
if dialect and getattr(dialect, 'supports_multivalues_insert', False):
return (self.table.insert(data),)
return (self.table.insert(), data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need the parenthesis here.

@@ -296,6 +296,8 @@ Other Enhancements
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)
- :func:`pd.io.sql.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row.
SQL dialects supporting multivalue inserts include mysql, postgresql, sqlite and any dialect with `supports_multivalues_insert`. (:issue:`14315`, :issue:`8953`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dialect" -> "SQLAlchemy dialect"

@@ -296,6 +296,8 @@ Other Enhancements
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)
- :func:`pd.io.sql.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if io.sql.to_sql is part of the API list. Better to just have

:meth:`DataFrame.to_sql`

@danfrankj
Copy link
Contributor Author

danfrankj commented Mar 2, 2018

Some profiling

Presto


In [5]: df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))

In [6]: %time df.to_sql('multi_insert_profile', presto_engine, schema='dan_frank', index=False)
INFO:pyhive.presto:SHOW COLUMNS FROM "dan_frank"."multi_insert_profile"
INFO:pyhive.presto:
CREATE TABLE "dan_frank"."multi_insert_profile" (
    "A" DOUBLE,
    "B" DOUBLE,
    "C" DOUBLE,
    "D" DOUBLE
)


INFO:pyhive.presto:INSERT INTO "dan_frank"."multi_insert_profile" ("A", "B", "C", "D") VALUES (-0.4425108530531213, -0.4582021047086419, 0.17242001384630398, -1.2917653645626361), (-0.9715964127007015, -0.1458055798883143, 0.3444250700373072, -0.35869901840257923), (0.6732070093449385, 0.3371601918897362, -0.49645678476330574, -0.8241023338536242), (-0.4845513289740901, -1.4860936235542728, 0.19123940403655423, -0.32166319533058985), (2.72221337305179, 0.31572155167450705, -0.5522159042533455, -0.28023622560479866), (-2.2406710854261345, 0.8005522925313067, -0.5762370339886204, 1.1784968768877826), (-0.06826129801094293, 0.2760723638718846, 0.526970720133034, 
... LOG TRUNCATED


CPU times: user 131 ms, sys: 9.06 ms, total: 140 ms
Wall time: 13.1 s



In [7]: presto_engine.dialect.supports_multivalues_insert = False

In [8]: %time df.to_sql('sequential_insert_profile', presto_engine, schema='dan_frank', index=False)
INFO:pyhive.presto:SHOW COLUMNS FROM "dan_frank"."sequential_insert_profile"
INFO:pyhive.presto:
CREATE TABLE "dan_frank"."sequential_insert_profile" (
    "A" DOUBLE,
    "B" DOUBLE,
    "C" DOUBLE,
    "D" DOUBLE
)


INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-0.4425108530531213, -0.4582021047086419, 0.17242001384630398, -1.2917653645626361)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-0.9715964127007015, -0.1458055798883143, 0.3444250700373072, -0.35869901840257923)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (0.6732070093449385, 0.3371601918897362, -0.49645678476330574, -0.8241023338536242)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-0.4845513289740901, -1.4860936235542728, 0.19123940403655423, -0.32166319533058985)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (2.72221337305179, 0.31572155167450705, -0.5522159042533455, -0.28023622560479866)
INFO:pyhive.presto:INSERT INTO "dan_frank"."sequential_insert_profile" ("A", "B", "C", "D") VALUES (-2.2406710854261345, 0.8005522925313067, -0.5762370339886204, 1.1784968768877826)
... LOG TRUNCATED 


CPU times: user 15.7 s, sys: 1.44 s, total: 17.1 s
Wall time: 14min 57s

MySQL

Comparable insert times

@danfrankj danfrankj force-pushed the df_multivalues_insert branch from 2dc22da to 09691d8 Compare March 5, 2018 19:09
@danfrankj
Copy link
Contributor Author

@TomAugspurger did some brief profiling and added additional documentation. Anything else you think is needed for this PR?

@TomAugspurger
Copy link
Contributor

@jorisvandenbossche any thoughts? This seems harmless to me, and the results are... impressive :)

@@ -340,6 +340,8 @@ Other Enhancements
- Added option ``display.html.use_mathjax`` so `MathJax <https://www.mathjax.org/>`_ can be disabled when rendering tables in ``Jupyter`` notebooks (:issue:`19856`, :issue:`19824`)
- :meth:`Timestamp.month_name`, :meth:`DatetimeIndex.month_name`, and :meth:`Series.dt.month_name` are now available (:issue:`12805`)
- :meth:`Timestamp.day_name` and :meth:`DatetimeIndex.day_name` are now available to return day names with a specified locale (:issue:`12806`)
- :meth:`DataFrame.to_sql` now performs a multivalue insert if the underlying connection supports this rather than inserting row by row.
SQLAlchemy dialects supporting multivalue inserts include mysql, postgresql, sqlite and any dialect with `supports_multivalues_insert`. (:issue:`14315`, :issue:`8953`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double back-ticks. can you also add a 'note' in io.rst about this.

@danfrankj danfrankj force-pushed the df_multivalues_insert branch from 09691d8 to 616935b Compare March 7, 2018 17:50
@danfrankj danfrankj force-pushed the df_multivalues_insert branch from 616935b to b8cbc2e Compare March 7, 2018 18:03
@danfrankj
Copy link
Contributor Author

@jreback backquotes fixed and note added to io.rst

@jreback jreback added this to the 0.23.0 milestone Mar 7, 2018
@jreback jreback merged commit 7c7bd56 into pandas-dev:master Mar 7, 2018
@jreback
Copy link
Contributor

jreback commented Mar 7, 2018

thanks!

@danfrankj
Copy link
Contributor Author

Thank you guys for the reviews! Excited for my first pandas contribution :)

harisbal added a commit to harisbal/pandas that referenced this pull request Mar 12, 2018
commit df2e361
Author: Jeff Reback <[email protected]>
Date:   Sun Mar 11 18:33:25 2018 -0400

    LINT: fixing

commit f1c0b7c
Author: David Polo <[email protected]>
Date:   Sun Mar 11 22:54:27 2018 +0100

    DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods… (pandas-dev#20157)

    * DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods.barh()
    - Added examples section
    - Added extended summary
    - Added argument explanation

    * DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods.barh()
    - Correcting PR comments

    * DOC: Improved the docstring of pandas.plotting._core.FramePlotMethods.barh()
    - Adding defaults for variables.

    * Update reference

commit 0780193
Author: Jonas Schulze <[email protected]>
Date:   Sun Mar 11 22:37:37 2018 +0100

    DOC: update the pandas.DataFrame.plot.density docstring (pandas-dev#20236)

    * DOC: update the pandas.DataFrame.plot.kde and pandas.Series.plot.kde docstrings

    Unfortunately, I was not able to compute a kernel estimate of a
    two-dimensional random variable. Hence, the example is more of an
    analysis of some independent data series.

    * DOC: extract similarities of kde docstrings

    The `DataFrame.plot.kde` and `Series.plot.kde` now use a common
    docstring, for which the differences are inserted.

commit 2718984
Author: Cihan Ceyhan <[email protected]>
Date:   Sun Mar 11 21:48:08 2018 +0100

    DOC: Update the pandas.Series.dt.round/floor/ceil docstrings (pandas-dev#20187)

    * DOC: Update the pandas.Series.dt.round/floor/ceil docstrings

    * DOC: review points fixed.

    * Add series

commit 0d86742
Author: Antonio Molina <[email protected]>
Date:   Sun Mar 11 18:57:37 2018 +0100

    DOC: Improved pandas.plotting.bootstrap_plot docstring (pandas-dev#20166)

    * Improved documentation on bootstrap_plot

    * Improved documentation on bootstrap_plot

    * Doc bootstrap_plot: Fixed some comments on pull requests

    * Added reference to wikipedia

    * Changed kwds for **kwds

    * Removed ** from kwds becuase of validation iuses

    * Fixed forgotten break line. I think that the kwds paramater now fits what expected @TomAugspurger. If not, sorry and indicate how it should be

    * Fixed warnings on compilation

    * Moved reference to extended description

commit a2910ad
Author: András Novoszáth <[email protected]>
Date:   Sun Mar 11 18:56:01 2018 +0100

    DOC: update the Index.get_values docstring (pandas-dev#20231)

    * DOC: update the Index.get_values docstring

    * Corrections

    * Corrected extended summary and quotes

    * Correcting spaces, extended summary, multiIndex example

    * See also correction

    * Multi ndim

commit afa6c42
Author: Marc <[email protected]>
Date:   Sun Mar 11 10:42:35 2018 -0400

    DOC: update the pandas.DataFrame.all docstring (pandas-dev#20216)

commit a44bae3
Author: Victor Villas <[email protected]>
Date:   Sun Mar 11 11:41:12 2018 -0300

    DOC: update the Series.view docstring (pandas-dev#20220)

commit 233103f
Author: David Adrián Cañones Castellano <[email protected]>
Date:   Sun Mar 11 15:40:02 2018 +0100

    DOC: update the docstring of pandas.DataFrame.from_dict (pandas-dev#20259)

commit 62bddec
Author: csfarkas <[email protected]>
Date:   Sun Mar 11 15:33:54 2018 +0100

    DOC: add docstring for Index.get_duplicates (pandas-dev#20223)

commit 8c77238
Author: adatasetaday <[email protected]>
Date:   Sun Mar 11 10:17:05 2018 -0400

    Docstring pandas.series.diff (pandas-dev#20238)

commit 4271757
Author: Aly Sivji <[email protected]>
Date:   Sun Mar 11 08:51:25 2018 -0500

    DOC: update `pandas/core/ops.py` docstring template to accept examples (pandas-dev#20246)

commit 080ef0c
Author: akosel <[email protected]>
Date:   Sun Mar 11 12:43:10 2018 +0000

    DOC: update the DataFrame.iat[] docstring (pandas-dev#20219)

    * DOC: update the DataFrame.iat[] docstring

    * Update based on PR comments

    * Update based on PR comments

    * Singular not plural

    * Update to account for use with Series. Add example using Series.

    * Update indexing.py

    * PEP8

commit 302fda4
Author: adatasetaday <[email protected]>
Date:   Sun Mar 11 08:36:21 2018 -0400

    DOC: update the pandas.DataFrame.diff docstring (pandas-dev#20227)

    * DOC: update the pandas.DataFrame.diff  docstring

    * DOC: update the pandas.DataFrame.diff docstring

    * DOC: update the pandas.DataFrame.diff docstring

    * DOC: update the pandas.DataFrame.diff docstring

    * DOC: update the pandas.DataFrame.diff docstring

    * DOC: update the pandas.DataFrame.diff  docstring

    * DOC: update the pandas.DataFrame.diff  docstring

    * DOC: update the pandas.DataFrame.diff  docstring

    * DOC: update the pandas.DataFrame.diff docstring

    * Cleanup

commit c791a84
Author: Pietro Battiston <[email protected]>
Date:   Sun Mar 11 13:07:01 2018 +0100

    DOC: pd.core.window.Expanding.kurt docstring (split from pd.core.Rolling.kurt) (pandas-dev#20064)

commit b3d6ce6
Author: Nipun Sadvilkar <[email protected]>
Date:   Sun Mar 11 17:29:33 2018 +0530

    DOC: update the pandas.date_range() docstring (pandas-dev#20143)

    * DOC: Improved the docstring of pandas.date_range()

    * Change date strings to iso format

    * Removed import pands in Examples docstring

    * Add See Also Docstring

    * Update datetimes.py

    * Doctests

commit 6d7272a
Author: Samuel Sinayoko <[email protected]>
Date:   Sun Mar 11 11:58:09 2018 +0000

    DOC: update DataFrame.to_records (pandas-dev#20191)

    * Update to_records docstring.

    - Minor changes (missing dots, newlines) to make tests pass.
    - More examples.

    * Fix html docs.

    Missing newlines.

    * Reword datetime type information.

    * flake8 errors

    * Fix typo (duplicated type)

    * Remove unwanted blank line after Examples.

    * Fix doctests.

    ```
    (pandas_dev) sinayoks@landade:~/dev/pandas/ $ pytest --doctest-modules pandas/core/frame.py -k to_record
    ========================================================================================== test session starts ==========================================================================================
    platform darwin -- Python 3.6.4, pytest-3.4.2, py-1.5.2, pluggy-0.6.0
    rootdir: /Users/sinayoks/dev/pandas, inifile: setup.cfg
    plugins: xdist-1.22.1, forked-0.2, cov-2.5.1
    collected 43 items

    pandas/core/frame.py .                                                                                                                                                                            [100%]

    ========================================================================================== 42 tests deselected ==========================================================================================
    ```

    * Few more changes

commit 636335a
Author: Gabriel de Maeztu <[email protected]>
Date:   Sun Mar 11 12:56:48 2018 +0100

    DOC: Improved the docstring of pandas.plotting.radviz (pandas-dev#20169)

commit fbebc7f
Author: jen w <[email protected]>
Date:   Sun Mar 11 06:50:54 2018 -0500

    DOC: Update pandas.DataFrame.tail docstring (pandas-dev#20225)

commit c2864d7
Author: Stephen Childs <[email protected]>
Date:   Sun Mar 11 07:50:39 2018 -0400

    DOC: update the DataFrame.cov docstring (pandas-dev#20245)

    * DOC: Revise docstring of DataFrame cov method

    Update the docstring with some examples from
    elsewhere in the pandas documentation.

    Some of the examples use randomly generated time series
    because we need to get covariance between long series.
    Used a random seed to ensure that the results are the
    same each time.

    * DOC: Fix See Also and min_periods explanation.

    Responding to comments on PR. See also section will link
    properly and number of periods explanation clearer.

commit 90e31b9
Author: jen w <[email protected]>
Date:   Sun Mar 11 06:50:18 2018 -0500

    DOC: update pandas.DataFrame.head docstring (pandas-dev#20262)

commit fb556ed
Author: Israel Saeta Pérez <[email protected]>
Date:   Sat Mar 10 22:33:42 2018 +0100

    DOC: Improve pandas.Series.plot.kde docstring and kwargs rewording for whole file (pandas-dev#20041)

commit c3d491a
Author: Andy R. Terrel <[email protected]>
Date:   Sat Mar 10 11:48:13 2018 -0800

    DOC: update the DataFrame.head()  docstring (pandas-dev#20206)

commit dd7f567
Author: DataOmbudsman <[email protected]>
Date:   Sat Mar 10 20:15:48 2018 +0100

    DOC: update the Index.shift docstring (pandas-dev#20192)

    * DOC: updating docstring of Index.shift

    * Add See Also section to shift

    * Update link to Series.shift

commit 5b0caf4
Author: Eric O. LEBIGOT (EOL) <[email protected]>
Date:   Sat Mar 10 17:32:20 2018 +0100

    DOC: update the Series.memory_usage() docstring (pandas-dev#20086)

commit 9fb7ac9
Author: Carol Willing <[email protected]>
Date:   Sat Mar 10 08:28:54 2018 -0800

    DOC: Edit contributing to docs section (pandas-dev#20190)

commit d8181a5
Author: DaanVanHauwermeiren <[email protected]>
Date:   Sat Mar 10 17:25:20 2018 +0100

    DOC: update the Series.isin docstring (pandas-dev#20175)

commit ec631ce
Author: Riccardo Magliocchetti <[email protected]>
Date:   Sat Mar 10 17:12:41 2018 +0100

    DOC: update the pandas.Series.tail docstring (pandas-dev#20176)

commit e5e4ae9
Author: DaanVanHauwermeiren <[email protected]>
Date:   Sat Mar 10 16:41:58 2018 +0100

    DOC: update the pandas.Index.drop_duplicates and pandas.Series.drop_duplicates docstring (pandas-dev#20114)

commit d7bcb22
Author: Riccardo Magliocchetti <[email protected]>
Date:   Sat Mar 10 15:49:31 2018 +0100

    DOC: update the MultiIndex.swaplevel docstring (pandas-dev#20105)

commit 8497029
Author: Gjelt <[email protected]>
Date:   Sat Mar 10 15:41:17 2018 +0100

    DOC: Improved the docstring of pandas.DataFrame.values (pandas-dev#20065)

commit 840d432
Author: Jordi Contestí <[email protected]>
Date:   Sat Mar 10 13:24:35 2018 +0100

    DOC: Improved the docstring of Series.str.findall (pandas-dev#19982)

commit 2a0d23b
Author: Jeff Reback <[email protected]>
Date:   Sat Mar 10 06:54:19 2018 -0500

    DOC: lint

commit bf0dcb5
Author: Kate Surta <[email protected]>
Date:   Sat Mar 10 14:42:52 2018 +0300

    BUG: Check for wrong arguments in index subclasses constructors (pandas-dev#20017)

commit 4131149
Author: Stijn Van Hoey <[email protected]>
Date:   Sat Mar 10 10:15:41 2018 +0100

    DOC: Extend docstring pandas core index to_frame method (pandas-dev#20036)

commit 52cffa3
Author: William Ayd <[email protected]>
Date:   Fri Mar 9 18:06:43 2018 -0800

    Cythonized GroupBy pct_change (pandas-dev#19919)

commit da6f827
Author: William Ayd <[email protected]>
Date:   Fri Mar 9 18:03:50 2018 -0800

    Refactored GroupBy ASVs (pandas-dev#20043)

commit bd31f71
Author: William Ayd <[email protected]>
Date:   Fri Mar 9 17:53:34 2018 -0800

    Added 'displayed_only' option to 'read_html' (pandas-dev#20047)

commit ed96567
Author: Ksenia <[email protected]>
Date:   Sat Mar 10 02:40:10 2018 +0100

    TST: series/indexing tests parametrization + moving test methods (pandas-dev#20059)

commit 7c14e4f
Author: Kyle Barron <[email protected]>
Date:   Fri Mar 9 11:31:14 2018 -0500

    DOC: Add syntax highlighting to SAS code blocks in comparison_with_sas.rst (pandas-dev#20080)

    * Add syntax highlighting to SAS code blocks

    * Fix typo

commit 731d971
Author: Matthew Roeschke <[email protected]>
Date:   Fri Mar 9 03:30:22 2018 -0800

    Fix typo in apply.py (pandas-dev#20058)

commit cc1b934
Author: Matthew Roeschke <[email protected]>
Date:   Fri Mar 9 03:13:50 2018 -0800

    BUG: Retain timezone dtype with cut and qcut (pandas-dev#19890)

commit c730d08
Author: William Ayd <[email protected]>
Date:   Fri Mar 9 02:37:27 2018 -0800

    DOC: Update Kurt Docstr (pandas-dev#20044)

commit 9119d07
Author: Joris Van den Bossche <[email protected]>
Date:   Fri Mar 9 10:03:44 2018 +0100

    Temporary github PR template for sprint (pandas-dev#20055)

commit 747501a
Author: Aly Sivji <[email protected]>
Date:   Fri Mar 9 02:19:59 2018 -0600

    DOC: Improve docstring for pandas.Index.repeat (pandas-dev#19985)

commit 1d73cf3
Author: Rouz Azari <[email protected]>
Date:   Thu Mar 8 16:54:53 2018 -0800

    BUG: Dense ranking with percent now uses 100% basis (pandas-dev#15639)

commit f9fd540
Author: William Ayd <[email protected]>
Date:   Thu Mar 8 16:36:23 2018 -0800

    Added flake8 to DEV requirements (pandas-dev#20063)

commit b669112
Author: Joris Van den Bossche <[email protected]>
Date:   Thu Mar 8 14:09:12 2018 +0100

    DOC: require returns section in validation script (pandas-dev#19994)

commit 024d8b4
Author: Jeff Reback <[email protected]>
Date:   Thu Mar 8 07:08:57 2018 -0500

    TST: xfail test_time on py2 & mpl 1.4.3 (pandas-dev#20053)

commit b85f6c1
Author: Marc Garcia <[email protected]>
Date:   Thu Mar 8 11:07:08 2018 +0000

    DOC: update docstring validation script + replace api coverage script (pandas-dev#20025)

    * Improvments to validate_docstrings script: adding sections to summary, validating type and description of parameters

    * DOC: Improvements to validate docstring script (added api_coverage functionality, sections in csv and extra validations)

commit 9273bf5
Author: Joris Van den Bossche <[email protected]>
Date:   Thu Mar 8 11:14:05 2018 +0100

    DOC/CI: temp pin matplotlib for doc build (pandas-dev#20045)

commit 63ce781
Author: Jeff Reback <[email protected]>
Date:   Wed Mar 7 17:01:38 2018 -0500

    TST: xfail mpl 2.2 tests

    xref pandas-dev#20031

commit 7c7bd56
Author: Daniel Frank <[email protected]>
Date:   Wed Mar 7 13:54:46 2018 -0800

    enable multivalues insert (pandas-dev#19664)

commit f33e84c
Author: Ksenia <[email protected]>
Date:   Wed Mar 7 22:09:42 2018 +0100

    Moving tests in series/indexing to fixtures (pandas-dev#20014.1) (pandas-dev#20034)

commit 2532a49
Author: Liam3851 <[email protected]>
Date:   Wed Mar 7 13:04:22 2018 -0500

    BUG: Fixes to msgpack support. (pandas-dev#19975)

commit fd010de
Author: Guilherme Beltramini <[email protected]>
Date:   Wed Mar 7 11:33:09 2018 -0300

    to_sql also accepts Series (pandas-dev#20004)

commit 8d462ed
Author: Paul Reidy <[email protected]>
Date:   Wed Mar 7 14:32:12 2018 +0000

    EHN: Implement method argument for DataFrame.replace (pandas-dev#19894)

commit d14fae8
Author: jbrockmendel <[email protected]>
Date:   Wed Mar 7 06:19:21 2018 -0800

    cleanup ops (pandas-dev#19972)

commit 776f2be
Author: William Ayd <[email protected]>
Date:   Wed Mar 7 05:59:39 2018 -0800

    Added .pytest_cache to gitignore (pandas-dev#20021)

commit 460941f
Author: jschendel <[email protected]>
Date:   Wed Mar 7 06:57:51 2018 -0700

    Fix typos in test_interval_new (pandas-dev#20026)

commit 5782ab8
Author: Joris Van den Bossche <[email protected]>
Date:   Wed Mar 7 14:57:17 2018 +0100

    DOC: enable matplotlib plot_directive to include figures in docstrings (pandas-dev#20015)

commit dd2b224
Author: DataOmbudsman <[email protected]>
Date:   Wed Mar 7 14:56:49 2018 +0100

    DOC: updating docstring of Index.shift (pandas-dev#19996)

commit 09c416c
Author: William Ayd <[email protected]>
Date:   Wed Mar 7 05:56:16 2018 -0800

    DOC: Updated kurt docstring (for pandas sprint) (pandas-dev#19999)

commit ad15f80
Author: Kate Surta <[email protected]>
Date:   Wed Mar 7 16:55:48 2018 +0300

    TST: Fix wrong argument in TestDataFrameAlterAxes.test_set_index_dst (pandas-dev#20019)

commit f6ee9ac
Author: Jeff Reback <[email protected]>
Date:   Wed Mar 7 08:55:33 2018 -0500

    TST: xfail clip tests under numpy-dev (pandas-dev#20035)

    xref pandas-dev#19976

commit 397e296
Author: Jeff Reback <[email protected]>
Date:   Wed Mar 7 08:15:49 2018 -0500

    TST: xfail some tests for mpl 2.2 compat (pandas-dev#20033)

    xref pandas-dev#20031

commit 56939b4
Author: luzpaz <[email protected]>
Date:   Wed Mar 7 06:10:39 2018 -0500

    DOC: misc typos (pandas-dev#20029)

commit 01b91c2
Author: alinde1 <[email protected]>
Date:   Tue Mar 6 22:47:45 2018 +0100

    DOC: is confusing for ddof parameter of sem, var and std functions (pandas-dev#19986)

commit db82165
Author: Joris Van den Bossche <[email protected]>
Date:   Tue Mar 6 22:42:41 2018 +0100

    CLN/DOC: cache_readonly: remove allow_setting + preserve docstring (pandas-dev#19991)

commit e02f737
Author: Tom Augspurger <[email protected]>
Date:   Tue Mar 6 09:38:32 2018 -0600

    DOC: add doc on ExtensionArray and extending pandas (pandas-dev#19936)

commit 0ca77b3
Author: jbrockmendel <[email protected]>
Date:   Tue Mar 6 04:27:21 2018 -0800

    Datetimelike add/sub catch cases more explicitly, tests (pandas-dev#19912)

commit 0038bad
Author: Matthew Roeschke <[email protected]>
Date:   Tue Mar 6 04:25:55 2018 -0800

    month_name/day_name warnings followup (pandas-dev#20010)

commit fd63c90
Author: Ksenia <[email protected]>
Date:   Tue Mar 6 13:25:37 2018 +0100

    TST: split series/test_indexing.py (pandas-dev#18614) (pandas-dev#20006)

commit 6366bf0
Author: Jeff Reback <[email protected]>
Date:   Tue Mar 6 07:25:17 2018 -0500

    TST: clean deprecation warnings for xref pandas-dev#19980 (pandas-dev#20013)

    xfail some mpl > 2.1.2 tests

commit fe61299
Author: William Ayd <[email protected]>
Date:   Tue Mar 6 00:30:13 2018 -0800

    DOC: fixed dynamic import mechanics of make.py (pandas-dev#20005)

commit 8a084eb
Author: Grant Smith <[email protected]>
Date:   Tue Mar 6 03:29:26 2018 -0500

    CLN: deprecate the pandas.tseries.plotting.tsplot function (GH18627) (pandas-dev#19980)

commit aedbd94
Author: Jeff Reback <[email protected]>
Date:   Mon Mar 5 06:36:41 2018 -0500

    TST: text correction, xref pandas-dev#19987

commit cbffd19
Author: Bhavesh Poddar <[email protected]>
Date:   Mon Mar 5 06:34:59 2018 -0500

    fixed pytest deprecation warning (pandas-dev#19987)

commit 058a16c
Author: Matthew Roeschke <[email protected]>
Date:   Mon Mar 5 03:23:49 2018 -0800

    CLN: Use generators in builtin functions (pandas-dev#19989)

commit 607910b
Author: Matthew Roeschke <[email protected]>
Date:   Sun Mar 4 12:15:37 2018 -0800

    Add month names (pandas-dev#18164)

commit 2fad756
Author: jbrockmendel <[email protected]>
Date:   Sun Mar 4 12:00:39 2018 -0800

    transition period_helper to use pandas_datetimestruct (pandas-dev#19918)

commit 53606ff
Author: Liam3851 <[email protected]>
Date:   Sun Mar 4 14:58:22 2018 -0500

    BUG: Compat for pre-0.20 TimedeltaIndex and Float64Index pickles pandas-dev#19939 (pandas-dev#19943)

commit 0bfb61b
Author: Joris Van den Bossche <[email protected]>
Date:   Fri Mar 2 22:35:45 2018 +0100

    DOC: small updates to make.py script (pandas-dev#19951)

    * enable passing verbosity flag to sphinx

    * alias api for api.rst

commit d1f3689
Author: Joris Van den Bossche <[email protected]>
Date:   Fri Mar 2 22:33:48 2018 +0100

     DOC: fix some sphinx syntax warnings  (pandas-dev#19962)

commit 49f09cc
Author: Tom Augspurger <[email protected]>
Date:   Fri Mar 2 15:20:28 2018 -0600

    API: Added ExtensionArray constructor from scalars (pandas-dev#19913)

commit d30d165
Author: Joris Van den Bossche <[email protected]>
Date:   Fri Mar 2 22:18:10 2018 +0100

    DOC: update docstring validation script (pandas-dev#19960)

commit a7a7f8c
Author: Joris Van den Bossche <[email protected]>
Date:   Fri Mar 2 13:49:59 2018 +0100

    DOC: clarify version of ActivePython that includes pandas (pandas-dev#19964)

commit b167483
Author: Gina <[email protected]>
Date:   Fri Mar 2 05:33:49 2018 -0600

    DOC: update install.rst to include ActivePython distribution (pandas-dev#19908)

commit e6c7dea
Author: topper-123 <[email protected]>
Date:   Fri Mar 2 11:19:07 2018 +0000

    ENH: Let initialisation from dicts use insertion order for python >= 3.6 (part III) (pandas-dev#19884)

commit d615f86
Author: Marc Garcia <[email protected]>
Date:   Fri Mar 2 09:39:45 2018 +0000

    DOC: Adding script to validate docstrings, and generate list of all functions/methods with state (pandas-dev#19898)

commit 5f271eb
Author: Yian <[email protected]>
Date:   Fri Mar 2 00:13:58 2018 +0100

    BUG: Adding skipna as an option to groupby cumsum and cumprod (pandas-dev#19914)

commit 072545d
Author: David C Hall <[email protected]>
Date:   Thu Mar 1 15:06:20 2018 -0800

    ENH: Add option to disable MathJax (pandas-dev#19824). (pandas-dev#19856)

commit d44a6ec
Author: Yian <[email protected]>
Date:   Fri Mar 2 00:02:31 2018 +0100

    Making to_datetime('today') and Timestamp('today') consistent (pandas-dev#19937)

commit 87fefe2
Author: jbrockmendel <[email protected]>
Date:   Thu Mar 1 14:54:42 2018 -0800

    dispatch Series[datetime64] comparison ops to DatetimeIndex (pandas-dev#19800)

commit 9242248
Author: Matthew Roeschke <[email protected]>
Date:   Thu Mar 1 14:50:35 2018 -0800

    BUG: DataFrame.diff(axis=0) with DatetimeTZ data (pandas-dev#19773)

commit c5a1ef1
Author: Joris Van den Bossche <[email protected]>
Date:   Thu Mar 1 22:48:39 2018 +0100

    DOC: remove empty attribute/method lists from class docstrings html page (pandas-dev#19949)

commit 9958ce6
Author: jschendel <[email protected]>
Date:   Thu Mar 1 04:14:19 2018 -0700

    BUG: Preserve column metadata with DataFrame.astype (pandas-dev#19948)

commit 3b4eb8d
Author: Joris Van den Bossche <[email protected]>
Date:   Thu Mar 1 12:12:35 2018 +0100

    CLN: remove redundant clean_fill_method calls (pandas-dev#19947)

commit c8859b5
Author: Joris Van den Bossche <[email protected]>
Date:   Thu Mar 1 10:35:05 2018 +0100

    DOC: script to build single docstring page (pandas-dev#19840)

commit 52559f5
Author: Matthew Roeschke <[email protected]>
Date:   Wed Feb 28 17:32:24 2018 -0800

    ENH: Allow Timestamp to accept Nanosecond argument (pandas-dev#19889)

commit 4a27697
Author: William Ayd <[email protected]>
Date:   Wed Feb 28 17:30:18 2018 -0800

    Cythonized GroupBy any (pandas-dev#19722)

commit 96b8bb1
Author: jschendel <[email protected]>
Date:   Wed Feb 28 18:07:15 2018 -0700

    ENH: Implement DataFrame.astype('category') (pandas-dev#18099)

commit 6ef4be3
Author: Liam3851 <[email protected]>
Date:   Wed Feb 28 06:14:11 2018 -0500

    ENH: Allow literal (non-regex) replacement using .str.replace pandas-dev#16808 (pandas-dev#19584)

commit 318a287
Author: README Bot <[email protected]>
Date:   Wed Feb 28 05:07:28 2018 -0600

    Add CodeTriage badge to pandas-dev/pandas (pandas-dev#19928)

    Adds a badge showing the number of people helping this repo on CodeTriage.

commit 14a38a6
Author: Chris Catalfo <[email protected]>
Date:   Wed Feb 28 03:14:23 2018 -0500

    DOC: fixes pipe example in basics.rst due to statsmodel changes (pandas-dev#19923)

commit dfe9d4a
Author: Phil Ngo <[email protected]>
Date:   Wed Feb 28 00:05:56 2018 -0800

    DOC: fix Series.reset_index example (pandas-dev#19930)

commit 9bdc5c8
Author: William Ayd <[email protected]>
Date:   Tue Feb 27 16:16:48 2018 -0800

    Consistent Timedelta Writing for all Excel Engines (pandas-dev#19921)

commit 61211a8
Author: jbrockmendel <[email protected]>
Date:   Tue Feb 27 16:11:47 2018 -0800

    Assorted _libs cleanups (pandas-dev#19887)
pandres pushed a commit to pandres/pandas that referenced this pull request Mar 15, 2018
@tripkane
Copy link

Hi all, In pandas 0.22 I could write a dataframe to sql of reasonable size without error. Now I receive this error "OperationalError: (sqlite3.OperationalError) too many SQL variables". I am converting a dataframe with ~20k+ rows to sql. After looking around I suspect the problem lies in the limit set by sqlite3: SQLITE_MAX_VARIABLE_NUMBER which is set to 999 by default. This can apparently be changed by recompiling sqlite and adjusting this variable accordingly. I also noticed that adjusting the chunksize in DataFrame.to_sql has no effect perhaps confirming this is the root cause.

@TomAugspurger
Copy link
Contributor

@tripkane maybe make a new issue (link back here) with a reproducible example.

@tripkane
Copy link

@TomAugspurger: ok thanks, will do

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jun 7, 2018
jorisvandenbossche added a commit that referenced this pull request Jun 7, 2018
daminisatya pushed a commit to daminisatya/pandas that referenced this pull request Jun 8, 2018
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018
TomAugspurger pushed a commit that referenced this pull request Jun 12, 2018
This reverts commit 7c7bd56.

(cherry picked from commit c460710)
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants