-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: to_sql() add parameter "method" to control insertions method (#8… #21401
ENH: to_sql() add parameter "method" to control insertions method (#8… #21401
Conversation
Hello @schettino72! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 27, 2018 at 23:51 Hours UTC |
There is only one problem with flake8 complaining about a long. |
Note: I am targeting this to 0.24, not 0.23.x. |
pandas/core/generic.py
Outdated
|
||
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format( | ||
table_name, columns) | ||
cur.copy_expert(sql=sql, file=s_buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice explanation here! Do you mind moving some of it to the parameter description? That's probably where viewers will look first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, indeed (similar to my comment above), I would at least put a basic explanation of each option in the parameter description.
And then you can refer to below for more detailed performance considerations / example ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we can put this example in the user guide (io.rst) ?
@schettino72 : That's exactly what we would do as well. Good call! |
@schettino72 : Looks like you have a lint error according to Travis here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Added some comments, didn't look at the tests yet
pandas/core/generic.py
Outdated
|
||
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format( | ||
table_name, columns) | ||
cur.copy_expert(sql=sql, file=s_buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, indeed (similar to my comment above), I would at least put a basic explanation of each option in the parameter description.
And then you can refer to below for more detailed performance considerations / example ..
pandas/core/generic.py
Outdated
This usually provides a big performance for Analytic databases | ||
like *Presto* and *Redshit*, but has worse performance for | ||
traditional SQL backend if the table contains many columns. | ||
For more information check SQLAlchemy `documention <http://docs.sqlalchemy.org/en/latest/core/dml.html?highlight=multivalues#sqlalchemy.sql.expression.Insert.values.params.*args>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you start the url on the next line, I think flake8 will be OK with the too long line, like:
For more information check SQLAlchemy `documention
<http://docs.sqlalchemy.org/en/latest/core/dml.html?highlight=multivalues#sqlalchemy.sql.expression.Insert.values.params.*args>`__.
pandas/core/generic.py
Outdated
|
||
- `'default'`: Uses standard SQL `INSERT` clause | ||
- `'multi'`: Pass multiple values in a single `INSERT` clause. | ||
It uses a **special** SQL syntax not supported by all backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'It' should be indented equally to the `'multi'`
of the line above (now it is one space more I think)
pandas/core/generic.py
Outdated
traditional SQL backend if the table contains many columns. | ||
For more information check SQLAlchemy `documention <http://docs.sqlalchemy.org/en/latest/core/dml.html?highlight=multivalues#sqlalchemy.sql.expression.Insert.values.params.*args>`__. | ||
- callable: with signature `(pd_table, conn, keys, data_iter)`. | ||
This can be used to implement more performant insertion based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here about the indentation
pandas/core/generic.py
Outdated
The parameter ``method`` controls the SQL insertion clause used. | ||
Possible values are: | ||
|
||
- `'default'`: Uses standard SQL `INSERT` clause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to say this is a single insert statement for each row?
pandas/core/generic.py
Outdated
|
||
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format( | ||
table_name, columns) | ||
cur.copy_expert(sql=sql, file=s_buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we can put this example in the user guide (io.rst) ?
pandas/io/sql.py
Outdated
def insert(self, chunksize=None, method=None): | ||
|
||
# set insert method | ||
if method in (None, 'default'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the 'None' needed here?
Codecov Report
@@ Coverage Diff @@
## master #21401 +/- ##
=======================================
Coverage 91.89% 91.89%
=======================================
Files 153 153
Lines 49596 49596
=======================================
Hits 45576 45576
Misses 4020 4020
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #21401 +/- ##
==========================================
+ Coverage 92.22% 92.22% +<.01%
==========================================
Files 169 169
Lines 50897 50903 +6
==========================================
+ Hits 46940 46946 +6
Misses 3957 3957
Continue to review full report at Codecov.
|
Thanks for quick review and tips. Ready for next round 😁 |
@schettino72 : Seems like you have test failures... |
lgtm. @jorisvandenbossche merge when satisfied. |
lgtm. ping @jorisvandenbossche |
table_name = table.name | ||
|
||
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format( | ||
table_name, columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise this is just an example, but isn't this piece vulnerable to SQL injection? thepsycopg2.sql
module has an Identifier
object designed to handle SQL identifiers appropriately. See also Example 41-1. Quoting Values In Dynamic Queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dahlbaek no expert on this, but if you could provide the code that you would replace it with to make it more robust, that would certainly be welcome!
@schettino72 would love to see this this ship - think we'll get it into the next release? |
What's the status on this? @schettino72 there's a merge conflict with master, if you could merge master and push. |
@schettino72 The magnificent you are the only hope for this unbelievable feature to be delivered at this wonderful release. |
Sorry, I dont have the time now to work on this. It seems the conflict is only on README file. |
Waiting for this! Thanks guys. |
@jreback Merged master, changed the |
s_buf = StringIO() | ||
writer = csv.writer(s_buf) | ||
writer.writerows(data_iter) | ||
s_buf.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this needs testing.
@jreback Added a test for the example in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. minor comment on the whatsnew.
+1 aside from the example. I'll push a commit updating it. |
Thanks @jreback, @TomAugspurger. And thank you @schettino72! |
Thank you all! Looking forward to using this :) |
Hooray! Thanks :) |
…ndas-dev#8… (pandas-dev#21401) * ENH: to_sql() add parameter "method" to control insertions method (pandas-dev#8953) * ENH: to_sql() add parameter "method". Fix docstrings (pandas-dev#8953) * ENH: to_sql() add parameter "method". Improve docs based on reviews (pandas-dev#8953) * ENH: to_sql() add parameter "method". Fix unit-test (pandas-dev#8953) * doc clean-up * additional doc clean-up * use dict(zip()) directly * clean up merge * default --> None * Remove stray default * Remove method kwarg * change default to None * test copy insert snippit * print debug * index=False * Add reference to documentation
…ndas-dev#8… (pandas-dev#21401) * ENH: to_sql() add parameter "method" to control insertions method (pandas-dev#8953) * ENH: to_sql() add parameter "method". Fix docstrings (pandas-dev#8953) * ENH: to_sql() add parameter "method". Improve docs based on reviews (pandas-dev#8953) * ENH: to_sql() add parameter "method". Fix unit-test (pandas-dev#8953) * doc clean-up * additional doc clean-up * use dict(zip()) directly * clean up merge * default --> None * Remove stray default * Remove method kwarg * change default to None * test copy insert snippit * print debug * index=False * Add reference to documentation
…953)
git diff upstream/master -u -- "*.py" | flake8 --diff