-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pandas/io/sql.py work with sqlalchemy 2.0 #48576
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @cdcadman
I'm sorry I'm not so familiar with the SQLAlchemy changes, but do you mind explaining why we should filter RemovedIn20Warnings
instead of updating the code? I assume we want to be compatible with both 1.4 and 2.0 which are not compatible. But feels like it'd make more sense to for now use if
statements to support both versions.
I may surely be missing something, if you can please expand on why this approach, that would be great. Thanks!
pandas/io/sql.py
Outdated
elif "statement" in kwargs: | ||
statement = kwargs["statement"] | ||
else: | ||
statement = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as statement = kwargs.get("statement")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I just made a new commit to address this.
My approach is to update pandas/io/sql.py so that it can support both versions. I also had to update pandas/tests/io/test_sql.py, because pandas users will need run DataFrame.to_sql within a transaction if they pass a sqlalchemy Connection under sqlalchemy 2.0. This is how I've approached the transition to sqlalchemy 2.0 in my own work: by continuing to use 1.4 while ensuring that my code would still work under 2.0. The RemovedIn20Warning provides a way of testing that the code will work under sqlalchemy 2.0, while the tests run under sqlalchemy 1.4. Another approach is to pass future=True to sqlalchemy.create_engine in order to create 2.0-style engines for testing. I hadn't thought of expanding the tests to run under both 1.4 and 2.0 sqlalchemy engines and connections, but that might be a better approach to testing the code during the transition period, since it does not require the SQLALCHEMY_WARN_20 environment variable to be set prior to the start of the tests. |
Thanks a lot for all the information @cdcadman, that's useful. There is something I still don't fully understand. Let me explain in detail, and please correct me if I'm wrong, I can surely be missing something. Let me use this example: # 1.4
session.query(User).get(42)
# 2.0
session.get(User, 42) I see two different cases. Case A: 2.0 syntax also works in 1.4 This case is easy, we should simply replace one by the other, right? Case B: 2.0 syntax doesn't work on 1.4 Then, while we want to keep compatibility with 1.4, we should have something like (code won't work, just to illustrate): if sqlalchemy.__version__ < '2.0':
session.query(User).get(42)
else:
session.get(User, 42) When we test a pandas function, if we have all the code written this way, once we finished all the changes, we shouldn't have any warning being raised, so we wouldn't need to filter any warning in our code. If we agree on this (please let me know if I missed anything), then the next question is how we make these changes. If I understand correctly, there is this flag you mention that makes alchemy start raising warnings about deprecations. I guess it can be activated locally, address the warnings, and then open PRs just with the fixes. Or we can activate it in our CI, so the warnings are visible in our tests. Either way is probably fine. If the above is correct, then I don't understand why we need to filter warnings in our code. Do you mind explaining what I'm missing please? |
@datapythonista I agree with everything you wrote. I don't think we should censor
This causes tests to fail if they raise RemovedIn20Warning. Your comment about checking sqlalchemy versions reminded me that it is possible within sqlalchemy version 1.4 to obtain a 2.0-style sqlalchemy engine by passing So I'd like to redo this PR in the next few days. I don't expect pandas/io/sql.py to change, but I will do something different with the test file:
|
Update: pandas/io/sql.py will have to change significantly from how I currently have it in this PR. |
@datapythonista I modified this PR's code, title, and original comment, and it is again ready for review. Since I am now testing 2.0-style sqlalchemy connectables instead of checking for warnings, I think this can close #40686. Since I did make a lot of changes, would you prefer that I start a new PR instead? |
Is it known when SQLAlchemy 2.0 will be released? If it comes before pandas 2.0 (late 2022/early 2023), maybe we can just bump the minimum version of SQLAlchemy to 2.0 and adopt the new syntax. I'm not in love in duplicating all the testing & having to support 2 versions of an optional dependency with different syntax. xref: #44823 |
@mroeschke I haven't seen a release date for sqlalchemy 2.0. Instead of duplicating all the tests, I could make the |
pandas/tests/io/test_sql.py
Outdated
@@ -2387,12 +2438,14 @@ class _TestMySQLAlchemy: | |||
|
|||
flavor = "mysql" | |||
port = 3306 | |||
future = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we usually do to manage two versions of the same library is to add a flag in pandas.compat
, and sometimes a function to handle both versions behavior, that then we call from our code. Do you think this could be helpful and avoid the test duplication and passing the future
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test duplication with the future
argument is different from what pandas.compat
provides. The future
argument duplicates the type of sqlalchemy connectables that can be passed to read_sql_query
and DataFrame.to_sql
. Currently, the tests are already duplicated to ensure that both sqlalchemy.engine.Engine
and sqlalchemy.engine.Connection
can be passed to these methods, and the fact that these are both allowed also makes the code in pandas.io.sql
more complicated. In sqlalchemy 1.4, each of these connectables has a subclass, sqlalchemy.future.Engine
and sqlalchemy.future.Connection
. In pandas.io.sql
, I did not differentiate between future and non-future, but the test duplication ensures that all 4 of these sqlalchemy connectables work. Once pandas decides to require sqlalchemy 2.0, the future/non-future duplication will be unnecessary.
The 2.0 beta is out. |
I'm planning to make some changes to this PR. Firstly, I noticed that pandas.io.sql.execute is documented, right above this line: https://pandas.pydata.org/docs/user_guide/io.html?highlight=sql%20execute#engine-connection-examples . As it stands, my PR would make this return a context manager instead of a Results Iterable, and I don't think I need to make this change, so I will change it back. I plan to make
In As I was looking over the tests, I noticed this interesting behavior related to transactions. I like having the ability to rollback a
|
@datapythonista @mroeschke Based on your feedback, I took out all the test duplication, and instead ran the tests with sqlalchemy 2.0.0b2 installed to ensure that this can close #40686. This PR will allow pandas to work with sqlalchemy 1.4.16 (the documented minimum version) and higher, even after sqlalchemy 2.0 is released. I found a note here (written 10/13/2022) on the timing of sqlalchemy 2.0: https://www.sqlalchemy.org/blog/2022/10/13/sqlalchemy-2.0.0b1-released/
|
Thanks for your work on this @cdcadman. IMO I would still be partial on just supporting sqlalchemy 2.0 syntax/tests/min version when it becomes available. |
@mroeschke As a pandas/sqlalchemy user, it would be really helpful to get these changes into pandas sooner, so that I can get my code ready for sqlalchemy 2.0 sooner. This PR is making the kind of changes envisioned in the first paragraph of the sqlalchemy 2.0 migration document: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#overview .
It's understandable if you are concerned that these changes will break existing code running on sqlalchemy 1.4. I made three types of changes to pandas/tests/io/test_sql.py:
Let me know if there's anything I can do to help get this merged sooner. |
Thanks @cdcadman. I misunderstood that sqlalchemy 1.4 already has 2.0 functionality, so we can adopt 2.0 syntax/functionality while pandas min version is 1.4.
Could you split these into 3 separate PRs? It's difficult for me to determine which changes correspond with a certain objective. Generally, 1 PR targeting 1 type of change is easier for review. |
@mroeschke , I split this into two commits. The tests pass after the first commit with sqlalchemy 1.4.44, but I had to modify |
pandas/io/sql.py
Outdated
@@ -1454,8 +1467,16 @@ def run_transaction(self): | |||
yield self.con | |||
|
|||
def execute(self, *args, **kwargs): | |||
"""Simple passthrough to SQLAlchemy connectable""" | |||
return self.con.execute(*args, **kwargs) | |||
"""Almost a simple passthrough to SQLAlchemy Connection""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious what 2.0 changes made this more complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the execute
method in sqlalchemy 2.0 are described here: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#execute-method-more-strict-execution-options-are-more-prominent . My goal was to allow end users to pass either a string or a sqlachemy expression (like table.insert()
). Currently, either one works, but passing a string emits a RemovedIn20Warning
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Looks to be that _convert_params
is packing the sql and params into args
and then you're unpacking it here.
Might be better if execute is always defined as def execute(self, sql: str | expression, *args, **kwargs)
for better clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I can also remove **kwargs
based on the existing tests. Maybe I can replace *args
as well. I will work on this and add some tests.
Did you have a specific type in mind for expression
? I don't know what to use without importing something from sqlalchemy.sql.expression
(ClauseElement | Executable
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I can also remove **kwargs based on the existing tests. Maybe I can replace *args as well. I will work on this and add some tests.
This would be great, thank you.
Did you have a specific type in mind for expression?
The docs for read_sql
note that it should be a SQLAlchemy Selectable (select or text object)
so maybe just a expression.Select
or expression.TextClause
(don't know if we have tests for either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a separate commit which makes the signature of PandasSQL.execute
into def execute(self, sql: str | Select | TextClause, params=None)
.
Hi Dev, If you have problem when you try the read data .....your code |
@phofl I think I addressed all the issues that got linked. Let me know if I missed anything. I think test coverage of |
Could you remove the Then we don't need the |
@mroeschke I reverted all the changes from this commit: fa78ea8. I think everything is working, since I only got one unrelated test failure: pandas/tests/io/parser/common/test_file_buffer_url.py::test_file_descriptor_leak[c_high] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @phofl merge when ready
thx @cdcadman very nice! |
Thanks @phofl and @mroeschke. |
Hi there, thank you so much for fixing this compatibility issue. As already referenced elsewhere (see below), and as a maintainer of the With kind regards, References |
Ah, I see. That topic was discussed at #49857 (comment) ff., and the outcome is apparently that pandas 1.5.x will never support SQLAlchemy 2.x, right? |
Correct, there probably won’t be another 1.5.x release anyway |
Hi @cdcadman! In the sqlalchemy related implementation you changed the simple call-through to the "execute" method into a fork with Do you remember the reason for this change? It was quite unexpected because it has (among other things) impact on the parameter style I can use and the docs don't mention it. For example when using sqlalchemy with mssql pyodbc, I have to use |
There were a few constraints I was working with. One was that pandas/pandas/tests/io/test_sql.py Line 70 in f738d97
So it seemed like the best way to support existing code was to send strings to |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I have run the following with sqlalchemy 2.0.0 to confirm compatilibility:
pytest pandas/tests/io/test_sql.py
mypy pandas\io\sql.py pandas\tests\io\test_sql.py --follow-imports=silent