Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pandasSQL transactions in sql test suite to avoid engine deadlocks #55129

Merged
merged 1 commit into from
Sep 13, 2023

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Sep 13, 2023

When trying to integrate the engine fixtures into these tests I have been running into a lot of deadlocking issues. From some research it looks like SQLAlchemy strongly discourages running doing Connectionless Execution

Note the comment:

Modern SQLAlchemy usage, especially the ORM, places a heavy stress on working within the context of a transaction at all times; the “implicit execution” concept makes the job of associating statement execution with a particular transaction much more difficult.

We probably need an overhaul of the sql module to more natively account for this, but I think it makes sense to unlock our test suite first so that getting better coverage via fixtures in #55074 would help us more confidently refactor

Here's an MRE to reproduce this deadlock locally:

import sqlalchemy as sa

from pandas.io.sql import pandasSQL_builder

engine = sa.create_engine("mysql+pymysql://root@localhost:3306/pandas")

pandasSQL = pandasSQL_builder(engine)
with pandasSQL.run_transaction() as trans:
    stmt = sa.text("CREATE TABLE test_trans (A INT, B TEXT)")
    trans.execute(stmt)

with pandasSQL.run_transaction() as trans:
    stmt = sa.text("INSERT INTO test_trans (A,B) VALUES (1, 'blah')")
    trans.execute(stmt)

result = pandasSQL.read_query("SELECT * FROM test_trans")

# Assume fixture tries to clean up all tables after function exit
with engine.begin() as conn:
    stmt = sa.text("DROP TABLE test_trans")
    conn.execute(stmt)

If you wrap the read_query call in a transaction there is no deadlock

(tip: if you ran the above locally you will probably want to kill the thread so it doesn't lock up your database. run show engine innodb status; inside the db, look for the hanging thread id and run KILL <thread_id>)

@WillAyd WillAyd changed the title pandasSQL use transactions in test suite to avoid engine deadlocks Use pandasSQL transactions in sql test suite to avoid engine deadlocks Sep 13, 2023
@mroeschke mroeschke added Testing pandas testing functions or related to the test suite IO SQL to_sql, read_sql, read_sql_query labels Sep 13, 2023
@mroeschke mroeschke added this to the 2.2 milestone Sep 13, 2023
@mroeschke mroeschke merged commit 51c2300 into pandas-dev:main Sep 13, 2023
39 checks passed
@mroeschke
Copy link
Member

Thanks @WillAyd

@WillAyd WillAyd deleted the sql-use-transactions branch September 13, 2023 23:45
hedeershowk pushed a commit to hedeershowk/pandas that referenced this pull request Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants