-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace classes with pytest in test_sql #55074
Conversation
dtype_backend_data, | ||
dtype_backend_expected, | ||
): | ||
if "sqlite" in conn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is silently passed on main. What's interesting is that read_sql_table
arguably works better than read_sql
because it correctly maps the boolean columns to a boolean value. We can update the fixture to account for that, but didn't want to tackle here to try and reduce the already enormous diff
res = sql.read_sql_table("test_schema_other", self.conn, schema="other") | ||
tm.assert_frame_equal(concat([df, df], ignore_index=True), res) | ||
|
||
# specifying schema in user-provided meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This until the end of the function doesn't even run on main so I just deleted to simplify
VALUES (1, '2021-01-01T00:00:00Z'); | ||
""" | ||
) | ||
if isinstance(self.conn, Engine): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also doesn't run on main. self.conn
is never an engine for this class but always a connection
|
||
def test_read_sql_parameter(self, sql_strings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_sql_parameter and read_sql_named_parameter already exist in PandasTest
, so still have to parametrize / convert that test but will render these unnecessary
|
||
def test_to_sql_empty(self, test_frame1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced by test_dataframe_to_sql_empty
earlier in new module
|
||
def dtype_backend_expected(self, storage, dtype_backend) -> DataFrame: | ||
@pytest.fixture | ||
def dtype_backend_expected(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pattern I would ultimately like to follow with the types / iris tables, i.e. just create a separate fixture that takes the connection as an argument and can build / drop the tables from there @mroeschke
@pytest.mark.db | ||
@pytest.mark.parametrize("conn", all_connectable_iris) | ||
def test_read_sql_iris_no_parameter_with_percent(conn, request, sql_strings, flavor): | ||
if "mysql" in conn or "postgresql" in conn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't appear to even be running on main
The diff here is unfortunate, but I am not sure there is a lot that can be broken off piece-wise. With the new design, we have a lot more coverage of the different connection types, particularly when using engines directly (and have also uncovered some broken behavior). Total number of tests increase from 824 to 1449 There is still more that can be done to clean up the fixtures (in particular clean up standard vs iris vs type table requiring fixtures) but as is this should help us much more effectively onboard the ADBC drivers |
Hmm I'm not sure. I'll have to do another pass, but the hard part is that you end up with a lot of duplication in the intermediate state. Taking the first method of class TestSQLiteFallback(SQLiteMixIn, PandasSQLTest):
"""
Test the fallback mode against an in-memory sqlite database.
"""
flavor = "sqlite"
@pytest.fixture(autouse=True)
def setup_method(self, iris_path, types_data):
self.conn = self.connect()
self.load_iris_data(iris_path)
self.load_types_data(types_data)
self.pandasSQL = sql.SQLiteDatabase(self.conn)
def test_read_sql_parameter(self, sql_strings):
self._read_sql_iris_parameter(sql_strings) We would have to duplicate load_iris_data, load_types_data and read_sql_iris_parameter just to get that one test to run. We would also have to reimplemnt connect, which in this case fortunately isn't that different from the In the current PR there is no intermediate state that mixes fixtures, functions and class-based setup - everything is managed via the fixture. Its a big diff but a cleaner end state The issue with hanging now is from updating the fixture to test engines directly, not just connections. I did this in hopes of increasing test coverage but may just need to scale it back some more |
OK finally got rid of all the hanging behavior when using engine arguments directly. This should be good to go |
pandas/tests/io/test_sql.py
Outdated
@@ -77,6 +77,9 @@ | |||
SQLALCHEMY_INSTALLED = False | |||
|
|||
|
|||
pytestmark = [pytest.mark.db, pytest.mark.single_cpu] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this shouldn't necessary (except pytest.mark.db
used for mysql/postgres)
- The sqllite3 engine should be able to run without the
pytest.mark.db
marker since it uses:memory:
I believe i.e. these tests shouldn't be skipped in the CI ifno db
was specified - IIRC the xdist parallelization is done per file, so tests here should be run sequentially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are you thinking we should only mark the SQLAlchemy tests as db? Or none at all??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the sqlalchemy tests and maybe wrap the fixture params that use sqlalchemy connections/engines in pytest.param(..., marks=pytest.mark.db)
@pytest.mark.parametrize("conn", all_connectable) | ||
def test_api_to_sql_index_label_multiindex(conn, request): | ||
conn_name = conn | ||
if "mysql" in conn_name: | ||
request.node.add_marker( | ||
pytest.mark.xfail(reason="MySQL can fail using TEXT without length as key") | ||
pytest.mark.xfail( | ||
reason="MySQL can fail using TEXT without length as key", strict=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this still flaky (is strict=False
) still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think it depends on the version of MySQL and may even be a MySQL versus MariaDB difference. I couldn't pinpoint exactly where this would have been allowed
https://stackoverflow.com/questions/1827063/mysql-error-key-specification-without-a-key-length
pandas/tests/io/test_sql.py
Outdated
@pytest.mark.parametrize("conn", all_connectable) | ||
def test_transactions(conn, request): | ||
if "engine" in conn: | ||
pytest.skip(reason="hangs forever in CI with engine") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the case after the refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same cases below.
Also, could you use is_ci_environment()
in this condition too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry these should be removed. The comment about CI might be misleading - ultimately I think comes back to different versions having different transactional behavior
df = DataFrame({"t": [datetime(2020, 12, 31, 12)]}, dtype="datetime64[ns]") | ||
df.to_sql("test", conn, if_exists="replace", index=False) | ||
result = pd.read_sql("select * from test", conn).iloc[0, 0] | ||
assert result == "2020-12-31 12:00:00.000000" | ||
|
||
|
||
@pytest.mark.db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the test following are marked directly against the test; this is a bit different from how the other tests work which use pytest.param
num_entries = len(test_frame1) | ||
num_rows = count_rows(self.conn, "test_frame1") | ||
assert num_rows == num_entries | ||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it could be a regular function, but can be a follow up
Very cool! It's a been a clean up that's long been needed |
* initial working test * passing mixing class removal * converted non-sqlalchemy tests * large refactor * sqlite class conversion * checkpoint * sqlitefallback conversion * fixup tests * no more test classes * factory func * most code cleanups * removed breakpoint; passing tests * fixes * fix when missing SQLAlchemy * more fixups when no SQLAlchemy * fixups * xfail -> skip * sqlite fixture use transaction for cleanup * verbose test for hangs * try skipping sqlite-sqlalchemy-memory on rollback test * sqlite sqlaclchemy memory cleanup * revert verbose logging in tests * mark all db tests * try single_cpu * skip more engine tests that can hang * try no pandasSQL without transaction * more skip * try verbose * transaction skips * remove verbose CI * CI verbose * no more hanging * reverted CI files * type ignore * cleanup skips * remove marks * mark fixtures * mark postgres fixtures
A lot of this is very intertwined so hard to break up into small diffs.