Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace classes with pytest in test_sql #55074

Merged
merged 42 commits into from
Sep 18, 2023

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Sep 8, 2023

A lot of this is very intertwined so hard to break up into small diffs.

dtype_backend_data,
dtype_backend_expected,
):
if "sqlite" in conn:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is silently passed on main. What's interesting is that read_sql_table arguably works better than read_sql because it correctly maps the boolean columns to a boolean value. We can update the fixture to account for that, but didn't want to tackle here to try and reduce the already enormous diff

res = sql.read_sql_table("test_schema_other", self.conn, schema="other")
tm.assert_frame_equal(concat([df, df], ignore_index=True), res)

# specifying schema in user-provided meta
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This until the end of the function doesn't even run on main so I just deleted to simplify

VALUES (1, '2021-01-01T00:00:00Z');
"""
)
if isinstance(self.conn, Engine):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also doesn't run on main. self.conn is never an engine for this class but always a connection


def test_read_sql_parameter(self, sql_strings):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_sql_parameter and read_sql_named_parameter already exist in PandasTest, so still have to parametrize / convert that test but will render these unnecessary


def test_to_sql_empty(self, test_frame1):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced by test_dataframe_to_sql_empty earlier in new module


def dtype_backend_expected(self, storage, dtype_backend) -> DataFrame:
@pytest.fixture
def dtype_backend_expected():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pattern I would ultimately like to follow with the types / iris tables, i.e. just create a separate fixture that takes the connection as an argument and can build / drop the tables from there @mroeschke

@pytest.mark.db
@pytest.mark.parametrize("conn", all_connectable_iris)
def test_read_sql_iris_no_parameter_with_percent(conn, request, sql_strings, flavor):
if "mysql" in conn or "postgresql" in conn:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't appear to even be running on main

@WillAyd
Copy link
Member Author

WillAyd commented Sep 9, 2023

The diff here is unfortunate, but I am not sure there is a lot that can be broken off piece-wise.

With the new design, we have a lot more coverage of the different connection types, particularly when using engines directly (and have also uncovered some broken behavior). Total number of tests increase from 824 to 1449

There is still more that can be done to clean up the fixtures (in particular clean up standard vs iris vs type table requiring fixtures) but as is this should help us much more effectively onboard the ADBC drivers

@WillAyd WillAyd marked this pull request as ready for review September 9, 2023 20:25
@WillAyd WillAyd requested a review from mroeschke as a code owner September 11, 2023 00:18
@WillAyd
Copy link
Member Author

WillAyd commented Sep 12, 2023

Hmm I'm not sure. I'll have to do another pass, but the hard part is that you end up with a lot of duplication in the intermediate state. Taking the first method of TestSQLiteFallback as an example:

class TestSQLiteFallback(SQLiteMixIn, PandasSQLTest):
    """
    Test the fallback mode against an in-memory sqlite database.

    """

    flavor = "sqlite"

    @pytest.fixture(autouse=True)
    def setup_method(self, iris_path, types_data):
        self.conn = self.connect()
        self.load_iris_data(iris_path)
        self.load_types_data(types_data)
        self.pandasSQL = sql.SQLiteDatabase(self.conn)

    def test_read_sql_parameter(self, sql_strings):
        self._read_sql_iris_parameter(sql_strings)

We would have to duplicate load_iris_data, load_types_data and read_sql_iris_parameter just to get that one test to run. We would also have to reimplemnt connect, which in this case fortunately isn't that different from the sqlite_buildin fixture, but then puts us in a state where we have a fixture that does part of the setup/teardown and custom functions to do the rest.

In the current PR there is no intermediate state that mixes fixtures, functions and class-based setup - everything is managed via the fixture. Its a big diff but a cleaner end state

The issue with hanging now is from updating the fixture to test engines directly, not just connections. I did this in hopes of increasing test coverage but may just need to scale it back some more

@WillAyd
Copy link
Member Author

WillAyd commented Sep 15, 2023

OK finally got rid of all the hanging behavior when using engine arguments directly. This should be good to go

@@ -77,6 +77,9 @@
SQLALCHEMY_INSTALLED = False


pytestmark = [pytest.mark.db, pytest.mark.single_cpu]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shouldn't necessary (except pytest.mark.db used for mysql/postgres)

  1. The sqllite3 engine should be able to run without the pytest.mark.db marker since it uses :memory: I believe i.e. these tests shouldn't be skipped in the CI if no db was specified
  2. IIRC the xdist parallelization is done per file, so tests here should be run sequentially

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you thinking we should only mark the SQLAlchemy tests as db? Or none at all??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the sqlalchemy tests and maybe wrap the fixture params that use sqlalchemy connections/engines in pytest.param(..., marks=pytest.mark.db)

@pytest.mark.parametrize("conn", all_connectable)
def test_api_to_sql_index_label_multiindex(conn, request):
conn_name = conn
if "mysql" in conn_name:
request.node.add_marker(
pytest.mark.xfail(reason="MySQL can fail using TEXT without length as key")
pytest.mark.xfail(
reason="MySQL can fail using TEXT without length as key", strict=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this still flaky (is strict=False) still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think it depends on the version of MySQL and may even be a MySQL versus MariaDB difference. I couldn't pinpoint exactly where this would have been allowed

https://stackoverflow.com/questions/1827063/mysql-error-key-specification-without-a-key-length

@pytest.mark.parametrize("conn", all_connectable)
def test_transactions(conn, request):
if "engine" in conn:
pytest.skip(reason="hangs forever in CI with engine")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case after the refactor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same cases below.

Also, could you use is_ci_environment() in this condition too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry these should be removed. The comment about CI might be misleading - ultimately I think comes back to different versions having different transactional behavior

df = DataFrame({"t": [datetime(2020, 12, 31, 12)]}, dtype="datetime64[ns]")
df.to_sql("test", conn, if_exists="replace", index=False)
result = pd.read_sql("select * from test", conn).iloc[0, 0]
assert result == "2020-12-31 12:00:00.000000"


@pytest.mark.db
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the test following are marked directly against the test; this is a bit different from how the other tests work which use pytest.param

num_entries = len(test_frame1)
num_rows = count_rows(self.conn, "test_frame1")
assert num_rows == num_entries
@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could be a regular function, but can be a follow up

@mroeschke mroeschke added Testing pandas testing functions or related to the test suite IO SQL to_sql, read_sql, read_sql_query labels Sep 18, 2023
@mroeschke mroeschke added this to the 2.2 milestone Sep 18, 2023
@mroeschke mroeschke merged commit df7c0b7 into pandas-dev:main Sep 18, 2023
38 of 39 checks passed
@mroeschke
Copy link
Member

Very cool! It's a been a clean up that's long been needed

hedeershowk pushed a commit to hedeershowk/pandas that referenced this pull request Sep 20, 2023
* initial working test

* passing mixing class removal

* converted non-sqlalchemy tests

* large refactor

* sqlite class conversion

* checkpoint

* sqlitefallback conversion

* fixup tests

* no more test classes

* factory func

* most code cleanups

* removed breakpoint; passing tests

* fixes

* fix when missing SQLAlchemy

* more fixups when no SQLAlchemy

* fixups

* xfail -> skip

* sqlite fixture use transaction for cleanup

* verbose test for hangs

* try skipping sqlite-sqlalchemy-memory on rollback test

* sqlite sqlaclchemy memory cleanup

* revert verbose logging in tests

* mark all db tests

* try single_cpu

* skip more engine tests that can hang

* try no pandasSQL without transaction

* more skip

* try verbose

* transaction skips

* remove verbose CI

* CI verbose

* no more hanging

* reverted CI files

* type ignore

* cleanup skips

* remove marks

* mark fixtures

* mark postgres fixtures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants