RM-39 update to sqlalchemy 1.4 #208

ryantimjohn · 2023-02-06T22:05:53Z

No description provided.

This reverts commit 979378b.

This reverts commit 10aed1b.

tests/integration/records/single_db/test_records_numeric.py

tests/unit/records/sources/test_table.py

ryantimjohn · 2023-02-21T22:36:26Z

@Brunope I will clean up commit history a lot here, feel free to review now or once that's done!

Brunope · 2023-02-22T00:23:10Z

tests/integration/records/table_validator.py

+        else:
+            columns = self.target_db_engine.dialect.get_columns(self.target_db_engine,
+                                                                table_name,
+                                                                schema=schema_name)


This is so weird to me, does the get_columns method take different types of objects depending on what the driver is????

From the sqlalchemy documentation, get_columns says it always takes a connection as the first arg.
https://docs.sqlalchemy.org/en/14/core/internals.html#sqlalchemy.engine.Dialect.get_columns.

I'm sure you've already tried this, but we can't just pass the conn object to both?

Is this records-mover's fault for storing inconsistent types in its wrappers around sqlalchemy?

class DBDriver(metaclass=ABCMeta): def __init__(self, db: Union[sqlalchemy.engine.Engine, sqlalchemy.engine.Connection], **kwargs) -> None:

well this is weird

Alright so I think the reason for this is that in some cases we want to instantiate a db_driver in the context of a transaction, as in this example from records/mover/records/prep_and_load.py

with tbl.db_engine.begin() as db: # This second transaction ensures the table has been created # before non-transactional statements like Redshift's COPY # take place. Otherwise you'll get an error like: # # Cannot COPY into nonexistent table driver = tbl.db_driver(db) try: import_count = load(driver) except load_exception_type: if not tbl.drop_and_recreate_on_load_error: raise reset_before_reload() with tbl.db_engine.begin() as db: driver = tbl.db_driver(db) prep.prep(schema_sql=schema_sql, driver=driver, existing_table_handling=ExistingTableHandling.DROP_AND_RECREATE) import_count = load(driver) return MoveResult(move_count=import_count, output_urls=None)

begin() starts a transaction and returns a connection object - naming it a "db" is pretty confusing.

Anyway, other times we don't care about transactions and don't create the connection object beforehand. I'm kind of confused about this, but sqlalchemy had this feature "connection-less execution" which basically let you execute stuff on an engine directly without needing a connection first, but they've been phasing it out: https://docs.sqlalchemy.org/en/14/core/connections.html#connectionless-execution-implicit-execution. Maybe this is why the code used to work.

I think the proper solution is to split the union from DBDriver into an optional connection argument and an Engine. If no connection is given it can create one from the Engine. Then, when a caller wants to do something like get_columns, they always use the connection attribute of DBDriver, not the Engine. This solves a bunch of confusion caused by examples such as above, where target_db_engine isn't actually an engine, it's a connection.

Should we track this work separately or do it as part of this ticket?

If we choose to put this work off until later, in the short term I think we should change the above code that I commented on (as well as the other similar occurrences) to read something like

if isinstance(self.target.db_engine, sqlalchemy.engine.Connection): connection = self.target_db_engine else: connection = self.target._db_engine.connect() get_columns(connection)

Implemented this approach.

ryantimjohn · 2023-02-22T18:21:36Z

@Brunope, your investigations on all of this match up with my own. Only thing that I'll add is that, in the most recent versions of SqlAlchemy, the .get_columns() method for the MySQL dialect base moved to calls that only work against the Connection class whereas other dialects get columns using calls that work against the Engine class.

mysql:
https://gitlab.istina.msu.ru/istina/sqlalchemy/-/blob/19b248c14c37c88280018423b3ac929ea101cc46/lib/sqlalchemy/dialects/mysql/base.py#L2569

postgresql (as an example):
https://gitlab.istina.msu.ru/istina/sqlalchemy/-/blob/19b248c14c37c88280018423b3ac929ea101cc46/lib/sqlalchemy/dialects/postgresql/base.py#L2830

I think you're approach is correct in that it future proofs us for when other dialects use calls that only work against Connection. I was limiting this to just MySQL to limit side-effects on other SQL dialects but you're right that there shouldn't be any.

….com/bluelabsio/records-mover into RM-39-Bump-SQLAlchemy-to-version-1.4

This reverts commit bfa4a0c.

tests/integration/records/single_db/test_records_numeric.py

tests/integration/records/table_validator.py

ryantimjohn and others added 15 commits February 6, 2023 17:02

RM-39 update to sqlalchemy 1.4

b7ed499

RM-39 update to sqlalchemy 1.4 syntax

f208d22

RM-39 revert sqlalchemy version change

3c51771

RM-39 bump to 1.4 less than 2.0

0d13460

RM-39 ratchet mypy

8383ba3

RM-39 update expected postgres data types

2ced342

Merge branch 'master' into RM-39-Bump-SQLAlchemy-to-version-1.4

955d99c

RM-39 try mysqlclient

10aed1b

RM-39 try mysqldb

979378b

Revert "RM-39 try mysqldb"

9b5681e

This reverts commit 979378b.

Revert "RM-39 try mysqlclient"

8f94c3a

This reverts commit 10aed1b.

RM-39 create connection with mysql instead of engine

07cda90

RM-39 update mysql df datatypes

14c8caf

RM-39 update mysql table2table dtypes

653864b

RM-39 records_numeric to conn instead of engine

8511c04

ryantimjohn commented Feb 21, 2023

View reviewed changes

tests/integration/records/single_db/test_records_numeric.py Outdated Show resolved Hide resolved

ryantimjohn commented Feb 21, 2023

View reviewed changes

tests/integration/records/single_db/test_records_numeric.py Outdated Show resolved Hide resolved

ryantimjohn commented Feb 21, 2023

View reviewed changes

tests/integration/records/single_db/test_records_numeric.py Outdated Show resolved Hide resolved

ryantimjohn commented Feb 21, 2023

View reviewed changes

tests/integration/records/single_db/test_records_numeric.py Outdated Show resolved Hide resolved

RM-39 update engine syntax

5144093