Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RM-39 update to sqlalchemy 1.4 #208

Merged
merged 41 commits into from
Feb 24, 2023

Conversation

ryantimjohn
Copy link
Contributor

No description provided.

@ryantimjohn ryantimjohn requested a review from Brunope February 21, 2023 22:34
@ryantimjohn ryantimjohn marked this pull request as ready for review February 21, 2023 22:35
@ryantimjohn
Copy link
Contributor Author

@Brunope I will clean up commit history a lot here, feel free to review now or once that's done!

else:
columns = self.target_db_engine.dialect.get_columns(self.target_db_engine,
table_name,
schema=schema_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so weird to me, does the get_columns method take different types of objects depending on what the driver is????

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the sqlalchemy documentation, get_columns says it always takes a connection as the first arg.
https://docs.sqlalchemy.org/en/14/core/internals.html#sqlalchemy.engine.Dialect.get_columns.

I'm sure you've already tried this, but we can't just pass the conn object to both?

Is this records-mover's fault for storing inconsistent types in its wrappers around sqlalchemy?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class DBDriver(metaclass=ABCMeta):
    def __init__(self,
                 db: Union[sqlalchemy.engine.Engine,
                           sqlalchemy.engine.Connection], **kwargs) -> None:

well this is weird

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright so I think the reason for this is that in some cases we want to instantiate a db_driver in the context of a transaction, as in this example from records/mover/records/prep_and_load.py

    with tbl.db_engine.begin() as db:
        # This second transaction ensures the table has been created
        # before non-transactional statements like Redshift's COPY
        # take place.  Otherwise you'll get an error like:
        #
        #  Cannot COPY into nonexistent table
        driver = tbl.db_driver(db)
        try:
            import_count = load(driver)
        except load_exception_type:
            if not tbl.drop_and_recreate_on_load_error:
                raise
            reset_before_reload()
            with tbl.db_engine.begin() as db:
                driver = tbl.db_driver(db)
                prep.prep(schema_sql=schema_sql,
                          driver=driver,
                          existing_table_handling=ExistingTableHandling.DROP_AND_RECREATE)
                import_count = load(driver)
        return MoveResult(move_count=import_count, output_urls=None)

begin() starts a transaction and returns a connection object - naming it a "db" is pretty confusing.

Anyway, other times we don't care about transactions and don't create the connection object beforehand. I'm kind of confused about this, but sqlalchemy had this feature "connection-less execution" which basically let you execute stuff on an engine directly without needing a connection first, but they've been phasing it out: https://docs.sqlalchemy.org/en/14/core/connections.html#connectionless-execution-implicit-execution. Maybe this is why the code used to work.

I think the proper solution is to split the union from DBDriver into an optional connection argument and an Engine. If no connection is given it can create one from the Engine. Then, when a caller wants to do something like get_columns, they always use the connection attribute of DBDriver, not the Engine. This solves a bunch of confusion caused by examples such as above, where target_db_engine isn't actually an engine, it's a connection.

Should we track this work separately or do it as part of this ticket?

If we choose to put this work off until later, in the short term I think we should change the above code that I commented on (as well as the other similar occurrences) to read something like

if isinstance(self.target.db_engine, sqlalchemy.engine.Connection):
    connection = self.target_db_engine
else:
    connection = self.target._db_engine.connect()
get_columns(connection)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this approach.

@ryantimjohn
Copy link
Contributor Author

ryantimjohn commented Feb 22, 2023

@Brunope, your investigations on all of this match up with my own. Only thing that I'll add is that, in the most recent versions of SqlAlchemy, the .get_columns() method for the MySQL dialect base moved to calls that only work against the Connection class whereas other dialects get columns using calls that work against the Engine class.

mysql:
https://gitlab.istina.msu.ru/istina/sqlalchemy/-/blob/19b248c14c37c88280018423b3ac929ea101cc46/lib/sqlalchemy/dialects/mysql/base.py#L2569

postgresql (as an example):
https://gitlab.istina.msu.ru/istina/sqlalchemy/-/blob/19b248c14c37c88280018423b3ac929ea101cc46/lib/sqlalchemy/dialects/postgresql/base.py#L2830

I think you're approach is correct in that it future proofs us for when other dialects use calls that only work against Connection. I was limiting this to just MySQL to limit side-effects on other SQL dialects but you're right that there shouldn't be any.

@ryantimjohn ryantimjohn added this pull request to the merge queue Feb 24, 2023
Merged via the queue into master with commit dd70f22 Feb 24, 2023
@ryantimjohn ryantimjohn deleted the RM-39-Bump-SQLAlchemy-to-version-1.4 branch February 24, 2023 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants