-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RM-39 update to sqlalchemy 1.4 #208
Conversation
@Brunope I will clean up commit history a lot here, feel free to review now or once that's done! |
else: | ||
columns = self.target_db_engine.dialect.get_columns(self.target_db_engine, | ||
table_name, | ||
schema=schema_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so weird to me, does the get_columns method take different types of objects depending on what the driver is????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the sqlalchemy documentation, get_columns says it always takes a connection as the first arg.
https://docs.sqlalchemy.org/en/14/core/internals.html#sqlalchemy.engine.Dialect.get_columns.
I'm sure you've already tried this, but we can't just pass the conn
object to both?
Is this records-mover's fault for storing inconsistent types in its wrappers around sqlalchemy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class DBDriver(metaclass=ABCMeta):
def __init__(self,
db: Union[sqlalchemy.engine.Engine,
sqlalchemy.engine.Connection], **kwargs) -> None:
well this is weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright so I think the reason for this is that in some cases we want to instantiate a db_driver in the context of a transaction, as in this example from records/mover/records/prep_and_load.py
with tbl.db_engine.begin() as db:
# This second transaction ensures the table has been created
# before non-transactional statements like Redshift's COPY
# take place. Otherwise you'll get an error like:
#
# Cannot COPY into nonexistent table
driver = tbl.db_driver(db)
try:
import_count = load(driver)
except load_exception_type:
if not tbl.drop_and_recreate_on_load_error:
raise
reset_before_reload()
with tbl.db_engine.begin() as db:
driver = tbl.db_driver(db)
prep.prep(schema_sql=schema_sql,
driver=driver,
existing_table_handling=ExistingTableHandling.DROP_AND_RECREATE)
import_count = load(driver)
return MoveResult(move_count=import_count, output_urls=None)
begin()
starts a transaction and returns a connection object - naming it a "db" is pretty confusing.
Anyway, other times we don't care about transactions and don't create the connection object beforehand. I'm kind of confused about this, but sqlalchemy had this feature "connection-less execution" which basically let you execute stuff on an engine
directly without needing a connection
first, but they've been phasing it out: https://docs.sqlalchemy.org/en/14/core/connections.html#connectionless-execution-implicit-execution. Maybe this is why the code used to work.
I think the proper solution is to split the union from DBDriver
into an optional connection
argument and an Engine
. If no connection
is given it can create one from the Engine
. Then, when a caller wants to do something like get_columns
, they always use the connection
attribute of DBDriver
, not the Engine
. This solves a bunch of confusion caused by examples such as above, where target_db_engine
isn't actually an engine, it's a connection.
Should we track this work separately or do it as part of this ticket?
If we choose to put this work off until later, in the short term I think we should change the above code that I commented on (as well as the other similar occurrences) to read something like
if isinstance(self.target.db_engine, sqlalchemy.engine.Connection):
connection = self.target_db_engine
else:
connection = self.target._db_engine.connect()
get_columns(connection)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this approach.
@Brunope, your investigations on all of this match up with my own. Only thing that I'll add is that, in the most recent versions of SqlAlchemy, the postgresql (as an example): I think you're approach is correct in that it future proofs us for when other dialects use calls that only work against Connection. I was limiting this to just MySQL to limit side-effects on other SQL dialects but you're right that there shouldn't be any. |
….com/bluelabsio/records-mover into RM-39-Bump-SQLAlchemy-to-version-1.4
This reverts commit bfa4a0c.
No description provided.