Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: Move to sqlachemy 1.4 (and v2 API) #5069

Closed
wants to merge 1 commit into from

Conversation

chrisjsewell
Copy link
Member

This PR is intended to update our use of sqlalchemy to the 2.0 API, which is supported by v1.4: https://docs.sqlalchemy.org/en/14/glossary.html#term-1

(note this currently builds on #5063)

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 12, 2021

In 95985f0, I have added an initial trial migration for the querybuilder.
I have run into an issue though, encapsulated by the added tests/orm/test_querybuilder.py::TestQueryBuilder::test_joined_node_count test:

    def do_execute(self, cursor, statement, parameters, context=None):
>       cursor.execute(statement, parameters)
E       sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) invalid reference to FROM-clause entry for table "db_dbnode_2"
E       LINE 3: ...b_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_...
E                                                                    ^
E       HINT:  There is an entry for table "db_dbnode_2", but it cannot be referenced from this part of the query.
E       
E       [SQL: SELECT count(*) AS count_1 
E       FROM (SELECT db_dbnode_1.id AS db_dbnode_1_id, db_dbnode_1.uuid AS db_dbnode_1_uuid, db_dbnode_1.node_type AS db_dbnode_1_node_type, db_dbnode_1.process_type AS db_dbnode_1_process_type, db_dbnode_1.label AS db_dbnode_1_label, db_dbnode_1.description AS db_dbnode_1_description, db_dbnode_1.ctime AS db_dbnode_1_ctime, db_dbnode_1.mtime AS db_dbnode_1_mtime, db_dbnode_1.attributes AS db_dbnode_1_attributes, db_dbnode_1.extras AS db_dbnode_1_extras, db_dbnode_1.repository_metadata AS db_dbnode_1_repository_metadata, db_dbnode_1.user_id AS db_dbnode_1_user_id, db_dbnode_1.dbcomputer_id AS db_dbnode_1_dbcomputer_id 
E       FROM db_dbnode AS db_dbnode_2, db_dbnode AS db_dbnode_1 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_2.id JOIN db_dbnode AS db_dbnode_1 ON db_dblink_1.output_id = db_dbnode_1.id 
E       WHERE CAST(db_dbnode_2.node_type AS VARCHAR) LIKE %(param_1)s AND CAST(db_dbnode_1.node_type AS VARCHAR) LIKE %(param_2)s) AS anon_1]
E       [parameters: {'param_1': 'data.%', 'param_2': 'data.%'}]
E       (Background on this error at: https://sqlalche.me/e/14/f405)

.tox/py38-django/lib/python3.8/site-packages/sqlalchemy/engine/default.py:717: ProgrammingError

when comparing the query statement to that output from develop you get:

diff --git a/x_old.txt b/x_new.txt
index 920060325..ba8541f94 100644
--- a/x_old.txt
+++ b/x_new.txt
@@ -1,3 +1,3 @@
 SELECT db_dbnode_1.id AS db_dbnode_1_id, db_dbnode_1.uuid AS db_dbnode_1_uuid, db_dbnode_1.node_type AS db_dbnode_1_node_type, db_dbnode_1.process_type AS db_dbnode_1_process_type, db_dbnode_1.label AS db_dbnode_1_label, db_dbnode_1.description AS db_dbnode_1_description, db_dbnode_1.ctime AS db_dbnode_1_ctime, db_dbnode_1.mtime AS db_dbnode_1_mtime, db_dbnode_1.attributes AS db_dbnode_1_attributes, db_dbnode_1.extras AS db_dbnode_1_extras, db_dbnode_1.repository_metadata AS db_dbnode_1_repository_metadata, db_dbnode_1.user_id AS db_dbnode_1_user_id, db_dbnode_1.dbcomputer_id AS db_dbnode_1_dbcomputer_id 
-FROM db_dbnode AS db_dbnode_2 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_2.id JOIN db_dbnode AS db_dbnode_1 ON db_dblink_1.output_id = db_dbnode_1.id 
+FROM db_dbnode AS db_dbnode_2, db_dbnode AS db_dbnode_1 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_2.id JOIN db_dbnode AS db_dbnode_1 ON db_dblink_1.output_id = db_dbnode_1.id 
 WHERE CAST(db_dbnode_2.node_type AS VARCHAR) LIKE %(param_1)s AND CAST(db_dbnode_1.node_type AS VARCHAR) LIKE %(param_2)s

You can see that it has added an additional alias db_dbnode_1, which is overriding the db_dbnode_2 alias: db_dbnode AS db_dbnode_2, db_dbnode AS db_dbnode_1

So my first guess, that even though I do self._raw_columns.pop(0) (which is not ideal in the first place), the second column still retains this "record" of what it was joining to

Note there is also a lot of these warnings:

tests/orm/test_querybuilder.py::TestQueryBuilder::test_joined_node_count
  /Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/orm/querybuilder.py:752: SAWarning: relationship 'Group_permissions.group' will copy column auth_group.id to column auth_group_permissions.group_id, which conflicts with relationship(s): 'Group.permissions' (copies auth_group.id to auth_group_permissions.group_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   To silence this warning, add the parameter 'overlaps="permissions"' to the 'Group_permissions.group' relationship.
    self.tag_to_alias_map[tag] = aliased(ormclass)

@chrisjsewell
Copy link
Member Author

cc @sphuber, in case you tried any of this in #4855

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 12, 2021

this is the query before/after self._raw_columns.pop(0)

SELECT db_dbnode_1.id, db_dbnode_1.uuid, db_dbnode_1.node_type, db_dbnode_1.process_type, db_dbnode_1.label, db_dbnode_1.description, db_dbnode_1.ctime, db_dbnode_1.mtime, db_dbnode_1.attributes, db_dbnode_1.extras, db_dbnode_1.repository_metadata, db_dbnode_1.user_id, db_dbnode_1.dbcomputer_id, db_dbnode_2.id AS id_1, db_dbnode_2.uuid AS uuid_1, db_dbnode_2.node_type AS node_type_1, db_dbnode_2.process_type AS process_type_1, db_dbnode_2.label AS label_1, db_dbnode_2.description AS description_1, db_dbnode_2.ctime AS ctime_1, db_dbnode_2.mtime AS mtime_1, db_dbnode_2.attributes AS attributes_1, db_dbnode_2.extras AS extras_1, db_dbnode_2.repository_metadata AS repository_metadata_1, db_dbnode_2.user_id AS user_id_1, db_dbnode_2.dbcomputer_id AS dbcomputer_id_1 
FROM db_dbnode AS db_dbnode_1 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_1.id JOIN db_dbnode AS db_dbnode_2 ON db_dblink_1.output_id = db_dbnode_2.id 
WHERE CAST(db_dbnode_1.node_type AS VARCHAR) LIKE 'data.%%' AND CAST(db_dbnode_2.node_type AS VARCHAR) LIKE 'data.%%'
SELECT db_dbnode_1.id, db_dbnode_1.uuid, db_dbnode_1.node_type, db_dbnode_1.process_type, db_dbnode_1.label, db_dbnode_1.description, db_dbnode_1.ctime, db_dbnode_1.mtime, db_dbnode_1.attributes, db_dbnode_1.extras, db_dbnode_1.repository_metadata, db_dbnode_1.user_id, db_dbnode_1.dbcomputer_id 
FROM db_dbnode AS db_dbnode_2, db_dbnode AS db_dbnode_1 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_2.id JOIN db_dbnode AS db_dbnode_1 ON db_dblink_1.output_id = db_dbnode_1.id 
WHERE CAST(db_dbnode_2.node_type AS VARCHAR) LIKE 'data.%%' AND CAST(db_dbnode_1.node_type AS VARCHAR) LIKE 'data.%%'

@sphuber
Copy link
Contributor

sphuber commented Aug 12, 2021

cc @sphuber, in case you tried any of this in #4855

I remember looking at this, but mostly that the whole logic around the private _raw_columns attribute had changed. I thought it completely changed structure and remember having to get rid of it completely. Fact that you are still using it seems to suggest I am misremembering though :D

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 12, 2021

Fact that you are still using it seems to suggest I am misremembering though

Yeh the _raw_columns is new in 1.4, with _entities being roughly its equivalent in 1.3

but, as per the issue above, I don't know if I can actually get it to work, and its obviously not great to be accessing private attributes anyway.
Going to look through the _build method in more detail, and think if there's a better way, rather than having to start hacking into the sqlalchemy internals

@chrisjsewell chrisjsewell linked an issue Aug 12, 2021 that may be closed by this pull request
@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 12, 2021

Note some other issues to look into closing while I'm focussing on the querybuilder:

@chrisjsewell
Copy link
Member Author

Note:

    # Note: according to https://github.com/sqlalchemy/sqlalchemy/commit/0d1efeec475621b5c2c2aca0632b02edef54c1a6
    # automatic uniquing of rows is turned off for the new 2.0 style of ORM querying,
    # so this sqlalchemy < 1.4 hack is no longer necessary
    # self._query._has_mapper_entities = False

@giovannipizzi
Copy link
Member

For reference, re:

Note: according to sqlalchemy/sqlalchemy@0d1efee
automatic uniquing of rows is turned off for the new 2.0 style of ORM querying,
so this sqlalchemy < 1.4 hack is no longer necessary
self._query._has_mapper_entities = False

This was actually in response to an issue I had opened a lot of time ago: sqlalchemy/sqlalchemy#4395
(our workaround self._query._has_mapper_entities = False was discussed in #1600)

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Sep 8, 2021

superseded by #5103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support SQLAlchemy v1.4
3 participants