sqlalchemy: adding 1 filter makes query very slow #1389

ltalirz · 2018-04-09T08:54:29Z

Consider the following AiiDA query

from aiida.orm.data.cif import CifData
from aiida.orm.data.parameter import ParameterData

qb = QueryBuilder()
qb.append(CifData, project=['uuid'],tag='cifs')

filters = {'attributes.deliverable_capacity': { '>=': 175 }}
project = ['attributes.density', 'attributes.deliverable_capacity','attributes.surface_area','uuid',  'attributes.name']
qb.append(ParameterData, filters=filters, project=project, descendant_of='cifs')

print(qb)
result = qb.all()

Using the Cof-database available here.
In short, we filter ParameterData by the deliverable_capacity attribute and require it is larger than 175 (+ get the uuid of the CifData node that the ParameterData descends from).
This query runs in ~2s on 70k ParameterData & CifData nodes on my machine.

If instead, I want to query for deliverable_capacity in the range between 175 and 300, i.e. when I replace the filter by

filters = {
   'attributes.deliverable_capacity':
        {'and': [{ '>=': 175}, {'<': 300}] }
}

the query suddenly takes 5 minutes.

I paste below the generated SQL for both queries

>175

WITH RECURSIVE anon_6(ancestor_id, descendant_id, depth) AS
(SELECT db_dblink_1.input_id AS ancestor_id, db_dblink_1.output_id AS descendant_id, CAST(0 AS INTEGER) AS depth
FROM db_dbnode AS db_dbnode_3 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_3.id
WHERE CAST(db_dbnode_3.type AS VARCHAR) LIKE 'data.cif.%' AND db_dblink_1.type IN ('createlink', 'inputlink') UNION ALL SELECT anon_6.ancestor_id AS ancestor_id, db_dblink_2.output_id AS descendant_id, anon_6.depth + CAST(1 AS INTEGER) AS current_depth
FROM anon_6 JOIN db_dblink AS db_dblink_2 ON db_dblink_2.input_id = anon_6.descendant_id
WHERE db_dblink_2.type IN ('createlink', 'inputlink'))
 SELECT db_dbnode_1.uuid, db_dbnode_2.attributes #> '{density}' AS anon_1, db_dbnode_2.attributes #> '{deliverable_capacity}' AS anon_2, db_dbnode_2.attributes #> '{surface_area}' AS anon_3, db_dbnode_2.uuid, db_dbnode_2.attributes #> '{name}' AS anon_4
FROM db_dbnode AS db_dbnode_1 JOIN anon_6 AS anon_5 ON anon_5.ancestor_id = db_dbnode_1.id JOIN db_dbnode AS db_dbnode_2 ON anon_5.descendant_id = db_dbnode_2.id
WHERE CAST(db_dbnode_1.type AS VARCHAR) LIKE 'data.cif.%' AND CAST(db_dbnode_2.type AS VARCHAR) LIKE 'data.parameter.%' AND jsonb_typeof(db_dbnode_2.attributes #> %(attributes_1)s) = 'number' AND CAST(db_dbnode_2.attributes #>> '{deliverable_capacity}' AS FLOAT) >= 175

>175, <300

WITH RECURSIVE anon_6(ancestor_id, descendant_id, depth) AS
(SELECT db_dblink_1.input_id AS ancestor_id, db_dblink_1.output_id AS descendant_id, CAST(0 AS INTEGER) AS depth
FROM db_dbnode AS db_dbnode_3 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_3.id
WHERE CAST(db_dbnode_3.type AS VARCHAR) LIKE 'data.cif.%' AND db_dblink_1.type IN ('createlink', 'inputlink') UNION ALL SELECT anon_6.ancestor_id AS ancestor_id, db_dblink_2.output_id AS descendant_id, anon_6.depth + CAST(1 AS INTEGER) AS current_depth
FROM anon_6 JOIN db_dblink AS db_dblink_2 ON db_dblink_2.input_id = anon_6.descendant_id
WHERE db_dblink_2.type IN ('createlink', 'inputlink'))
 SELECT db_dbnode_1.uuid, db_dbnode_2.attributes #> '{density}' AS anon_1, db_dbnode_2.attributes #> '{deliverable_capacity}' AS anon_2, db_dbnode_2.attributes #> '{surface_area}' AS anon_3, db_dbnode_2.uuid, db_dbnode_2.attributes #> '{name}' AS anon_4
FROM db_dbnode AS db_dbnode_1 JOIN anon_6 AS anon_5 ON anon_5.ancestor_id = db_dbnode_1.id JOIN db_dbnode AS db_dbnode_2 ON anon_5.descendant_id = db_dbnode_2.id
WHERE CAST(db_dbnode_1.type AS VARCHAR) LIKE 'data.cif.%' AND CAST(db_dbnode_2.type AS VARCHAR) LIKE 'data.parameter.%' AND jsonb_typeof(db_dbnode_2.attributes #> %(attributes_1)s) = 'number' AND CAST(db_dbnode_2.attributes #>> '{deliverable_capacity}' AS FLOAT) >= 175 AND jsonb_typeof(db_dbnode_2.attributes #> %(attributes_2)s) = 'number' AND CAST(db_dbnode_2.attributes #>> '{deliverable_capacity}' AS FLOAT) < 300

The only difference is in the last line.

P.S. I'm running aiida 0.11.4 and postgres 9.5.9

The text was updated successfully, but these errors were encountered:

giovannipizzi · 2018-05-23T14:48:12Z

What is the speed if you just do this filter?

filters = {
   'attributes.deliverable_capacity':
        { '<': 300 }
}

Also, as a second, independent test (do if afterwards, though): could you run, in your DB (from psql), a VACUUM ANALYZE on the relevant tables, and see if afterwards the speed changes?

ltalirz assigned szoupanos Apr 9, 2018

ltalirz added the topic/database-sqlalchemy label Apr 9, 2018

giovannipizzi added this to the v1.1.0 milestone Dec 3, 2018

sphuber removed this from the v1.1.0 milestone Feb 28, 2020

chrisjsewell added the type/performance Issue related to how quickly AiiDA works label Apr 7, 2020

chrisjsewell mentioned this issue Aug 12, 2021

♻️ REFACTOR: Move to sqlachemy 1.4 (and v2 API) #5069

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlalchemy: adding 1 filter makes query very slow #1389

sqlalchemy: adding 1 filter makes query very slow #1389

ltalirz commented Apr 9, 2018 •

edited

Loading

giovannipizzi commented May 23, 2018

sqlalchemy: adding 1 filter makes query very slow #1389

sqlalchemy: adding 1 filter makes query very slow #1389

Comments

ltalirz commented Apr 9, 2018 • edited Loading

giovannipizzi commented May 23, 2018

ltalirz commented Apr 9, 2018 •

edited

Loading