-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query much slower when using two filters instead of one #4856
Comments
Some more experiments on the individual query: In [1]: pk=522398
In [5]: def query(pk, fast=False):
...:
...: if fast:
...: filters={'attributes.process_label': 'Cp2kPhonopyWorkChain'}
...: else:
...: filters={'attributes.process_label': 'Cp2kPhonopyWorkChain', 'label': 'phonopy_test_1'}
...: qb=QueryBuilder()
...: qb.append(Node, filters={'id': pk}, tag='cif')
...: qb.append(WorkChainNode, with_incoming='cif', filters=filters)
...: print(qb.all())
...:
...:
In [8]: time query(pk,fast=True)
[[<WorkChainNode: uuid: 6708952e-3433-4da3-84f0-198e73d2507f (pk: 613039) (aiida_lsmo.workchains.cp2k_phonopy.Cp2kPhonopyWorkChain)>]]
CPU times: user 8.74 ms, sys: 0 ns, total: 8.74 ms
Wall time: 9.48 ms
In [9]: time query(pk,fast=False)
[[<WorkChainNode: uuid: 6708952e-3433-4da3-84f0-198e73d2507f (pk: 613039) (aiida_lsmo.workchains.cp2k_phonopy.Cp2kPhonopyWorkChain)>]]
CPU times: user 10.9 ms, sys: 0 ns, total: 10.9 ms
Wall time: 1.26 s If I'm interpreting the difference between CPU time (of the python process) vs wall time correctly (let me know if I'm wrong), then the first function call is waiting roughly ~1ms for postgresql to return the query, while the second call is waiting ~1.1s, i.e. the second postgresql query is actually a factor of 1000 slower. @giovannipizzi is this expected? |
I doubt this is due to a change in AiiDA code, most probably due to an increase in the DB size (if I had to bet). This is clearly an "issue" at the DB level.
SELECT db_dbnode_1.id, db_dbnode_1.uuid, db_dbnode_1.node_type, db_dbnode_1.process_type, db_dbnode_1.label, db_dbnode_1.description, db_dbnode_1.ctime, db_dbnode_1.mtime, db_dbnode_1.attributes, db_dbnode_1.extras, db_dbnode_1.user_id, db_dbnode_1.dbcomputer_id
FROM db_dbnode
AS db_dbnode_2 JOIN db_dblink AS db_dblink_1 ON db_dblink_1.input_id = db_dbnode_2.id
JOIN db_dbnode AS db_dbnode_1 ON db_dblink_1.output_id = db_dbnode_1.id
WHERE CAST(db_dbnode_2.node_type AS VARCHAR) LIKE '%%' AND db_dbnode_2.id = 1 AND
CAST(db_dbnode_1.node_type AS VARCHAR) LIKE 'process.workflow.workchain.%%' AND
CASE WHEN (jsonb_typeof(db_dbnode_1.attributes #> %(attributes_1)s) = 'string') THEN (db_dbnode_1.attributes #>> '{process_label}') = 'Cp2kPhonopyWorkChain' ELSE false END
AND db_dbnode_1.label = 'phonopy_test_1' and the fast one just does not have the last line
Hope this helps! |
Hi,
when looping on:
the query becomes extremely slow using two filters.
If I use just one filter it is acceptable: to give an idea for a list of 1000 it takes ca. 25 minutes instead of 25 seconds (60x)!
As an extra info, only some of the materials in the list have this outgoing WC, but from the tqdm progress bar I'm using, the time spent on each material is the same.
I recently updated to python-3.8.8/aiida_core-1.6.1 and I noted this weird behaviour of the querybuilder.
I did not have the chance to try it with the previous version 1.5 I was using, so I'm not sure it is version specific, but certainly I did similar queries with multiple filters and it was not that slow.
fyi @ltalirz
Edit (leo): the size of the db is 689681 nodes
The text was updated successfully, but these errors were encountered: