Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace transitive closure with recursive query #258

Closed
aiida-bot opened this issue Aug 29, 2016 · 8 comments
Closed

Replace transitive closure with recursive query #258

aiida-bot opened this issue Aug 29, 2016 · 8 comments

Comments

@aiida-bot
Copy link

Originally reported by: Giovanni Pizzi (Bitbucket: pizzi, GitHub: giovannipizzi)


The transitive closure, for big graphs, can become large. We should disable it, but provide equivalent functionality when needed (especially, but not only, for the QueryBuilder).

Since we are moving to the support of PostgreSQL only, we can use recursive queries that can get a similar table automatically.


@aiida-bot
Copy link
Author

Original comment by Giovanni Pizzi (Bitbucket: pizzi, GitHub: giovannipizzi):


Considering that the columns entry_edge_id, direct_edge_id and exit_edge_id are needed only for deleting one node (or to efficiently know what is the path that is traversed), we can remove them.

We can get results analogous to the following query with the db_dbpath table:

SELECT parent_id, child_id, depth from db_dbpath where parent_id = 223389 ORDER BY depth, child_id;

using the following recursive query:

WITH RECURSIVE path_down(input_id, output_id, depth, path) AS (
 SELECT link.input_id, link.output_id, 0::INT AS depth, (link.input_id || '->' || link.output_id::TEXT) AS path FROM db_dblink AS link WHERE link.input_id = 223389
UNION ALL
 SELECT existing.input_id, newlink.output_id, existing.depth + 1 AS depth, (existing.path || '->' || newlink.output_id::TEXT) FROM path_down AS existing, db_dblink AS newlink WHERE newlink.input_id = existing.output_id
)
SELECT * FROM path_down AS path 
ORDER BY path.depth, path.output_id ASC;

and similarly, to get the parent links instead of the child links:

WITH RECURSIVE path_up(input_id, output_id, depth, path) AS (
 SELECT link.input_id, link.output_id, 0::INT AS depth, (link.input_id || '->' || link.output_id::TEXT) AS path FROM db_dblink AS link WHERE link.output_id = 223389
UNION ALL
 SELECT newlink.input_id, existing.output_id, existing.depth + 1 AS depth, (newlink.input_id::TEXT || '->' || existing.path) FROM path_up AS existing, db_dblink AS newlink WHERE newlink.output_id = existing.input_id
)
SELECT * FROM path_up AS path 
ORDER BY path.depth, path.input_id ASC;

@aiida-bot
Copy link
Author

Original comment by Giovanni Pizzi (Bitbucket: pizzi, GitHub: giovannipizzi):


A few comments on efficiency:

  1. One cannot create a view of the recursive query without any filtering (as the WHERE link.input_id = 223389 above, and/or a filter on the depth) and then filter on it, or equivalently filter selecting from the query above. In fact (at least currently, postgres 9.5) PostgreSQL will not optimize it, rather create the full table (expensive), then filter it. I've tested it, but for more comments see also here: http://stackoverflow.com/questions/5861272/postgresql-with-recursive-performance (see the comment by Denis)

  2. creating the 'path' column as per the queries above are expensive. Also, probably the string is not very useful. So two comments. First, have a different query that does not generate it if it's not needed. Second, (at least in terms of functionality, I don't know in terms of efficiency) I would have there a JSONB field instead of a TEXT field, and have a list (so one could even query it, or at least use it directly) where new entries are appended, rather than a string concatenation (with the query above, the path field contains strings of the type 223389->223390->223396->223397->223524->223561

  3. Of course, when the ordering is not needed, remove it.

@aiida-bot
Copy link
Author

Original comment by Giovanni Pizzi (Bitbucket: pizzi, GitHub: giovannipizzi):


Actually, for point 2 above, using directly PostgreSQL arrays (not JSON arrays, directly int[]) makes things more usable, but not necessarily faster. Using the following query:

WITH RECURSIVE path_down(input_id, output_id, depth, path) AS (
 SELECT link.input_id, link.output_id, 0::INT AS depth, array[link.input_id, link.output_id] AS path FROM db_dblink AS link WHERE link.input_id = 142523
UNION ALL
 SELECT existing.input_id, newlink.output_id, existing.depth + 1 AS depth, (select array_agg(t.unnest) from (select unnest from unnest(existing.path) UNION ALL (SELECT newlink.output_id::int)) as t) FROM path_down AS existing, db_dblink AS newlink WHERE newlink.input_id = existing.output_id
)
SELECT * FROM path_down AS path 
ORDER BY path.depth, path.output_id ASC;

it takes, starting on the same node (that would return 14982 rows in my example) 4.1 seconds for this query with arrays, and 4.3 with strings.

@aiida-bot
Copy link
Author

Original comment by Giovanni Pizzi (Bitbucket: pizzi, GitHub: giovannipizzi):


Final comment for today: you can also inject a function once, and then use it multiple times, for typical queries, I don't know if its more efficient. An example (not optimized!):

create function path_down(id int) returns table (id int) as $$
WITH RECURSIVE path_down(input_id, output_id, depth, path) AS (
 SELECT link.input_id, link.output_id, 1::INT AS depth, (link.input_id || '->' || link.output_id::TEXT) AS path FROM db_dblink AS link WHERE link.input_id = $1
UNION ALL
 SELECT existing.input_id, newlink.output_id, existing.depth + 1 AS depth, (existing.path || '->' || newlink.output_id::TEXT) FROM path_down AS existing, db_dblink AS newlink WHERE newlink.input_id = existing.output_id)
SELECT output_id FROM path_down;
$$ language sql stable strict rows 5 cost 1;

and then use it as

select * from path_down(223389);

@giovannipizzi
Copy link
Member

@lekah what's the current status?
@nmounet what are the features in the private branch "optional_dbpath" that might be needed? can these be ported in the public repo?

@lekah
Copy link
Contributor

lekah commented Jan 20, 2017

What we discussed during the coding week is implemented and currently in develop. The QueryBuilder has an option to use the DbPath, and if not to use recursive queries.
QueryBuilder(with_dbpath=True|False)... @nmounet Tests only include small queries. I don't think the queries can be made any faster at this stage. Some of your queries should go into a new container for all kinds of super-special performance-sensitive queries. (aiida.backends.general.abstractqueries)

lekah added a commit that referenced this issue Feb 5, 2017
nvarini pushed a commit to nvarini/aiida_core that referenced this issue Feb 7, 2017
…ts_beta and Declarative DbPathBeta to behave as DbPath does,

with queryable properties being descendant_id (instead of child_id), ancestor_id (parent_id) and depth.
Possibility to add traversed path as an array, though this will only work in Postgresql, so commented out for now.
To initialize the mapper correctly, I added imports in the __init__ of backends.sqlalchemy.model
This might be also of greater convenience when importing!
Added headers to all files in the querybuild module.
nvarini pushed a commit to nvarini/aiida_core that referenced this issue Feb 7, 2017
nvarini pushed a commit to nvarini/aiida_core that referenced this issue Feb 7, 2017
nvarini pushed a commit to nvarini/aiida_core that referenced this issue Feb 7, 2017
…ts using recursive unions in SQL. It was made more efficient by not building the whole dbpath

in memory, but the needed subset. Needs to be implemented now for ancestors, and documented.
nvarini pushed a commit to nvarini/aiida_core that referenced this issue Feb 7, 2017
…Builder switch between using the DbPath and not using it. Adapted the tests. The previous functionalities in dummy_model and in sqlalchemy.model.__init__ are now obsolete and were removed.
@giovannipizzi giovannipizzi added this to the Feb17 release (v0.8.0) milestone Feb 28, 2017
@lekah lekah modified the milestones: Pre-1.0 release, Feb17 release (v0.8.0) Mar 8, 2017
lekah added a commit that referenced this issue May 31, 2017
… the triggers, without making it yet compulsory to migrate. This related to replacing the DbPath table with recursive queries on the links, #258
lekah added a commit that referenced this issue Jun 1, 2017
lekah added a commit that referenced this issue Jun 1, 2017
…setting is with_dbpath=False. The whole option needs to be removed asap
lekah added a commit that referenced this issue Jun 1, 2017
…ango with a QueryBuilder query, so that the recursive query is used there
lekah added a commit that referenced this issue Jun 1, 2017
…ll for when working with the Django backend
lekah added a commit that referenced this issue Jun 1, 2017
lekah added a commit that referenced this issue Jun 1, 2017
lekah added a commit that referenced this issue Jun 1, 2017
…or Node.has_children, Node.has_parents, added import
lekah added a commit that referenced this issue Jun 1, 2017
…the transitive closure. In essence this is also the test for the recursive queries, so there is redundancy now, and we can remove some tests for the queries
@lekah
Copy link
Contributor

lekah commented Jun 1, 2017

The main work is done in branch tc_wo_dbpath. Tests that fail so far are backup_scripts, export_and_import, tcod_exporters (for the main reason as export_and_import). Holding me back is now the reliance on DbPath table in the import and export functionality. This probably deserves a new issue.

lekah added a commit that referenced this issue Jul 12, 2017
lekah added a commit that referenced this issue Jul 12, 2017
…Path model and the corresponding relationshio fields
lekah added a commit that referenced this issue Jul 12, 2017
lekah added a commit that referenced this issue Jul 12, 2017
…ive to explicitly not follow return links.
lekah added a commit to lekah/aiida_core that referenced this issue Jul 19, 2017
lekah added a commit that referenced this issue Jul 21, 2017
…ble to simplify the code significantly, leading to a much cleaner API of the QB
@giovannipizzi
Copy link
Member

This has been fixed in the linked issues, and anyway in 0.10.0 the transitive closure has been removed, so I close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants