Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query relationship considering the 'extended' provenance graph #4293

Open
greschd opened this issue Aug 7, 2020 · 5 comments
Open

Add query relationship considering the 'extended' provenance graph #4293

greschd opened this issue Aug 7, 2020 · 5 comments

Comments

@greschd
Copy link
Member

greschd commented Aug 7, 2020

Use case / problem

When creating queries with the with_ancestors or with_descendants keywords, only links in the "data provenance" (CREATE, INPUT_CALC) are considered when determining the ancestor / descendant relationship. I have encountered several cases where it would be convenient to consider the data + logical provenance. For example, finding all instances of a workchain that descend from a given structure.

Possible solution

A possible solution would be adding new keywords (working name with_extended_ancestors, with_extended_descendants) that take into account also INPUT_WORK and CALL links.

The RETURN links can not be included, because they can create loops in the provenance graph - which would lead to infinite recursions.

Describe alternatives you've considered

  • Altering the behavior of the current keywords is backwards-incompatible, and may be undesired because it will have a performance cost.
  • The RETURN links could also be included if there is some safeguard to exclude only the ones that actually do create a cycle (e.g. by node creation time). However, I think this could lead to confusing edge cases, and may not be worth the effort. Maybe we can specify in the documentation for the new feature that RETURN links may be included in the future, thus making it "technically not backwards-incompatible" if we do decide to add it.

Additional context

Discussion in #3881.

@sphuber
Copy link
Contributor

sphuber commented Aug 14, 2020

I think wanting to search for "logical" ancestors and descendants is a perfectly valid use case and should be supported.

I started to work on this and a naive approach did not seem to complicated to implement. Simply adding a with_extended_ancestors keyword and have that include all links except RETURN would already provide all use cases, because one could further narrow down the traversed paths by filtering out certain link types using edge_filters, if it were not for the fact that that is bugged. I opened #4305 for this.

Before implementing the rest (and settling all the naming), there is one more fundamental question we should answer. Currently, the implementation will return all paths that match the query path when using with_ancestors and with_descendants. The projected results then can often include duplicates. The question is whether this is expected behavior or the builder should always return the unique set of matched paths. I think in most cases, one wants just the unique set and one doesn't care about multiple paths but I could be wrong here. Additionally, I think having the current behavior as default can cause unexpected behavior.

@greschd
Copy link
Member Author

greschd commented Aug 17, 2020

Before implementing the rest (and settling all the naming), there is one more fundamental question we should answer. Currently, the implementation will return all paths that match the query path when using with_ancestors and with_descendants. The projected results then can often include duplicates. The question is whether this is expected behavior or the builder should always return the unique set of matched paths. I think in most cases, one wants just the unique set and one doesn't care about multiple paths but I could be wrong here. Additionally, I think having the current behavior as default can cause unexpected behavior.

I think there is already an ongoing discussion on that point, see #3755

As for myself, I usually put a .distinct() somewhere to avoid getting duplicates. Not sure if that might still carry the performance penalty, though.

@ltalirz
Copy link
Member

ltalirz commented Jun 20, 2022

Hey guys, I would like to revive this thread, since this I think this is an important feature.

Example use case: query for all calculations that exited with a specific exit code and get their parent workchains so you can resubmit them.

I started to work on this and a naive approach did not seem to complicated to implement

@sphuber Would you mind opening a draft PR for this?

The projected results then can often include duplicates.

As mentioned by @greschd , this question was discussed in #3755 and it seems the original intention of the authors was to include the duplicates, although I agree with @sphuber in being pretty certain this is not what most users are looking for.

As long as there is an easy workaround, I'm not too bothered about having the duplicates in a first implementation.

Another suggestion to consider might be introducing separate flags to control uniqueness/extendedness rather than creating flags like with_ancestors_unique_extended (or you might end up with 2**#variants flags).

Simply adding a with_extended_ancestors keyword and have that include all links except RETURN would already provide all use cases, because one could further narrow down the traversed paths by filtering out certain link types using edge_filters, if it were not for the fact that that is bugged

Would it make sense to add the support for ancestors already, with support for descendants to be added later?
I believe there are many use cases where the query can be written both top-down and bottom-up.

@sphuber
Copy link
Contributor

sphuber commented Jul 1, 2022

@sphuber Would you mind opening a draft PR for this?

I remember that I thought this would be relatively easy but when actually trying, it wasn't actually the case. See the discussion with @lekah in #4305 as to why. I abandoned that work and don't think I have the branch anymore.

@lekah
Copy link
Contributor

lekah commented Jul 1, 2022

Can you use the Aiida Graph Explorer instead or is this one dead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants