Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that CALL and RETURN links are not considered in 'with_ancestors' queries. #3881

Open
greschd opened this issue Mar 31, 2020 · 6 comments

Comments

@greschd
Copy link
Member

greschd commented Mar 31, 2020

When querying with the with_ancestors or with_descendants option, the "logical provenance" links (CALL and RETURN) are not taken into account - probably because cycles would lead to an "infinite" number of possible paths.

This should, however, be clearly documented in the QueryBuilder documentation. Currently, it just produces empty query results.

Maybe CALL links could even be allowed, since only RETURN can break causality.

To demonstrate the behavior, consider the following simple example:

from aiida import orm
from aiida.engine import workfunction, calcfunction, run_get_node

@calcfunction
def do_nothing_calc():
    return

@workfunction
def do_nothing_work():
    return do_nothing_calc()

res, node = run_get_node(do_nothing_work)

The following query will find the do_nothing_calc node:

qb = orm.QueryBuilder()
qb.append(orm.WorkFunctionNode, filters={'uuid': node.uuid}, tag='work')
qb.append(orm.CalcFunctionNode, with_incoming='work')
qb.all() # [[<CalcFunctionNode: uuid: f4e22f7f-704b-4521-af82-95c85e0985f5 (pk: 321) (__main__.do_nothing_calc)>]]

whereas using with_ancestors does not:

qb = orm.QueryBuilder()
qb.append(orm.WorkFunctionNode, filters={'uuid': node.uuid}, tag='work')
qb.append(orm.CalcFunctionNode, with_ancestors='work')
qb.all() # []
@greschd greschd changed the title Document that CALL and RETURN links are not considered in with_ancestors and _descendants. Document that CALL and RETURN links are not considered in 'with_ancestors' queries. Mar 31, 2020
@giovannipizzi
Copy link
Member

Yes, this is the expcted result.
We wanted these commands to only follow the data provenance.
I agree this should be better documented if not clear.

I wouldn't follow CALL links exactly because they are not part of the data provenance, to be consistent with everywhere else (e.g. logic when exporting, deleting, ...).
This does not prevent to define a with_call_ancestors= or something similar, to follow recursively (only) the CALL stack, if we believe this is useful (I agree it might be!).
But I wouldn't merge this into with_ancestors=, because I am not sure that finding the input of any of the workflows that eventually launched one of the calculations that is among your ancestors is really a benefit and not just confusing (or finding as a descendant the output of any of the calculations launched, eventually, by any of the descendant workflows).

@DanielMarchand
Copy link
Contributor

This bit me the other day. I want to query all the WorkChainNodes that are an ancestor of a given structure node. I can usually use with_incomming, but not always (I sometimes transform the structure prior to launching).

Definitely would be nice to have a with_call_ancestors.

@greschd
Copy link
Member Author

greschd commented Aug 6, 2020

Yeah, I've come across a few cases where this would be nice.. usually there is some workaround with using called_descendants from the "top-most" workchain, but querying would definitely be cleaner.

Following only CALL links as suggested might not be terribly useful, since there is bound to be an INPUT_* link somewhere in the relationship I'm trying to find. At least in the cases I'm aware of, that wouldn't have been sufficient.

If I understand things correctly, only RETURN links are problematic for the purpose of finding descendants / ancestors, since they can introduce cycles in the graph. I think including all of CALL_CALC, CALL_WORK, INPUT_CALC, INPUT_WORK, and CREATE links still produces a DAG sub-graph - happy to be corrected on that though @giovannipizzi @sphuber.

I think it would make sense to add an option (with_extended_ancestors, name to be improved) that traverses that graph. Or, if that might easily be too expensive, maybe we could even provide the option to select which link types are included (disallowing RETURN).

@sphuber
Copy link
Contributor

sphuber commented Aug 6, 2020

This bit me the other day. I want to query all the WorkChainNodes that are an ancestor of a given structure node. I can usually use with_incomming, but not always (I sometimes transform the structure prior to launching).

I might be misunderstanding, but this does not seem correct. You say you are looking for WorkChainNode that is an ancestor of a StructureData, but I think you were describing a StructureData being an input of the workchain. That would make the WorkChainNode a descendant and not an ancestor. If that is true, it should just follow INPUT_WORK links.

Definitely would be nice to have a with_call_ancestors.

Your use case shouldn't have anything to do with call links though. When you say you "transform" the structure prior to launching, do you mean you pass it through some calcfunction? And do you then pass that output as input to the workchain? This procedure would indeed prohibit the use of a direct with_incoming. Question if it wouldn't be better to make the calcfunction part of the workchain instead of doing it outside.

If I understand things correctly, only RETURN links are problematic for the purpose of finding descendants / ancestors, since they can introduce cycles in the graph. I think including all of CALL_CALC, CALL_WORK, INPUT_CALC, INPUT_WORK, and CREATE links still produces a DAG sub-graph

Yes, this should be correct. I think it makes sense indeed to think about an additional keyword that can target these sub graphs.

@DanielMarchand
Copy link
Contributor

DanielMarchand commented Aug 7, 2020

I might be misunderstanding, but this does not seem correct. You say you are looking for WorkChainNode that is an ancestor of a StructureData, but I think you were describing a StructureData being an input of the workchain. That would make the WorkChainNode a descendant and not an ancestor. If that is true, it should just follow INPUT_WORK links.

I might be getting confused with the terminology. Basically in a4 using 'descendant_of' works (i.e. identifies WorkChains that are the descendants of a given structure node). In v1.3 neither 'with_ancestors' [I think this is the one I should use] nor 'with_descendents' work anymore, but "with_incomming" does. I'm not sure why this is the case, only that it is the case.

Your use case shouldn't have anything to do with call links though.

I'm not sure why 'with_incoming' works and nothing else does, I thought it was related to this issue, but I could be wrong.

When you say you "transform" the structure prior to launching, do you mean you pass it through some calcfunction?

Yes, exactly, might not have been the best design choice, but I made it a long time ago and I have a reasonable size set of legacy data that it would be nice to support.

And do you then pass that output as input to the workchain? This procedure would indeed prohibit the use of a direct with_incoming. Question if it wouldn't be better to make the calcfunction part of the workchain instead of doing it outside.

You're right that this would probably be a better way to have designed it. Going in the future I could do something like this, but it would be nice to be able to support legacy calculations I made long ago.

@greschd
Copy link
Member Author

greschd commented Aug 7, 2020

Your use case shouldn't have anything to do with call links though.

I'm not sure why 'with_incoming' works and nothing else does, I thought it was related to this issue, but I could be wrong.

Yes, it's related to this issue, but not because of the CALL links: The link between the structure and workchain is of type INPUT_WORK. While with_ancestors considers only CREATE and INPUT_CALC links (which make up the "data provenance"), with_incoming considers, as far as I can tell, any kinds of links but only going up one level.

When you say you "transform" the structure prior to launching, do you mean you pass it through some calcfunction?

Yes, exactly, might not have been the best design choice, but I made it a long time ago and I have a reasonable size set of legacy data that it would be nice to support.

Eh, I think both are valid design choices. If you have an existing large-ish workchain there might be edge cases where you want to "clean up" your structure before passing it in without making it an entire new super-workchain. In any case, I think querying should be able to answer these kinds of questions.

There are two things here:

  • document the behavior of with_ancestors and with_descendants w.r.t. which link types it checks (this issue, initially)
  • add support for extended 'ancestor' and 'descendant' queries - taking all link types that will still produce a DAG. I'll make a separate issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants