Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pseudo potential family plugins for specific file formats #31

Merged
merged 3 commits into from
Dec 8, 2020

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Dec 7, 2020

Fixes #25

@sphuber sphuber force-pushed the fix/025/decouple-format-from-family branch 2 times, most recently from b41cb05 to 91c9d17 Compare December 7, 2020 15:58
@zooks97
Copy link
Contributor

zooks97 commented Dec 7, 2020

Why not support PK and UUID in load_pseudo_family? Supporting this here does not preclude loading the Group directly but adds the benefit of load_pseudo_family being a one-stop-shop for users, no matter what identifier they have on hand.

@sphuber
Copy link
Contributor Author

sphuber commented Dec 7, 2020

@zooks97 following the discussion of last week, I have updated the code to decouple the family class from the pseudo types it can host. Any family can now support one or many pseudo types, even though a single instance can only host pseudos of a single type. I added the load_pseudo_profile to make loading a specific family by label and a specific pseudo type easier, as now there no longer is a specific class that can serve as a shortcut. Could you have a look if you see any potential problems in the design/implementation and whether this would simplify the implementation of the Pseudo Dojo work?

@zooks97
Copy link
Contributor

zooks97 commented Dec 7, 2020

In the same vein, it could be possible to make pseudo_type an optional argument if the user doesn't explicitly know the pseudo_type (e.g. installing SSSP, you don't specify a type because there is only UPF, so where are they supposed to get that information?) or knows that only one family exists with the given label. If there is only one family which has the given label, return that; otherwise, raise a similar exception to the case where multiple families have the same label and type.

@zooks97
Copy link
Contributor

zooks97 commented Dec 7, 2020

implementation of the Pseudo Dojo work?

Overall, it looks very good. It will simplify the PseudoDojo work as far as reducing the number of PseudoPotentialFamily subclasses, but by necessity there will still be a large chunk of logic to deal with the detailed inconsistency in PseudoDojo.

@sphuber
Copy link
Contributor Author

sphuber commented Dec 7, 2020

Why not support PK and UUID in load_pseudo_family? Supporting this here does not preclude loading the Group directly but adds the benefit of load_pseudo_family being a one-stop-shop for users, no matter what identifier they have on hand.

It is a good point. I mostly didn't do it because it complicates the implementation and behavior, but maybe you are right that having to switch the function based on what identifier you have is going to be annoying.

Thinking about this some more, I am not even sure anymore why we concluded that we needed the pseudo_type argument to have people distinguish pseudo families when they have a label. The uniqueness requirement in the database is on the tuple (label, type_string) which means that you cannot have two families of the same type with the same label. That means that if you want to create PseudoPotentialFamily instances with UpfData nodes, then you have to use different labels anyway. So essentially the ambiguity problem that I thought was there, doesn't even exist, does it?

So that means we can just remove pseudo_type from the function, meaning we can simply get rid of it entirely, because at that point it is identical to load_group essentially... Am I missing something?

@sphuber sphuber force-pushed the fix/025/decouple-format-from-family branch from 91c9d17 to 1ca1bc4 Compare December 7, 2020 17:47
@sphuber sphuber requested a review from zooks97 December 7, 2020 18:18
The initial design was to create a single pseudo potential family
subclass with corresponding entry point for each pseudo potential data
subtype available. Not only does this approach not really make sense, as
the format of the pseudo potential should be irrelevant to its content,
and so a family of UPF potentials can convey the exact same information
as one where the pseudos are formatted in PSP8 format, but it also
forces lots of subclasses and entry points to be created.

Still, having the format of the pseudos used in a pseudo family is still
useful, for example in order to query for it. Codes typically only
support a subset of pseudo formats, and so it is important to find a
family with the correct format when launching a calculation. The idea is
then to move this information to the extras of family such that it can
be queried for.

Here we first remove all existing pseudo family subclasses that only
exist because of a particular pseudo format.
Originally, each pseudopotential family could only specify a single
supported pseudopotential type, meaning a single subclass of the
`PseudoPotentialData` base class. However, this policy forced one to
create a new family subclass for each pseudopotential type that one
might want to create a family off. Semantically, this does not even make
sense, because the pseudopotential types are merely describing the file
format with which the pseudopotential is written to file. In principle,
any pseudo could be transformed from any format to any other format. A
family with a certain set of pseudopotentials in format A, could be
represented by the same pseudos but then in format B, without changing
the informational content. Forcing a different pseudo family class to be
used is nonsensical and unpractical.

The `PseudoPotentialFamily` class is updated such that the `_pseudo_type`
class attribute is renamed to `_pseudo_types` and now takes a tuple of
`PseudoPotentialData` subclasses. An instance of a family can be created
where the pseudopotentials are in any one of these formats. Note that a
family class can support multiple pseudopotential types, but any one
instance can only host pseudopotentials of a single type.
Since pseudopotential families can now optionally support many
pseudopotential formats, although each instance only supports one format
at a time, it should be easy to determine which format any particular
family hosts.

The `pseudo_type` property is added to `PseudoPotentialFamily` which
returns the entry point name of the `PseudoPotentialData` subclass that
is used by the pseudopotentials that it contains. For example, if the
family hosts only `UpfData` nodes, whose entry point string is defined
as `aiida.data:pseudo.upf`, it will return `pseudo.upf`. Note that the
entry point group `aiida.data` is stripped as it is redundant
information.

The `pseudo_type` is stored as an extra on the family instance and is
updated after the node contents are mutated, i.e. after `add_nodes`,
`remove_nodes` and `clear. Having this information stored as an extra
and not calculated each time on the fly, not only helps efficiency but
it also makes it queryable, which is important if one needs to find a
family with pseudos of a particular type.

The class does not provide a `pseudo_type` setter as this only ever
needs to be set by the class itself. Users can of course always change
the value through the `set_extra` method of the `Group` class, causing
inconsistency in the data, but there is nothing one can do to protect
against this.
@sphuber sphuber force-pushed the fix/025/decouple-format-from-family branch from 1ca1bc4 to d6b2ca5 Compare December 8, 2020 07:48
@sphuber sphuber merged commit 4808f10 into master Dec 8, 2020
@sphuber sphuber deleted the fix/025/decouple-format-from-family branch December 8, 2020 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the format of pseudo potentials used for a family just an attribute
2 participants