Remove pseudo potential family plugins for specific file formats #31

sphuber · 2020-12-07T12:46:49Z

Fixes #25

zooks97 · 2020-12-07T16:22:52Z

Why not support PK and UUID in load_pseudo_family? Supporting this here does not preclude loading the Group directly but adds the benefit of load_pseudo_family being a one-stop-shop for users, no matter what identifier they have on hand.

sphuber · 2020-12-07T16:26:35Z

@zooks97 following the discussion of last week, I have updated the code to decouple the family class from the pseudo types it can host. Any family can now support one or many pseudo types, even though a single instance can only host pseudos of a single type. I added the load_pseudo_profile to make loading a specific family by label and a specific pseudo type easier, as now there no longer is a specific class that can serve as a shortcut. Could you have a look if you see any potential problems in the design/implementation and whether this would simplify the implementation of the Pseudo Dojo work?

zooks97 · 2020-12-07T16:27:00Z

In the same vein, it could be possible to make pseudo_type an optional argument if the user doesn't explicitly know the pseudo_type (e.g. installing SSSP, you don't specify a type because there is only UPF, so where are they supposed to get that information?) or knows that only one family exists with the given label. If there is only one family which has the given label, return that; otherwise, raise a similar exception to the case where multiple families have the same label and type.

zooks97 · 2020-12-07T16:29:57Z

implementation of the Pseudo Dojo work?

Overall, it looks very good. It will simplify the PseudoDojo work as far as reducing the number of PseudoPotentialFamily subclasses, but by necessity there will still be a large chunk of logic to deal with the detailed inconsistency in PseudoDojo.

sphuber · 2020-12-07T16:43:55Z

Why not support PK and UUID in load_pseudo_family? Supporting this here does not preclude loading the Group directly but adds the benefit of load_pseudo_family being a one-stop-shop for users, no matter what identifier they have on hand.

It is a good point. I mostly didn't do it because it complicates the implementation and behavior, but maybe you are right that having to switch the function based on what identifier you have is going to be annoying.

Thinking about this some more, I am not even sure anymore why we concluded that we needed the pseudo_type argument to have people distinguish pseudo families when they have a label. The uniqueness requirement in the database is on the tuple (label, type_string) which means that you cannot have two families of the same type with the same label. That means that if you want to create PseudoPotentialFamily instances with UpfData nodes, then you have to use different labels anyway. So essentially the ambiguity problem that I thought was there, doesn't even exist, does it?

So that means we can just remove pseudo_type from the function, meaning we can simply get rid of it entirely, because at that point it is identical to load_group essentially... Am I missing something?

The initial design was to create a single pseudo potential family subclass with corresponding entry point for each pseudo potential data subtype available. Not only does this approach not really make sense, as the format of the pseudo potential should be irrelevant to its content, and so a family of UPF potentials can convey the exact same information as one where the pseudos are formatted in PSP8 format, but it also forces lots of subclasses and entry points to be created. Still, having the format of the pseudos used in a pseudo family is still useful, for example in order to query for it. Codes typically only support a subset of pseudo formats, and so it is important to find a family with the correct format when launching a calculation. The idea is then to move this information to the extras of family such that it can be queried for. Here we first remove all existing pseudo family subclasses that only exist because of a particular pseudo format.

Originally, each pseudopotential family could only specify a single supported pseudopotential type, meaning a single subclass of the `PseudoPotentialData` base class. However, this policy forced one to create a new family subclass for each pseudopotential type that one might want to create a family off. Semantically, this does not even make sense, because the pseudopotential types are merely describing the file format with which the pseudopotential is written to file. In principle, any pseudo could be transformed from any format to any other format. A family with a certain set of pseudopotentials in format A, could be represented by the same pseudos but then in format B, without changing the informational content. Forcing a different pseudo family class to be used is nonsensical and unpractical. The `PseudoPotentialFamily` class is updated such that the `_pseudo_type` class attribute is renamed to `_pseudo_types` and now takes a tuple of `PseudoPotentialData` subclasses. An instance of a family can be created where the pseudopotentials are in any one of these formats. Note that a family class can support multiple pseudopotential types, but any one instance can only host pseudopotentials of a single type.

Since pseudopotential families can now optionally support many pseudopotential formats, although each instance only supports one format at a time, it should be easy to determine which format any particular family hosts. The `pseudo_type` property is added to `PseudoPotentialFamily` which returns the entry point name of the `PseudoPotentialData` subclass that is used by the pseudopotentials that it contains. For example, if the family hosts only `UpfData` nodes, whose entry point string is defined as `aiida.data:pseudo.upf`, it will return `pseudo.upf`. Note that the entry point group `aiida.data` is stripped as it is redundant information. The `pseudo_type` is stored as an extra on the family instance and is updated after the node contents are mutated, i.e. after `add_nodes`, `remove_nodes` and `clear. Having this information stored as an extra and not calculated each time on the fly, not only helps efficiency but it also makes it queryable, which is important if one needs to find a family with pseudos of a particular type. The class does not provide a `pseudo_type` setter as this only ever needs to be set by the class itself. Users can of course always change the value through the `set_extra` method of the `Group` class, causing inconsistency in the data, but there is nothing one can do to protect against this.

sphuber force-pushed the fix/025/decouple-format-from-family branch 2 times, most recently from b41cb05 to 91c9d17 Compare December 7, 2020 15:58

sphuber force-pushed the fix/025/decouple-format-from-family branch from 91c9d17 to 1ca1bc4 Compare December 7, 2020 17:47

sphuber requested a review from zooks97 December 7, 2020 18:18

sphuber added 3 commits December 8, 2020 08:47

sphuber force-pushed the fix/025/decouple-format-from-family branch from 1ca1bc4 to d6b2ca5 Compare December 8, 2020 07:48

zooks97 approved these changes Dec 8, 2020

View reviewed changes

sphuber merged commit 4808f10 into master Dec 8, 2020

sphuber deleted the fix/025/decouple-format-from-family branch December 8, 2020 08:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove pseudo potential family plugins for specific file formats #31

Remove pseudo potential family plugins for specific file formats #31

sphuber commented Dec 7, 2020 •

edited

Loading

zooks97 commented Dec 7, 2020

sphuber commented Dec 7, 2020

zooks97 commented Dec 7, 2020 •

edited

Loading

zooks97 commented Dec 7, 2020

sphuber commented Dec 7, 2020

Remove pseudo potential family plugins for specific file formats #31

Remove pseudo potential family plugins for specific file formats #31

Conversation

sphuber commented Dec 7, 2020 • edited Loading

zooks97 commented Dec 7, 2020

sphuber commented Dec 7, 2020

zooks97 commented Dec 7, 2020 • edited Loading

zooks97 commented Dec 7, 2020

sphuber commented Dec 7, 2020

sphuber commented Dec 7, 2020 •

edited

Loading

zooks97 commented Dec 7, 2020 •

edited

Loading