Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add as_dict() method to AirssProvider #2642

Merged
merged 7 commits into from
Sep 8, 2022
Merged

Add as_dict() method to AirssProvider #2642

merged 7 commits into from
Sep 8, 2022

Conversation

janosh
Copy link
Member

@janosh janosh commented Sep 1, 2022

@mkhorton This is a small follow-up to #2625. I think a convenience method to turn the most important attributes of an AirssProvider into a dict would be nice to have.

from pymatgen.io.res import AirssProvider

res_file = "/good_castep/NaLiTa2O6-28290-4142-1.res"
AirssProvider.from_file(res_file).as_dict()
sample output
{'structure': {'@module': 'pymatgen.core.structure',
  '@class': 'Structure',
  'charge': 0,
  'lattice': {'matrix': [[5.35841, 0.0, 3.2810798275095844e-16],
    [-3.420358136599889e-16, 5.585868740246209, -0.0037514850704228257],
    [0.0, 0.0, 7.70098]],
   'pbc': (True, True, True),
   'a': 5.35841,
   'b': 5.58587,
   'c': 7.70098,
   'alpha': 90.03848,
   'beta': 90.0,
   'gamma': 90.0,
   'volume': 230.5009196038728},
  'sites': [{'species': [{'element': 'Li', 'occu': 1}],
    'abc': [-0.4570528716928, 0.1473363079843, -0.2499276126345],
    'xyz': [-2.4490766782074167, 0.8230012770727893, -1.9252402763057663],
    'label': 'Li',
    'properties': {}},
   {'species': [{'element': 'Li', 'occu': 1}],
    'abc': [0.0429471283072, -0.1473363079843, 0.2499276126345],
    'xyz': [0.23012832179258363, -0.8230012770727893, 1.9252402763057663],
    'label': 'Li',
    'properties': {}},
   {'species': [{'element': 'O', 'occu': 1}],
...
 'energy': -42060.247,
 'integrated_spin_density': 0.0,
 'integrated_absolute_spin_density': 0.0,
 'spacegroup_label': 'Pmn21',
 'appearances': 1}

@ScottNotFound Feel free to jump in if I missed any important attributes.

janosh added 3 commits August 31, 2022 18:20
pymatgen/core/tests/test_trajectory.py:8: in <module>
    from pymatgen.core.trajectory import Trajectory
pymatgen/core/trajectory.py:35: in <module>
    Vector3D = tuple[float, float, float]
@ScottNotFound
Copy link
Contributor

Hmm, I don't know the situations where as_dict methods are currently used, but I assume the use case is typically for databases? I wonder if it might be worth having ResProvider inherit from MSONable like a lot of other classes do. Otherwise I think the TITL entry, REM entries, and structure are enough to recreate the object.

@janosh
Copy link
Member Author

janosh commented Sep 1, 2022

I wonder if it might be worth having ResProvider inherit from MSONable like a lot of other classes do.

I was thinking about that as well.

Otherwise I think the TITL entry, REM entries, and structure are enough to recreate the object.

Thanks. 👍

@shyuep
Copy link
Member

shyuep commented Sep 7, 2022

As a general principle, inheriting from MSONAble should be the default unless there is good reason to implement a separate as_dict. If you implement as_dict, you generally have to implement from_dict as well.

@janosh
Copy link
Member Author

janosh commented Sep 7, 2022

@shyuep Would the MSONable API work in this case? AirssProvider currently is not meant to be created through __init__:

pymatgen/pymatgen/io/res.py

Lines 395 to 397 in 9407e8d

def __init__(self, res: Res, parse_rems: Literal["gentle", "strict"] = "gentle"):
"""The :func:`from_str` and :func:`from_file` methods should be used instead of constructing this directly."""
super().__init__(res)

@shyuep
Copy link
Member

shyuep commented Sep 7, 2022

The MSONAble API does not rely on what method you use to init it. As long as res and parse_rems are part of the attributes, it will work. E.g., you rarely use the Poscar init method and mostly uses from_file. But Poscar still works with the MSONable api. All the API is meant to do is for serialization.

@ScottNotFound
Copy link
Contributor

Looking at the way MSONable works, if you make ResProvider inherit from it, then you get the as_dict and from_dict methods without having to do any implementation. They are just access wrappers so _res is all you need. Then you can override and stick any extra keys like structure in there after calling from super.

@janosh
Copy link
Member Author

janosh commented Sep 7, 2022

@ScottNotFound Thanks for pointing that out! We should probably make both ResProvider and Res inherit from MSONable? Else AirssProvider.from_file(res_file).as_dict() would have key 'res' be an instance of Res, meaning not serializable.

@ScottNotFound
Copy link
Contributor

ScottNotFound commented Sep 7, 2022

Res is a dataclass meaning it has a __init__ that includes all its attributes so it just works with MSONable apparently.

Edit: Oh I see the problem. I guess the class isn't smart enough to serialize a dataclass. You would have to have all of them inherit or you can do something with __repr__. I'll play with it.

@janosh
Copy link
Member Author

janosh commented Sep 7, 2022

Hmmm... I assumed the output from as_dict() is meant to be dicts all the way down? Seems weird to return a dict containing classes like Res and AirssTITL.

@janosh
Copy link
Member Author

janosh commented Sep 7, 2022

Edit: Oh I see the problem. I guess the class isn't smart enough to serialize a dataclass. You would have to have all of them inherit or you can do something with __repr__. I'll play with it.

Yeah, I think all the classes in pymatgen/io/res.py would have to be children of MSONable in order for the AirssProvider.as_dict() to be a dict of dicts.

@shyuep
Copy link
Member

shyuep commented Sep 7, 2022

Alternatively, you can modify the MSONAble protocol to handle dataclass? That would be better for future usability rather than just fixing the problem for this one case. MSONable was written at a time when dataclasses were not a thing....

@janosh
Copy link
Member Author

janosh commented Sep 7, 2022

At this point, the as_dict() method won't be useful for my original purpose where I just want the pymatgen Structure and top level fields like title, volume, energy, etc. spread into columns of a dataframe:

df_res = pd.DataFrame(
    [AirssProvider.from_file(file).as_dict() for file in res_files]
).set_index("seed")

Should we have another function, say AirssProvider.summary() for that in pymatgen? If not, I'll just keep it in my AIRSS project.

Screen Shot 2022-09-07 at 13 41 21

@ScottNotFound
Copy link
Contributor

Alternatively, you can modify the MSONAble protocol to handle dataclass? That would be better for future usability rather than just fixing the problem for this one case. MSONable was written at a time when dataclasses were not a thing....

I like this idea.

@shyuep
Copy link
Member

shyuep commented Sep 7, 2022

How about I suggest this:

  1. Someone can look into whether it is easy to add support for dataclasses in MSONable.
  2. You can still implement an as_dict method. I would suggest you make AirssRes MSONable in any case and override as_dict. You can have a kwarg called verbosity and default it to full verbosity and it will just delegate to super().as_dict. You then implement a lower verbosity level for your purpose. Note that the Structure.as_dict method has a similar thing - there are different verbosities. This is because the default Site objects carry the Lattice object, and for the purposes of serializing a crystal structure, it is extremely wasteful to repeat the same Lattice object on all sites.

@ScottNotFound
Copy link
Contributor

At this point, the as_dict() method won't be useful for my original purpose where I just want the pymatgen Structure and top level fields like title, volume, energy, etc. spread into columns of a dataframe:

Is there any other functionality in pymatgen that takes calculation results and supplies it in a form suitable for a dataframe? It would be nice to keep a consistent interface. I see that IStructure provides as_dataframe, perhaps an as_datarow would suffice if there is no existing similar function elsewhere?

@shyuep
Copy link
Member

shyuep commented Sep 7, 2022

@ScottNotFound I don't think there is a need for that functionality. The reason is because dataframes can take a dict.

E.g., one of the most common ways I construct a dataframe is simply:

pd.DataFrame([o.as_dict() for o in objects])

The MPRester.query method returns a list of dicts. If you simply do pd.DataFrame(MPRester().query(....whatever arguments)), you get a DataFrame. I do this often.

In essence, the as_dataframe method is rather redundant because you are replacing a single line of code.

shyuep pushed a commit that referenced this pull request Sep 7, 2022
@shyuep
Copy link
Member

shyuep commented Sep 7, 2022

@janosh I just added support for dataclasses in the MSONAble protocol. It should now work without having to mess with the subclasses.

@janosh
Copy link
Member Author

janosh commented Sep 7, 2022

@shyuep Great. I did what you suggested. as_dict() now has a verbose flag that delegates to super().as_dict() if True (the default) and has the previous behavior useful for dataframe if verbose=False. I only made ResProvider(MSONable) inherit from MSONable since it's not a dataclass. All the data classes were left as is.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.005%) to 63.517% when pulling 5faa3f9 on janosh:AirssProvider-as_dict into 7500268 on materialsproject:master.

@shyuep shyuep merged commit 00df8be into materialsproject:master Sep 8, 2022
@janosh janosh deleted the AirssProvider-as_dict branch September 8, 2022 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants