For MPcules: Molecule Trajectory and graph hashes #2945

espottesmith · 2023-04-19T14:48:14Z

Summary

We are nearing the inclusion of a large dataset of molecules and molecular properties to the Materials Project. This PR adds some features to pymatgen which will be useful for the molecules data pipeline in emmet, namely graph hashes and a Trajectory class that works for Molecules.

The graph hashing code, which is currently in emmet but is being moved to pymatgen because we expect it to be useful for more general users (rather than only developers), was taken (following licenses, I hope) from networkx. We took the code directly because we want to have access to a stable version of the hashing algorithm (networkx has made subtle changes to the algorithm in the past, leading to the same graphs producing different hashes).

The MoleculeOptimizeTrajectory class is basically a copy-paste of the existing Trajectory class, designed for Molecule objects rather than Structure objects. This will be important if we include geometry optimization trajectories in the new MP dataset, and/or if we want to visualize atomic motion (e.g. molecular normal modes of vibration).

Checklist

Doc strings have been added in the Google docstring format. Run pydocstyle on your code.
Type annotations are highly encouraged. Run mypy path/to/file.py to type check your code.
Tests have been added for any new functionality or bug fixes.
All linting and tests pass.

Note that the CI system will run all the above checks. But it will be much more efficient if you already fix most errors prior to submitting the PR. We highly recommended installing pre-commit hooks. Simply Run

…lecule-specific trajectory

…uck-typing. That'll go great

janosh · 2023-04-19T21:08:29Z

Thanks @espottesmith! Could you briefly explain the major differences between Trajectory and MoleculeOptimizeTrajectory? I understand latter is meant to handle Molecules but I'm curious what is preventing an object agnostic Trajectory class that handles both Structures and Mols?

Also, are you anticipating a MoleculeMDTrajectory class or similar with different needs? If not, why not call it MoleculeTrajectory and call the other StructureTrajectory with an alias for Trajectory for backwards compat?

espottesmith · 2023-04-19T21:18:04Z

There's nothing preventing an agnostic Trajectory class, in principle. It just felt easier to make two versions, one that had to consider Lattices and another that doesn't. This is the same approach taken previously, e.g. for StructureGraph vs. MoleculeGraph.

I don't care much about the name. MoleculeTrajectory and StructureTrajectory are fine.

janosh · 2023-04-19T21:49:52Z

If there's nothing preventing it, my preference is for an agnostic Trajectory class as that minimizes LoC and will prevent API drift that could occur between MoleculeTrajectory and StructureTrajectory over time. But this is just one opinion. Happy to be overruled by other maintainers or arguments.

espottesmith · 2023-04-19T22:56:14Z

I would respectfully say that if that's a design pattern that you or others want to move towards, then y'all should have at it. As I said, this pattern is easier to write (in my opinion and experience), and as that implies, writing a unified interface would mean a nontrivial amount of additional work (that I don't want to do if I don't have to).

shyuep · 2023-04-19T23:28:59Z

I agree with @janosh that it is preferable to have a single unified interface with a consistent API. Trajectories are trajectories. Whether there is a periodic boundary condition shouldn't make any difference. And I would prefer if the original contributor of the PR, i.e., @espottesmith makes the change.

espottesmith · 2023-04-20T03:14:30Z

The interface is as unified as it reasonably can be.

janosh · 2023-04-20T03:23:38Z

Thanks a lot, Evan! I really appreciate you going the extra mile!

Btw, different topic but wanted to mention I could get behind dropping mypy. Sounds like you'd be in favor? 🤣
In my personal experience, the pain mypy adds to developing can be too high for the number of coding errors it helps avoid. If @shyuep is not against, I'm happy to make the change.

shyuep · 2023-04-20T04:21:21Z

For mypy, I think the general approach should be to not bother with type annotations where it is not necessary but to add them where you want a more stringent control of the types being passed. If you don't annotate, mypy will not check?

janosh · 2023-04-20T14:48:46Z

If you don't annotate, mypy will not check?

That's true.

But unlike most (all?) other languages, I think the main beneficiaries of type annotations in Python are actually the users, not the developers. Tools like Sphinx automatically integrate type annos into static docs and IDEs display them on hover and when passing arguments like in this example:

So from that perspective, the more type hints the better. But the more you use them, the more mypy can be a nuisance.

shyuep · 2023-04-20T15:19:53Z

Philosophically, I see type annotations as doing two things: 1. forcing developers to be more rigorous about what they intend, and 2. helping users use the code as intended. I generally think that they are most useful for primitive types, e.g., saying a generic variable name like mode can be a string, int or something else. It becomes messy when the types are built on objects with a hierarchy or even more complex things like lists or dicts.

It is definitely more effort than just not doing type annotations, but in the long run, it will be better. I am more fussy about type annotations for something like pymatgen/core but less for pymatgen/analysis. The more users who use a particular code, the more important it is that the types are properly specified.

espottesmith · 2023-04-20T18:28:54Z

Stepping away from the mypy discussion, does anyone know why one of the tests failed? It failed during the dependency install, but this PR doesn't touch any of pymatgen's dependencies (at least, as far as I can remember).

janosh · 2023-04-20T18:35:30Z

Yeah failure is unrelated to this PR. Not sure why phonopy suddenly fails to install on Mac py3.8. Not holding back this PR. Haven't had time to re-review yet.

can we rely on heuristic if lattice is None, we're handling molecules?

janosh · 2023-04-22T14:35:06Z

@espottesmith One more question. Do we need the new use_molecule kwarg? I just tried removing it and seeing if relying on the heuristic "if lattice is None = we're handling molecules" can work. The tests pass all still pass but maybe there's some subtlety that we're not checking for? Let me know if anything seems off to you in 4c9e24c?

espottesmith and others added 22 commits March 25, 2021 10:20

Merge branch 'mpmaster'

3190126

Merge branch 'mpmaster'

bfaa9ba

Merge remote-tracking branch 'origin/master'

b352b27

Merge remote-tracking branch 'materialsproject/master'

150e1bd

Merge remote-tracking branch 'materialsproject/master'

a9fa576

Merge remote-tracking branch 'materialsproject/master'

975e4e9

Initial commit; adding graph hashing from emmet, beginning work on mo…

85d88d8

…lecule-specific trajectory

Continued

af08f37

Trajectory first draft done

8ce8446

Tests

3787c7b

Merge remote-tracking branch 'materialsproject/master' into for_mpcules

d0c1ad9

Beginning to add entry classes to incorporate into Emmet

df29aa6

More test fixes for molecule trajectory

9baa12e

molecule trajectory works

f603a20

Small addition

3ef2f85

Merge remote-tracking branch 'materialsproject/master' into for_mpcules

2ffb209

Add tests for graph hashing

e2bc08b

Small tweaks to trajectory tests

e4f2e57

pre-commit auto-fixes

ef6874e

mypy is a cruelty to both programmers and programming languages

cac6a25

Merge remote-tracking branch 'origin/for_mpcules' into for_mpcules

d25d41f

Ah, yes. Let's just staple type-checking to a language built around d…

1dc02f4

…uck-typing. That'll go great

janosh added 4 commits April 19, 2023 14:09

fix common typo MSONAble->MSONable

a8ca931

tweak PR template

625fa12

fix supported types list

ff39467

fix from_molecules() return type

652fe45

espottesmith and others added 7 commits April 19, 2023 18:59

Requested change

4994b45

Merge remote-tracking branch 'origin/for_mpcules' into for_mpcules

3fe06cf

pre-commit auto-fixes

74862f4

Trying to make mypy happy, like a crying child

f73693b

And now ruff

9e6a69e

pre-commit auto-fixes

9af72c2

Please let it end

73bbeac

janosh added 2 commits April 22, 2023 07:30

faff

2ca75b8

remove use_molecule kwarg

4c9e24c

can we rely on heuristic if lattice is None, we're handling molecules?

fix mypy, refactor to_positions() and to_displacements()

176eb47

janosh added awaiting user Needs more information from OP. enhancement A new feature or improvement to an existing one and removed awaiting user Needs more information from OP. labels Apr 25, 2023

janosh merged commit a553f37 into materialsproject:master Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For MPcules: Molecule Trajectory and graph hashes #2945

For MPcules: Molecule Trajectory and graph hashes #2945

espottesmith commented Apr 19, 2023 •

edited

Loading

janosh commented Apr 19, 2023 •

edited

Loading

espottesmith commented Apr 19, 2023

janosh commented Apr 19, 2023

espottesmith commented Apr 19, 2023

shyuep commented Apr 19, 2023

espottesmith commented Apr 20, 2023

janosh commented Apr 20, 2023

shyuep commented Apr 20, 2023

janosh commented Apr 20, 2023 •

edited

Loading

shyuep commented Apr 20, 2023

espottesmith commented Apr 20, 2023

janosh commented Apr 20, 2023

janosh commented Apr 22, 2023

For MPcules: Molecule Trajectory and graph hashes #2945

For MPcules: Molecule Trajectory and graph hashes #2945

Conversation

espottesmith commented Apr 19, 2023 • edited Loading

Summary

Checklist

janosh commented Apr 19, 2023 • edited Loading

espottesmith commented Apr 19, 2023

janosh commented Apr 19, 2023

espottesmith commented Apr 19, 2023

shyuep commented Apr 19, 2023

espottesmith commented Apr 20, 2023

janosh commented Apr 20, 2023

shyuep commented Apr 20, 2023

janosh commented Apr 20, 2023 • edited Loading

shyuep commented Apr 20, 2023

espottesmith commented Apr 20, 2023

janosh commented Apr 20, 2023

janosh commented Apr 22, 2023

espottesmith commented Apr 19, 2023 •

edited

Loading

janosh commented Apr 19, 2023 •

edited

Loading

janosh commented Apr 20, 2023 •

edited

Loading