-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For MPcules: Molecule Trajectory and graph hashes #2945
Conversation
…lecule-specific trajectory
…uck-typing. That'll go great
Thanks @espottesmith! Could you briefly explain the major differences between Also, are you anticipating a MoleculeMDTrajectory class or similar with different needs? If not, why not call it |
There's nothing preventing an agnostic Trajectory class, in principle. It just felt easier to make two versions, one that had to consider Lattices and another that doesn't. This is the same approach taken previously, e.g. for StructureGraph vs. MoleculeGraph. I don't care much about the name. MoleculeTrajectory and StructureTrajectory are fine. |
If there's nothing preventing it, my preference is for an agnostic |
I would respectfully say that if that's a design pattern that you or others want to move towards, then y'all should have at it. As I said, this pattern is easier to write (in my opinion and experience), and as that implies, writing a unified interface would mean a nontrivial amount of additional work (that I don't want to do if I don't have to). |
I agree with @janosh that it is preferable to have a single unified interface with a consistent API. Trajectories are trajectories. Whether there is a periodic boundary condition shouldn't make any difference. And I would prefer if the original contributor of the PR, i.e., @espottesmith makes the change. |
The interface is as unified as it reasonably can be. |
Thanks a lot, Evan! I really appreciate you going the extra mile! Btw, different topic but wanted to mention I could get behind dropping |
For mypy, I think the general approach should be to not bother with type annotations where it is not necessary but to add them where you want a more stringent control of the types being passed. If you don't annotate, mypy will not check? |
That's true. But unlike most (all?) other languages, I think the main beneficiaries of type annotations in Python are actually the users, not the developers. Tools like Sphinx automatically integrate type annos into static docs and IDEs display them on hover and when passing arguments like in this example: So from that perspective, the more type hints the better. But the more you use them, the more |
Philosophically, I see type annotations as doing two things: 1. forcing developers to be more rigorous about what they intend, and 2. helping users use the code as intended. I generally think that they are most useful for primitive types, e.g., saying a generic variable name like It is definitely more effort than just not doing type annotations, but in the long run, it will be better. I am more fussy about type annotations for something like pymatgen/core but less for pymatgen/analysis. The more users who use a particular code, the more important it is that the types are properly specified. |
Stepping away from the mypy discussion, does anyone know why one of the tests failed? It failed during the dependency install, but this PR doesn't touch any of pymatgen's dependencies (at least, as far as I can remember). |
Yeah failure is unrelated to this PR. Not sure why |
can we rely on heuristic if lattice is None, we're handling molecules?
@espottesmith One more question. Do we need the new |
Summary
We are nearing the inclusion of a large dataset of molecules and molecular properties to the Materials Project. This PR adds some features to pymatgen which will be useful for the molecules data pipeline in emmet, namely graph hashes and a Trajectory class that works for Molecules.
The graph hashing code, which is currently in emmet but is being moved to pymatgen because we expect it to be useful for more general users (rather than only developers), was taken (following licenses, I hope) from networkx. We took the code directly because we want to have access to a stable version of the hashing algorithm (networkx has made subtle changes to the algorithm in the past, leading to the same graphs producing different hashes).
The MoleculeOptimizeTrajectory class is basically a copy-paste of the existing Trajectory class, designed for Molecule objects rather than Structure objects. This will be important if we include geometry optimization trajectories in the new MP dataset, and/or if we want to visualize atomic motion (e.g. molecular normal modes of vibration).
Checklist
mypy path/to/file.py
to type check your code.Note that the CI system will run all the above checks. But it will be much more efficient if you already fix most errors prior to submitting the PR. We highly recommended installing
pre-commit
hooks. Simply Run