Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A builder for classical md workflows #1010

Closed
wants to merge 134 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
4954d18
Add test files.
orionarcher Mar 30, 2024
7081d9d
Add in tests systems for classical_md builders and README files for d…
orionarcher Mar 31, 2024
e3761ad
Missed dcd in previous commit
orionarcher Mar 31, 2024
a93fb46
Remove unused test files.
orionarcher Mar 31, 2024
a9f9a4b
Add basic fixtures and new folders for utilities.
orionarcher Apr 1, 2024
3d2f528
Add functionality for labeling universe and skeleton tests.
orionarcher Apr 1, 2024
2ca6164
Move classical md schema to tasks.py in classical_md directory. Add i…
orionarcher Apr 10, 2024
b736e21
Move openmm schema to openmm/tasks.py in classical_md directory.
orionarcher Apr 10, 2024
7b2857f
Add utility functions and testing.
orionarcher Apr 10, 2024
2344ce6
Add core task doc and solvation builder.
orionarcher Apr 10, 2024
4fa1146
Remove import from atomate2
orionarcher Apr 10, 2024
bc83930
Create HexBytes type for serializing blobs and make interchange and t…
orionarcher Apr 10, 2024
1f19f9e
Check that dcd exists before loading.
orionarcher Apr 10, 2024
6ee39af
Merge branch 'refs/heads/main' into schema_updates
orionarcher Apr 10, 2024
3989a5d
Merge branch 'refs/heads/md_builders' into schema_updates
orionarcher Apr 10, 2024
5dee8cf
Small linting fix.
orionarcher Apr 10, 2024
fdae534
Merge branch 'refs/heads/only_schemas' into md_builders
orionarcher Apr 11, 2024
5409617
Fix file name being set even if it doesn't exist.
orionarcher Apr 12, 2024
b934321
Add nearly complete draft of OpenMMBuilder.
orionarcher Apr 14, 2024
215524e
Add basic end to end test for openmm builder
orionarcher Apr 14, 2024
00854c8
Add json documents serializing stores for builder tests.
orionarcher Apr 14, 2024
d0a90e2
Remove atomate2 import
orionarcher Apr 14, 2024
dc8b42a
Add init files to create package.
orionarcher Apr 14, 2024
d71504c
Recreate and move stores
orionarcher Apr 14, 2024
ee7da72
Finish process_items and rename builder to ElectrolyteBuilder
orionarcher Apr 14, 2024
fac001b
Add Na+ and Br- to test system.
orionarcher Apr 14, 2024
cdf29a8
Add identify_solute and identify_networking_solvents functions and up…
orionarcher Apr 14, 2024
1513c34
Finish process_items logic and add update_targets.
orionarcher Apr 14, 2024
d8cd276
Overwrite source keys without warning in ElectrolyteBuilder
orionarcher Apr 14, 2024
9c00065
Merge remote-tracking branch 'refs/remotes/origin/main' into md_builders
orionarcher Apr 14, 2024
2cdceea
Expect different state files from each job in calc output from_directory
orionarcher Apr 16, 2024
4c37aae
Small refactor and add instantiate)solute method + testing
orionarcher Apr 16, 2024
6bea930
Fix loading of pd.Dataframes in SolvationDoc with custom DataFrame type.
orionarcher Apr 16, 2024
d785ec7
Fix error in identify_networking_solvents
orionarcher Apr 16, 2024
d834d38
Automatically create directory when instantiating universe.
orionarcher Apr 19, 2024
5e78d3c
Allow instantiate universe to avoid downloading files.
orionarcher May 8, 2024
84da34e
Different implementation of default kernel kwargs
orionarcher May 8, 2024
da223e1
Add new EC/EMC test system.
orionarcher May 8, 2024
5608035
Exclude solute from Solute.solvents by default.
orionarcher May 13, 2024
349ecb6
Remove unused calculations file.
orionarcher May 28, 2024
a601cb8
Add ability to select analysis_classes to create_solute.
orionarcher May 28, 2024
80e770a
Add CalculationsDoc for tracking workflow history.
orionarcher May 28, 2024
3f9331b
Add high level import slot.
orionarcher May 28, 2024
147e4b7
Merge branch 'refs/heads/main' into md_builders
orionarcher May 28, 2024
3662b52
Linting.
orionarcher May 28, 2024
ced8a9c
delete unused system.
orionarcher Jun 2, 2024
523f8f5
Update ElectrolyteBuilder to accept no input for solute and calculati…
orionarcher Jun 2, 2024
99f304f
Update SolvationDoc to not require all analysis classes.
orionarcher Jun 2, 2024
d35cca5
By default, do not calculate residence times when creating solute
orionarcher Jun 2, 2024
318a5d9
Add tests and link to correct test files.
orionarcher Jun 2, 2024
0394ec1
Add improved documentation.
orionarcher Jun 2, 2024
78a6707
Merge branch 'refs/heads/main' into md_builders
orionarcher Jun 2, 2024
098747b
Update requirements
orionarcher Jun 2, 2024
d773619
Add new requirements file to CI.
orionarcher Jun 2, 2024
574e6e6
Fix typo in testing.yml
orionarcher Jun 2, 2024
3c927be
Move pip installable requirements to emmet-core setup.py.
orionarcher Jun 3, 2024
13821c4
Move conda requirements out of their own file.
orionarcher Jun 3, 2024
23136ac
mdanalysis -> MDAnalysis
orionarcher Jun 3, 2024
4bf914f
Fix test and mypy issue.
orionarcher Jun 3, 2024
fdd58db
Fix mypy issue.
orionarcher Jun 3, 2024
a3b86e5
Fix some dependencies
orionarcher Jun 4, 2024
a4d724b
Update deployment.txt
orionarcher Jun 4, 2024
1a826f6
Merge branch 'refs/heads/main' into md_builders
orionarcher Jun 4, 2024
e203076
Update python 3.9 extras
orionarcher Jun 4, 2024
ec90301
Merge branch 'main' into md_builders
tschaume Jun 6, 2024
97dff89
Update testing.yml
tschaume Jun 6, 2024
7917c3f
Merge remote-tracking branch 'personal/md_builders' into md_builders
orionarcher Jun 7, 2024
abf04d6
Merge branch 'refs/heads/main' into md_builders
orionarcher Jun 8, 2024
8f21a85
Update union types to use explicit type expressions for 3.9 compatabi…
orionarcher Jun 8, 2024
c02506f
Fix another type issue.
orionarcher Jun 8, 2024
e5c847e
Fix another type issue.
orionarcher Jun 8, 2024
0d24212
Fix another type issue.
orionarcher Jun 8, 2024
2475fcb
Rerun tests, transitory error.
orionarcher Jun 8, 2024
858608f
Try another type fix.
orionarcher Jun 8, 2024
bde99cf
Remove redundant dependencies from emmet-builders.
orionarcher Jun 16, 2024
f7a653b
Merge remote-tracking branch 'origin/main' into md_builders
orionarcher Jun 16, 2024
b9f9ca7
Add transport analysis requirement.
orionarcher Jul 20, 2024
139e033
Remove residence times from solute analysis class defaults.
orionarcher Jul 20, 2024
de6ed44
Create a SolventBenchmarkingDoc with density, viscosity, and dielectr…
orionarcher Jul 20, 2024
b5d6dbb
Update blobs and docs stores in test files
orionarcher Jul 20, 2024
9ed2363
Update blobs and docs stores in test files
orionarcher Jul 20, 2024
382ac22
Delete duplicate props entry
orionarcher Jul 20, 2024
4e19935
Split insert_blobs and instantiate_universe out of ElectrolyteBuilder…
orionarcher Jul 20, 2024
ca47463
Add a SolventBenchmarkingBuilder and corresponding testing.
orionarcher Jul 20, 2024
c32900e
Add a FauxInterchange object
orionarcher Jul 24, 2024
bf8f937
Merge branch 'refs/heads/main' into md_builders
orionarcher Jul 24, 2024
e1249d0
bug warning, unfixed
orionarcher Jul 29, 2024
3c2c2dc
replace online simulation data with locally generated data
orionarcher Jul 29, 2024
778cf45
Add test for elyte builder
orionarcher Jul 29, 2024
cb4e41d
Add READMEs for autogenerated test data
orionarcher Jul 29, 2024
e807934
Update stores
orionarcher Jul 29, 2024
57c4a35
Delete conftest
orionarcher Jul 29, 2024
7154de7
Change FauxInterchange to OpenMMInterchange
orionarcher Jul 29, 2024
ffe077b
Add more detailed viscosity data to BM builder
orionarcher Jul 29, 2024
af00afe
Remove classical_md directory and rename it openmm and openff, as app…
orionarcher Jul 29, 2024
9f9c2e7
Remove classical_md directory and rename it openmm and openff, as app…
orionarcher Jul 29, 2024
35af578
Add job_uuids to classical md task doc
orionarcher Jul 30, 2024
9c6e015
Add option to rebase traj path in builder if jobs have been moved.
orionarcher Jul 30, 2024
b5e7dfe
Delete commented out code.
orionarcher Jul 30, 2024
0831565
Refactor testing and regenerate test data to use OpenMMFlowMaker and …
orionarcher Jul 30, 2024
1b69baa
Merge branch 'refs/heads/main' into md_builders
orionarcher Jul 30, 2024
c3dd490
Add xml files with opls params
orionarcher Jul 30, 2024
ecee5ed
Refactor data generating fixture
orionarcher Jul 30, 2024
9b6be43
Add multiple interchange parsing in solvation builder
orionarcher Jul 30, 2024
4119535
OPLS data generating fixture
orionarcher Jul 30, 2024
3f0989e
Refactor SolventBuilder and BenchmarkingBuilder to reuse code from so…
orionarcher Jul 31, 2024
2b8ea17
Modify create universe to accept interchange or openmminterchange
orionarcher Jul 31, 2024
2327937
PDBxFile to PDBFile
orionarcher Jul 31, 2024
15102e2
Finish OPLS test with benchmarking builder
orionarcher Jul 31, 2024
97e0d82
Adjsut prev_task to prev_dir and regenerate input files
orionarcher Jul 31, 2024
3805cfa
Update stores data
orionarcher Aug 11, 2024
cfca4ca
Merge branch 'refs/heads/main' into md_builders
Aug 11, 2024
9d359d9
Merge remote-tracking branch 'refs/remotes/personal/md_builders' into…
Aug 11, 2024
adb2105
Move openmm_md testing out of classical_md directory.
orionarcher Aug 11, 2024
8a9bd10
Update benchmarking doc
orionarcher Aug 11, 2024
212291d
allow benchmarking builder to work with local files.
orionarcher Aug 11, 2024
daae28f
Finish SolventBenchmarkingDoc and add more robust testing.
orionarcher Aug 18, 2024
1e6e016
Raise meaningful error and add comment
orionarcher Aug 18, 2024
08f0513
Merge branch 'refs/heads/main' into md_builders
orionarcher Aug 18, 2024
3fe9e52
Lint files
orionarcher Aug 18, 2024
bff872d
Rerun tests
orionarcher Aug 18, 2024
faae827
Test update targets in benchmark builder
orionarcher Aug 18, 2024
f9cc425
Add tags and run kwargs to benchmarking builder
orionarcher Aug 18, 2024
a5a772e
Manually update transport analysis version.
orionarcher Aug 18, 2024
bd901cb
Make sure tags are put in BM doc
orionarcher Aug 18, 2024
dcc697b
Make sure sure kwargs are put in BM doc
orionarcher Aug 18, 2024
ed69b8d
Fix mypy linting error
orionarcher Aug 18, 2024
04faec8
Split out MDTaskDoc from ClassicalMDTaskDoc and migrate name of MDTas…
orionarcher Aug 30, 2024
09bb51b
Update __init__
orionarcher Aug 30, 2024
70bc422
Run pre-commit
orionarcher Aug 30, 2024
cb3d5b9
Merge branch 'main' into md_builders
tschaume Sep 4, 2024
8226a83
Merge branch 'main' into md_builders
tschaume Sep 4, 2024
dee0aae
Run pre-commit
orionarcher Sep 4, 2024
522999c
Merge branch 'materialsproject:main' into md_builders
orionarcher Sep 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
python-version: ${{ matrix.python-version }}
channels: anaconda, conda-forge

- name: Install OpenBabel
- name: Install all conda requirements
shell: bash -l {0}
run: |
conda install openbabel openff-toolkit>=0.14.0 openff-interchange>=0.3.22 sqlite -y
Expand Down
1 change: 1 addition & 0 deletions emmet-builders/emmet/builders/openmm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from emmet.builders.openmm.core import ElectrolyteBuilder
361 changes: 361 additions & 0 deletions emmet-builders/emmet/builders/openmm/core.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
from typing import Optional, List, Union
from pathlib import Path

import numpy as np

from maggma.core import Builder, Store
from maggma.stores import MemoryStore
from emmet.builders.openmm.utils import (
create_solute,
identify_solute,
identify_networking_solvents,
)
from emmet.core.openff.solvation import SolvationDoc
from emmet.core.openff.benchmarking import SolventBenchmarkingDoc
from emmet.core.openmm import OpenMMTaskDocument
from emmet.core.openmm.calculations import CalculationsDoc
from emmet.core.utils import jsanitize

from emmet.builders.openmm.openmm_utils import (
insert_blobs,
instantiate_universe,
resolve_traj_path,
task_doc_to_universe,
)


class ElectrolyteBuilder(Builder):
"""
Builder to create solvation and calculations documents from OpenMM task documents.

This class processes molecular dynamics (MD) simulations and generates
comprehensive reports including solvation properties and calculation results.
It leverages the OpenFF toolkit and MDAnalysis for molecular topology and trajectory
handling, respectively.
"""

def __init__(
self,
md_docs: Store,
blobs: Store,
solute: Optional[Store] = None,
calculations: Optional[Store] = None,
query: Optional[dict] = None,
solute_analysis_classes: Union[List[str], str] = "all",
solvation_fallback_radius: float = 3,
chunk_size: int = 10,
):
self.md_docs = md_docs
self.blobs = blobs
self.solute = solute or MemoryStore()
self.calculations = calculations or MemoryStore()
self.query = query or {}
self.solute_analysis_classes = solute_analysis_classes
self.solvation_fallback_radius = solvation_fallback_radius

self.md_docs.key = "uuid"
self.blobs.key = "blob_uuid"
if self.solute:
self.solute.key = "job_uuid"
if self.calculations:
self.calculations.key = "job_uuid"

super().__init__(
sources=[md_docs, blobs],
targets=[self.solute, self.calculations],
chunk_size=chunk_size,
)

# def prechunk(self, number_splits: int): # pragma: no cover
# """
# Prechunk method to perform chunking by the key field
# """
# q = dict(self.query)
# keys = self.electronic_structure.newer_in(
# self.materials, criteria=q, exhaustive=True
# )
# N = ceil(len(keys) / number_splits)
# for split in grouper(keys, N):
# yield {"query": {self.materials.key: {"$in": list(split)}}}

def get_items(self, local_trajectories=False):
self.logger.info("Electrolyte builder started.")

hosts = self.md_docs.query(self.query, ["hosts"])
flow_ids = {doc["hosts"][-1] for doc in hosts} # top level flows

job_groups = []
for flow_id in flow_ids:
# the last item in hosts should be the top level workflow
host_match = {"$expr": {"$eq": [{"$arrayElemAt": ["$hosts", -1]}, flow_id]}}
job_groups.append(list(self.md_docs.query(criteria=host_match)))

items = []
for jobs in job_groups:
# find the job with the most calcs in the flow, presumably the last
len_calcs = [len(job["output"]["calcs_reversed"] or []) for job in jobs]
last_job = jobs[np.argmax(len_calcs)]

insert_blobs(
self.blobs, last_job["output"], include_traj=not local_trajectories
)

items.append(last_job)

return items

def get_items_from_directories(self):
# query will be ignored
return

def process_items(
self,
items: List,
local_trajectories: bool = False,
rebase_traj_path: Optional[tuple[Path, Path]] = None,
):
"""

Parameters
----------
items: the items from get_items
local_trajectories: whether to look for files locally in lieu of downloading
rebase_traj_path: useful if the launch directory has moved

Returns
-------

"""
self.logger.info(f"Processing {len(items)} materials for electrolyte builder.")

processed_items = []
for item in items:
# create task_doc
task_doc = OpenMMTaskDocument.parse_obj(item["output"])

# _ is needed bc traj_path may be a tmpfile and a reference must be in scope
traj_path, _ = resolve_traj_path(
task_doc, local_trajectories, rebase_traj_path
)

u = task_doc_to_universe(task_doc, traj_path)

# create solute_doc
solute = create_solute(
u,
solute_name=identify_solute(u),
networking_solvents=identify_networking_solvents(u),
fallback_radius=self.solvation_fallback_radius,
analysis_classes=self.solute_analysis_classes,
)
solute.run()
solvation_doc = SolvationDoc.from_solute(
solute, job_uuid=item["uuid"], flow_uuid=item["hosts"][-1]
)
calculations_doc = CalculationsDoc.from_calcs_reversed(
task_doc.calcs_reversed,
job_uuid=item["uuid"],
flow_uuid=item["hosts"][-1],
)

# create docs
# TODO: what cleanup do I need?
docs = {
"solute": jsanitize(solvation_doc.model_dump()),
"calculations": jsanitize(calculations_doc.model_dump()),
}

processed_items.append(docs)

return processed_items

def update_targets(self, items: List):
if len(items) > 0:
self.logger.info(f"Found {len(items)} electrolyte docs to update.")

solutes = [item["solute"] for item in items]
self.solute.update(solutes)

calculations = [item["calculations"] for item in items]
self.calculations.update(calculations)

else:
self.logger.info("No items to update.")

def instantiate_universe(
self,
job_uuid: str,
traj_directory: Union[str, Path] = ".",
overwrite_local_traj: bool = True,
):
"""
Instantiate a MDAnalysis universe from a task document.

This is useful if you want to analyze a small number of systems
without running the whole build pipeline.

To get a solute, call create_solute using the universe. See
the body of process_items for the appropriate syntax.

Args:
job_uuid: str
The UUID of the job.
traj_directory: str
Name of the DCD file to write.
overwrite_local_traj: bool
Whether to overwrite the local trajectory if it exists.
"""
return instantiate_universe(
self.md_docs, self.blobs, job_uuid, traj_directory, overwrite_local_traj
)


class BenchmarkingBuilder(Builder):
"""
Builder to create solvation and calculations documents from OpenMM task documents.

This class processes molecular dynamics (MD) simulations and generates
comprehensive reports including solvation properties and calculation results.
It leverages the OpenFF toolkit and MDAnalysis for molecular topology and trajectory
handling, respectively.
"""

def __init__(
self,
md_docs: Store,
blobs: Store,
benchmarking: Optional[Store] = None,
query: Optional[dict] = None,
chunk_size: int = 10,
):
self.md_docs = md_docs
self.blobs = blobs
self.benchmarking = benchmarking or MemoryStore()
self.query = query or {}

self.md_docs.key = "uuid"
self.blobs.key = "blob_uuid"
self.benchmarking.key = "job_uuid"

super().__init__(
sources=[md_docs, blobs],
targets=[self.benchmarking],
chunk_size=chunk_size,
)

# def prechunk(self, number_splits: int): # pragma: no cover
# """
# Prechunk method to perform chunking by the key field
# """
# q = dict(self.query)
# keys = self.electronic_structure.newer_in(
# self.materials, criteria=q, exhaustive=True
# )
# N = ceil(len(keys) / number_splits)
# for split in grouper(keys, N):
# yield {"query": {self.materials.key: {"$in": list(split)}}}

def get_items(self, local_trajectories=False):
self.logger.info("Electrolyte builder started.")

hosts = self.md_docs.query(self.query, ["hosts"])
flow_ids = {doc["hosts"][-1] for doc in hosts} # top level flows

job_groups = []
for flow_id in flow_ids:
# the last item in hosts should be the top level workflow
host_match = {"$expr": {"$eq": [{"$arrayElemAt": ["$hosts", -1]}, flow_id]}}
job_groups.append(list(self.md_docs.query(criteria=host_match)))

items = []
for jobs in job_groups:
# find the job with the most calcs in the flow, presumably the last
len_calcs = [len(job["output"]["calcs_reversed"] or []) for job in jobs]
last_job = jobs[np.argmax(len_calcs)]

insert_blobs(
self.blobs, last_job["output"], include_traj=not local_trajectories
)

items.append(last_job)

return items

def get_items_from_directories(self):
# query will be ignored
return

def process_items(
self,
items,
local_trajectories: bool = False,
rebase_traj_path: Optional[tuple[Path, Path]] = None,
**benchmarking_kwargs,
):
self.logger.info(f"Processing {len(items)} materials for electrolyte builder.")

processed_items = []
for item in items:
# create task_doc
task_doc = OpenMMTaskDocument.parse_obj(item["output"])

# _ is needed bc traj_path may be a tmpfile and a reference must be in scope
traj_path, _ = resolve_traj_path(
task_doc, local_trajectories, rebase_traj_path
)

u = task_doc_to_universe(task_doc, traj_path)

benchmarking_doc = SolventBenchmarkingDoc.from_universe(
u,
temperature=task_doc.calcs_reversed[0].input.temperature,
density=task_doc.calcs_reversed[0].output.density[-1],
job_uuid=item["uuid"],
flow_uuid=item["hosts"][-1],
tags=task_doc.tags,
**benchmarking_kwargs,
)

del u

docs = {
"benchmarking": jsanitize(benchmarking_doc.model_dump()),
}

processed_items.append(docs)

return processed_items

def update_targets(self, items: List):
if len(items) > 0:
self.logger.info(f"Found {len(items)} electrolyte docs to update.")

calculations = [item["benchmarking"] for item in items]
self.benchmarking.update(calculations)

else:
self.logger.info("No items to update.")

def instantiate_universe(
self,
job_uuid: str,
traj_directory: Union[str, Path] = ".",
overwrite_local_traj: bool = True,
):
"""
Instantiate a MDAnalysis universe from a task document.

This is useful if you want to analyze a small number of systems
without running the whole build pipeline.

Args:
job_uuid: str
The UUID of the job.
traj_directory: str
Name of the DCD file to write.
overwrite_local_traj: bool
Whether to overwrite the local trajectory if it exists.
"""
return instantiate_universe(
self.md_docs, self.blobs, job_uuid, traj_directory, overwrite_local_traj
)
Loading
Loading