From b3ea8d3980d62032ed33ae75fb0c7539fafcebe3 Mon Sep 17 00:00:00 2001 From: Janosh Riebesell Date: Fri, 14 Apr 2023 07:33:43 -0700 Subject: [PATCH] tweak contributing guide --- models/chgnet/test_chgnet.py | 4 ++-- readme.md | 6 ++--- scripts/compile_metrics.py | 1 - site/package.json | 2 +- site/src/routes/contribute/+page.md | 36 ++++++++++++++--------------- site/src/routes/preprint/+page.md | 2 +- 6 files changed, 25 insertions(+), 26 deletions(-) diff --git a/models/chgnet/test_chgnet.py b/models/chgnet/test_chgnet.py index 94dd3b09..4fde72c7 100644 --- a/models/chgnet/test_chgnet.py +++ b/models/chgnet/test_chgnet.py @@ -1,6 +1,6 @@ -"""Get chgnet formation energy predictions on WBM test set. +"""Get CHGNet formation energy predictions on WBM test set. To slurm submit this file: python path/to/file.py slurm-submit -Requires git cloning and then pip installing chgnet from source: +Requires git cloning and then pip installing CHGNet from source: git clone https://github.com/CederGroupHub/chgnet pip install -e ./chgnet. """ diff --git a/readme.md b/readme.md index b2468a04..82d6d6c3 100644 --- a/readme.md +++ b/readme.md @@ -15,14 +15,14 @@ > TL;DR: We benchmark ML models on crystal stability prediction from unrelaxed structures finding interatomic potentials in particular to be a valuable addition to high-throughput discovery pipelines. -Matbench Discovery is an [interactive leaderboard](https://janosh.github.io/matbench-discovery) and associated [PyPI package](https://pypi.org/project/matbench-discovery) which together make it easy to benchmark ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals. +Matbench Discovery is an [interactive leaderboard](https://janosh.github.io/matbench-discovery/models) and associated [PyPI package](https://pypi.org/project/matbench-discovery) which together make it easy to benchmark ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals. So far, we've tested 8 models covering multiple methodologies ranging from random forests with structure fingerprints to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential-based relaxers. We find [CHGNet](https://github.com/CederGroupHub/chgnet) ([paper](https://doi.org/10.48550/arXiv.2302.14231)) to achieve the highest F1 score of 0.59, $R^2$ of 0.61 and a discovery acceleration factor (DAF) of 3.06 (meaning a 3x higher rate of stable structures compared to dummy selection in our already enriched search space). We believe our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases. -We welcome contributions that add new models to the leaderboard through [GitHub PRs](https://github.com/janosh/matbench-discovery/pulls). See the [usage and contributing guide](https://janosh.github.io/matbench-discovery/contribute) for details. +We welcome contributions that add new models to the leaderboard through GitHub PRs. See the [contributing guide](https://janosh.github.io/matbench-discovery/contribute) for details. Anyone interested in joining this effort please [open a GitHub discussion](https://github.com/janosh/matbench-discovery/discussions) or [reach out privately](mailto:janosh@lbl.gov?subject=Matbench%20Discovery). -For detailed results and analysis, check out the [preprint](https://janosh.github.io/matbench-discovery/preprint) and [supplementary material](https://janosh.github.io/matbench-discovery/si). +For detailed results and analysis, check out our [preprint](https://janosh.github.io/matbench-discovery/preprint) and [SI](https://janosh.github.io/matbench-discovery/si). diff --git a/scripts/compile_metrics.py b/scripts/compile_metrics.py index aa71cc6d..96d583fc 100644 --- a/scripts/compile_metrics.py +++ b/scripts/compile_metrics.py @@ -129,7 +129,6 @@ dummy_clf_preds = dummy_clf.predict(np.zeros(len(df_wbm))) true_clf = df_wbm[each_true_col] < 0 each_true = df_wbm[each_true_col] -pd.Series(dummy_clf_preds).value_counts() dummy_metrics = stable_metrics( each_true, np.array([1, -1])[dummy_clf_preds.astype(int)] diff --git a/site/package.json b/site/package.json index bbee6069..e8a8d6fb 100644 --- a/site/package.json +++ b/site/package.json @@ -19,7 +19,7 @@ "@iconify/svelte": "^3.1.1", "@rollup/plugin-yaml": "^4.0.1", "@sveltejs/adapter-static": "^2.0.2", - "@sveltejs/kit": "^1.15.4", + "@sveltejs/kit": "^1.15.5", "@sveltejs/vite-plugin-svelte": "^2.0.4", "@typescript-eslint/eslint-plugin": "^5.58.0", "@typescript-eslint/parser": "^5.58.0", diff --git a/site/src/routes/contribute/+page.md b/site/src/routes/contribute/+page.md index f1d74250..62ff39d3 100644 --- a/site/src/routes/contribute/+page.md +++ b/site/src/routes/contribute/+page.md @@ -6,7 +6,7 @@ ## 🔨   Installation -The recommended way to acquire the training and test sets for this benchmark is through its Python package [available on PyPI](https://pypi.org/project/{name}): +The recommended way to acquire the training and test sets for this benchmark is through our Python package [available on PyPI](https://pypi.org/project/{name}): ```zsh pip install matbench-discovery @@ -53,27 +53,27 @@ assert list(df_wbm) == [ ] ``` -`"wbm-summary"` column glossary: - -1. `formula`: A compound's unreduced alphabetical formula -1. `n_sites`: Number of sites in the structure's unit cell -1. `volume`: Relaxed structure volume in cubic Angstrom -1. `uncorrected_energy`: Raw VASP-computed energy -1. `e_form_per_atom_wbm`: Original formation energy per atom from [WBM paper] -1. `e_hull_wbm`: Original energy above the convex hull in (eV/atom) from [WBM paper] -1. `bandgap_pbe`: PBE-level DFT band gap from [WBM paper] -1. `uncorrected_energy_from_cse`: Should be the same as `uncorrected_energy`. There are 2 cases where the absolute difference reported in the summary file and in the computed structure entries exceeds 0.1 eV (`wbm-2-3218`, `wbm-1-56320`) which we attribute to rounding errors. -1. `e_form_per_atom_mp2020_corrected`: Matbench Discovery takes these as ground truth for the formation energy. Includes MP2020 energy corrections (latest correction scheme at time of release). -1. `e_correction_per_atom_mp2020`: [`MaterialsProject2020Compatibility`](https://pymatgen.org/pymatgen.entries.compatibility.html#pymatgen.entries.compatibility.MaterialsProject2020Compatibility) energy corrections in eV/atom. -1. `e_correction_per_atom_mp_legacy`: Legacy [`MaterialsProjectCompatibility`](https://pymatgen.org/pymatgen.entries.compatibility.html#pymatgen.entries.compatibility.MaterialsProjectCompatibility) energy corrections in eV/atom. Having both old and new corrections allows updating predictions from older models like MEGNet that were trained on MP formation energies treated with the old correction scheme. -1. `e_above_hull_mp2020_corrected_ppd_mp`: Energy above hull distances in eV/atom after applying the MP2020 correction scheme. The convex hull in question is the one spanned by all ~145k Materials Project `ComputedStructureEntries`. Matbench Discovery takes these as ground truth for material stability. Any value above 0 is assumed to be an unstable/metastable material. +`"wbm-summary"` columns: + +1. **`formula`**: A compound's unreduced alphabetical formula +1. **`n_sites`**: Number of sites in the structure's unit cell +1. **`volume`**: Relaxed structure volume in cubic Angstrom +1. **`uncorrected_energy`**: Raw VASP-computed energy +1. **`e_form_per_atom_wbm`**: Original formation energy per atom from [WBM paper] +1. **`e_hull_wbm`**: Original energy above the convex hull in (eV/atom) from [WBM paper] +1. **`bandgap_pbe`**: PBE-level DFT band gap from [WBM paper] +1. **`uncorrected_energy_from_cse`**: Should be the same as `uncorrected_energy`. There are 2 cases where the absolute difference reported in the summary file and in the computed structure entries exceeds 0.1 eV (`wbm-2-3218`, `wbm-1-56320`) which we attribute to rounding errors. +1. **`e_form_per_atom_mp2020_corrected`**: Matbench Discovery takes these as ground truth for the formation energy. Includes MP2020 energy corrections (latest correction scheme at time of release). +1. **`e_correction_per_atom_mp2020`**: [`MaterialsProject2020Compatibility`](https://pymatgen.org/pymatgen.entries.compatibility.html#pymatgen.entries.compatibility.MaterialsProject2020Compatibility) energy corrections in eV/atom. +1. **`e_correction_per_atom_mp_legacy`**: Legacy [`MaterialsProjectCompatibility`](https://pymatgen.org/pymatgen.entries.compatibility.html#pymatgen.entries.compatibility.MaterialsProjectCompatibility) energy corrections in eV/atom. Having both old and new corrections allows updating predictions from older models like MEGNet that were trained on MP formation energies treated with the old correction scheme. +1. **`e_above_hull_mp2020_corrected_ppd_mp`**: Energy above hull distances in eV/atom after applying the MP2020 correction scheme. The convex hull in question is the one spanned by all ~145k Materials Project `ComputedStructureEntries`. Matbench Discovery takes these as ground truth for material stability. Any value above 0 is assumed to be an unstable/metastable material. ## 📥   Direct Download You can also download the data files directly from GitHub: -1. [`2022-10-19-wbm-summary.csv`]({repo}/blob/-/data/wbm/2022-10-19-wbm-summary.csv): Computed material properties only, no structures. Available properties are VASP energy, formation energy, energy above the convex hull, volume, band gap, number of sites per unit cell, and more. e_form_per_atom and e_above_hull each have 3 separate columns for old, new and no Materials +1. [`2022-10-19-wbm-summary.csv`]({repo}/blob/-/data/wbm/2022-10-19-wbm-summary.csv): Computed material properties only, no structures. Available properties are VASP energy, formation energy, energy above the convex hull, volume, band gap, number of sites per unit cell, and more. 1. [`2022-10-19-wbm-init-structs.json`]({repo}/blob/-/data/wbm/2022-10-19-wbm-init-structs.json): Unrelaxed WBM structures 1. [`2022-10-19-wbm-cses.json`]({repo}/blob/-/data/wbm/2022-10-19-wbm-cses.json): Relaxed WBM structures along with final VASP energies 1. [`2023-01-10-mp-energies.json.gz`]({repo}/blob/-/data/mp/2023-01-10-mp-energies.json.gz): Materials Project formation energies and energies above convex hull @@ -87,7 +87,7 @@ You can also download the data files directly from GitHub: To deploy a new model on this benchmark and add it to our leaderboard, please create a pull request to the `main` branch of [{repo}]({repo}) that includes at least these 3 required files: -1. `--preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. The recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('--preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also Python objects like e.g. pseudo-relaxed structures (see the M3GNet and BOWSR test scripts). +1. `--preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. The recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('--preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also objects like relaxed structures. See e.g. [M3GNet]({repo}/blob/-/models/m3gnet/test_m3gnet.py) and [CHGNet]({repo}/blob/-/models/chgnet/test_chgnet.py) test scripts. 1. `test_.(py|ipynb)`: The Python script or Jupyter notebook that generated the energy predictions. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_.(py|ipynb)`. 1. `metadata.yml`: A file to record all relevant metadata of your algorithm like model name and version, authors, package requirements, relevant citations/links to publications, notes, etc. Here's a template: @@ -181,7 +181,7 @@ And you're done! Once tests pass and the PR is merged, your model will be added - the exact code in the script that launched the run, and - which versions of dependencies were installed in the environment your model ran in. -This information can be very useful for someone looking to reproduce your results or compare their model to yours i.t.o. computational cost. We therefore strongly recommend tracking all runs that went into a model submission to Matbench Discovery with WandB so that the runs can be copied over to our WandB project at for everyone to inspect. This also allows us to include your model in more detailed analysis found in the [SI]({homepage}/si). +This information can be useful for others looking to reproduce your results or compare their model to yours i.t.o. computational cost. We therefore strongly recommend tracking all runs that went into a model submission with WandB so that the runs can be copied over to our WandB project at for everyone to inspect. This also allows us to include your model in some of the more detailed analysis found in the [SI]({homepage}/si). ## 😵‍💫   Troubleshooting diff --git a/site/src/routes/preprint/+page.md b/site/src/routes/preprint/+page.md index 013f5a92..705c4cd3 100644 --- a/site/src/routes/preprint/+page.md +++ b/site/src/routes/preprint/+page.md @@ -164,7 +164,7 @@ The reason CGCNN+P achieves better regression metrics than CGCNN but is still wo {/if} -> @label:fig:cumulative-clf-metrics Running precision and recall over the course of a simulated discovery campaign. This figure highlights how different models perform better or worse depending on the length of the discovery campaign. Length here is an integer measuring how many DFT relaxations you have compute budget for. +> @label:fig:cumulative-clf-metrics Cumulative precision and recall over the course of a simulated discovery campaign. This figure highlights how different models perform better or worse depending on the length of the discovery campaign. Length here is an integer measuring how many DFT relaxations you have compute budget for. @Fig:cumulative-clf-metrics simulates ranking materials from most to least stable according to model predictions and going down the list calculating the precision and recall of correctly identified stable materials at each step, i.e. exactly how these models could be used in a prospective materials discovery campaign.