-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ALIGNN FF (aborted) #47
Conversation
with minor tweaks to pbenner's original alignn_relax.py
with parallelization recommendations based on available cores
uses ALIGNN FF relaxed structures generated by alignn_ff_relax.py
models/alignn_ff/metadata.yml
Outdated
doi: https://doi.org/10.1039/D2DD00096B | ||
preprint: https://arxiv.org/abs/2209.05554 | ||
requirements: | ||
ase: 3.22.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbenner Could you check if these package version numbers are correct (i.e. match the ones you were using for this submission).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the versions are correct, except for ase -> 3.22.1
models/alignn_ff/metadata.yml
Outdated
pandas: 2.0.1 | ||
scikit-learn: 1.2.2 | ||
torch: 1.9.0+cu111 | ||
trained_for_benchmark: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbenner Just to clarify, you used alignnff_wt10
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, it was called best_model.pt before:
alignn/ff/best_model.pt → alignn/ff/alignnff_wt10/best_model.pt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it makes sense to test some of the newly added models. Can give it a try, at least check if the convergence is better
models/alignn_ff/alignn_ff_relax.py
Outdated
|
||
import numpy as np | ||
import pandas as pd | ||
from pqdm.processes import pqdm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we work around this new dependency? I'm not sure how it differs from tqdm
? Is tqdm
lacking parallel process support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can just use tqdm. For parallel processing pqdm has a bit nicer output, that's all.
@pbenner I started evaluating the ALIGNN FF predictions and they look bad enough that I think we may have made a mistake. Are we sure ALIGNN FF was trained on the MP legacy correction scheme ( |
2023-07-11-alignn-ff-wbm-IS2RE.csv.gz rename pred_col
I think I forgot to remove the energy corrections for the custom ALIGNN model. Uploaded two new files:
|
Just to clarify, the model you trained from scratch that produced |
By "custom model without energy correction" I meant that I didn't apply them. It's this Just also added a few comments to the script. |
Since ALIGNN-FF was pre-trained on the JARVIS data, which is DFT simulation data with OptB88vdW functional (with van der Waals correction), the results are not comparable and I would drop it. Would you consider adding training data to matbench discovery (forces/stresses) that allows the user to train a force field model? I think this would be very valuable and make the benchmark applicable to a wider class of models, where no good pre-trained model exists. |
That's one option. We could also add it anyway with a big disclaimer just to highlight the importance of homogeneous DFT settings between train and test set.
Yes, that data is definitely part of the MBD training set. I have a local copy of the energies, forces, stresses and magmoms for every ionic step in the MP database. It's ~14.7 GB. I was planning to upload it to figshare at some point. But considering my local data is identical to the MPtraj dataset used to train CHGNet (~11.3 GB) on Figshare (except for some cleaning which most people will benefit from), I think we might as well just point to that. |
I wouldn't do this. I think it should be obvious for anyone using DFT that without compatibility scheme you should not do any kind of comparison. |
I converted the MPtraj dataset to ALIGNN compatible format: |
Thanks for your input @JaGeo! Always good to hear other's thoughts. I'm leaning the same way.
@pbenner That's great! Thanks for the update! Even if that doesn't pan out, I might still merge this PR just to keep a record that we tried to add ALIGNN-FF and show how far we got as well as explain in a readme why we aborted. |
I tried to optimize the ALIGNN-FF training. Unfortunately, the model is quite resource hungry. The 12 GB of CHGNet training data turn into 600 GB of graph data. It still fits into memory, but prevents me to use the accelerator for training on multiple GPUs, which creates a copy for each process. Also, a batch size of 16 already fills up all 80 GB of GPU memory. Training is quite slow with such a small batch size (roughly a day per epoch). I'm now fine tuning the ALIGNN-FF WT10 model on the CHGNet data, which hopefully takes fewer epochs than training a new model from scratch. |
Thanks for trying @pbenner! I think we've done everything that can reasonably be expected here to get ALIGNN FF into MBD. Rather than spend more time on it, how about we call it quits and move on the wrapping up the preprint? |
Yes agreed. I also saw that the WT10 model trained on Jarvis has a worse initial loss on the CHGNet data than the untrained model, which indicates that these two datasets are very much incompatible. I'll let it rest for now... |
All credit for this PR goes to @pbenner. Thanks for this submission! 🙏
This PR copies over a subset of files in pbenner/matbench-discovery#621d4f1a6.
Update 2023-08-09
This model submission was aborted due to ongoing technical challenges. See the model readme for details.