Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ALIGNN FF (aborted) #47

Merged
merged 11 commits into from
Aug 9, 2023
Merged

Add ALIGNN FF (aborted) #47

merged 11 commits into from
Aug 9, 2023

Conversation

janosh
Copy link
Owner

@janosh janosh commented Jul 12, 2023

All credit for this PR goes to @pbenner. Thanks for this submission! 🙏

This PR copies over a subset of files in pbenner/matbench-discovery#621d4f1a6.

Update 2023-08-09

This model submission was aborted due to ongoing technical challenges. See the model readme for details.

@janosh janosh added the new model Model submission label Jul 12, 2023
doi: https://doi.org/10.1039/D2DD00096B
preprint: https://arxiv.org/abs/2209.05554
requirements:
ase: 3.22.0
Copy link
Owner Author

@janosh janosh Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbenner Could you check if these package version numbers are correct (i.e. match the ones you were using for this submission).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the versions are correct, except for ase -> 3.22.1

@janosh janosh temporarily deployed to github-pages July 12, 2023 00:42 — with GitHub Actions Inactive
pandas: 2.0.1
scikit-learn: 1.2.2
torch: 1.9.0+cu111
trained_for_benchmark: false
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbenner Just to clarify, you used alignnff_wt10?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, it was called best_model.pt before:
alignn/ff/best_model.pt → alignn/ff/alignnff_wt10/best_model.pt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes sense to test some of the newly added models. Can give it a try, at least check if the convergence is better


import numpy as np
import pandas as pd
from pqdm.processes import pqdm
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we work around this new dependency? I'm not sure how it differs from tqdm? Is tqdm lacking parallel process support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can just use tqdm. For parallel processing pqdm has a bit nicer output, that's all.

@janosh
Copy link
Owner Author

janosh commented Jul 14, 2023

@pbenner I started evaluating the ALIGNN FF predictions and they look bad enough that I think we may have made a mistake. Are we sure ALIGNN FF was trained on the MP legacy correction scheme (MaterialsProjectCompatibility)? Perhaps they trained on raw VASP energies? We'll have to consult the paper or check the code base.

alignn-ff-rollinghull-dist-mae

2023-07-11-alignn-ff-wbm-IS2RE.csv.gz rename pred_col
@janosh janosh temporarily deployed to github-pages July 14, 2023 15:31 — with GitHub Actions Inactive
@pbenner
Copy link
Collaborator

pbenner commented Jul 15, 2023

I think I forgot to remove the energy corrections for the custom ALIGNN model. Uploaded two new files:

  • 2023-07-15-custom-alignn-relaxed-wbm-IS2RE.csv.gz: Custom model without energy correction
  • 2023-07-15-mp_e_form_alignn-relaxed-wbm-IS2RE.csv.gz: Pre-trained model with correction applied
    But still, I would like to check the ALIGNN-FF model in detail. Will do that next week. Maybe it makes sense to train it from scratch.

@janosh
Copy link
Owner Author

janosh commented Jul 15, 2023

Just to clarify, the model you trained from scratch that produced 2023-07-15-custom-alignn-relaxed-wbm-IS2RE.csv.gz was trained on MP ComputedStructureEntries that already include the 2020 MP energy correction scheme. So the predictions it makes are also corrected energies? By "custom model without energy correction", do you mean you subtracted the energy corrections again from every model prediction?

@pbenner
Copy link
Collaborator

pbenner commented Jul 16, 2023

By "custom model without energy correction" I meant that I didn't apply them. It's this if-statement that I forgot to add to your script:
https://github.com/pbenner/matbench-discovery/blob/25616eeba7e71b0909feccddfef88dfc63c60314/models/alignn/test_alignn_relaxed.py#L118
You can find the changes here in this commit:
pbenner@25616ee
And I re-executed the script twice with the corresponding model_name variable set. Hope this makes sense.

Just also added a few comments to the script.

@pbenner
Copy link
Collaborator

pbenner commented Jul 21, 2023

Since ALIGNN-FF was pre-trained on the JARVIS data, which is DFT simulation data with OptB88vdW functional (with van der Waals correction), the results are not comparable and I would drop it. Would you consider adding training data to matbench discovery (forces/stresses) that allows the user to train a force field model? I think this would be very valuable and make the benchmark applicable to a wider class of models, where no good pre-trained model exists.

@janosh
Copy link
Owner Author

janosh commented Jul 21, 2023

Since ALIGNN-FF was pre-trained on the JARVIS data, which is DFT simulation data with OptB88vdW functional (with van der Waals correction), the results are not comparable and I would drop it.

That's one option. We could also add it anyway with a big disclaimer just to highlight the importance of homogeneous DFT settings between train and test set.

Would you consider adding training data to matbench discovery (forces/stresses) that allows the user to train a force field model?

Yes, that data is definitely part of the MBD training set. I have a local copy of the energies, forces, stresses and magmoms for every ionic step in the MP database. It's ~14.7 GB. I was planning to upload it to figshare at some point.

But considering my local data is identical to the MPtraj dataset used to train CHGNet (~11.3 GB) on Figshare (except for some cleaning which most people will benefit from), I think we might as well just point to that.

@JaGeo
Copy link

JaGeo commented Jul 21, 2023

That's one option. We could also add to the benchmark anyway with a big disclaimer just to highlight how important homogeneous DFT settings between train and test set are.

I wouldn't do this. I think it should be obvious for anyone using DFT that without compatibility scheme you should not do any kind of comparison.

@pbenner
Copy link
Collaborator

pbenner commented Jul 23, 2023

I converted the MPtraj dataset to ALIGNN compatible format:
https://github.com/pbenner/matbench-discovery/blob/alignn/models/alignn/make_train_data_ff.py
It will probably take too long to train ALIGNN-FF, but I'm just giving it a try.

@janosh
Copy link
Owner Author

janosh commented Jul 25, 2023

I wouldn't do this. I think it should be obvious for anyone using DFT that without compatibility scheme you should not do any kind of comparison.

Thanks for your input @JaGeo! Always good to hear other's thoughts. I'm leaning the same way.

It will probably take too long to train ALIGNN-FF, but I'm just giving it a try

@pbenner That's great! Thanks for the update!

Even if that doesn't pan out, I might still merge this PR just to keep a record that we tried to add ALIGNN-FF and show how far we got as well as explain in a readme why we aborted.

@pbenner
Copy link
Collaborator

pbenner commented Aug 6, 2023

I tried to optimize the ALIGNN-FF training. Unfortunately, the model is quite resource hungry. The 12 GB of CHGNet training data turn into 600 GB of graph data. It still fits into memory, but prevents me to use the accelerator for training on multiple GPUs, which creates a copy for each process. Also, a batch size of 16 already fills up all 80 GB of GPU memory. Training is quite slow with such a small batch size (roughly a day per epoch).

I'm now fine tuning the ALIGNN-FF WT10 model on the CHGNet data, which hopefully takes fewer epochs than training a new model from scratch.

@janosh
Copy link
Owner Author

janosh commented Aug 8, 2023

I'm now fine tuning the ALIGNN-FF WT10 model on the CHGNet data, which hopefully takes fewer epochs than training a new model from scratch.

Thanks for trying @pbenner! I think we've done everything that can reasonably be expected here to get ALIGNN FF into MBD. Rather than spend more time on it, how about we call it quits and move on the wrapping up the preprint?

@pbenner
Copy link
Collaborator

pbenner commented Aug 9, 2023

Yes agreed. I also saw that the WT10 model trained on Jarvis has a worse initial loss on the CHGNet data than the untrained model, which indicates that these two datasets are very much incompatible. I'll let it rest for now...

@janosh janosh temporarily deployed to github-pages August 9, 2023 22:10 — with GitHub Actions Inactive
@janosh janosh merged commit d324b02 into main Aug 9, 2023
@janosh janosh deleted the alignn-ff branch August 9, 2023 22:14
@janosh janosh changed the title Add ALIGNN FF Add ALIGNN FF (aborted) Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Model submission
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants