All input, intermediate, and output data are already available in DVC, you can selectively reproduce the parts you want.
See ENVIRONMENT.md
The pre-trained models are available in the repository. Notebook showing the usage of the models for predictions, along with data generation is here. The weights and data indexes are in DVC:
dvc pull datasets/checkpoints/combined_mixed_all_train/formation_energy_per_site/megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45/0.pth.dvc datasets/checkpoints/combined_mixed_all_train/homo_lumo_gap_min/megnet_pytorch/sparse/05-12-2022_19-50-53/831cc496/0.pth.dvc csv-cif-low-density-8x8 csv-cif-no-spin-500-data csv-cif-spin-500-data
This step extracts the computed energy and HOMO-LUMO gap values from the raw VASP output, saves the unrelaxed structures in a uniform way; converts the structures from standard CIF format to a fast platform-specific pickle storage; preprocesses the target values, e. g. computes the formation energy per site; produces the sparse defect-only representations.
dvc pull -R datasets/POSCARs datasets/raw_vasp/high_density_defects datasets/raw_vasp/dichalcogenides8x8_vasp_nus_202110 datasets/csv_cif/low_density_defects_Innopolis-v1/{MoS2,WSe2}
parallel --delay 3 -j6 dvc repro processed-high-density@{} ::: hBN_spin GaSe_spin BP_spin InSe_spin MoS2 WSe2
parallel --delay 3 -j2 dvc repro processed-low-density@{} ::: MoS2 WSe2
Note that unlike GNU Make DVC currently doesn't internally parallelize execution, so we use GNU parallel. We also use --delay 3
to avoid DVC lock race. Computing matmier features can easily take several days, you might want to parallelize it according to your computing setup.
This step computes matminer descriptors, to be used with CatBoost. Assuming the resources are available, the step takes around 3 days, you can skip it if you don't plan on running CatBoost.
dvc repro matminer
We use train/validation split to do a random search for hyperparameters
dvc pull -R processed-high-density processed-low-density datasets/processed/{high,low}_density_defects datasets/experiments/combined_mixed_weighted_test.dvc datasets/experiments/combined_mixed_weighted_validation.dvc
python scripts/generate_trials_for_tuning.py --model-name megnet_pytorch --mode random --n-steps 50
python scripts/generate_trials_for_tuning.py --model-name megnet_pytorch/sparse --mode random --n-steps 50
python scripts/generate_trials_for_tuning.py --model-name catboost --mode random --n-steps 50
python scripts/generate_trials_for_tuning.py --model-name gemnet --mode random --n-steps 50
python scripts/generate_trials_for_tuning.py --model-name schnet --mode random --n-steps 50
This will create trials/<model_name>/<date>
folders with trials. Alternatively, you can pull our trials with dvc pull -R trials
.
The next step is running those trials combined with combined_mixed_weighted_validation
experiment according to your compute environmet. For example, on a single GPU:
python run_experiments.py --experiments combined_mixed_weighted_validation --trials trials/megnet_pytorch/sparse/05-12-2022_19-34-37/0ff69f1c --gpus 0
or on CPU:
python run_experiments.py --experiments combined_mixed_weighted_validation --trials trials/megnet_pytorch/sparse/05-12-2022_19-34-37/0ff69f1c --cpu
For running trials for a specific model on a single node, we have a script:
python scripts/hyperparam_tuning.py --model-name megnet_pytorch --experiment combined_mixed_weighted_validation --wandb-entity hse_lambda --trials-folder trials/megnet_pytorch/sparse/05-12-2022_19-34-37/
There is also script scripts/ASPIRE-1/run_grid_search.sh
, for running on an HPC, but it is specific to our cluster.
Use find_best_trial.py
for every model, e.g.:
python scripts/find_best_trial.py --experiment combined_mixed_weighted_validation --trials-folder megnet_pytorch/sparse/05-12-2022_19-50-53
Since some models (thankfully, not ours) exhibit instability, we repeat training several times for each model - with the same parameters and training data. To fit this into the infrastructure we copy the trials. This step was only done on ASPIRE-1 and Constructor Research Platform, so it would require some modifications to run on a different cluster (e. g. replace qsub
with sbatch
). Note that CatBoost by default is deterministic, so you need to change the random seed manually in the copies of the trials.
cd scripts/ASPIRE-1
xargs -a stability_trials.txt -L1 ./run_stability_trials.sh
Format of stability_trials.txt
:
megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45 formation_energy_per_site 12 4 combined_mixed_weighted_test
trial target total_repeats parallel_runs_per_GPU experiment
Two S-vacancy MoS2 in test, everything else in train.
xargs -a MoS2_V2_E.txt -L1 ./run_stability_trials.sh
Manually prepare the model configurations (aka trials) in ../trials/megnet_pytorch/ablation_study
. Put them into a .txt
and run the experiments:
cd scripts/ASPIRE-1
xargs ablation_stability.txt -L1 ./run_stability_trials.sh
If you generated your own trials, you need to replace the trial names. Main results:
python scripts/summary_table_lean.py --experiment combined_mixed_weighted_test --targets formation_energy_per_site --stability-trials stability/schnet/25-11-2022_16-52-31/71debf15 stability/catboost/29-11-2022_13-16-01/02e5eda9 stability/gemnet/16-11-2022_20-05-04/b5723f85 stability/megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45 stability/megnet_pytorch/25-11-2022_11-38-18/1baefba7 --separate-by target --column-format-re stability\/\(?P\<name\>.+\)\/.+/\.+ --paper-results --multiple 1000
python scripts/summary_table_lean.py --experiment combined_mixed_weighted_test --targets homo_lumo_gap_min --stability-trials stability/schnet/25-11-2022_16-52-31/2a52dbe8 stability/catboost/29-11-2022_13-16-01/1b1af67c stability/gemnet/16-11-2022_20-05-04/c366c47e stability/megnet_pytorch/sparse/05-12-2022_19-50-53/831cc496 stability/megnet_pytorch/25-11-2022_11-38-18/1baefba7 --separate-by target --column-format-re stability\/\(?P\<name\>.+\)\/.+/\.+ --paper-results --multiple 1000
Ablation:
python scripts/summary_table_lean.py --experiment combined_mixed_weighted_test --targets formation_energy_per_site --stability-trials stability/megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45 stability/megnet_pytorch/25-11-2022_11-38-18/1baefba7 stability/megnet_pytorch/ablation_study/d6b7ce45-sparse stability/megnet_pytorch/ablation_study/d6b7ce45-sparse-z stability/megnet_pytorch/ablation_study/d6b7ce45-sparse-z-were --separate-by target --print-std --paper-ablation-energy --multiple 1000
python scripts/summary_table_lean.py --experiment combined_mixed_weighted_test --targets homo_lumo_gap_min --stability-trials stability/megnet_pytorch/sparse/05-12-2022_19-50-53/831cc496 stability/megnet_pytorch/25-11-2022_11-38-18/1baefba7 stability/megnet_pytorch/ablation_study/831cc496-sparse{,-z,-z-were} --separate-by target --print-std --paper-ablation-homo-lumo --multiple 1000
ai4material_design/notebooks/Results tables.ipynb
ai4material_design/notebooks/MoS2_V2_plot.ipynb
To on the whole 2DMD dataset, run:
parallel -j 2 python run_experiments.py --output-folder /output --targets {1} --trials {2} --experiments combined_mixed_all_train --gpus 0 --n-jobs 4 --save-checkpoints ::: formation_energy_per_site homo_lumo_gap_min :::+ megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45 megnet_pytorch/sparse/05-12-2022_19-50-53/831cc496