Skip to content

Implementation of multimode transformer architecture used for graph2graph, spectra2graph, and graph+spectra2rgaph prediction tasks

Notifications You must be signed in to change notification settings

Savoie-Research-Group/MultiModalTransformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Vocabulary: This folder contains character to index and index to character conversion dictionaries.

Script: This folder contains scripts used to train the model and evaluate the model performance under multiple cases.

Database access:

[1] Simulated spectroscopy database: This is the raw spectra database, refer to corresponding spectra by database['spec_category']['molecule_smiles'] eg. database['ms']['c1ccccc1']

https://figshare.com/articles/dataset/MS_IR_H-NMR_Spectra_Database/25513513

[2] Training/Validation/Test dataset splitting: Including the split of real and null reactions for training, validation and test that can be directly fed into the model training and evaluation.

https://figshare.com/articles/dataset/Training_Validation_Test_set_split/25511056

[3] Model checkpoints:

https://figshare.com/articles/dataset/Model_checkpoints/25513519

Detailed code usage scenario:

[1] Retrain the regular model: 1) install pytorch 1.12.1 (gpu version); 2) replace content in config_train.txt with correct file paths and select training mode; 3) run "python train_script.py -c config_train.txt" on gpu.

[2] Retrain the random drop model: Step 1) and 2) same as [1]; 3) run "python train_script_random_drop.py -c config_train.txt" on gpu; 4) 6 drop lists txt files (MS/IR/NMR for train/validation sets) will be generated under the current directory to indicate what examples are dropped with what spectral sources.

[3] Evaluate performance on regular model: 1) install pytorch 1.13.0 (cpu version) and rdkit 2020.09.1; 2) replace content in config_eval.txt with correct file paths and select evaluation mode; 3) run "python test_script.py -c config_eval.txt".

[4] Evaluate decisiveness behavior of model: step 1) same as [3]; 2) replace content in config_eval_decisive.txt with correct file paths and select evaluation mode; 3) run "python test_decisive.py -c config_eval_decisive.txt"; 4) 5 txt files will be created in the desiganted path reflecting the decoded SMILES as well as the decisive strings of reactants, MS, IR and H-NMR.

[5] Evaluate model performance with noised spectra: step 1) same as [3]; 2) replace content in config_eval_noise.txt with correct file paths, select evaluation mode, noise level and which spectral inputs are noised; 3) run "python test_noise.py -c config_eval_noise.txt".

[6] Evaluate model performance on solvent dataset: step 1) and 2) same as [3]; 3) run "python test_sol.py -c config_eval_sol.txt".

[7] Evaluate model performance on multi dataset: step 1) and 2) same as [3]; 3) uncomment section indicated in calc_acc.py to print out results before running the script; 4) run "python test_multi.py -c config_eval_multi.txt". 5) A .txt file will be generated with top5 smiles prediction generated (XXX means invalid smiles) ending with the ground truth smiles separated by ';'.

About

Implementation of multimode transformer architecture used for graph2graph, spectra2graph, and graph+spectra2rgaph prediction tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages