This repository contains code for predicting the aqueous solubility of organic molecules using machine learning models. The models and dataset are based on the research paper: Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations.
-
Download Data: Download the dataset from this link and save it as
data.csv
in the./data
folder. -
Generate Features:
- Generate Pybel coordinates and Molecular Dynamics (MDM) features by running
create_data.py
in the./data
folder:cd ./data python create_data.py
- Generate Pybel coordinates and Molecular Dynamics (MDM) features by running
-
Train Models:
- To train the MDM model, run
train.py
in the./mdm
folder:cd ../mdm python train.py
- To train the GNN model, run
train.py
in the./gnn
folder:cd ../gnn python train.py
- To train the SMI model, run
train.py
in the./smi
folder:cd ../smi python train.py
- To train the MDM model, run
-
Make Predictions:
- Use the
predict.ipynb
files in each model's folder to make predictions:Repeat the above steps for thecd ../mdm jupyter notebook predict.ipynb
gnn
andsmi
folders.
- Use the
-
Ensemble Models:
- To ensemble the models, run the following scripts:
cd ../ensemble python CV.py python Optuna.py python KNN.py
- To ensemble the models, run the following scripts:
-
Compare Predictions:
- To compare predictions from individual models with ensemble methods, use the
ensemble_prediction.ipynb
notebook:jupyter notebook ensemble_prediction.ipynb
- To compare predictions from individual models with ensemble methods, use the
For detailed instructions on how to run the models, featurize the data, and other specifics, please refer to the original research paper linked above. The methods and techniques described in the paper are critical for understanding and effectively using this repository.