Skip to content

SoftServeInc/affinity-by-GNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

affinity-sampling

Introduction

This repository contains the code for the paper

focused on two aspects of designing AI-driven tools for solving the early-stage drug discovery problems. First, it reports a new GNN architecture for predicting affinity of small molecule ligand to protein target using a novel graph-based neural network architecture. Second, it showcases how naive application of commonly used performance evaluation strategies can yield overly optimistic performance metrics for a given ML model.

Installation

The code depends on a number of packages (moreover, specific combination of their versions facilitates good performance in terms of speed) and we recommend using conda package manager to automatically install these dependencies. To do so, ensure you have conda (actually, conda distribution should be pretty enough) installed, clone the repository and run

conda env create -f environment.yml

this will create an environment called affgnn and installed all neccessary packages there. After that, use

conda activate affgnn

to make the new envoronment usable. Note, however, that the presented configuration also requires CUDA to be usable on the syetem (the packages to be installed use CUDA 10.2).

Configuration and use

The code can be run as

python train.py

which will first train the model using the folds marked 0,1,2,3, then make prediction on fold 4 (this id can be set through argv_valFold and argv_testFold keys in affinity_module/config.py) and save the predicted affinities as text files.

The input data needs to be supplied in two ways: a .csv defining the structure of ligands, PDB codes and UniProtIDs of receptors and the distrubution of the data over five folds, and .dssp files defining the secondary structure elements of target proteins.

Location of the input.csv, as well as the location of the folder containing .dssp files for receptors, should then be set by editing affinity_module/config.py (see master_data_table and dssp_files_path keys respectively).

To prepare receptor data in .dssp format, the underlying receptor structures, eitehr in PDB or CIF (preferably) format, should be processed with DSSP program, e.g.,

mkdssp --output-format dssp protein.cif > protein.dssp

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages