Skip to content

Sequence-to-drug concept adds a perspective on drug design. It can serve as an alternative method to SBDD, particularly for proteins that do not yet have high-quality 3D structures available.

License

Notifications You must be signed in to change notification settings

lifanchen-simm/transformerCPI2.0

Repository files navigation

TransfomerCPI2.0

We only disclose the inference models. TransformerCPI2.0 is based on TransformerCPI whose codes are all released. The details of TransformerCPI2.0 are described in our paper https://doi.org/10.1038/s41467-023-39856-w which is now published on Nature communications. Trained models are available at present.

Setup and dependencies

environment.yaml is the conda environment of this project.

Inference

predict.py makes the inference, the input are protein sequence and compound SMILES. featurizer.py tokenizes and encodes the protein sequence and compounds. mutation_analysis.py conducts drug mutation analysis to predict binding sites. substitution_analysis.py conducts substitution analysis.

Trained models

Trained models is now available freely at https://drive.google.com/drive/folders/1X7i1eO-EykCQcvqMeWeB7QXT3E9eLG08?usp=sharing. The current open source version only aims to reproduce the results reported in the article, so the inference speed is limited.

Requirements

python = 3.8.8

pytorch = 1.9

tape-proteins = 0.5

rdkit = 2021.03.5

numpy = 1.19.5

scikit-learn = 0.24.1

About

Sequence-to-drug concept adds a perspective on drug design. It can serve as an alternative method to SBDD, particularly for proteins that do not yet have high-quality 3D structures available.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages