Protein Family Classification

This repository aims at reproducing results from this paper. The project uses tensorflow, scikit, numpy, pandas and nltk, The model achieved F1 score of 0.83 on cv dataset. Dataset used is swissprot-kB. Families with < 200 examples and sequences with length > 1000 were removed at the time of preprocessing. Glove model was used to create embeddings.

Getting Started

Downloading the dataset - For details checkout the readme file in data directory

Prerequisites

Tensorflow
Scikit-learn
Numpy
Pandas
NLTK

All the libraries can be installed using pip3. A shell script to install all dependecies would be available in this repository.

Steps to run the model

Do the following to run the model :

chmod +x run.sh
./run.sh

If there is some bug, check the script run.sh. Steps inside the script are as follows :

Download dataset in data folder, rename it to uniprot-all.tab.
Go to utils folder, run script1.py.
Go to data folder, clone Glove and use "make" command.
Run the GloVe model with appropriate parameters (check run.sh line no 17, 19, 21, 23)
Go to utils folder, run script2.py
Run model.py

This would run the model on the dataset.

Train time

Each epoch using Tesla-K80 took approx ~ 4 secs for batch size of 128.

Authors

Sudhanshu Ranjan and Udayraj Deshmukh.

Similar repo

If interested, check out the repo for Protein Secondary Structure Prediction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protein Family Classification

Getting Started

Prerequisites

Steps to run the model

Train time

Authors

Similar repo

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
utils		utils
README.md		README.md
model.py		model.py
run.sh		run.sh

s1998/protein-family-classification

Folders and files

Latest commit

History

Repository files navigation

Protein Family Classification

Getting Started

Prerequisites

Steps to run the model

Train time

Authors

Similar repo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages