DeeplyEssential is a Deep neural network for the identification of Essential genes in bacteria species. The dataset used for the learning of the nework contains 30 bacterial species collected from DEG.
- Python 2.7
- keras==2.1.5
- numpy==1.14.2
- pandas==0.22.0
- scikit-learn==0.19.1
- tensorflow==1.6.0
DeeplyEssential takes 6 parameters
- Essential gene directory path. The directory contains
- A essential gene sequence file
- A essential protein sequence file
- An gene annotation file
- Non Essential gene directory path. This directory contains
- A essential gene sequence file
- A essential protein sequence file
- An gene annotation file
- Clustered gene file path clusted by OrthoMCL (sample given,
orthoMCL.txt
) - Text file containing bacteria species information (sample given,
dataset.txt
) - Experiment option
- '-gp' for Gram Positive (GP) Dataset
- '-gn' for Gram Negative (GN) Dataset
- '-c' for GP + GN Dataset
- Name of the experiment
$ python main.py <essential gene dir> <non-essential gene dir> <cluster gene file> <dataset> -c <experiment name>
The dataset are collected from DEG. Update: The dataset from current version of DEG may have been changed. Download the exact data that was used for this project from here. (https://drive.google.com/drive/folders/1zhtTP164Ae6MVHrB7A38z8C48tSe0W83?usp=sharing)
DeeplyEssential generates a report containing experiment name, basic statistics about the dataset and evaluation metics for each iteration of experiment. A sample (sample_output.tab
) is provided.