The DeepNovo branch contains a pytorch re-implementation of DeepNovo
The PointNovo branch contains the implementation of our proposed PointNovo model. The software is tested on Ubuntu 1604/1804.
python >= 3.6
pytorch >= 1.0
dataclasses, biopython, pyteomics, cython
For database search you also need to install percolator.
The ABRF DDA spectrums file could be downloaded here. The PXD008844 and PXD010559 spectra for training, validation and testing and the EThcD NIST antibody sequence data could be found here.
And the 9 species data (published by the DeepNovo paper) could be downloaded here.
It is worth noting that in our implementation we represent training samples in a slightly different format (i.e. peptide stored in a csv file and spectrums stored in mgf files). We also include a script for converting the file format (data_format_converter.py in PointNovo branch).
Like DeepNovo, in PointNovo we also use the knapsack algorithm to further limit the search space. This means when performing de novo sequencing, the program needs to either read or create a knapsack matrix based on the selected PTMs (one time computation). Pre-built knapsack matrix files could be found here:
You can use symbolic links to choose which knapsack file to use. i.e.
ln -s fix_C_var_NMQ_knapsack.npy knapsack.npy
make build
make train
On a RTX 2080 Ti GPU it takes around 0.3 seconds to train a batch of 16 annotated spectra. By default the trained model will be saved under ./train directory
make denovo
On a RTX 2080 Ti GPU it takes around 0.4 second to train a batch of 16 annotated spectra
make test
This script is borrowed from the original DeepNovo implementation. It will generate the metrics defined by the paper.
make db