PMPred is a linux based tool for scoring the deleteriousness of nonsense variants as well as insertion/deletions variants in the human genome (currently supported builds: GRCh37/hg19).
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. To predict the pathogenic effect of mutations, PMPred measures the significance of changes between molecular functions of the wildtype and altered sequences. Indeed, the method calculates the pathogenic rate of frameshift indels and nonsense mutations in terms of function loss without specific training. In addition, it calculates the position of variants on target genes and conducts domain prediction analysis to provide a detailed information of each variant. As a result of this process, an annotated list regarding the consequences of these variants on conserved domains and their pathogenic rate is provided to the user.
1.Convert VCF file to ANNOVAR input format (convert2annovar.pl)
2.Variant annotation (annotate_variation.pl)
3.Infer mutated protein sequence(coding_change.pl)
4.Convert sequences to fasta format (annovar2unigo)
5.Remove empty lines (grep)
6.Function prediction (UniGOPred)
7.Domain prediction (InterProScan)
8.Pathogenicity prediction (prediction.py)
Users who have a list of nonsense or indel variants that they wish to functionally characterise can use the PMPred package to run the classification algorithms.
Data must be a VCF containing phased genotype data; no other formats are currently supported.
The output format is a csv file reporting all variant according to pathogenic rate in ascending order. It has the following fields:
1.Variant
2.Transcript stable ID
3.Gene Symbol
4.Pvalue
5.Correlation
6.Location(Distance from beginning)
7.Pathogenic class
8.Wildtype domain
9.Altered domain
1.Install python
Follow the instructions on python.org to install Python version 3.0 or newer. Pip, python's package manager, must be included in the installation.
2.Install Nextflow
Download the distribution package by copying and pasting this command in your terminal:
curl -fsSL https://get.nextflow.io | bash
It creates the nextflow executable file in the current directory. You may want to move it to a folder accessible from your $PATH.
Nextflow documentation is available at this link http://docs.nextflow.io
3.Download Annovar data
Annovar data is available at this link https://www.openbioinformatics.org/annovar/annovar_download_form.php
4.Install InterProScan
To install the InterProScan 5 software you then need to complete the following steps:
- Obtaining the core InterProScan software
mkdir my_interproscan cd my_interproscan wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.59-91.0/interproscan-5.59-91.0-64-bit.tar.gz wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.59-91.0/interproscan-5.59-91.0-64-bit.tar.gz.md5
(Direct link: https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.59-91.0/interproscan-5.59-91.0-64-bit.tar.gz)
- Extract the tar ball:
tar -pxvzf interproscan-5.59-91.0-*-bit.tar.gz
This is a completely self-contained version that includes member database specific binaries and model / signature files. This should run 'out of the box' on a Linux system. Note that it excludes analyses that contain components for which you are obliged to acquire your own license.
- Index hmm models
Before you run interproscan for the first time, you should run the command:
python3 setup.py interproscan.properties
This command will press and index the hmm models to prepare them into a format used by hmmscan.
For more information on downloading, installing and running InterProScan please see the InterProScan readthedocs.
5.Download UniGOPred data
UniGOPred data is available at this link ?
6.Dependencies
Other dependencies are available in requirements.txt file.
pip install -r requirements.txt
7.Start running your own analysis!
nextflow run main.nf --input test.vcf