Genomic Notation Translator
This script takes a string argument in HGVS format, and translates it to genomic notation using transvar (transvar.readthedocs.io). Useful for cerain use-case specific batch procesing.
Gnott itself is a simple python script - just download and run in python. The only prerequisite is having transvar installed.
See: http://transvar.readthedocs.io/en/latest/download_and_install.html
And: http://transvar.readthedocs.io/en/latest/quick_start.html
Or you can just run these:
sudo pip install transvar
#or locally: pip install --user transvar
# set up databases
transvar config --download_anno --refversion hg19
# download have a reference
transvar config --download_ref --refversion hg19
This scipt can theoretically take any input that Transvar accepts. The only requirement is that the string has a ":x" sequence in it, where the 'x' is one of g/c/p characters, denoting the type of encoding. However, I wrote gnott for a very specific use-case, so only the formats you see below were tested.
this script will skip mutations ending with 'X' or 'Fs', since the output of Transvar for this type of input is not useful for the use-case this script was originally created for.
gnott.py \[-h] [-o g,c,p,pp] string
- -o {g,c,p,pp} Denotes what format should the output string be in.
- -debug Runs debug mode, shows more details about progress and errrors.
python gnott.py 'NM_000492.3:p.Gly480Cys'
chr7:g.117199563G>T
use to modify the output format.
Possible values: 'g', 'c', 'p', 'pp'
- g - genomic reference sequence
- ignored if input sequence is in g.
- c - coding DNA reference sequence
- p - protein reference sequence, 1-letter coding
- pp - protein reference sequence, 3-letter coding
DEFAULTS TO: 'g' or 'p' if input sequence is in g.
python gnott.py 'NM_000492.3:p.Gly480Cys' -o c
chr7:c.1438G>T
python gnott.py 'NM_000492.3:c.1438G>T' -o p
chr7:p.G480C
INPUT TYPE (g/c/p) is detected automatically. See above for c/p type input examples.
If the input is genomic level annotation (g),then the output is HGVS coded protein and mutation.
In this case, output modifier -o can only be c or p, and defaults to p. If modifier g is passed, it will be ignored.
python gnott.py 'chr7:g.117199563G>T' -o c
NM_000492:c.1438G>T
Script gnottFile.py automatically reads lines in a file and calls gnott with the correct input. This script expects the first line in supplied file to be an HGVS protein code, and variants on next lines.
Example of a valid input file:
NM_000314
L70V
D326N
N276S
G132D
Arguments are similar to gnott.py, with extra -i argument:
usage: gnottFile.py \[-h] \[-i {g,c,p}] \[-o {g,c,p,pp}] [-debug] filename
- -i {g,c,p,a} Denotes what format are the variants in.
- -o {g,c,p,pp} Denotes what format should the output string be in.
- -debug Runs debug mode, shows more details about progress and errrors.
This tells gnott what its looking at in the input file. The example above would require value "p" for the -i argument to be processed correctly.
If you have multiple levels of annotations in single file, include the level modifier in the file. E.g.:
NM_000314
p.L70V
p.D326N
and run gnottFile with "-i a". The "a" value will detect the level automatically for each line (but it has to be present for all lines).