Skip to content

Commit

Permalink
Merge pull request #194 from cancervariants/staging
Browse files Browse the repository at this point in the history
Staging
  • Loading branch information
korikuzma authored Nov 23, 2021
2 parents 0fea32d + 3883597 commit d44e5f1
Show file tree
Hide file tree
Showing 111 changed files with 8,137 additions and 1,782 deletions.
9 changes: 4 additions & 5 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ coveralls = "*"
coverage = "*"
flake8-docstrings = "*"
pre-commit = "*"
variation-normalization = {editable = true, path = "."}
variation-normalizer = {editable = true, path = "."}
pyyaml = "*"
jupyter = "*"
ipykernel = "*"
Expand All @@ -31,10 +31,9 @@ uvicorn = "*"
pydantic = "*"
uvloop = "*"
httptools = "*"
"ga4gh.vrs" = {version = "==0.7.0rc3", extras = ["extras"]}
gene-normalizer = ">=0.1.21"
"ga4gh.vrs" = {version = ">=0.7.2", extras = ["extras"]}
gene-normalizer = ">=0.1.23"
pyliftover = "*"
boto3 = "*"
"ga4gh.vrsatile.pydantic" = "*"
"ga4gh.vrsatile.pydantic" = ">=0.0.5"
pandas = "*"
jsonschema = ">=2.3, <4.0"
38 changes: 15 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Variation Normalization

Services and guidelines for normalizing variation terms into [VRS (v1.1.1)](https://vrs.ga4gh.org/en/1.1.1) and [VRSATILE (latest)](https://vrsatile.readthedocs.io/en/latest/) compatible representations.
Services and guidelines for normalizing variation terms into [VRS (v1.2.0)](https://vrs.ga4gh.org/en/1.2.0) and [VRSATILE (latest)](https://vrsatile.readthedocs.io/en/latest/) compatible representations.

Public OpenAPI endpoint: https://normalize.cancervariants.org/variation

Expand All @@ -13,12 +13,19 @@ pip install variation-normalizer
## About
Variation Normalization works by using four main steps: tokenization, classification, validation, and translation. During tokenization, we split strings on whitespace and parse to determine the type of token. During classification, we specify the order of tokens a classification can have. We then do validation checks such as ensuring references for a nucleotide or amino acid matches the expected value and validating a position exists on the given transcript. During translation, we return a VRS Allele object.

Variation Normalization is limited to the following types of variants represented as HGVS expressions and text representations (ex: `BRAF V600E`):

* **protein (p.)**: substitution, deletion, insertion, deletion-insertion
* **coding DNA (c.)**: substitution, deletion, insertion, deletion-insertion
* **genomic (g.)**: substitution, deletion, ambiguous deletion, insertion, deletion-insertion, duplication

We are working towards adding more types of variations, coordinates, and representations.

### Endpoints
#### /toVRS
The `/toVRS` endpoint returns a list of valid [Alleles](https://vrs.ga4gh.org/en/1.1.1/terms_and_model.html#allele).
The `/toVRS` endpoint returns a list of valid VRS [Variations](https://vrs.ga4gh.org/en/1.2.0/terms_and_model.html#variation).

#### /normalize
The `/normalize` endpoint returns a [Variation Descriptor](https://vrsatile.readthedocs.io/en/latest/value_object_descriptor/vod_index.html#variation-descriptor) containing the MANE Transcript, if one is found.
The `/normalize` endpoint returns a [Variation Descriptor](https://vrsatile.readthedocs.io/en/latest/value_object_descriptor/vod_index.html#variation-descriptor) containing the MANE Transcript, if one is found. If a genomic query is not given a gene, `normalize` will return its GRCh38 representation.

The steps for retrieving MANE Transcript data is as follows:
1. Map starting annotation layer to genomic
Expand All @@ -30,17 +37,6 @@ The steps for retrieving MANE Transcript data is as follows:
3. Longest Compatible Remaining Transcript
4. Map back to starting annotation layer

#### Limitations
Variation Normalization is limited to the following types of variants represented as HGVS expressions and text representations (ex: `BRAF V600E`):

* **protein (p.)**: substitution, deletion, insertion, deletion-insertion
* **coding DNA (c.)**: substitution, deletion, insertion, deletion-insertion\
*Note: c. coordinates will be returned as r. coordinates in the VRS and VRSATILE objects*
* **genomic (g.)**: substitution, deletion, insertion, deletion-insertion\
*Note: If a genomic query is not given a gene, `normalize` will return its GRCh38 representation.*

We are working towards adding more types of variants, coordinates, and representations.

## Backend Services

Variation Normalization relies on some local data caches which you will need to set up. It uses pipenv to manage its environment, which you will also need to install.
Expand All @@ -52,17 +48,13 @@ pipenv lock
pipenv sync
```

### Setting up Gene Normalizer
Variation Normalization relies on data from [Gene Normalization](https://github.com/cancervariants/gene-normalization. You must have Gene Normalization's DynamoDB running for the application to work.
### Gene Normalizer

You must run the following when loading the database:

```commandline
python3 -m gene.cli --update_all --update_merged
```
Variation Normalization relies on data from [Gene Normalization](https://github.com/cancervariants/gene-normalization). You must load all sources _and_ merged concepts.

For more information, visit see the [README](https://github.com/cancervariants/gene-normalization/blob/main/README.md).
You must also have Gene Normalization's DynamoDB running for the application to work.

For more information about the gene-normalizer, visit the [README](https://github.com/cancervariants/gene-normalization/blob/main/README.md).

### SeqRepo
Variation Normalization relies on [seqrepo](https://github.com/biocommons/biocommons.seqrepo), which you must download yourself.
Expand Down
33 changes: 33 additions & 0 deletions docs/hgvs_dup_del_mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# HGVS Dup Del Mode

This mode helps us interpret deletions and duplications that are represented as HGVS expressions.

## Default Characteristics

- If endpoints are ambiguous: cnv (copies attribute)
- handling X chromosome
- base 1-2
- Duplication: Definite Range = 2, 3
- Deletion: Definite Range = 0, 1
- handling Y chromosome
- base of 1
- Duplication: Number = 2
- Deletion: Number = 0
- handling 1 – 22 chromosome
- base of 2
- Duplication: Number = 3
- Deletion: Number = 1
- elif len del or dup > 100bp: (use outermost coordinates)
- repeated_seq_expr with a derived_seq_expr subject (Allele)
- else:
- literal_seq_expr (normalized LiteralSequenceExpression Allele)

# Notes

- Ambiguous ranges are of the form:
- `(#_#)_(#_#)`
- `(?_#)_(#_?)`
- `(?_#)_#`
- `#_(#_?)`
- We do not normalize any ambiguous ranges
- We do not change the molecular context for ambiguous ranges.
84 changes: 42 additions & 42 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@

-i https://pypi.org/simple
-e .
-e .
anyio==3.3.4; python_full_version >= '3.6.2'
anyio==3.4.0; python_full_version >= '3.6.2'
appdirs==1.4.4
appnope==0.1.2; sys_platform == 'darwin'
argcomplete==1.12.3
Expand All @@ -21,13 +20,13 @@ asgiref==3.4.1; python_version >= '3.6'
attrs==21.2.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
babel==2.9.1; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
backcall==0.2.0
backports.entry-points-selectable==1.1.0; python_version >= '2.7'
backports.entry-points-selectable==1.1.1; python_version >= '2.7'
beautifulsoup4==4.10.0; python_version >= '3.1'
biocommons.seqrepo==0.6.4
bioutils==0.5.5; python_version >= '3.6'
bleach==4.1.0; python_version >= '3.6'
boto3==1.19.10
botocore==1.22.10; python_version >= '3.6'
boto3==1.20.11
botocore==1.23.11; python_version >= '3.6'
bs4==0.0.1
canonicaljson==1.5.0; python_version ~= '3.5'
certifi==2021.10.8
Expand All @@ -38,8 +37,8 @@ click==8.0.3; python_version >= '3.6'
colorama==0.4.4; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
coloredlogs==15.0.1; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
configparser==5.1.0; python_version >= '3.6'
coverage[toml]==6.1.1
coveralls==3.3.0
coverage[toml]==6.1.2
coveralls==3.3.1
cssselect==1.1.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
cycler==0.11.0; python_version >= '3.6'
debugpy==1.5.1; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
Expand All @@ -51,59 +50,60 @@ docutils==0.18; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2
entrypoints==0.3; python_version >= '2.7'
fake-useragent==0.1.11
fastapi==0.70.0
filelock==3.3.2; python_version >= '3.6'
filelock==3.4.0; python_version >= '3.6'
flake8-docstrings==1.6.0
flake8==4.0.1
frozendict==2.0.7; python_version >= '3.6'
ga4gh.vrs[extras]==0.7.0rc3
ga4gh.vrsatile.pydantic==0.0.3
gene-normalizer==0.1.22
fonttools==4.28.2; python_version >= '3.7'
frozendict==2.1.0; python_version >= '3.6'
ga4gh.vrs[extras]==0.7.2
ga4gh.vrsatile.pydantic==0.0.5
gene-normalizer==0.1.23
gffutils==0.10.1
h11==0.12.0; python_version >= '3.6'
hgvs==1.5.1
httptools==0.3.0
humanfriendly==10.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
identify==2.3.3; python_full_version >= '3.6.1'
identify==2.4.0; python_full_version >= '3.6.1'
idna==3.3; python_version >= '3'
importlib-metadata==4.8.1; python_version >= '3.6'
importlib-metadata==4.8.2; python_version < '3.10'
inflection==0.5.1; python_version >= '3.5'
iniconfig==1.1.1
ipykernel==6.5.0
ipykernel==6.5.1
ipython-genutils==0.2.0
ipython==7.29.0; python_version >= '3.7'
ipywidgets==7.6.5
jedi==0.18.0; python_version >= '3.6'
jinja2==3.0.2; python_version >= '3.6'
jedi==0.18.1; python_version >= '3.6'
jinja2==3.0.3; python_version >= '3.6'
jmespath==0.10.0; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2, 3.3'
json5==0.9.6
jsonschema==3.2.0
jupyter-client==7.0.6; python_full_version >= '3.6.1'
jupyter-client==7.1.0; python_full_version >= '3.6.1'
jupyter-console==6.4.0; python_version >= '3.6'
jupyter-core==4.9.1; python_version >= '3.6'
jupyter-server==1.11.2; python_version >= '3.6'
jupyter-server==1.12.0; python_version >= '3.6'
jupyter==1.0.0
jupyterlab-pygments==0.1.2
jupyterlab-server==2.8.2; python_version >= '3.6'
jupyterlab-widgets==1.0.2; python_version >= '3.6'
jupyterlab==3.2.1
jupyterlab==3.2.4
keyring==23.2.1; python_version >= '3.6'
kiwisolver==1.3.2; python_version >= '3.7'
lxml==4.6.4
markdown==3.3.4; python_version >= '3.6'
markdown==3.3.6; python_version >= '3.6'
markupsafe==2.0.1; python_version >= '3.6'
matplotlib-inline==0.1.3; python_version >= '3.5'
matplotlib==3.4.3
matplotlib==3.5.0
mccabe==0.6.1
mistune==0.8.4
nbclassic==0.3.4; python_version >= '3.6'
nbclient==0.5.4; python_full_version >= '3.6.1'
nbconvert==6.2.0; python_version >= '3.7'
nbclient==0.5.9; python_full_version >= '3.6.1'
nbconvert==6.3.0; python_version >= '3.7'
nbformat==5.1.3; python_version >= '3.5'
nest-asyncio==1.5.1; python_version >= '3.5'
nodeenv==1.6.0
notebook==6.4.5; python_version >= '3.6'
numpy==1.21.3; python_version < '3.10' and platform_machine != 'aarch64' and platform_machine != 'arm64'
packaging==21.2; python_version >= '3.6'
notebook==6.4.6; python_version >= '3.6'
numpy==1.21.4; python_version < '3.10' and platform_machine != 'aarch64' and platform_machine != 'arm64'
packaging==21.3; python_version >= '3.6'
pandas==1.3.4
pandocfilters==1.5.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
parse==1.19.0
Expand All @@ -112,37 +112,37 @@ parso==0.8.2; python_version >= '3.6'
pexpect==4.8.0; sys_platform != 'win32'
pickleshare==0.7.5
pillow==8.4.0; python_version >= '3.6'
pkginfo==1.7.1
pkginfo==1.8.1
platformdirs==2.4.0; python_version >= '3.6'
pluggy==1.0.0; python_version >= '3.6'
pre-commit==2.15.0
prometheus-client==0.12.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
prompt-toolkit==3.0.22; python_full_version >= '3.6.2'
psycopg2-binary==2.9.1; python_version >= '3.6'
psycopg2-binary==2.9.2; python_version >= '3.6'
ptyprocess==0.7.0
py==1.10.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
py==1.11.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
pycodestyle==2.8.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
pycparser==2.20; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pycparser==2.21; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pydantic==1.8.2
pydocstyle==6.1.1; python_version >= '3.6'
pyee==8.2.2
pyfaidx==0.6.3.1
pyflakes==2.4.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
pygments==2.10.0; python_version >= '3.5'
pyliftover==0.4
pyparsing==2.4.7; python_version >= '2.6' and python_version not in '3.0, 3.1, 3.2, 3.3'
pyppeteer==0.2.6; python_full_version >= '3.6.1' and python_full_version < '4.0.0'
pyparsing==3.0.6; python_version >= '3.6'
pyppeteer==0.2.6; python_version < '4' and python_full_version >= '3.6.1'
pyquery==1.4.3
pyrsistent==0.18.0; python_version >= '3.6'
pysam==0.17.0
pysam==0.18.0
pytest-cov==3.0.0
pytest==6.2.5
python-dateutil==2.8.2; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
python-jsonschema-objects==0.3.10
python-jsonschema-objects==0.4.1
pytz==2021.3
pyyaml==6.0; python_version >= '3.6'
pyzmq==22.3.0; python_version >= '3.6'
qtconsole==5.1.1; python_version >= '3.6'
qtconsole==5.2.0; python_version >= '3.6'
qtpy==1.11.2; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4, 3.5'
readme-renderer==30.0
requests-html==0.10.0; python_version >= '3.6'
Expand All @@ -151,11 +151,11 @@ requests==2.26.0
rfc3986==1.5.0
s3transfer==0.5.0; python_version >= '3.6'
send2trash==1.8.0
simplejson==3.17.5; python_version >= '2.5' and python_version not in '3.0, 3.1, 3.2, 3.3'
simplejson==3.17.6; python_version >= '2.5' and python_version not in '3.0, 3.1, 3.2, 3.3'
six==1.16.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
sniffio==1.2.0; python_version >= '3.5'
snowballstemmer==2.1.0
soupsieve==2.3; python_version >= '3.6'
snowballstemmer==2.2.0
soupsieve==2.3.1; python_version >= '3.6'
sqlparse==0.4.2; python_version >= '3.5'
starlette==0.16.0; python_version >= '3.6'
tabulate==0.8.9
Expand All @@ -166,9 +166,9 @@ tomli==1.2.2; python_version >= '3.6'
tornado==6.1; python_version >= '3.5'
tqdm==4.62.3; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
traitlets==5.1.1; python_version >= '3.7'
twine==3.5.0
typing-extensions==3.10.0.2
urllib3==1.26.7; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4' and python_full_version < '4.0.0'
twine==3.6.0
typing-extensions==4.0.0
urllib3==1.26.7; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4' and python_version < '4'
uvicorn==0.15.0
uvloop==0.16.0
virtualenv==20.10.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
Expand Down
Loading

0 comments on commit d44e5f1

Please sign in to comment.