refactor!: clean up app #474

korikuzma · 2023-07-27T18:14:30Z

Close #162, #332, #429, #119, #428, #189, #309, #414, #475, #427

@jsstevenson I'm realllly sorry for making this large of a PR. I'm going to open now. There's some places (validators/translators) where I want to use shared methods for DRY principle. However, I'm going to open up now since it's so large and look at doing this while you review.

Notes:

Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules
Remove to canonical variation (no longer support)
Combined tests for tokenizers/classifiers/validators/translators into one module
Removed amino_acids.csv (accidentally left in)
Names changes
- Coding DNA → cDNA
- Polypeptide truncation → Protein Stop Gain
- Silent Mutation → Reference Agree
- Uncertain/Range → Ambiguous
- HGVSDupDelModeEnum → HGVSDupDelModeOption
Validators no longer do any kind of translations to VRS representations. Translators will do this work
Classifier only returns exact matches and only returns a single classification rather than a list
Use regex patterns (in variation/regex.py) rather than multiple if/else conditions
Remove unused code
Create variation schemas for supported variation types. Uses consistent field naming
Cleaning up instance variables in classes
Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number
Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root
baseline_copies is required in /hgvs_to_copy_number_count
cool-seq-tool update
- Removes file path params from QueryHandler, can set these via environment variables
- QueryHandler accepts only uta_db_url as param and removes uta_db_pwd
new dependencies for linting
- ruff (replaced flake8)
- black

variation/classifiers/classifier.py

korikuzma · 2023-08-01T19:58:16Z

@jsstevenson Thanks for all your feedback so far! So happy to get another set of eyes to find things I've missed

variation/validators/validator.py

- Remove unused classification_type method - Update type hints / return types - Remove del_or_dup and use AltType

korikuzma · 2023-08-01T20:27:00Z

Might look to see about removing AltType and seeing if we can use an existing field in the models. Also think I may remove CoordinateType and just use MoleculeContext. Although I think molecule context is being removed once we move to 2.0-alpha (at least I don't see it in the schema yet).

- Assumes gene normalizer works correctly. Only checks that gene_context is present if given in the test fixture

pyproject.toml

variation/vrs_representation.py

README.md

tests/test_normalize.py

jsstevenson

@korikuzma astounding work! I think I'm caught up through f422b4d, so I'll just do an approve now in the interest of keeping this moving

Pipfile

korikuzma

Need to make this change, otherwise RSE default won't work as expected. Should add a test for dup > 100 bps

korikuzma · 2023-08-07T20:30:17Z

Adding todo's in #163 . Going to merge this into refactor

- Refactor app (#474) - Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules - Remove to canonical variation (no longer support) - Combined tests for tokenizers/classifiers/validators/translators into one module - Removed amino_acids.csv (accidentally left in) - Names changes - Coding DNA → cDNA - Polypeptide truncation → Protein Stop Gain - Silent Mutation → Reference Agree - Uncertain/Range → Ambiguous - HGVSDupDelModeEnum → HGVSDupDelModeOption - Validators no longer do any kind of translations to VRS representations. Translators will do this work - Classifier only returns exact matches and only returns a single classification rather than a list - Use regex patterns (in variation/regex.py) rather than multiple if/else conditions - Remove unused code - Create variation schemas for supported variation types. Uses consistent field naming - Cleaning up instance variables in classes - Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number - Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root - baseline_copies is required in /hgvs_to_copy_number_count - cool-seq-tool update - Removes file path params from QueryHandler, can set these via environment variables - QueryHandler accepts only uta_db_url as param and removes uta_db_pwd - new dependencies for linting - ruff (replaced flake8) - black - Add more support for gnomad vcf expressions in normalize (#479, #489) - Remove pyliftover from deps (covered by cool-seq-tool) (#480) - Fix default mode for hgvs dup del mode wrt rse (#482) - Fix default HGVS dup del mode - dels should be allele w lse (#484) - Use cool-seq-tool AnnotationLayer and rm CoordinateType (#485) - Remove structural type from varaition descriptor (#487)

- Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules - Remove to canonical variation (no longer support) - Combined tests for tokenizers/classifiers/validators/translators into one module - Removed amino_acids.csv (accidentally left in) - Names changes - Coding DNA → cDNA - Polypeptide truncation → Protein Stop Gain - Silent Mutation → Reference Agree - Uncertain/Range → Ambiguous - HGVSDupDelModeEnum → HGVSDupDelModeOption - Validators no longer do any kind of translations to VRS representations. Translators will do this work - Classifier only returns exact matches and only returns a single classification rather than a list - Use regex patterns (in variation/regex.py) rather than multiple if/else conditions - Remove unused code - Create variation schemas for supported variation types. Uses consistent field naming - Cleaning up instance variables in classes - Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number - Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root - baseline_copies is required in /hgvs_to_copy_number_count - cool-seq-tool update - Removes file path params from QueryHandler, can set these via environment variables - QueryHandler accepts only uta_db_url as param and removes uta_db_pwd - new dependencies for linting - ruff (replaced flake8) - black

korikuzma added 30 commits May 10, 2023 10:28

refactor: update TokenType + remove protein termination classifier

b17d52c

refactor: Remove TokenMatchType (not necessary)

008b632

forgot to remove additional match_type

c2a47bb

refactor: clean up handling unknown tokens

111d104

refactor: Rename GeneMatchToken to GeneToken

cc79fd1

refactor: create AltType str enum

5cd918d

refactor: update TokenType type

095c9e9

refactor: remove LookupType enum (not used)

b79410c

wip: add work for tokenizers

67ab56a

wip: storing progress

944d6db

wip: store progress for delins

a4c2bf7

wip: store progress for cdna insertion

8b76419

wip: store progress for ref agree

7857767

wip: minor cleanup of validators

6778521

wip: store progress for protein stop gain

d567c5a

wip: store initial work for genomic dup

59e7942

wip: store progress for genomic ambiguous dups

4dd8043

wip: progress for genomic del

b652db8

wip: remove canonical variation work

d2cf750

wip: fix classifiers

d909427

wip: handle if mane none in translators

80dd57c

wip: tmp stop running gh actions

1b39114

wip: more progress for normalize

8dd9d92

wip: store more progress

df8e845

wip: add genomic del ambiguous

450981e

iMerge branch 'main' into issue-332-kori-merge-main

8310cdf

wip: fix tokenizers

8512eea

wip: clean up classifier tests

6ee5110

wip: rename coding dna --> cdna

666ab23

wip: storing progress for dup1

f194615

jsstevenson reviewed Aug 1, 2023

View reviewed changes

variation/classifiers/classifier.py Show resolved Hide resolved

jsstevenson reviewed Aug 1, 2023

View reviewed changes

variation/validators/validator.py Outdated Show resolved Hide resolved

pr review changes

1bb95ae

- Remove unused classification_type method - Update type hints / return types - Remove del_or_dup and use AltType

This was referenced Aug 1, 2023

Remove /to_canonical_variation endpoint #454

Closed

server error on /to_vrs and /normalize for "NC_000010.11-87925523-C-G" #438

Open

korikuzma added 3 commits August 1, 2023 16:45

update invalid tests for normalize + put in todo reminder

ce7bbfb

tests: add test for genomic delins change w gene

ae7ef17

tests: stop checking exact gene normalizer response

4bac65d

- Assumes gene normalizer works correctly. Only checks that gene_context is present if given in the test fixture