Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: clean up app #474

Merged
merged 120 commits into from
Aug 7, 2023
Merged

refactor!: clean up app #474

merged 120 commits into from
Aug 7, 2023

Conversation

korikuzma
Copy link
Member

@korikuzma korikuzma commented Jul 27, 2023

Close #162, #332, #429, #119, #428, #189, #309, #414, #475, #427

@jsstevenson I'm realllly sorry for making this large of a PR. I'm going to open now. There's some places (validators/translators) where I want to use shared methods for DRY principle. However, I'm going to open up now since it's so large and look at doing this while you review.

Notes:

  • Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules
  • Remove to canonical variation (no longer support)
  • Combined tests for tokenizers/classifiers/validators/translators into one module
  • Removed amino_acids.csv (accidentally left in)
  • Names changes
    • Coding DNA → cDNA
    • Polypeptide truncation → Protein Stop Gain
    • Silent Mutation → Reference Agree
    • Uncertain/Range → Ambiguous
    • HGVSDupDelModeEnum → HGVSDupDelModeOption
  • Validators no longer do any kind of translations to VRS representations. Translators will do this work
  • Classifier only returns exact matches and only returns a single classification rather than a list
  • Use regex patterns (in variation/regex.py) rather than multiple if/else conditions
  • Remove unused code
  • Create variation schemas for supported variation types. Uses consistent field naming
  • Cleaning up instance variables in classes
  • Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number
  • Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root
  • baseline_copies is required in /hgvs_to_copy_number_count
  • cool-seq-tool update
    • Removes file path params from QueryHandler, can set these via environment variables
    • QueryHandler accepts only uta_db_url as param and removes uta_db_pwd
  • new dependencies for linting
    • ruff (replaced flake8)
    • black

@korikuzma
Copy link
Member Author

@jsstevenson Thanks for all your feedback so far! So happy to get another set of eyes to find things I've missed

- Remove unused classification_type method
- Update type hints / return types
- Remove del_or_dup and use AltType
@korikuzma
Copy link
Member Author

korikuzma commented Aug 1, 2023

Might look to see about removing AltType and seeing if we can use an existing field in the models. Also think I may remove CoordinateType and just use MoleculeContext. Although I think molecule context is being removed once we move to 2.0-alpha (at least I don't see it in the schema yet).

README.md Show resolved Hide resolved
@jsstevenson jsstevenson self-requested a review August 2, 2023 13:30
Copy link
Member

@jsstevenson jsstevenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@korikuzma astounding work! I think I'm caught up through f422b4d, so I'll just do an approve now in the interest of keeping this moving

@korikuzma korikuzma changed the base branch from main to refactor August 2, 2023 13:47
Pipfile Show resolved Hide resolved
Copy link
Member Author

@korikuzma korikuzma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make this change, otherwise RSE default won't work as expected. Should add a test for dup > 100 bps
Screenshot 2023-08-06 at 5 51 40 PM

@korikuzma korikuzma mentioned this pull request Aug 7, 2023
@korikuzma
Copy link
Member Author

Adding todo's in #163 . Going to merge this into refactor

@korikuzma korikuzma merged commit 77d3ef3 into refactor Aug 7, 2023
@korikuzma korikuzma deleted the issue-332-kori branch August 7, 2023 20:30
korikuzma added a commit that referenced this pull request Aug 25, 2023
- Refactor app (#474)
  - Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules
  - Remove to canonical variation (no longer support)
  - Combined tests for tokenizers/classifiers/validators/translators into one module
  - Removed amino_acids.csv (accidentally left in)
  - Names changes
    - Coding DNA → cDNA
    - Polypeptide truncation → Protein Stop Gain
    - Silent Mutation → Reference Agree
    - Uncertain/Range → Ambiguous
    - HGVSDupDelModeEnum → HGVSDupDelModeOption
  - Validators no longer do any kind of translations to VRS representations. Translators will do this work
  - Classifier only returns exact matches and only returns a single classification rather than a list
  - Use regex patterns (in variation/regex.py) rather than multiple if/else conditions
  - Remove unused code
  - Create variation schemas for supported variation types. Uses consistent field naming
  - Cleaning up instance variables in classes
  - Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number
  - Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root 
  - baseline_copies is required in /hgvs_to_copy_number_count
  - cool-seq-tool update
      - Removes file path params from QueryHandler, can set these via environment variables
      - QueryHandler accepts only uta_db_url as param and removes uta_db_pwd
   - new dependencies for linting
    - ruff (replaced flake8)
    - black 
- Add more support for gnomad vcf expressions in normalize (#479, #489)
- Remove pyliftover from deps (covered by cool-seq-tool) (#480) 
- Fix default mode for hgvs dup del mode wrt rse (#482)
- Fix default HGVS dup del mode - dels should be allele w lse (#484)
- Use cool-seq-tool AnnotationLayer and rm CoordinateType (#485)
- Remove structural type from varaition descriptor (#487)
@korikuzma korikuzma mentioned this pull request Aug 25, 2023
korikuzma added a commit that referenced this pull request Sep 22, 2023
- Mainly focused on cleanup related to to_vrs and normalize endpoints. Did not really look at gnomad_vcf_to_protein or copy_number_variation modules
- Remove to canonical variation (no longer support)
- Combined tests for tokenizers/classifiers/validators/translators into one module
- Removed amino_acids.csv (accidentally left in)
- Names changes
  - Coding DNA → cDNA
  - Polypeptide truncation → Protein Stop Gain
  - Silent Mutation → Reference Agree
  - Uncertain/Range → Ambiguous
  - HGVSDupDelModeEnum → HGVSDupDelModeOption
- Validators no longer do any kind of translations to VRS representations. Translators will do this work
- Classifier only returns exact matches and only returns a single classification rather than a list
- Use regex patterns (in variation/regex.py) rather than multiple if/else conditions
- Remove unused code
- Create variation schemas for supported variation types. Uses consistent field naming
- Cleaning up instance variables in classes
- Only run fully justified allele normalization on VRS Alleles. Do not run on VRS Copy Number
- Pulled tokenize, classify, validate, translate outside of subdirectories (variation/tokenizers, variation/classifiers, variation/validators, variation/translators) and moved to app root 
- baseline_copies is required in /hgvs_to_copy_number_count
- cool-seq-tool update
    - Removes file path params from QueryHandler, can set these via environment variables
    - QueryHandler accepts only uta_db_url as param and removes uta_db_pwd
- new dependencies for linting
  - ruff (replaced flake8)
  - black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment