-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structural Variant normalization and Representation #818
Comments
VT (abandoned?) and bcftools don't normalize symbolic variants - raised issue samtools/bcftools#1919 I think this means we may need to convert symbolic variants to explicit ref/alt then normalize them and then convert back. An issue would be how to make sure we re-load a dup as a dup rather than an ins (keep track of old allele in INFO?) |
END Should end be max(len(ref), len(alt))) or abs(len(ref) - len(alt))? If you are looking at overlaps - the max seems better? HISTORICAL If > 1kb then describe as a CNV - should we go through historical data and convert variants to CNVs? Can find long ones via Sequence.objects.order_by("-length").first().variant_set.all().get().pk NORMALIZATION We should convert CNV to DUP —------- NC_000003.11:g.128204049_128206714del As it is > 1kb - it should be represented in the DB as We need to do this when creating a variant - if length >= settings.VARIANT_STRUCTURAL_SIZE: We’ll then run this VCF through VT (do anyway?) so it will get normalized Normalize CNVs May have to explicitly write out the ref/alt - run through VT When we are calling write_vcf_from_tuples - I think we actually want to write the proper ref-alt as write_vcf_from_tuples —--------------- Search for “chrom, position, ref” get_results_from_variant_tuples NC_000003.11:g.128204049_128206714del - can’t create as error needs end Variant = 3:128204042 > T In [11]: name.get_vcf_coords() Example large CNV normalization NC_000003.11:g.128204049_128206714del' resolved to 3:128204042 TCT...GCC>T ClinGen allele registry agrees also normalizes to same start "hgvs":["NC_000003.11:g.128204049_128206714del","CM000665.1:g.128204049_128206714del"], SPDI NC_000003.12:128481050:GG:TT
However HGVS call doesn't appear to normalize: it has inserted sequence "CTCCCG" but same prefix on deleted sequence |
Found some things that sounded useful: |
@davmlaw Just found this slide deck- https://drive.google.com/file/d/1w4uIdBJX6duSDni8XlOc84tF4h1CbMa_/view |
We use CAR for unique IDs, unfortunately they say:
We'd also have to use their own invented format as well, strangely it supports this: GRCh37 (chr7:132348499-132359000)x1 But not this: NC_000007.13:g.132348499_132359000del Gives: VariationTooLong |
Converting CNV to DUP (which we need to do for TSO500) is handled in issue https://github.com/SACGF/variantgrid_sapath/issues/304 The other variant normalization etc is now handled in #1014 |
* Panal App VUS/GUS relationship download * Panal App VUS/GUS relationship download * Providing methods to find more useful relationships between term and gene symbol for report * issue #2647 - liftover refactor * issue #2647 - start of bcftools liftover * Issue #2647 - BCFTools liftover * Issue #2647 - Remove vestigial NCBI remap traces * Issue #2647 - enable in vg test * Stop warning that doesn't apply here * Issue #2647 - be able to create variant for long ones... * This occasionally failed - execution continued after asking to change window * issue #1054 - make bcftools liftover part of standard ANNOTATION settings * issue #1054 - misc liftover issues * Issue #1052 - analysis template version * Make VEP have version in path * Shariant test config - enable BCFTools liftover * Upload initially shows error wrongly in pipeline race condition * issue #818 - don't uniq on preprocess anymore * issue #1056 - vcf_clean_and_filter convert contig headers * issue #1057 - VEP deployment change to explicit version * vg test config * issue #980 - karyomapping use symbol instead of gene * Renamed counts to Counts(germline only) on overlaps page (#1046) * Fix issue where alleleOriginToggle was being called with undefined * WIP fixing conda dependencies * Update of conda environment (works again in conda) * Add new evidence key for somatic testing * Show better quick clinical significance values (for somatic and germline) * Fix recently introduced bug in suggested terms for an allele * Allow link for condition matching to appear on a classification even when not in edit mode * issue #1059 - all variants node * issue #1059 - all variants node * Adjust node counts a little * issue #1043 - export VCF * Tidy up server status page removing redundant data (now go to Overall Status) * Fix the evidence key values * Get c.HGVS showing properly on variant details page again * issue #758 - configure quick links via settings * More comments in conda file * Move condition text match from classification.html to JSON to keep things in sync More requirements on when you will be linked to condition resolution * Style fixes for new liftover * Put liftover date on form * Move overall data to the splash server status page * Add Clinical Trails gov to quick links * Add liftover to settings menu if variants menu disabled * Update shariant prod settings to use VEP v108 * Make liftover page side menu change based on settings * Better wording for changing clinical contexts within a discordance * Tighten up the handling of embedded card or modal (used for triage but affected other modals by mistake) * Better safety around condition text relationships * For view metrics use heading of "Users" not "User" * Add overall stats to liftover pages * Slight formatting on liftover note * Rework the internals of the view user activity reports * More styling on liftover runs page * More styling on liftover pages * Tiny column alignment issue on liftover * linting * Fix exclusion of blank searches in metrics * Subtle change of header wording for activity report * Improve the formatting of missing IDs when making a batch for ClinVar * Proper int to string handling for missing IDs * issue #758 - quick links via settings * Reworking of Lab Differences using new zippable functionality for ExportRow * Put the allele URL into the lab compare (I think it was there before) * Enable SEARCH_HGVS_GENE_SYMBOL_USE_MANE for shariant test * Also try enabling SEARCH_HGVS_GENE_SYMBOL * Fix bug where condition URL was calculated before seeing if we had condition text * Enable extra variant annotations for testing * Attempt at parsing OncogenicityClassification for pulling in ClinVar records * issue #1193 - analysis audit log * issue #1193 - analysis audit log * issue #1193 - analysis audit log * issue #1193 - analysis audit log * issue #1193 - analysis audit log * variantgrid_private#1193 - Audit log for analysis * Get audit log name right in requirements * issue #1060 - variant details page, create allele for all variants not just shorty clingen ones * Sometimes can't reload analysis #1061 * issue #1193 - disable audit log for template copies etc * issue #1193 - Audit log for analysis - click to expand and see JSON * issue #1193 - Don't audit node cloning (was causing tests to fail) * temp fix for #1053 - will work out data fix later * Better categorising of ClinVar's Somatic/Germline records * issue #1053 - cohort genotype versions - make common cohort version match cohort.version * linting - format whitespace * linting - remove unused imports * Unused file * linting - f string w/o interpolation * linting - use generator * Increase the ClinVar parser version to purge old cache * Add django-audit-log to conda environment * ClinVar REcords More work on parsing condition * issue #1053 - missing import, make sure we retrieve only 1 cgc per cohort * issue #877 - Export column - c_hgvs_compat * Add MeSH as a non-local ontology set * Ability to render multiple conditions against a single ClinVar record * Don't abbreviate ref/alt in VCF * Fix bug with filtering out non-human clinvar records * Count homo sapiens, homo-sapiens, homosapiens the same as human for ClinVar records * A script to close manually raised flags * Raise classification change flag for both Classification and Somatic Clinical Significance * Filter based on creator of flag * Added prefixed clinvar batch export CB_ and clinvar export CE_ search functionality (#1066) * issue #3604 - COSMIC search * Update changelog * Tidy up code for close flags, allow it to re-open flags * Bug fixes for the condition checking code * panalapp work * PanelApp Compare: Fixes to stop infinite recurssion, correct method signatures. * #3591 panel app export --------- Co-authored-by: TheMadBug <[email protected]> Co-authored-by: Dave Lawrence <[email protected]>
Split out of #54 Structural Variant - copy number variation
It seems to be possible to write the same underlying genetic mutation in different ways using VCF files:
Say you have the same contig/position/ref base but the ALT is different:
The downside of is that there is no representation in HGVS. Maybe it's possible via ISCN?
I dont' think you can represent it as SPDI unless you just use pure numbers for the last 2
Would it be a good idea to convert the
<CNV>
to a<del>
if FC < 1 and a <DUP> if FC > 1?
The do NOT annotate the same, ie the DEL is impact=HIGH wile the CNV is impact=MODIFIER
Advantages of leaving it are:
Advantages of normalizing it are:
<CNV
> to<DEL/DUP>
then you can represent as HGVSI guess it comes down to whether you think a deletion or dup at the same site are the same - I would say not?
TODO: We should check how
<CNV>
vs<DEL>
vs<DUP>
are handled as VEP annotation - does it matter?Does it make sense to call it a DUP if there are say 4 copies now? Is that just homozygous for a DUP?
gnomAD structural variants include CNV (as well as INS/DUP etc)
The text was updated successfully, but these errors were encountered: