Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From which COSMIC donwlodad file do the "Existing_variation" come? #1691

Open
MGCarta opened this issue Jun 5, 2024 · 3 comments
Open

From which COSMIC donwlodad file do the "Existing_variation" come? #1691

MGCarta opened this issue Jun 5, 2024 · 3 comments
Assignees

Comments

@MGCarta
Copy link

MGCarta commented Jun 5, 2024

Describe the issue

Hi, this is not a specific problem with VEP tool but more understanding issue.
I have a variant in PTEN with HGVSc: NM_000314.8:c.634+5G>C which annotated via VEP command line tool gives as "Existing_variation" value the following:

rs138336847&COSV64291709&COSV64294394&COSV64310386&CS010097&CS991491

If I search the variant in the website this is nicely appearing, however I do not understand what COSMIC donwlodad file VEP is using to retrieve these values from.
I can see from the header of my VEP annotated VCF file that the COSMIC version used is v98.
I downloaded the Cosmic_NonCodingVariants_Tsv_v98_GRCh38.tar file (because I assume that this is a non-coding variant and it should be listed there) from COSMIC website but the variant is not reported there with any of the identifiers in the "Existing_variation" value from VEP.

Additional information

Please fill in the following sections to help us find the source of your issue as quickly as possible.

System

  • VEP version: 111
  • VEP Cache version: homo_sapiens_refseq/111_GRCh38
  • Perl version: perl 5, version 30, subversion 0 (v5.30.0)
  • OS: Ubuntu 20.04.6
  • tabix installed: yes Version: 1.10.2-3ubuntu0.1

Full VEP command line

vep --af --af_1kg --af_gnomade --af_gnomadg --assembly GRCh38 --cache --canonical --database 0 --dir [PATH]/.vep --domains --fasta [PATH]/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --force_overwrite --fork 4 --hgvs --hgvsg --hgvsg_use_accession --input_file [PATH]/benchmark_table_union.txt --mane --no_intergenic --numbers --offline --output_file [PATH]/benchmark_table_union_annotated.vcf --plugin [PATH]/spliceai_scores.raw.indel.hg38.vcf.gz,cutoff=0.5 --pubmed --refseq --symbol --vcf

Full error message

No error message

Data files (if applicable)

No data files

@nuno-agostinho nuno-agostinho self-assigned this Jun 7, 2024
@nuno-agostinho
Copy link
Contributor

Hi @MGCarta,

Existing_variation is populated by the --check_existing flag to identify known co-located variants. VEP by default uses a normalisation-based allele matching algorithm to identify known variants that match input variants.

However, for some data sources (COSMIC, HGMD), Ensembl is not licensed to redistribute allele-specific data, so VEP will report the existence of co-located variants with unknown alleles without carrying out allele matching. In order to disable this behaviour and exclude these variants, you can use the --exclude_null_alleles flag.

Please refer to our public documentation: Existing or colocated variants.

I just want to add that the data we use is directly provided by the COSMIC team, so they may not have exactly the same information compared to the files you pointed.

I will try to get in contact with the COSMIC team to understand why those identifiers are not available in COSMIC release v98. Hope this was helpful for now.

Kind regards,
Nuno

@MGCarta
Copy link
Author

MGCarta commented Jun 8, 2024

Hi @nuno-agostinho,
and thank you very much for your explanation.

  1. If I understand correctly in Existing_variation, if the variant given as input to VEP is known, there could be co-located variants that have been selected by VEP on both allele- and genomic coordinate- basis.
  2. For some databases, such as COSMIC, the selection of co-located variants is not done on an allele basis, but on a coordinate basis, is this right?
  3. Therefore, if the variant input to VEP is C>A, is it possible that in Existing_variation I have COSMIC entries that are at the same genomic position, but could be C>A as well as C>G?
  4. If I want to disable COSMIC entries with a different allele, I have to use the combination of the --check_existing and --exclude_null_alleles parameters. But does that mean I won't get COSMIC entries at all?

Best,
Giulia

@jamie-m-a
Copy link
Contributor

Hi @MGCarta

Sorry for the delay in responding!

Yes you are correct in all assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants