Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new tool virheat #6005

Merged
merged 6 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions tools/virheat/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: virheat
owner: iuc
description: "generates a heatmap of allele frequencies from vcf files"
homepage_url: https://github.com/jonas-fuchs/virHEAT
long_description: |
VirHEAT tool generates multi-sample variant-frequency plots from SnpEff-annotated viral variant lists. The tool provides a condensed look at variant frequencies after mapping raw reads to a viral/bacterial reference genome and compares multiple vcf files at the same time.
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/virheat
categories:
- Visualization
- Variant Analysis
48 changes: 48 additions & 0 deletions tools/virheat/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
<?xml version="1.0"?>
<macros>
<token name="@TOOL_VERSION@">0.6</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">21.01</token>
<xml name="biotools">
<xrefs>
<xref type="bio.tools">virheat</xref>
</xrefs>
</xml>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">virheat</requirement>
<yield/>
</requirements>
</xml>
<xml name="version">
<version_command>virheat --version</version_command>
</xml>
<token name="@HELP_HEADER@"><![CDATA[
What it does
============
This tool generates multi-sample variant-frequency plots from SnpEff-annotated
viral variant lists. The tool provides a condensed look at variant frequencies after mapping raw reads to a viral/bacterial reference genome and compares multiple vcf files at the same time.
The tool expects either the length of a reference genome or the gff3 file if the sequence annotation is needed.
Additionally, to analyze whether mutations are sufficiently covered and to display non-covered cells in grey, first create per-base coverage TSV files for each BAM file with Qualimap and provide them in the same folder as the VCF files. Give them the same name as your VCF files.
The tool expects input variant lists in VCF format. VCF datasets as produced by standard variant callers with
- variant allele frequencies encoded in an ``AF`` INFO field
- variant functional genomic effects annotated using SnpEff's EFF format
]]></token>
<xml name="citations">
<citations>
<citation type="bibtex">@unpublished{Fuchs2020,
author = {Fuchs, Jonas},
title = {},
year = {2020},
note = {Multi-sample annotated viral variant-frequency plots based on the R pheatmap package.},
address = {Institute for Virology, University of Freiburg}
}</citation>
</citations>
</xml>
</macros>
65 changes: 65 additions & 0 deletions tools/virheat/test-data/SARS-CoV-2.gff3
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region NC_045512.2 1 29903
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2697049
NC_045512.2 RefSeq region 1 29903 . + . ID=NC_045512.2:1..29903;Dbxref=taxon:2697049;collection-date=Dec-2019;country=China;gb-acronym=SARS-CoV-2;gbkey=Src;genome=genomic;isolate=Wuhan-Hu-1;mol_type=genomic RNA;nat-host=Homo sapiens;old-name=Wuhan seafood market pneumonia virus
NC_045512.2 RefSeq five_prime_UTR 1 265 . + . ID=id-NC_045512.2:1..265;gbkey=5'UTR
NC_045512.2 RefSeq gene 266 21555 . + . ID=gene-GU280_gp01;Dbxref=GeneID:43740578;Name=ORF1ab;gbkey=Gene;gene=ORF1ab;gene_biotype=protein_coding;locus_tag=GU280_gp01
NC_045512.2 RefSeq CDS 266 13468 . + 0 ID=cds-YP_009724389.1;Parent=gene-GU280_gp01;Dbxref=GenBank:YP_009724389.1,GeneID:43740578;Name=YP_009724389.1;Note=pp1ab%3B translated by -1 ribosomal frameshift;exception=ribosomal slippage;gbkey=CDS;gene=ORF1ab;locus_tag=GU280_gp01;product=ORF1ab polyprotein;protein_id=YP_009724389.1
NC_045512.2 RefSeq CDS 13468 21555 . + 0 ID=cds-YP_009724389.1;Parent=gene-GU280_gp01;Dbxref=GenBank:YP_009724389.1,GeneID:43740578;Name=YP_009724389.1;Note=pp1ab%3B translated by -1 ribosomal frameshift;exception=ribosomal slippage;gbkey=CDS;gene=ORF1ab;locus_tag=GU280_gp01;product=ORF1ab polyprotein;protein_id=YP_009724389.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 266 805 . + . ID=id-YP_009724389.1:1..180;Note=nsp1%3B produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=leader protein;protein_id=YP_009725297.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 806 2719 . + . ID=id-YP_009724389.1:181..818;Note=produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp2;protein_id=YP_009725298.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 2720 8554 . + . ID=id-YP_009724389.1:819..2763;Note=former nsp1%3B conserved domains are: N-terminal acidic (Ac)%2C predicted phosphoesterase%2C papain-like proteinase%2C Y-domain%2C transmembrane domain 1 (TM1)%2C adenosine diphosphate-ribose 1''-phosphatase (ADRP)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp3;protein_id=YP_009725299.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 8555 10054 . + . ID=id-YP_009724389.1:2764..3263;Note=nsp4B_TM%3B contains transmembrane domain 2 (TM2)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp4;protein_id=YP_009725300.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 10055 10972 . + . ID=id-YP_009724389.1:3264..3569;Note=nsp5A_3CLpro and nsp5B_3CLpro%3B main proteinase (Mpro)%3B mediates cleavages downstream of nsp4. 3D structure of the SARSr-CoV homolog has been determined (Yang et al.%2C 2003)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=3C-like proteinase;protein_id=YP_009725301.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 10973 11842 . + . ID=id-YP_009724389.1:3570..3859;Note=nsp6_TM%3B putative transmembrane domain%3B produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp6;protein_id=YP_009725302.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 11843 12091 . + . ID=id-YP_009724389.1:3860..3942;Note=produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp7;protein_id=YP_009725303.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 12092 12685 . + . ID=id-YP_009724389.1:3943..4140;Note=produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp8;protein_id=YP_009725304.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 12686 13024 . + . ID=id-YP_009724389.1:4141..4253;Note=ssRNA-binding protein%3B produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp9;protein_id=YP_009725305.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 13025 13441 . + . ID=id-YP_009724389.1:4254..4392;Note=nsp10_CysHis%3B formerly known as growth-factor-like protein (GFL)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009724389.1;gbkey=Prot;product=nsp10;protein_id=YP_009725306.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 13442 13468 . + . ID=id-YP_009724389.1:4393..5324;Note=nsp12%3B NiRAN and RdRp%3B produced by pp1ab only;Parent=cds-YP_009724389.1;gbkey=Prot;product=RNA-dependent RNA polymerase;protein_id=YP_009725307.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 13468 16236 . + . ID=id-YP_009724389.1:4393..5324;Note=nsp12%3B NiRAN and RdRp%3B produced by pp1ab only;Parent=cds-YP_009724389.1;gbkey=Prot;product=RNA-dependent RNA polymerase;protein_id=YP_009725307.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 16237 18039 . + . ID=id-YP_009724389.1:5325..5925;Note=nsp13_ZBD%2C nsp13_TB%2C and nsp_HEL1core%3B zinc-binding domain (ZD)%2C NTPase/helicase domain (HEL)%2C RNA 5'-triphosphatase%3B produced by pp1ab only;Parent=cds-YP_009724389.1;gbkey=Prot;product=helicase;protein_id=YP_009725308.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 18040 19620 . + . ID=id-YP_009724389.1:5926..6452;Note=nsp14A2_ExoN and nsp14B_NMT%3B produced by pp1ab only;Parent=cds-YP_009724389.1;gbkey=Prot;product=3'-to-5' exonuclease;protein_id=YP_009725309.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 19621 20658 . + . ID=id-YP_009724389.1:6453..6798;Note=nsp15-A1 and nsp15B-NendoU%3B produced by pp1ab only;Parent=cds-YP_009724389.1;gbkey=Prot;product=endoRNAse;protein_id=YP_009725310.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 20659 21552 . + . ID=id-YP_009724389.1:6799..7096;Note=nsp16_OMT%3B 2'-o-MT%3B produced by pp1ab only;Parent=cds-YP_009724389.1;gbkey=Prot;product=2'-O-ribose methyltransferase;protein_id=YP_009725311.1
NC_045512.2 RefSeq CDS 266 13483 . + 0 ID=cds-YP_009725295.1;Parent=gene-GU280_gp01;Dbxref=GenBank:YP_009725295.1,GeneID:43740578;Name=YP_009725295.1;Note=pp1a;gbkey=CDS;gene=ORF1ab;locus_tag=GU280_gp01;product=ORF1a polyprotein;protein_id=YP_009725295.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 266 805 . + . ID=id-YP_009725295.1:1..180;Note=nsp1%3B produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=leader protein;protein_id=YP_009742608.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 806 2719 . + . ID=id-YP_009725295.1:181..818;Note=produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp2;protein_id=YP_009742609.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 2720 8554 . + . ID=id-YP_009725295.1:819..2763;Note=former nsp1%3B conserved domains are: N-terminal acidic (Ac)%2C predicted phosphoesterase%2C papain-like proteinase%2C Y-domain%2C transmembrane domain 1 (TM1)%2C adenosine diphosphate-ribose 1''-phosphatase (ADRP)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp3;protein_id=YP_009742610.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 8555 10054 . + . ID=id-YP_009725295.1:2764..3263;Note=nsp4B_TM%3B contains transmembrane domain 2 (TM2)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp4;protein_id=YP_009742611.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 10055 10972 . + . ID=id-YP_009725295.1:3264..3569;Note=nsp5A_3CLpro and nsp5B_3CLpro%3B main proteinase (Mpro)%3B mediates cleavages downstream of nsp4. 3D structure of the SARSr-CoV homolog has been determined (Yang et al.%2C 2003)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=3C-like proteinase;protein_id=YP_009742612.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 10973 11842 . + . ID=id-YP_009725295.1:3570..3859;Note=nsp6_TM%3B putative transmembrane domain%3B produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp6;protein_id=YP_009742613.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 11843 12091 . + . ID=id-YP_009725295.1:3860..3942;Note=produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp7;protein_id=YP_009742614.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 12092 12685 . + . ID=id-YP_009725295.1:3943..4140;Note=produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp8;protein_id=YP_009742615.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 12686 13024 . + . ID=id-YP_009725295.1:4141..4253;Note=ssRNA-binding protein%3B produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp9;protein_id=YP_009742616.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 13025 13441 . + . ID=id-YP_009725295.1:4254..4392;Note=nsp10_CysHis%3B formerly known as growth-factor-like protein (GFL)%3B produced by both pp1a and pp1ab;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp10;protein_id=YP_009742617.1
NC_045512.2 RefSeq mature_protein_region_of_CDS 13442 13480 . + . ID=id-YP_009725295.1:4393..4405;Note=produced by pp1a only;Parent=cds-YP_009725295.1;gbkey=Prot;product=nsp11;protein_id=YP_009725312.1
NC_045512.2 RefSeq stem_loop 13476 13503 . + . ID=id-GU280_gp01;Dbxref=GeneID:43740578;function=Coronavirus frameshifting stimulation element stem-loop 1;gbkey=stem_loop;gene=ORF1ab;inference=COORDINATES: profile:Rfam-release-14.1:RF00507%2CInfernal:1.1.2;locus_tag=GU280_gp01
NC_045512.2 RefSeq stem_loop 13488 13542 . + . ID=id-GU280_gp01-2;Dbxref=GeneID:43740578;function=Coronavirus frameshifting stimulation element stem-loop 2;gbkey=stem_loop;gene=ORF1ab;inference=COORDINATES: profile:Rfam-release-14.1:RF00507%2CInfernal:1.1.2;locus_tag=GU280_gp01
NC_045512.2 RefSeq gene 21563 25384 . + . ID=gene-GU280_gp02;Dbxref=GeneID:43740568;Name=S;gbkey=Gene;gene=S;gene_biotype=protein_coding;gene_synonym=spike glycoprotein;locus_tag=GU280_gp02
NC_045512.2 RefSeq CDS 21563 25384 . + 0 ID=cds-YP_009724390.1;Parent=gene-GU280_gp02;Dbxref=GenBank:YP_009724390.1,GeneID:43740568;Name=YP_009724390.1;Note=structural protein%3B spike protein;gbkey=CDS;gene=S;locus_tag=GU280_gp02;product=surface glycoprotein;protein_id=YP_009724390.1
NC_045512.2 RefSeq gene 25393 26220 . + . ID=gene-GU280_gp03;Dbxref=GeneID:43740569;Name=ORF3a;gbkey=Gene;gene=ORF3a;gene_biotype=protein_coding;locus_tag=GU280_gp03
NC_045512.2 RefSeq CDS 25393 26220 . + 0 ID=cds-YP_009724391.1;Parent=gene-GU280_gp03;Dbxref=GenBank:YP_009724391.1,GeneID:43740569;Name=YP_009724391.1;gbkey=CDS;gene=ORF3a;locus_tag=GU280_gp03;product=ORF3a protein;protein_id=YP_009724391.1
NC_045512.2 RefSeq gene 26245 26472 . + . ID=gene-GU280_gp04;Dbxref=GeneID:43740570;Name=E;gbkey=Gene;gene=E;gene_biotype=protein_coding;locus_tag=GU280_gp04
NC_045512.2 RefSeq CDS 26245 26472 . + 0 ID=cds-YP_009724392.1;Parent=gene-GU280_gp04;Dbxref=GenBank:YP_009724392.1,GeneID:43740570;Name=YP_009724392.1;Note=ORF4%3B structural protein%3B E protein;gbkey=CDS;gene=E;locus_tag=GU280_gp04;product=envelope protein;protein_id=YP_009724392.1
NC_045512.2 RefSeq gene 26523 27191 . + . ID=gene-GU280_gp05;Dbxref=GeneID:43740571;Name=M;gbkey=Gene;gene=M;gene_biotype=protein_coding;locus_tag=GU280_gp05
NC_045512.2 RefSeq CDS 26523 27191 . + 0 ID=cds-YP_009724393.1;Parent=gene-GU280_gp05;Dbxref=GenBank:YP_009724393.1,GeneID:43740571;Name=YP_009724393.1;Note=ORF5%3B structural protein;gbkey=CDS;gene=M;locus_tag=GU280_gp05;product=membrane glycoprotein;protein_id=YP_009724393.1
NC_045512.2 RefSeq gene 27202 27387 . + . ID=gene-GU280_gp06;Dbxref=GeneID:43740572;Name=ORF6;gbkey=Gene;gene=ORF6;gene_biotype=protein_coding;locus_tag=GU280_gp06
NC_045512.2 RefSeq CDS 27202 27387 . + 0 ID=cds-YP_009724394.1;Parent=gene-GU280_gp06;Dbxref=GenBank:YP_009724394.1,GeneID:43740572;Name=YP_009724394.1;gbkey=CDS;gene=ORF6;locus_tag=GU280_gp06;product=ORF6 protein;protein_id=YP_009724394.1
NC_045512.2 RefSeq gene 27394 27759 . + . ID=gene-GU280_gp07;Dbxref=GeneID:43740573;Name=ORF7a;gbkey=Gene;gene=ORF7a;gene_biotype=protein_coding;locus_tag=GU280_gp07
NC_045512.2 RefSeq CDS 27394 27759 . + 0 ID=cds-YP_009724395.1;Parent=gene-GU280_gp07;Dbxref=GenBank:YP_009724395.1,GeneID:43740573;Name=YP_009724395.1;gbkey=CDS;gene=ORF7a;locus_tag=GU280_gp07;product=ORF7a protein;protein_id=YP_009724395.1
NC_045512.2 RefSeq gene 27756 27887 . + . ID=gene-GU280_gp08;Dbxref=GeneID:43740574;Name=ORF7b;gbkey=Gene;gene=ORF7b;gene_biotype=protein_coding;locus_tag=GU280_gp08
NC_045512.2 RefSeq CDS 27756 27887 . + 0 ID=cds-YP_009725318.1;Parent=gene-GU280_gp08;Dbxref=GenBank:YP_009725318.1,GeneID:43740574;Name=YP_009725318.1;gbkey=CDS;gene=ORF7b;locus_tag=GU280_gp08;product=ORF7b;protein_id=YP_009725318.1
NC_045512.2 RefSeq gene 27894 28259 . + . ID=gene-GU280_gp09;Dbxref=GeneID:43740577;Name=ORF8;gbkey=Gene;gene=ORF8;gene_biotype=protein_coding;locus_tag=GU280_gp09
NC_045512.2 RefSeq CDS 27894 28259 . + 0 ID=cds-YP_009724396.1;Parent=gene-GU280_gp09;Dbxref=GenBank:YP_009724396.1,GeneID:43740577;Name=YP_009724396.1;gbkey=CDS;gene=ORF8;locus_tag=GU280_gp09;product=ORF8 protein;protein_id=YP_009724396.1
NC_045512.2 RefSeq gene 28274 29533 . + . ID=gene-GU280_gp10;Dbxref=GeneID:43740575;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding;locus_tag=GU280_gp10
NC_045512.2 RefSeq CDS 28274 29533 . + 0 ID=cds-YP_009724397.2;Parent=gene-GU280_gp10;Dbxref=GenBank:YP_009724397.2,GeneID:43740575;Name=YP_009724397.2;Note=ORF9%3B structural protein;gbkey=CDS;gene=N;locus_tag=GU280_gp10;product=nucleocapsid phosphoprotein;protein_id=YP_009724397.2
NC_045512.2 RefSeq gene 29558 29674 . + . ID=gene-GU280_gp11;Dbxref=GeneID:43740576;Name=ORF10;gbkey=Gene;gene=ORF10;gene_biotype=protein_coding;locus_tag=GU280_gp11
NC_045512.2 RefSeq CDS 29558 29674 . + 0 ID=cds-YP_009725255.1;Parent=gene-GU280_gp11;Dbxref=GenBank:YP_009725255.1,GeneID:43740576;Name=YP_009725255.1;gbkey=CDS;gene=ORF10;locus_tag=GU280_gp11;product=ORF10 protein;protein_id=YP_009725255.1
NC_045512.2 RefSeq stem_loop 29609 29644 . + . ID=id-GU280_gp11;Dbxref=GeneID:43740576;function=Coronavirus 3' UTR pseudoknot stem-loop 1;gbkey=stem_loop;gene=ORF10;inference=COORDINATES: profile::Rfam-release-14.1:RF00165%2CInfernal:1.1.2;locus_tag=GU280_gp11
NC_045512.2 RefSeq stem_loop 29629 29657 . + . ID=id-GU280_gp11-2;Dbxref=GeneID:43740576;function=Coronavirus 3' UTR pseudoknot stem-loop 2;gbkey=stem_loop;gene=ORF10;inference=COORDINATES: profile::Rfam-release-14.1:RF00165%2CInfernal:1.1.2;locus_tag=GU280_gp11
NC_045512.2 RefSeq three_prime_UTR 29675 29903 . + . ID=id-NC_045512.2:29675..29903;gbkey=3'UTR
NC_045512.2 RefSeq stem_loop 29728 29768 . + . ID=id-NC_045512.2:29728..29768;Note=basepair exception: alignment to the Rfam model implies coordinates 29740:29758 form a noncanonical C:T basepair%2C but the homologous positions form a highly conserved C:G basepair in other viruses%2C including SARS (NC_004718.3);function=Coronavirus 3' stem-loop II-like motif (s2m);gbkey=stem_loop;inference=COORDINATES: profile:Rfam-release-14.1:RF00164%2CInfernal:1.1.2

Loading