-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Personalized Cancer and Network Explorer (PeCax) is a tool for identifying patient specific cancer mechanisms by providing a complete mutational profile from variants to networks. It employs ClinVAP to perform clinical variant annotation which focuses on processing, filtering and prioritization of the variants to find the disrupted genes that are involved in carcinogenesis and to identify actionable variants from the mutational landscape of a patient. In addition it creates networks showing the connections between the driver genes and the genes in their neighbourhood and automatically performs pathway enrichment analysis using pathway resources (SBML4j). Its interactive visualisation (BioGraphVisart) supports easy network exploration and patient similarity (node overlap) and a merged network graph of two patient-specific networks can be calculated.
There are multiple options for input to use PeCaX.
A mandatory Variant Call Format (VCF) file containing somatic variant (SNV) will be analyzed and annotated with ClinVAP. The results will be structured in a report (.json-format) and displayed in interactive tables (ClinVAP Report) once the pipeline is finished. An example file can be found from here.
In addition to the SNV information, CNV information can be annalyzed and annotated. They can be given in an optional .tsv-file. The file has to have the same column names as in the example file here.
The human genome assembly version that was used in the original NGS pipeline while calling the variants needs to be given. The default value is GRCh37.
ClinVAP can analyse the SNV information according to a diagnosis. This should be entered as an ICD10 code.
This is an optional field only necessary if diagnosis is provided. The default value is sort.
- sort: Option to sort results based on their diagnosis similarity score.
- filter: Option to only get the results of a certain cancer type given as diagnosis.
- sort, filter, prioritize: Option to get sorted results for evidence level A, B and C, filtered results for evdence level D and E.
The report of a analyzed vcf-file can be downloaded as a json-file. This file can be loaded again and the contained tables are displayed.
For each .vcf-file analyzed with PeCaX a job id is assigned and the results are stored in a database. The results can be accessed by this job id afterwards.
Clinical Variant Annotation Pipeline (ClinVAP) generates concise case reports from long list of somatic mutations by first annotating them functionally and clinically, then by prioritizing and filtering them based on their variant effect together with clinical actionability (https://doi.org/10.1093/bioinformatics/btz924). For more information visit the ClinVAP documentation.
With functional annotation it achieves to predict the effect of mutations on protein level and prioritize the variants based on their effect or importance. For this we use Ensemble Variant Effect Predictor.
Clinical annotation includes the selection of clinically relevant variants or mutated genes together with the targeting therapeutics mechanistically or with the clinical evidence. It uses background knowledgebase developed in-house using various publicly available data sources.
Variant annotation is done with Ensemble VEP in offline mode to ensure data privacy. VEP plugins SIFT and PolyPhen are used to predict variant effect on protein. Variants are filtered based on passing quality measures of the NGS pipeline used to generate VCF files, their predicted effects by VEP, SIFT and PolyPhen. The remaining variants are passed to the reporting application.
The reporting application uses background knowledgebase to do the clinical annotation. It is created by integrating publicly available databases to contain information of driver gene annotation, clinicaly relevant known pharmacogenomics effects information, adverse effect information, mechanistic drug targets.
The database is queried with observed variants to identify therapeutics with known effects directly on the variants, and with the gene name to identify driver genes and the therapeutics with known effect on the distrupted genes.
It also enables users to filter results more providing the diagnosis information as ICD10 code format. It provides one more layer of prioritization by selecting the results associated with the provided cancer type.
Evidence level is also calculated for the results to show the significance of the associations between reported targets and drugs and their observed effect.
After completing variant and clinical annotation, the ClinVAP Report is shown in the GUI and it can be downloaded as JSON or PDF formats.
The results of the annotation and analysis with ClinVAP are represented in interactive tables (see table below). If information from external sources is available for a gene in the table, the related web links are listed next to the gene name in a dropdown menu with the icon . Next to each table, the related gene network is displayed, if applicable. Each table can be downloaded aas a .pdf-file. These contain information about:
Somatic Mutations in Known Driver Genes
List of cancer driver genes along with the mutations observed in the patient. Consequence column provides the predicted effects of the variants on the protein sequence. Tumor type column gives the list of cohorts in which the gene is identified as driver. VAF (variant allele frequency) column shows the proportion of the variant allele to the coverage of that loci. Reference column represents the driver gene sources that catalogued the corresponding gene as driver. Driver gene information is obtained from Vogelstein et. al, Uniprot, TSGene, IntoGen and COSMIC.
Somatic Mutations with Known Pharmacogenetic Effect
List of drugs with the evidence of targeting the observed variant of the mutated gene, and the documented drug response for the given mutational profile. Evidence level letter represents: A = validated association, B = clinical evidence, C = case study, D = preclinical evidence, E = inferential association. Evidence level number represents the matching type between the observed variant and the database result: 1 = same variant, 2 = different variant, same consequence, 3 = different variant, different consequence, same gene. The information is obtained from CIViC, CGI and DrugBank.
Somatic Mutations in Pharmaceutical Target proteins
-
Pharmacogenomics Summary of Drugs Targeting Affected Genes
Therapies that have evidence of targeting the affected gene. Evidence level letter represents: A = validated association, B = clinical evidence, C = case study, D = preclinical evidence, E = inferential association. Evidence level number represents the matching type between the observed variant and the database result: 1 = same variant, 2 = different variant, same consequence, 3 = different variant, different consequence, same gene. The information is obtained from CIViC, CGI and DrugBank. -
Summary of Cancer Drugs Targeting Affected Genes List of cancer drugs targeting the mutated gene. Information is obtained from DrugBank, Therapeutic Target Database, IUPHAR, and Santos et al.
Adverse Effects
List of drugs with known adverse effects on observed variant and distrupted genes.
References
The publications of the reference IDs given in the tables.
Appendix
All the somatic variants of the patient with their dbSNP and COSMIC IDs.
If you use ClinVAP in your work, please cite the following article:
Sürün, B., Schärfe, C.P., Divine, M.R., Heinrich, J., Toussaint, N.C., Zimmermann, L., Beha, J. and Kohlbacher, O., 2020. ClinVAP: a reporting strategy from variants to therapeutic options. Bioinformatics, 36(7), pp.2316-2317.
SBML4j is a service for persisting Biological models and pathways in SBML format in a graph database. The models will be integrated into one unified knowledge graph from which network mappings are created. There are four types of netowrk mappings available depending on the given models, regulatory, signalling, protein-protein - interaction and metabolic mappings. Those mappings can then be explored, annotated and searched in. A user can run graph algorithms on the mappings and retrieve the created subgraphs in the graphML format. SBML4j is written in Java as a Spring Boot Application and the data is stored in a neo4j graph database instance.
A REST interface is provided to interact with the pathways as well as with the network mappings. Communication with the REST API is JSON based and networks are provided in the graphML format. The full documentation of the API can be found at Swaggerhub. A client can annotate a mapping with arbitrary data on nodes and relationships. SBML4j also offers methods to filter a network by type of node or relationship and individual entities. For one or multiple named nodes (i.e. genes) a network context can be calculated which is stored as a separate network. This enables a user to get a represantion of the network surroundings of a gene of interest or get the up- and/or downstream genes in the given models across standard pathway boundaries. With multiple supplied nodes the same query will calculate a minimal network containing all the given genes, provided they have a known network context.
The SBML4j database used in PeCaX is built from 40 cancer-related pathways from the KEGG pathway database. The pathways are integrated with each other and a models-spanning network mapping is created which is used as the basis for the presented networks. The networks are enriched with drug-target information from Drugbank. This drug-target information is initially assembled using MyDrug (developed at the University of Tübingen, to be published) and added to the SBML4j mappings. Get in touch with us if you want to know more about MyDrug.
Find out more about SBML4j on GitHub or get started with the pre-built docker image (docker pull thortiede/sbml4j:latest).
BioGraphVisart is a web-based tool to interactively visualize networks, especially with biological background. It receives the network from SBML4j as a GraphML-file and displays it.
The nodes of the network represent the genes and related drugs targeting those genes. If multiple drugs have the same gene as target, they are collapsed into one squared node, which can be extended by clicking on it. The nodes are colored according to the information if they are stored in the related table. For the table Somatic Mutations in Known Driver Genes the driver genes in the network are additionally annotated with information of what driver type they are of, which is displayed when the mouse is moved over a node.
The edges between nodes represent the interaction(s) between two genes or a drug and a gene. The type of interaction can be seen in the interaction legend or by moving the mouse over an edge of interest. Multiple edges between two nodes are by default collapsed into one. This can be disabled in the interactions legend.
The most common KEGG Pathways for the displayed genes can be calculated and the according genes can be grouped by them, highlighted by coloured squares.
The network can be downloaded in the file formats PNG and SVG.
BioGraphVisart can also be used as standalone application, have a look at https://kohlbacherlab.github.io/BioGraphVisart/.
For more details have a look at the BioGraphVisart Documentation.
The ClinVAP report displayed in the tables can be downloaded in the formats JSON (can be used as input again) and PDF. Each table can be downloaded individually in the format PDF.
Each single network displayed can be downloaded in the formats PNG and SVG from the BioGraphVisart interface. Additionally, the generated networks can be downloaded all together in the fromat GraphML. A .graphml-file can be used as input for BioGraphVisart as standalone application (https://kohlbacherlab.github.io/BioGraphVisart/).
Disclaimer The report created by ClinVAP is intended as a hypothesis generating framework and thus for research use only, and not for diagnostic or clinical purposes. Information provided in the report does not replace a physician's medical judgment and usage is entirely at your own risk. The providers of this resource shall in no event be liable for any direct, indirect, incidental, consequential, or exemplary damages.