Skip to content
GenerGener edited this page Apr 5, 2020 · 44 revisions

Pangenome and variation graph

We will used tools supporting the variation graph data model, as described at the pangenome tools and workflows page, to build and distribute pangenome data structures from SARS-CoV-2 genomes. These models are useful for diagnostic and resequencing applications. They can also help us generate assemblies from raw sequencing data.

Communication

Josiah: After some debate, I decided to create a second project for a Pangenome Browser. This depends on Variation graph construction, but it's certainly a different set of tasks that can be carried out independently. In order for a browser to be effective, we must have annotations aggregated/curated as a third task. I would still like us to coordinate closely.

Specific use cases.

In my personal opinion (Ben Busby), it may be particularly productive to look at the less conserved satellite genes near the S locus.

We may also want to look at amino acid 614 of the spike protein. This is in an unstructured loop, presumably between presumably transmembrane helices. This may be involved in immune evasion. We should look at correspondence between SARS-1 and SARS-2 at this position.

Being able to see these loci in the context of each other, as well as the CoVID genomes in general, may be very beneficial in terms of subclassing the virus wrt human reaction.

Seconding Ben Busby's point. These types of variable regions are seen in other RNA viruses, like HIV for sure. Some are also tied to biological switches, believed to control tropism between tissues (V3 region of gp120). We might find something like that for SARS-CoV-2 if variants at the protein/amino acid level. Would be interesting to see if we can look at variants near recently annotated RNA features as well. See some sources below.

HIV receptor tropism RE: V3 (some papers)

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3195025/

V3 and V2

And in HIV-2 RNA preprint

Participants

Clone this wiki locally