Skip to content
Haruo Suzuki edited this page Apr 8, 2020 · 21 revisions

Creating a tool for phylogenetic analysis of COVID-19 sequence data

Communication

For the time being, there is a #phylogeny channel on the Slack group (check out the [email protected] group for the invitation link). During the BioHackathon, we'll update this section.

Resources

Please check out the Datasets and Tools page.

Any new resources you might have in mind, please add them there directly.

Tools (brainstorm section) - particular tools can be added in the Resources page

  • multiple sequence alignment tools, e.g. clustal omega, muscle, mafft
  • phylogenetic inference tools, e.g. PhyML, RAxML, IQ-TREE, MrBayes (Bayesian), BEAST or BEAST2 (Bayesian)
  • sequence rate evolution analysis tools, e.g. PAML, HyPhy (Phyphy: Python wrapper for Hyphy)
  • Visualization of trees, e.g. ETE toolkit (Python API; has a wrapper for PAML)

Ideas for projects

  • Working on the phylogeny of COVID 19 (similar to this analysis, and more connected to this article in terms of receptors and conserved sites).
  • To be implemented as a rerunnable workflow for when new sequence data come available
  • Easily deployable, runnable in public cloud
  • Connected to other COVID 19 analysis workflows and their emerging I/O standards
  • Comparing phylogenies and compositional features (e.g. G+C, k-mer, and codon composition)

The current list of SARS-CoV-2 sequences GenBank can be used for this purpose, and, if developed as a workflow, it can connect to the "main" public sequence resource deliverable/task - possibly also to the biostatistics and the Machine Learning ones.

As for technical implementation, it would make sense to implement this as a rerunnable workflow (e.g. Snakemake or CWL) that is therefore connected to the Workflows activity. As available sequence data continues to grow, some of the analysis steps are going to become computationally expensive. (For example, running BranchSiteREL or similar analyses.) Hence, we should plan for scaling out to HPC cloud infra.

Participants