Skip to content
Karin Lagesen edited this page Aug 17, 2017 · 11 revisions

Disclaimer

This is pre-publication software that is currently under active development. Use it at your own risk. Bug reports are welcome, but a user cannot depend on getting support at this time.

The Bifrost genomic epidemiology pipeline

Pipeline for analyzing genomic read sets for public and animal health purposes.

Author: Karin Lagesen, @karinlag

Contact information: please submit an issue, and the author will get back to you.


Synopsis

This software uses the Nextflow.io workflow system to run various analyses appropriate for genomic epidemiology and comparative microbiology purposes. The Nextflow system allows for running the same pipeline on a local computer and on a cluster without changing the code.

Installation

For installation, see the installation pages. Please note: this software has at the time of writing (August 2017) not been tested on any other systems than Ubuntu and on the University of Oslo/Abel cluster (i.e. under slurm).

How to run

For details on how to run, see the Run pages. The pipeline consists of a run script which enables the running of several different tracks. For each track, a nextflow script, a template config file and a template profile file is provided. For each compute system, the profile file needs to be adjusted to ensure that it has the right locations for software, etc. Once this is done, that profile file should not need modification. For each run, the template config script should be modified to specify specific things for that run, such as input data, species, databases needed, options to software, etc.

Current capabilities

The pipeline has been developed as a series of tracks, where each track has a specific input and a set of logically connected analyses. Each track comes with its own nextflow script and a separate config file, which is used to specify inputs and software options to that specific run.

The current pipeline contains the following tracks:

  • Track One: Quality control
    • Fastqc is run on all input files, followed by an analysis of the results to help pinpoint bad runs.
  • Track Two: Assembly
    • Sequences are trimmed with Trimmomatic before assembly with SPAdes. The user can modify most options to both Trimmomatic and SPAdes as they see fit. Once all assemblies are completed, QUAST is run to evaluate results.
  • Track Three: MLST, virulence and AMR annotation
    • The software ARIBA is used to annotate MLST, virulence and AMR directly from reads. The user must specify species, which AMR and virulence database to use, and in the case of E. coli, which of the two available MLST schemas to use.

Planned features:

The following features are planned for future releases:

  • Species identification
  • SNP tree analyses, probably both with parsnp and kSNP
  • Pan-genome analysis, probably using ROARY

For University of Oslo Abel users

This software is already available at the UiO Abel cluster. Please see the University of Oslo Abel pages for how to run the software on the cluster.


Clone this wiki locally