-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Thanks for checking out the bohra wiki! Here you will find information regarding installation options, pipeline options, setting up a run and see some sample reports.
Bohra is microbial genomics pipeline, designed predominantly for use in public health, but may also be useful in research settings. At a minimum the pipeline takes as input a tab-delimited file with the isolate IDs followed by the path to READ1 and READ2, where reads are illumina reads (other platforms are not supported at this stage).
Bohra was inspired by Nullarbor (https://github.com/tseemann/nullarbor) to be used in public health microbiology labs for analysis of short reads from microbiological samples. The workflow itself is written in Nextflow with a 'runner' tool for setup, written in python.
Overall the bohra workflow has 7 modules.
- Read assessment - assesses the sequence data input, number of reads, Qscore, estimated genome size and coverage. This module also uses kraken2 to determine kmer ID.
- Preview - this module uses mash and quicktree to provide a rapid overview of the sequences you provided.
- SNPs - with use the reads and reference provided to identify variants, calculate the core genome and determine pairwise SNPs
- Phylogeny - uses the results of the SNPs module to make a ML tree using iqtree
- Assemble - will assemble the reads provided using shovill, spades or skesa. This step will also annotate the assemblies and determine the quality metrics of the assemblies. You can provide pre-made assemblies and then just annotation and quality will be undertaken.
- AMR and typing - this module uses assemblies to identify AMR determinants, MLST and also serotyping.
- Panaroo - this module uses the gff files create in module 5 to calculate the pangenome of the dataset. This is only performed as part of the full pipeline.