Assembly, binning and annotation of metagenomes.
This pipeline is for assembly, binning and annotation of metagenomes. It supports both short and long reads, quality trims the reads and adapters with https://github.com/OpenGene/fastp and https://github.com/rrwick/Porechop and performs basic QC with https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
The pipeline then:
- assigns taxonomy to reads using https://ccb.jhu.edu/software/centrifuge/ and/or https://ccb.jhu.edu/software/kraken2/
- performs assembly using https://github.com/voutcn/megahit and http://cab.spbu.ru/software/spades/, and checks their quality using http://quast.sourceforge.net/quast
- performs metagenome binning using https://bitbucket.org/berkeleylab/metabat/src/master/, and checks the quality of the genome bins using https://busco.ezlab.org/
Furthermore, the pipeline creates various reports in the results directory specified, including a https://multiqc.info/ report summarizing some of the findings and software versions.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
The nf-core/mag pipeline comes with documentation about the pipeline, found in the docs/
directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
This pipeline was written by Hadrien Gourlé at SLU and Daniel Straub (@d4straub).
Long read processing was inspired by caspargross/HybridAssembly written by Caspar Gross @caspargross