Skip to content

grp-bork/gunc_workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GUNC workflow

Bork Group Logo Developed by the Bork Group in collaboration with nf-core
Raise an issue or contact us

See our other Software & Services
Contributors:
Collaborators:
The development of this workflow was supported by NFDI4Microbiota NFDI4Microbiota icon

Description

The GUNC workflow is a nextflow workflow for the detection of chimerism & contamination in prokaryotic genomes resulting from mis-binning of contigs from unrelated lineages. The workflow is based on the CheckM and GUNC (Genome UNClutterer) tools. GUNC applies an entropy based score on taxonomic assignment and the contig location of all genes in a genome.

Citation

This workflow: DOI

Also cite:

Orakov A, Fullam A, Coelho LP, et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 2021;22(1):178. Published 2021 Jun 13. doi:10.1186/s13059-021-02393-0
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25(7):1043-1055. doi:10.1101/gr.186072.114
Ewels PA, Peltzer A, Fillinger S, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020;38(3):276-278. doi:10.1038/s41587-020-0439-x

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.


Overview

  1. Run CheckM (CheckM)
  2. Run GUNC (GUNC)

Usage

Cloud-based Workflow Manager (CloWM)

This workflow will be available on the CloWM platform (coming soon).

Command-Line Interface (CLI)

You can run the pipeline using:

nextflow run gunc \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Input files

The input is a csv samplesheet with your input data that looks as follows:

samplesheet.csv:

id,group,assembler,fasta
test_minigut,0,MEGAHIT,https://github.com/nf-core/test-datasets/raw/mag/assemblies/MEGAHIT-test_minigut.contigs.fa.gz

Each row represents a metagenomic bin.