-
Notifications
You must be signed in to change notification settings - Fork 7
Usage
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --reads <path to reads>
There are many steps into getting an assembled bacterial genome. Donut Falls supports Nanopore sequencing of isolates with and without corresponding Illumina fastq files. For metagenomic samples, we recommend NF-core's MAG. Our typical use-case is sequencing isolates on a GridIon and using MinKnow for basecalling and fastq.gz file generation.
---
Basic nanopore workflow
---
flowchart LR
A[isolate bacteria] --> D[sequence]
D --> E[basecalling]
E --> F[Donut Falls]
F --> G[analysis]
Final results are placed in the value of params.outdir
(default = donut_falls), which can be adjusted on the command line or in an input file.
Fastq files are in the inputs for Donut Falls. Nanopore fastq files are required, and Illumina fastq files are optional (unless specifying hybrid assembly). Donut Falls should also work on fastq files downloaded from the SRA (see Test for more information).
The most straight-forward method of getting input files into Donut Falls is to put all the Nanopore sequencing files into the same directory, and then specifying that directory on the command line.
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --reads <directory of nanopore reads>
There was an attempt made to match Illumina reads to Nanopore reads in a variety of different ways, but we decided it was too difficult to maintain. Thus, a sample sheet that matches Nanopore reads with Illumina reads can be used as input. This is the only way to input Illumina fastq reads for polishing or hybrid assembly.
The sample file has two required columns and two optional columns
- 'sample' designate the name used for the isolate that was sequenced
- 'fastq' designate the Nanopore fastq.gz file
- 'fastq_1' and 'fastq_2' are optional and designate the forward and reverse Illumina reads
A typical sample file with both Nanopore and Illumina reads
sample,fastq,fastq_1,fastq_2
test,nanopore.fastq.gz,illumina_1.fastq.gz,illumina_2.fastq.gz
An acceptable sample file for just Nanopore reads
sample,fastq
test,long_reads_low_depth.fastq.gz
An acceptable sample file where one sample does not have Illumina reads
sample,fastq,fastq_1,fastq_2
sample1,sample1.fastq.gz,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,sample2.fastq.gz,,
The default workflow should run just fine, but there are some parameters that would improve performance.
- params.medaka_options (default is '')
- Medaka performs best when given what kind of model basecaller used. It generally has the format of
{pore}_{device}_{caller variant}_{caller version}
and specified with-m
. - Example for data from MinION R9.4.1 flowclells using the fast Guppy basecaller version 3.0.3:
params.medaka_options = '-m r941_min_fast_g303'
- Medaka performs best when given what kind of model basecaller used. It generally has the format of
- params.filtlong_options (default is '--min_length 1000 --keep_percent 95')
- Too much coverage can actually harm assembly, so it is better to subsample reads to 50-100X coverage.
- Example for E. coli with an estimate genome size of 5M, the desired number of bases is about 250M:
params.filtlong_options = '--min_length 1000 --target_bases 250000000'
- params.remove (default is remove.txt)
- This is only used with Trycycler. This csv file specifies with sequences to remove from a cluster during the reconcile step. The format of this file is 'sample,cluster,sequence-to-remove.fasta'.
- Example :
params.remove = 'remove.csv'
There are currently several options available for Donut falls that are specified by 'params.assembler'.
De novo assembly of nanopore reads (with or without polishing):
- flye (default)
- miniasm
- raven
- lr_unicycler (yes, unicycler has a long-read only mode)
Hybrid assembly (requires Illumina reads)
Donut Falls has two profiles for "easy" command line container management.
- docker : uses Docker to manage containers in the workflow
docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
- singularity : uses Singularity to manage containers in the workflow
singularity.enabled = true
singularity.autoMounts = true
Config files are a reproducible way to ensure that the same parameters are shared each time a workflow is run. It is common to specify paths to databases and solidify parameter values in config files.
To get a copy of a template config file with every editable parameter, run the following command
nextflow run UPHL-BioNGS/Donut_Falls --config_file true
This will create a config file named edit_me.config
into the current directory. This file can be renamed and edited without altering the original workflow. The parameters (also known as params) in this file are all preceded by //
, which indicates that they are not in use. The //
must be removed for that line to be taken into consideration from the workflow.
To use this config file during runtime, simply specify the config file with -c
on the command line.
nextflow run UPHL-BioNGS/Donut_Falls -c edit_me.config
This master config file can also be found at Donut_Falls/configs/donut_falls_template.config.
# input summary file from nanopore sequencing run
params.sequencing_summary = workflow.launchDir + "/*sequencing_summary*txt"
# sample sheet with information about samples and their corresponding files
params.sample_sheet = ''
# specifies which subworkflow to use
params.assembler = 'flye' // or 'miniasm' or 'lr_unicycler' or 'raven' or 'unicycler' or 'masurca' or 'trycycler'
# where the results are saved
params.outdir = 'donut_falls'
# for Trycycler reconcile
params.remove = 'remove.txt'
# directory of nanopore fastq files for input
params.reads = ''
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity,test
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --reads reads --assembler flye
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet SampleSheet.csv --assembler miniasm
Hybrid assembly with unicycler using docker to manage containers and a sample sheet named 'SampleSheet.csv'
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet SampleSheet.csv --assembler unicycler
The config file
docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
params.assembler = 'flye'
params.flye_options = '--meta'
params.sample_sheet = 'SampleSheet.csv'
The command line
nextflow run UPHL-BioNGS/Donut_Falls -c config.config
Trycycler is not an assembler, but is still designated by the 'assembler' parameter. Trycycler is a useful tool that reconciles generated consensus sequences from other assemblers, but has manual steps. Please see the Trycycler subworkflow wiki page for more information.
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet SampleSheet.csv --assembler trycycler --remove outliers.txt -resume