Skip to content
Young edited this page Apr 20, 2023 · 9 revisions

Usage

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --reads <path to reads>

There are many steps into getting an assembled bacterial genome. Donut Falls supports Nanopore sequencing of isolates with and without corresponding Illumina fastq files. For metagenomic samples, we recommend NF-core's MAG. Our typical use-case is sequencing isolates on a GridIon and using MinKnow for basecalling and fastq.gz file generation.

---
Basic nanopore workflow
---
flowchart LR

A[isolate bacteria] --> D[sequence]
D --> E[basecalling]
E --> F[Donut Falls]
F --> G[analysis]
Loading

Final results are placed in the value of params.outdir (default = donut_falls), which can be adjusted on the command line or in an input file.

Prepare input files

Fastq files are in the inputs for Donut Falls. Nanopore fastq files are required, and Illumina fastq files are optional (unless specifying hybrid assembly). Donut Falls should also work on fastq files downloaded from the SRA (see Test for more information).

Reading in fastq files from a directory

The most straight-forward method of getting input files into Donut Falls is to put all the Nanopore sequencing files into the same directory, and then specifying that directory on the command line.

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --reads <directory of nanopore reads>

Using a sample sheet

There was an attempt made to match Illumina reads to Nanopore reads in a variety of different ways, but we decided it was too difficult to maintain. Thus, a sample sheet that matches Nanopore reads with Illumina reads can be used as input. This is the only way to input Illumina fastq reads for polishing or hybrid assembly.

The sample file has two required columns and two optional columns

  • 'sample' designate the name used for the isolate that was sequenced
  • 'fastq' designate the Nanopore fastq.gz file
  • 'fastq_1' and 'fastq_2' are optional and designate the forward and reverse Illumina reads

A typical sample file with both Nanopore and Illumina reads

sample,fastq,fastq_1,fastq_2
test,nanopore.fastq.gz,illumina_1.fastq.gz,illumina_2.fastq.gz

An acceptable sample file for just Nanopore reads

sample,fastq
test,long_reads_low_depth.fastq.gz

An acceptable sample file where one sample does not have Illumina reads

sample,fastq,fastq_1,fastq_2
sample1,sample1.fastq.gz,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,sample2.fastq.gz,,

Recommended parameters to adjust

The default workflow should run just fine, but there are some parameters that would improve performance.

  • params.medaka_options (default is '')
    • Medaka performs best when given what kind of model basecaller used. It generally has the format of {pore}_{device}_{caller variant}_{caller version} and specified with -m.
    • Example for data from MinION R9.4.1 flowclells using the fast Guppy basecaller version 3.0.3: params.medaka_options = '-m r941_min_fast_g303'
  • params.filtlong_options (default is '--min_length 1000 --keep_percent 95')
    • Too much coverage can actually harm assembly, so it is better to subsample reads to 50-100X coverage.
    • Example for E. coli with an estimate genome size of 5M, the desired number of bases is about 250M: params.filtlong_options = --target_bases 250000000
  • params.remove (default is remove.txt)
    • This is only used with Trycycler. This csv file specifies with sequences to remove from a cluster during the reconcile step. The format of this file is 'sample,cluster,sequence-to-remove.fasta'.
    • Example : params.remove = remove.csv

Switching assemblers

There are currently several options available for Donut falls that are specified by 'params.assembler'.

De novo assembly of nanopore reads (with or without polishing):

Hybrid assembly (requires Illumina reads)

Choosing a profile

Donut Falls has two profiles for "easy" command line container management.

  • docker : uses Docker to manage containers in the workflow
docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
  • singularity : uses Singularity to manage containers in the workflow
singularity.enabled = true
singularity.autoMounts = true

Using a config file

Config files are a reproducible way to ensure that the same parameters are shared each time a workflow is run. It is common to specify paths to databases and solidify parameter values in config files.

To get a copy of a template config file with every editable parameter, run the following command

nextflow run UPHL-BioNGS/Donut_Falls --config_file true

This will create a config file named edit_me.config into the current directory. This file can be renamed and edited without altering the original workflow. The parameters (also known as params) in this file are all preceded by //, which indicates that they are not in use. The // must be removed for that line to be taken into consideration from the workflow.

To use this config file during runtime, simply specify the config file with -c on the command line.

nextflow run UPHL-BioNGS/Donut_Falls -c edit_me.config

This master config file can also be found at Donut_Falls/configs/donut_falls_template.config.

Relevant parameters (params) including external files and directories and outputs

# input summary file from nanopore sequencing run
params.sequencing_summary          = workflow.launchDir + "/*sequencing_summary*txt"
# sample sheet with information about samples and their corresponding files
params.sample_sheet                = ''
# specifies which subworkflow to use
params.assembler                   = 'flye' // or 'miniasm' or 'lr_unicycler' or 'raven' or 'unicycler' or 'masurca' or 'trycycler' 
# where the results are saved
params.outdir                      = 'donut_falls'
# for Trycycler reconcile
params.remove                      = 'remove.txt'
# directory of nanopore fastq files for input
params.reads                       = ''

Examples

Running a test profile (there are several: test, test1, test2, etc )

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity,test

Defaut usage : flye assembly with fastq files in the directory 'reads'

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --reads reads --assembler flye

Assembly with miniasm and minipolish using a sample sheet named 'SampleSheet.csv'

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet SampleSheet.csv --assembler miniasm

Hybrid assembly with unicycler using docker to manage containers and a sample sheet named 'SampleSheet.csv'

nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet SampleSheet.csv --assembler unicycler

Using a config file to set all params, including container management and sample sheet

The config file

docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
params.assembler = 'flye'
params.flye_options = '--meta'
params.sample_sheet = 'SampleSheet.csv'

The command line

nextflow run UPHL-BioNGS/Donut_Falls -c config.config

Trycycler

Trycycler is not an assembler, but is still designated by the 'assembler' parameter. Trycycler is a useful tool that reconciles generated consensus sequences from other assemblers, but has manual steps. Please see the Trycycler subworkflow wiki page for more information.

nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet SampleSheet.csv --assembler trycycler --remove outliers.txt -resume
Clone this wiki locally