Skip to content

Commit

Permalink
Merge pull request #240 from AntoniaSchuster/dev
Browse files Browse the repository at this point in the history
Add Prodigal
  • Loading branch information
AntoniaSchuster authored Oct 29, 2021
2 parents d63ce61 + 9b7ad0c commit bf83c44
Show file tree
Hide file tree
Showing 10 changed files with 211 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

-[#240](https://github.com/nf-core/mag/pull/240) - Add prodigal to predict protein-coding genes for assemblies

### `Changed`

### `Fixed`
Expand Down
3 changes: 3 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@
* [Porechop](https://github.com/rrwick/Porechop)

* [Prodigal](https://pubmed.ncbi.nlm.nih.gov/20211023/)
> Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648.
* [SAMtools](https://doi.org/10.1093/bioinformatics/btp352)
> Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. doi: 10.1093/bioinformatics/btp352.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ The pipeline then:

* assigns taxonomy to reads using [Centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [Kraken2](https://github.com/DerrickWood/kraken2/wiki)
* performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast)
* predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal)
* performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), and checks the quality of the genome bins using [Busco](https://busco.ezlab.org/)
* assigns taxonomy to bins using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk) and/or [CAT](https://github.com/dutilh/CAT)

Expand Down
6 changes: 6 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -161,5 +161,11 @@ params {
'multiqc' {
args = ""
}
prodigal {
args = "-p meta"
publish_dir = "Prodigal"
output_format = "gff"
publish_by_meta = ['assembler', 'id']
}
}
}
16 changes: 16 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
* [Quality control](#quality-control) of input reads - trimming and contaminant removal
* [Taxonomic classification of trimmed reads](#taxonomic-classification-of-trimmed-reads)
* [Assembly](#assembly) of trimmed reads
* [Protein-coding gene prediction](#gene-prediction) of assemblies
* [Binning](#binning) of assembled contigs
* [Taxonomic classification of binned genomes](#taxonomic-classification-of-binned-genomes)
* [Additional summary for binned genomes](#additional-summary-for-binned-genomes)
Expand Down Expand Up @@ -214,6 +215,21 @@ SPAdesHybrid is a part of the [SPAdes](http://cab.spbu.ru/software/spades/) soft

</details>

## Gene prediction

Protein-coding genes are predicted for each assembly.

<details markdown="1">
<summary>Output files</summary>

* `Prodigal/`
* `[sample/group].gff`: Gene Coordinates in GFF format
* `[sample/group].faa`: The protein translation file consists of all the proteins from all the sequences in multiple FASTA format.
* `[sample/group].fna`: Nucleotide sequences of the predicted proteins using the DNA alphabet, not mRNA (so you will see 'T' in the output and not 'U').
* `[sample/group]_all.txt`: Information about start positions of genes.

</details>

## Binning

### Contig sequencing depth
Expand Down
3 changes: 3 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
},
"fastqc": {
"git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d"
},
"prodigal": {
"git_sha": "49da8642876ae4d91128168cd0db4f1c858d7792"
}
}
}
Expand Down
78 changes: 78 additions & 0 deletions modules/nf-core/modules/prodigal/functions.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
//
// Utility functions used in nf-core DSL2 module files
//

//
// Extract name of software tool from process name using $task.process
//
def getSoftwareName(task_process) {
return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()
}

//
// Extract name of module from process name using $task.process
//
def getProcessName(task_process) {
return task_process.tokenize(':')[-1]
}

//
// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules
//
def initOptions(Map args) {
def Map options = [:]
options.args = args.args ?: ''
options.args2 = args.args2 ?: ''
options.args3 = args.args3 ?: ''
options.publish_by_meta = args.publish_by_meta ?: []
options.publish_dir = args.publish_dir ?: ''
options.publish_files = args.publish_files
options.suffix = args.suffix ?: ''
return options
}

//
// Tidy up and join elements of a list to return a path string
//
def getPathFromList(path_list) {
def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries
paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes
return paths.join('/')
}

//
// Function to save/publish module results
//
def saveFiles(Map args) {
def ioptions = initOptions(args.options)
def path_list = [ ioptions.publish_dir ?: args.publish_dir ]

// Do not publish versions.yml unless running from pytest workflow
if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) {
return null
}
if (ioptions.publish_by_meta) {
def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta
for (key in key_list) {
if (args.meta && key instanceof String) {
def path = key
if (args.meta.containsKey(key)) {
path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key]
}
path = path instanceof String ? path : ''
path_list.add(path)
}
}
}
if (ioptions.publish_files instanceof Map) {
for (ext in ioptions.publish_files) {
if (args.filename.endsWith(ext.key)) {
def ext_list = path_list.collect()
ext_list.add(ext.value)
return "${getPathFromList(ext_list)}/$args.filename"
}
}
} else if (ioptions.publish_files == null) {
return "${getPathFromList(path_list)}/$args.filename"
}
}
48 changes: 48 additions & 0 deletions modules/nf-core/modules/prodigal/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// Import generic module functions
include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions'

params.options = [:]
options = initOptions(params.options)

process PRODIGAL {
tag "$meta.id"
label 'process_low'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }

conda (params.enable_conda ? "bioconda::prodigal=2.6.3" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/prodigal:2.6.3--h516909a_2"
} else {
container "quay.io/biocontainers/prodigal:2.6.3--h516909a_2"
}

input:
tuple val(meta), path(genome)
val(output_format)

output:
tuple val(meta), path("${prefix}.${output_format}"), emit: gene_annotations
tuple val(meta), path("${prefix}.fna"), emit: nucleotide_fasta
tuple val(meta), path("${prefix}.faa"), emit: amino_acid_fasta
tuple val(meta), path("${prefix}_all.txt"), emit: all_gene_annotations
path "versions.yml" , emit: versions

script:
prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
"""
prodigal -i "${genome}" \\
$options.args \\
-f $output_format \\
-d "${prefix}.fna" \\
-o "${prefix}.${output_format}" \\
-a "${prefix}.faa" \\
-s "${prefix}_all.txt"
cat <<-END_VERSIONS > versions.yml
${getProcessName(task.process)}:
${getSoftwareName(task.process)}: \$(prodigal -v 2>&1 | sed -n 's/Prodigal V\\(.*\\):.*/\\1/p')
END_VERSIONS
"""
}
41 changes: 41 additions & 0 deletions modules/nf-core/modules/prodigal/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: prodigal
description: Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program
keywords:
- sort
tools:
- prodigal:
description: Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program
homepage: {}
documentation: {}
tool_dev_url: {}
doi: ""
licence: ["GPL v3"]

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: BAM/CRAM/SAM file
pattern: "*.{bam,cram,sam}"

output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- bam:
type: file
description: Sorted BAM/CRAM/SAM file
pattern: "*.{bam,cram,sam}"

authors:
- "@grst"
13 changes: 13 additions & 0 deletions workflows/mag.nf
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ include { GTDBTK } from '../subworkflows/local/gtdbtk'
include { FASTQC as FASTQC_RAW } from '../modules/nf-core/modules/fastqc/main' addParams( options: modules['fastqc_raw'] )
include { FASTQC as FASTQC_TRIMMED } from '../modules/nf-core/modules/fastqc/main' addParams( options: modules['fastqc_trimmed'] )
include { FASTP } from '../modules/nf-core/modules/fastp/main' addParams( options: modules['fastp'] )
include { PRODIGAL } from '../modules/nf-core/modules/prodigal/main' addParams( options: modules['prodigal'] )

////////////////////////////////////////////////////
/* -- Create channel for reference databases -- */
Expand Down Expand Up @@ -466,6 +467,18 @@ workflow MAG {
ch_software_versions = ch_software_versions.mix(QUAST.out.version.first().ifEmpty(null))
}

/*
================================================================================
Predict proteins
================================================================================
*/

PRODIGAL (
ch_assemblies,
modules['prodigal']['output_format']
)
ch_software_versions = ch_software_versions.mix(PRODIGAL.out.versions.first().ifEmpty(null))

/*
================================================================================
Binning
Expand Down

0 comments on commit bf83c44

Please sign in to comment.