Skip to content

Commit

Permalink
Merge pull request #7 from nf-core/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
d4straub authored Jul 14, 2020
2 parents c1d6dff + 8c1c101 commit b08d336
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 19 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Add host read removal with Bowtie 2 and according custom section to MultiQC
- Add separate MultiQC section for FastQC after preprocessing
- Add social preview image
- Export depth.txt.gz into result folder
- Compress assembly files
- Add MetaBAT2 RNG seed parameter `--metabat_rng_seed` and set the default to 1 which ensures reproducible binning results
- Add parameters `--megahit_fix_cpu_1`, `--spades_fix_cpus` and `--spadeshybrid_fix_cpus` to ensure reproducible results from assembly tools

Expand All @@ -25,6 +27,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Fixed channel joining for multiple samples causing MetaBAT2 error [#32](https://github.com/nf-core/mag/issues/32)
- Update MetaBAT2 from v2.13 to v2.15
- Fix number of threads used by MetaBAT2 program `jgi_summarize_bam_contig_depths`
- Fix SPAdes memory conversion issue [#70](https://github.com/nf-core/mag/pull/70)
- No more ignoring errors in SPAdes assembly
- No more ignoring of BUSCO errors

### `Deprecated`

Expand Down
10 changes: 6 additions & 4 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ process {
time = 4.h
}
withName: busco {
errorStrategy = { task.exitStatus in [143,137] ? 'retry' : 'ignore' }
errorStrategy = { task.exitStatus in [143,137] ? 'retry' : 'finish' }
}
withName: phix_download_db {
time = 4.h
Expand Down Expand Up @@ -91,13 +91,15 @@ process {
cpus = { check_spades_cpus (10, task.attempt) }
memory = { check_max (64.GB * (2**(task.attempt-1)), 'memory' ) }
time = { check_max (24.h * (2**(task.attempt-1)), 'time' ) }
errorStrategy = { task.exitStatus in [143,137,1] ? 'retry' : 'ignore' }
errorStrategy = { task.exitStatus in [143,137,1] ? 'retry' : 'finish' }
maxRetries = 5
}
withName: spadeshybrid {
cpus = { check_spadeshybrid_cpus (10, task.attempt) }
memory = { check_max (64.GB * (2**(task.attempt-1)), 'memory' ) }
time = { check_max (24.h * (2**(task.attempt-1)), 'time' ) }
errorStrategy = { task.exitStatus in [143,137,1] ? 'retry' : 'ignore' }
errorStrategy = { task.exitStatus in [143,137,1] ? 'retry' : 'finish' }
maxRetries = 5
}
withName: bowtie2 {
cpus = { check_max (2 * task.attempt, 'cpus' ) }
Expand All @@ -112,4 +114,4 @@ process {
withName:get_software_versions {
cache = false
}
}
}
18 changes: 15 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl

**Output directory: `results/MEGAHIT`**

- `${sample}.contigs.fasta`: metagenome assembly in fasta format
- `${sample}.contigs.fa.gz`: compressed metagenome assembly in fasta format
- `${sample}_QC/`: directory containing QUAST files

### SPAdes
Expand All @@ -132,7 +132,9 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl

**Output directory: `results/SPAdes`**

- `${sample}_contigs.fasta`: metagenome assembly in fasta format
- `${sample}_scaffolds.fasta.gz`: compressed assembled scaffolds in fasta format
- `${sample}_graph.gfa.gz`: compressed assembly graph in gfa format
- `${sample}_contigs.fasta.gz`: compressed assembled contigs in fasta format
- `${sample}_QC/`: directory containing QUAST files

### SPAdesHybrid
Expand All @@ -141,7 +143,9 @@ SPAdesHybrid is a part of the SPAdes software and is used when the user provides

**Output directory: `results/SPAdesHybrid`**

- `${sample}_contigs.fasta`: metagenome assembly in fasta format
- `${sample}_scaffolds.fasta.gz`: compressed assembled scaffolds in fasta format
- `${sample}_graph.gfa.gz`: compressed assembly graph in gfa format
- `${sample}_contigs.fasta.gz`: compressed assembled contigs in fasta format
- `${sample}_QC/`: directory containing QUAST files

### Quast
Expand All @@ -154,6 +158,14 @@ SPAdesHybrid is a part of the SPAdes software and is used when the user provides

## Binning

### Contig sequencing depth

Sequencing depth per contig and sample is generated by `jgi_summarize_bam_contig_depths --outputDepth`. The values correspond to `(sum of exactely aligned bases) / ((contig length)-2*75)`. For example, for two reads aligned exactly with `10` and `9` bases on a 1000 bp long contig the depth is calculated by `(10+9)/(1000-2*75)` (1000bp length of contig minus 75bp from each end, which is excluded).

**output directory: `results/GenomeBinning`**

- `<assembler>-<assembly>-depth.txt.gz`: Sequencing depth for each contig and sample, only for short reads.

### Metabat

[metabat](https://bitbucket.org/berkeleylab/metabat) recovers genome bins (that is, contigs/scaffolds that all belongs to a same organism) from metagenome assemblies. Additionally, Quast is run again on all the genome bins.
Expand Down
41 changes: 29 additions & 12 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -888,14 +888,17 @@ process krona {
process megahit {
tag "$name"
publishDir "${params.outdir}/", mode: 'copy',
saveAs: {filename -> filename.indexOf(".fastq.gz") == -1 ? "Assembly/$filename" : null}
saveAs: {filename ->
if (filename.indexOf(".log") > 0 || filename.indexOf(".contigs.fa.gz") > 0 ) "Assembly/$filename"
else null}

input:
set val(name), file(reads) from trimmed_reads_megahit

output:
set val("MEGAHIT"), val("$name"), file("MEGAHIT/${name}.contigs.fa") into (assembly_megahit_to_quast, assembly_megahit_to_metabat)
file("MEGAHIT/*.log")
file("MEGAHIT/${name}.contigs.fa.gz")

when:
!params.skip_megahit
Expand All @@ -905,6 +908,7 @@ process megahit {
if ( !params.megahit_fix_cpu_1 || task.cpus == 1 )
"""
megahit -t "${task.cpus}" $input -o MEGAHIT --out-prefix "${name}"
gzip -c "MEGAHIT/${name}.contigs.fa" > "MEGAHIT/${name}.contigs.fa.gz"
"""
else
error "ERROR: '--megahit_fix_cpu_1' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file."
Expand All @@ -922,27 +926,30 @@ process megahit {
process spadeshybrid {
tag "$id"
publishDir "${params.outdir}/", mode: 'copy', pattern: "${id}*",
saveAs: {filename -> filename.indexOf(".fastq.gz") == -1 ? "Assembly/SPAdesHybrid/$filename" : null}
saveAs: {filename ->
if (filename.indexOf(".log") > 0 || filename.indexOf("_scaffolds.fasta.gz") > 0 || filename.indexOf("_graph.gfa.gz") > 0 || filename.indexOf("_contigs.fasta.gz") > 0 ) "Assembly/SPAdesHybrid/$filename"
else null}

input:
set id, file(lr), file(sr) from files_pre_spadeshybrid

output:
set id, val("SPAdesHybrid"), file("${id}_graph.gfa") into assembly_graph_spadeshybrid
set val("SPAdesHybrid"), val("$id"), file("${id}_scaffolds.fasta") into (assembly_spadeshybrid_to_quast, assembly_spadeshybrid_to_metabat)
file("${id}_contigs.fasta")
file("${id}_log.txt")
file("${id}_contigs.fasta.gz")
file("${id}_scaffolds.fasta.gz")
file("${id}_graph.gfa.gz")

when:
params.manifest && !params.single_end && !params.skip_spadeshybrid

script:
def maxmem = "${task.memory.toString().replaceAll(/[\sGB]/,'')}"
maxmem = task.memory.toGiga()
if ( !params.spadeshyrid_fix_cpus || task.cpus == params.spadeshybrid_fix_cpus )
"""
metaspades.py \
--threads "${task.cpus}" \
--memory "$maxmem" \
--memory $maxmem \
--pe1-1 ${sr[0]} \
--pe1-2 ${sr[1]} \
--nanopore ${lr} \
Expand All @@ -951,6 +958,9 @@ process spadeshybrid {
mv spades/scaffolds.fasta ${id}_scaffolds.fasta
mv spades/contigs.fasta ${id}_contigs.fasta
mv spades/spades.log ${id}_log.txt
gzip "${id}_contigs.fasta"
gzip "${id}_graph.gfa"
gzip -c "${id}_scaffolds.fasta" > "${id}_scaffolds.fasta.gz"
"""
else
error "ERROR: '--spadeshyrid_fix_cpus' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file."
Expand All @@ -960,34 +970,39 @@ process spadeshybrid {
process spades {
tag "$id"
publishDir "${params.outdir}/", mode: 'copy', pattern: "${id}*",
saveAs: {filename -> filename.indexOf(".fastq.gz") == -1 ? "Assembly/SPAdes/$filename" : null}

saveAs: {filename ->
if (filename.indexOf(".log") > 0 || filename.indexOf("_scaffolds.fasta.gz") > 0 || filename.indexOf("_graph.gfa.gz") > 0 || filename.indexOf("_contigs.fasta.gz") > 0 ) "Assembly/SPAdes/$filename"
else null}
input:
set id, file(sr) from trimmed_reads_spades

output:
set id, val("SPAdes"), file("${id}_graph.gfa") into assembly_graph_spades
set val("SPAdes"), val("$id"), file("${id}_scaffolds.fasta") into (assembly_spades_to_quast, assembly_spades_to_metabat)
file("${id}_contigs.fasta")
file("${id}_log.txt")
file("${id}_contigs.fasta.gz")
file("${id}_scaffolds.fasta.gz")
file("${id}_graph.gfa.gz")

when:
!params.single_end && !params.skip_spades

script:
def maxmem = "${task.memory.toString().replaceAll(/[\sGB]/,'')}"
maxmem = task.memory.toGiga()
if ( !params.spades_fix_cpus || task.cpus == params.spades_fix_cpus )
"""
metaspades.py \
--threads "${task.cpus}" \
--memory "$maxmem" \
--memory $maxmem \
--pe1-1 ${sr[0]} \
--pe1-2 ${sr[1]} \
-o spades
mv spades/assembly_graph_with_scaffolds.gfa ${id}_graph.gfa
mv spades/scaffolds.fasta ${id}_scaffolds.fasta
mv spades/contigs.fasta ${id}_contigs.fasta
mv spades/spades.log ${id}_log.txt
gzip "${id}_contigs.fasta"
gzip "${id}_graph.gfa"
gzip -c "${id}_scaffolds.fasta" > "${id}_scaffolds.fasta.gz"
"""
else
error "ERROR: '--spades_fix_cpus' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file."
Expand Down Expand Up @@ -1072,6 +1087,7 @@ process metabat {
output:
set val(assembler), val(sample), file("MetaBAT2/*.fa") into (metabat_bins, metabat_bins_for_cat, metabat_bins_quast_bins)
file("MetaBAT2/discarded/*")
file("${assembler}-${assembly}-depth.txt.gz")

when:
!params.skip_binning
Expand All @@ -1080,6 +1096,7 @@ process metabat {
def name = "${assembler}-${sample}"
"""
OMP_NUM_THREADS=${task.cpus} jgi_summarize_bam_contig_depths --outputDepth depth.txt ${bam}
gzip -c depth.txt > "${assembler}-${assembly}-depth.txt.gz"
metabat2 -t "${task.cpus}" -i "${assembly}" -a depth.txt -o "MetaBAT2/${name}" -m ${min_size} --unbinned --seed ${params.metabat_rng_seed}
#save unbinned contigs above thresholds into individual files, dump others in one file
Expand Down

0 comments on commit b08d336

Please sign in to comment.