Skip to content

Commit

Permalink
Merge pull request #494 from jfy133/save-mapped-reads
Browse files Browse the repository at this point in the history
Add ability to save assembly-mapped reads
  • Loading branch information
jfy133 authored Aug 16, 2023
2 parents e67b6a1 + 6bdbad0 commit 2090c56
Show file tree
Hide file tree
Showing 5 changed files with 28 additions and 12 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#395](https://github.com/nf-core/mag/pull/395) - Add support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
- [#395](https://github.com/nf-core/mag/pull/395) - Adds support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
- [#422](https://github.com/nf-core/mag/pull/422) - Adds support for normalization of read depth with BBNorm (added by @erikrikarddaniel and @fabianegli)
- [#439](https://github.com/nf-core/mag/pull/439) - Adds ability to enter the pipeline at the binning stage by providing a CSV of pre-computed assemblies (by @prototaxites)
- [#459](https://github.com/nf-core/mag/pull/459) - Adds ability to skip damage correction step in the ancient DNA workflow and just run pyDamage (by @jfy133)
- [#364](https://github.com/nf-core/mag/pull/364) - Adds geNomad nf-core modules for identifying viruses in assemblies (by @PhilPalmer and @CarsonJM)
- [#481](https://github.com/nf-core/mag/pull/481) - Adds MetaEuk for annotation of eukaryotic MAGs, and MMSeqs2 to enable downloading databases for MetaEuk (by @prototaxites)
- [#437](https://github.com/nf-core/mag/pull/429) - `--gtdb_db` also now supports directory input of an pre-uncompressed GTDB archive directory (reported by @alneberg, fix by @jfy133)
- [#494](https://github.com/nf-core/mag/pull/494) - Adds support for saving the BAM files from Bowtie2 mapping of input reads back to assembly (fix by @jfy133)

### `Changed`

Expand All @@ -38,7 +39,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#449](https://github.com/nf-core/mag/pull/447) - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133)
- [#470](https://github.com/nf-core/mag/pull/470) - Fix binning preparation from running even when binning was requested to be skipped (reported by @prototaxites, fix by @jfy133)
- [#480](https://github.com/nf-core/mag/pull/480) - Improved `-resume` reliability through better meta map preservation (reported by @prototaxites, fix by @jfy133)
- [#493](https://github.com/nf-core/mag/pull/493) - Update `METABAT2` nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems, fix by @adamrtalbot)
- [#493](https://github.com/nf-core/mag/pull/493) - Update `METABAT2` nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems (fix by @adamrtalbot)

### `Dependencies`

Expand Down
14 changes: 11 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -305,9 +305,17 @@ process {
ext.args = params.bowtie2_mode ? params.bowtie2_mode : params.ancient_dna ? '--very-sensitive -N 1' : ''
ext.prefix = { "${meta.id}.assembly" }
publishDir = [
path: { "${params.outdir}/Assembly/${assembly_meta.assembler}/QC/${assembly_meta.id}" },
mode: params.publish_dir_mode,
pattern: "*.log"
[
path: { "${params.outdir}/Assembly/${assembly_meta.assembler}/QC/${assembly_meta.id}" },
mode: params.publish_dir_mode,
pattern: "*.log"
],
[
path: { "${params.outdir}/Assembly/${assembly_meta.assembler}/QC/${assembly_meta.id}" },
mode: params.publish_dir_mode,
pattern: "*.{bam,bai}",
enabled: params.save_assembly_mapped_reads
],
]
}

Expand Down
3 changes: 3 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl
- `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs
- `MEGAHIT-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set.
- `MEGAHIT-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
- `MEGAHIT-[sample].[bam/bai]`: Optionally saved BAM file of the Bowtie2 mapping of reads against the assembly.

</details>

Expand All @@ -211,6 +212,7 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl
- `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs
- `SPAdes-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set.
- `SPAdes-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
- `SPAdes-[sample].[bam/bai]`: Optionally saved BAM file of the Bowtie2 mapping of reads against the assembly.

</details>

Expand All @@ -229,6 +231,7 @@ SPAdesHybrid is a part of the [SPAdes](http://cab.spbu.ru/software/spades/) soft
- `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs
- `SPAdesHybrid-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set.
- `SPAdesHybrid-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
- `SPAdesHybrid-[sample].[bam/bai]`: Optionally saved BAM file of the Bowtie2 mapping of reads against the assembly.

</details>

Expand Down
1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ params {
// binning options
bowtie2_mode = null
binning_map_mode = 'group'
save_assembly_mapped_reads = false
skip_binning = false
min_contig_size = 1500
min_length_unbinned_contigs = 1000000
Expand Down
17 changes: 10 additions & 7 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -513,8 +513,7 @@
},
"skip_gtdbtk": {
"type": "boolean",
"description": "Skip the running of GTDB, as well as the automatic download of the database",
"default": "false"
"description": "Skip the running of GTDB, as well as the automatic download of the database"
},
"gtdb_db": {
"type": "string",
Expand All @@ -523,23 +522,23 @@
},
"gtdbtk_min_completeness": {
"type": "number",
"default": 50.0,
"default": 50,
"description": "Min. bin completeness (in %) required to apply GTDB-tk classification.",
"help_text": "Completeness assessed with BUSCO analysis (100% - %Missing). Must be greater than 0 (min. 0.01) to avoid GTDB-tk errors. If too low, GTDB-tk classification results can be impaired due to not enough marker genes!",
"minimum": 0.01,
"maximum": 100
},
"gtdbtk_max_contamination": {
"type": "number",
"default": 10.0,
"default": 10,
"description": "Max. bin contamination (in %) allowed to apply GTDB-tk classification.",
"help_text": "Contamination approximated based on BUSCO analysis (%Complete and duplicated). If too high, GTDB-tk classification results can be impaired due to contamination!",
"minimum": 0,
"maximum": 100
},
"gtdbtk_min_perc_aa": {
"type": "number",
"default": 10.0,
"default": 10,
"description": "Min. fraction of AA (in %) in the MSA for bins to be kept.",
"minimum": 0,
"maximum": 100
Expand All @@ -553,7 +552,7 @@
},
"gtdbtk_pplacer_cpus": {
"type": "number",
"default": 1.0,
"default": 1,
"description": "Number of CPUs used for the by GTDB-Tk run tool pplacer.",
"help_text": "A low number of CPUs helps to reduce the memory required/reported by GTDB-Tk. See also the [GTDB-Tk documentation](https://ecogenomics.github.io/GTDBTk/faq.html#gtdb-tk-reaches-the-memory-limit-pplacer-crashes)."
},
Expand Down Expand Up @@ -649,7 +648,6 @@
"properties": {
"run_virus_identification": {
"type": "boolean",
"default": false,
"description": "Run virus identification."
},
"genomad_min_score": {
Expand Down Expand Up @@ -715,6 +713,11 @@
"description": "Bowtie2 alignment mode",
"help_text": "Bowtie2 alignment mode options, for example: `--very-fast` , `--very-sensitive-local -N 1` , ..."
},
"save_assembly_mapped_reads": {
"type": "boolean",
"description": "Save the output of mapping raw reads back to assembled contigs",
"help_text": "Specify to save the BAM and BAI files generated when mapping input reads back to the assembled contigs (performed in preparation for binning and contig depth estimations)."
},
"bin_domain_classification": {
"type": "boolean",
"description": "Enable domain-level (prokaryote or eukaryote) classification of bins using Tiara. Processes which are domain-specific will then only receive bins matching the domain requirement.",
Expand Down

0 comments on commit 2090c56

Please sign in to comment.