Skip to content

Commit

Permalink
Merge pull request #692 from muabnezor/add_chopper_nanoq
Browse files Browse the repository at this point in the history
Add chopper and nanoq options for longread preprocessing
  • Loading branch information
muabnezor authored Nov 22, 2024
2 parents cd5ebae + e978c23 commit ebb4283
Show file tree
Hide file tree
Showing 21 changed files with 964 additions and 30 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#692](https://github.com/nf-core/mag/pull/692) - Added Nanoq as optional longread filtering tool (added by @muabnezor)
- [#692](https://github.com/nf-core/mag/pull/692) - Added chopper as optional longread filtering tool and/or phage lambda removal tool (added by @muabnezor)
- [#708](https://github.com/nf-core/mag/pull/708) - Added `--exclude_unbins_from_postbinning` parameter to exclude unbinned contigs from post-binning processes, speeding up Prokka in some cases (added by @dialvarezs)

### `Changed`
Expand All @@ -17,6 +19,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Dependencies`

| Tool | Previous version | New version |
| ------- | ---------------- | ----------- |
| chopper | | 0.9.0 |
| nanoq | | 0.10.0 |

### `Deprecated`

## 3.2.1 [2024-10-30]
Expand Down
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@

> Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25(7), 1043–1055. doi: 10.1101/gr.186072.114
- [Chopper](https://doi.org/10.1093/bioinformatics/bty149)

> De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149
- [CONCOCT](https://doi.org/10.1038/nmeth.3103)

> Alneberg, J., Bjarnason, B. S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., Lahti, L., Loman, N. J., Andersson, A. F., & Quince, C. (2014). Binning metagenomic contigs by coverage and composition. Nature Methods, 11(11), 1144–1146. doi: 10.1038/nmeth.3103
Expand Down Expand Up @@ -114,6 +118,10 @@

> De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. doi: 10.1093/bioinformatics/bty149.
- [Nanoq](https://doi.org/10.21105/joss.02991)

> Steinig, E., Coin, L. (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69), 2991, doi: 10.21105/joss.02991
- [Porechop](https://github.com/rrwick/Porechop)

- [Porechop-abi](https://github.com/bonsai-team/Porechop_ABI)
Expand Down
60 changes: 57 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -183,22 +183,76 @@ process {
"--min_length ${params.longreads_min_length}",
"--keep_percent ${params.longreads_keep_percent}",
"--trim",
"--length_weight ${params.longreads_length_weight}"
"--length_weight ${params.longreads_length_weight}",
params.longreads_min_quality ? "--min_mean_q ${params.longreads_min_quality}" : '',
].join(' ').trim()
publishDir = [
path: { "${params.outdir}/QC_longreads/Filtlong" },
mode: params.publish_dir_mode,
pattern: "*_filtlong.fastq.gz",
enabled: params.save_filtlong_reads
enabled: params.save_filtered_longreads
]
ext.prefix = { "${meta.id}_run${meta.run}_filtlong" }
}

withName: NANOQ {
ext.args = [
"--min-len ${params.longreads_min_length}",
params.longreads_min_quality ? "--min-qual ${params.longreads_min_quality}": '',
"-vv"
].join(' ').trim()
publishDir = [
[
path: { "${params.outdir}/QC_longreads/Nanoq" },
mode: params.publish_dir_mode,
pattern: "*_nanoq_filtered.fastq.gz",
enabled: params.save_filtered_longreads
],
[
path: { "${params.outdir}/QC_longreads/Nanoq" },
mode: params.publish_dir_mode,
pattern: "*_nanoq_filtered.stats"
]
]
ext.prefix = { "${meta.id}_run${meta.run}_nanoq_filtered" }
}

withName: NANOLYSE {
publishDir = [[path: { "${params.outdir}/QC_longreads/NanoLyse" }, mode: params.publish_dir_mode, pattern: "*.log"], [path: { "${params.outdir}/QC_longreads/NanoLyse" }, mode: params.publish_dir_mode, pattern: "*_nanolyse.fastq.gz", enabled: params.save_lambdaremoved_reads]]
publishDir = [
[
path: { "${params.outdir}/QC_longreads/NanoLyse" },
mode: params.publish_dir_mode, pattern: "*.log"
],
[
path: { "${params.outdir}/QC_longreads/NanoLyse" },
mode: params.publish_dir_mode, pattern: "*_nanolyse.fastq.gz",
enabled: params.save_lambdaremoved_reads
]
]
ext.prefix = { "${meta.id}_run${meta.run}_lambdafiltered" }
}

withName: CHOPPER {
ext.args2 = [
params.longreads_min_quality ? "--quality ${params.longreads_min_quality}": '',
params.longreads_min_length ? "--minlength ${params.longreads_min_length}": ''
].join(' ').trim()
publishDir = [
[
path: { "${params.outdir}/QC_longreads/Chopper" },
mode: params.publish_dir_mode,
pattern: "*.log"
],
[
path: { "${params.outdir}/QC_longreads/Chopper" },
mode: params.publish_dir_mode,
pattern: "*_chopper.fastq.gz",
enabled: params.save_lambdaremoved_reads || params.save_filtered_longreads
]
]
ext.prefix = { "${meta.id}_run${meta.run}_chopper" }
}

withName: NANOPLOT_RAW {
ext.prefix = 'raw'
ext.args = {
Expand Down
24 changes: 19 additions & 5 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,25 +109,39 @@ The pipeline uses Nanolyse to map the reads against the Lambda phage and removes

</details>

### Filtlong and porechop
### Long read adapter removal

The pipeline uses filtlong and porechop to perform quality control of the long reads that are eventually provided with the TSV input file.
The pipeline uses porecho_abi or porechop to perform adaptertrimming of the long reads that are eventually provided with the TSV input file.

<details markdown="1">
<summary>Output files</summary>

- `QC_longreads/porechop/`
- `[sample]_[run]_porechop_trimmed.fastq.gz`: If `--longread_adaptertrimming_tool 'porechop'`, the adapter trimmed FASTQ files from porechop
- `[sample]_[run]_porechop-abi_trimmed.fastq.gz`: If `--longread_adaptertrimming_tool 'porechop_abi'`, the adapter trimmed FASTQ files from porechop_ABI
- `QC_longreads/filtlong/`

</details>

### Long read filtering

The pipeline uses filtlong, chopper, or nanoq for quality filtering of long reads, specified with `--longread_filtering_tool <filtlong|chopper|nanoq>`. Only filtlong is capable of filtering long reads against short reads, and is therefore currently recommended in the hybrid mode. If chopper is selected as long read filtering tool, Lambda Phage removal will be performed with chopper as well, instead of nanolyse.

<details markdown="1">
<summary>Output files</summary>

- `QC_longreads/Filtlong/`
- `[sample]_[run]_filtlong.fastq.gz`: The length and quality filtered reads in FASTQ from Filtlong
- `QC_longreads/Nanoq/`
- `[sample]_[run]_nanoq_filtered.fastq.gz`: The length and quality filtered reads in FASTQ from Nanoq
- `QC_longreads/Chopper/`
- `[sample]_[run]_nanoq_chopper.fastq.gz`: The length and quality filtered, optionally phage lambda removed reads in FASTQ from Chopper

</details>

Trimmed and filtered FASTQ output directories and files will only exist if `--save_porechop_reads` and/or `--save_filtlong_reads` (respectively) are provided to the run command .
Trimmed and filtered FASTQ output directories and files will only exist if `--save_porechop_reads` and/or `--save_filtered_longreads` (respectively) are provided to the run command .

No direct host read removal is performed for long reads.
However, since within this pipeline filtlong uses a read quality based on k-mer matches to the already filtered short reads, reads not overlapping those short reads might be discarded.
However, since within this pipeline filtlong uses a read quality based on k-mer matches to the already filtered short reads, reads not overlapping those short reads might be discarded. Note that this only applies when using filtlong as long read filtering tool.
The lower the parameter `--longreads_length_weight`, the higher the impact of the read qualities for filtering.
For further documentation see the [filtlong online documentation](https://github.com/rrwick/Filtlong).

Expand Down
10 changes: 10 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"chopper": {
"branch": "master",
"git_sha": "22737835af2db3dd0d5b6b332e75e160d0199fae",
"installed_by": ["modules"]
},
"concoct/concoct": {
"branch": "master",
"git_sha": "baa30accc6c50ea8a98662417d4f42ed18966353",
Expand Down Expand Up @@ -212,6 +217,11 @@
"git_sha": "3135090b46f308a260fc9d5991d7d2f9c0785309",
"installed_by": ["modules"]
},
"nanoq": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"porechop/abi": {
"branch": "master",
"git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/chopper/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

56 changes: 56 additions & 0 deletions modules/nf-core/chopper/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions modules/nf-core/chopper/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit ebb4283

Please sign in to comment.