Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move long read preprocessing into a subworkflow #674

Merged
merged 37 commits into from
Oct 11, 2024

Conversation

muabnezor
Copy link
Contributor

@muabnezor muabnezor commented Sep 20, 2024

This PR moves long read preprocessing into a subworkflow. It also adds the option to run porechop-abi, instead of porechop, for long read adaptertrimming.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 marked this pull request as draft September 20, 2024 12:57
Copy link

github-actions bot commented Sep 20, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 2cfd5e5

+| ✅ 308 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   5 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-10-04 08:52:39

@jfy133
Copy link
Member

jfy133 commented Sep 20, 2024

@muabnezor feel free to open the PR to 'ready for review' when ready :) but very brief skim look good!

@muabnezor muabnezor added the WIP Work in progress label Sep 23, 2024
@muabnezor
Copy link
Contributor Author

Hi,

In this pull request I exchanged porechop with porechop-abi for preprocessing of long-reads, I think this is an ok exchange. I also added the module chopper, but decided it is better to integrate this tool in a future PR. Maybe it would have a cleaner PR just focusing on refactoring the code, and waiting with introducing new modules altogether. Should I move the short-read preprocessing to a subworkflow in this PR, or wait with this?

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good @muabnezor !

I have some requests first:

Please don't remove PORECHOP_PORECHOP or FILTLONG yet, I would like to keep them in for backwards compatibility (maybe some people don't 'trust' yet the newer tools), please have them alongside the new tools but wrapped in an if statement, and a user can select their preferred tool with a new param (e.g. --longread_preprocessing_tool or something).

Please also don't forget:

  1. Update CHANGELOG
  2. Add any new tools to citations.md
  3. Add yourself to the list of contibutors/developers on the GitHub README !

Hi

In this pull request I exchanged porechop with porechop-abi for preprocessing of long-reads, I think this is an ok exchange. I also added the module chopper, but decided it is better to integrate this tool in a future PR. Maybe it would have a cleaner PR just focusing on refactoring the code, and waiting with introducing new modules altogether. Should I move the short-read preprocessing to a subworkflow in this PR, or wait with this?

Yes, lets keep it clean and do one PR per thing you've suggested once you've addressed by comments here (short-read prepocessing next, then chopper), good plan :)!

subworkflows/local/lr_preprocessing.nf Outdated Show resolved Hide resolved
subworkflows/local/lr_preprocessing.nf Outdated Show resolved Hide resolved
subworkflows/local/lr_preprocessing.nf Outdated Show resolved Hide resolved
@muabnezor
Copy link
Contributor Author

Sounds great @jfy133! To run any newer modules, they need to be specified with an option, otherwise, we’ll leave the functionality as is for now. I’ll take care of that, along with your additional comments. I apologize for the limited time I have this week to work on this, but I’ll get to it as soon as I can.

@jfy133
Copy link
Member

jfy133 commented Sep 24, 2024

Sounds perfect!

No worries about the time, this is already great progress 👍

…long read preprocessing tools. Currently only has the option to specify porechop_abi, but I decided this is a nice solution for future tools that the user might want to chose among
@muabnezor
Copy link
Contributor Author

muabnezor commented Sep 24, 2024

What's new

  • Added parameter longread_preprocessing_tools a comma-separated string
  • Added filtlong and porechop to multiqc
  • changed local filtlong to nf-core/filtlong

@jfy133
Copy link
Member

jfy133 commented Sep 25, 2024

What's new

* Added parameter longread_preprocessing_tools a comma-separated string

* Added filtlong and porechop to multiqc

* changed local filtlong to nf-core/filtlong

Great, thanks for adding filtling/porechop!

While it's clever from a conciseness PoV I personally don't like this as it's more error prone by users. For example, we then can't get nice dropdowns on the nf-core launch GUI (and other GUI launch platforms).

image

Finally it's inconsistent with how other parts of the pipeline select a given tool.

I would be grateful if you switch back to one parameter per preprocessing step.

(Note however, I much prefer this development approach of you taking the initiative of a possible optimisation and incorporating it in as a proof of concept, even if I disagree, so if it doesn't annoy you to revert, please continue to take this approch :) )

@muabnezor
Copy link
Contributor Author

Finally it's inconsistent with how other parts of the pipeline select a given tool.

I would be grateful if you switch back to one parameter per preprocessing step.

No problems @jfy133, and I agree, makes more sense to have one parameter per tool. I'll add a --lr_adaptertrimming_tool parameter instead.

I just need to check the multiqc stuff, as filtlong doesn't seem to get included in the report for some reason. Then I'll update all additional documents, and the PR should be ready for review.

@jfy133
Copy link
Member

jfy133 commented Sep 25, 2024

Finally it's inconsistent with how other parts of the pipeline select a given tool.
I would be grateful if you switch back to one parameter per preprocessing step.

No problems @jfy133, and I agree, makes more sense to have one parameter per tool. I'll add a --lr_adaptertrimming_tool parameter instead.

Great :) following the other parameters please don't use shorthand: --longreads_adaptertrimming_tool is fine :)

I just need to check the multiqc stuff, as filtlong doesn't seem to get included in the report for some reason. Then I'll update all additional documents, and the PR should be ready for review.

Huh interesting, I'm pretty sure it does get displayed in MultiQC in taxprofiler, it may only be included in the GeneralStats table?

@muabnezor
Copy link
Contributor Author

Still scratching my head about the filtlong log and MultiQC. The log file is passed to MultiQC in the same way as in TaxProfiler, but no filtlong data is included in the MultiQC report.

@muabnezor muabnezor changed the title Work in Progress: Move long read preprocessing into a subworkflow Move long read preprocessing into a subworkflow Oct 1, 2024
@muabnezor
Copy link
Contributor Author

This should be ready @jfy133. Think I'm getting a grip on nf-core now, next PR should be a lot cleaner!

@muabnezor
Copy link
Contributor Author

Have you had time to look at this yet @jfy133?

@jfy133
Copy link
Member

jfy133 commented Oct 11, 2024

Sorry, didn't get to it before because of the meta-omics hackathon and preparing for the nextflow summit, but I have time now!

@jfy133 jfy133 linked an issue Oct 11, 2024 that may be closed by this pull request
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, LGTM to me now @muabnezor !

I'm going to make one change to the output docs for you (something that wasn't documented previously), then you can merge in!

I'll give the green tick once I've made the addition

This is a very nice and clean PR, thank you very much! Hopefully the first of many :D

CHANGELOG.md Outdated
### `Changed`

- [#674](https://github.com/nf-core/mag/pull/674/files) - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if prefered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [#674](https://github.com/nf-core/mag/pull/674/files) - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if prefered.
- [#674](https://github.com/nf-core/mag/pull/674/files) - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if prefered. (added by @muabnezor)

@@ -116,6 +116,8 @@

- [Porechop](https://github.com/rrwick/Porechop)

- [Porechop-abi](https://github.com/bonsai-team/Porechop_ABI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Porechop-abi](https://github.com/bonsai-team/Porechop_ABI)
- [Porechop-abi](https://github.com/bonsai-team/Porechop_ABI)
> Bonenfant, Q., Noé, L., & Touzet, H. (2022). Porechop_ABI: discovering unknown adapters in ONT sequencing reads for downstream trimming. In BioRxiv. doi: 10.1101/2022.07.07.499093

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@jfy133
Copy link
Member

jfy133 commented Oct 11, 2024

@nf-core-bot fix linting

@jfy133 jfy133 self-requested a review October 11, 2024 11:04
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once tests pass, you can merge @muabnezor ! Thanks again! Lets chat on Slack on next step if unclear :)

@jfy133
Copy link
Member

jfy133 commented Oct 11, 2024

Ah wait, I've not done the template merge yet so it won't let you pass because of the linting, I'll merge for you (and next PR hopefully you can merge in yourself).

@jfy133 jfy133 merged commit db33efd into nf-core:dev Oct 11, 2024
14 of 15 checks passed
@jfy133 jfy133 mentioned this pull request Oct 27, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Porechop_ABI as alternative to Porechop
5 participants