Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortread subworkflow #716

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from

Conversation

muabnezor
Copy link
Contributor

@muabnezor muabnezor commented Nov 23, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

This PR moves the short read preprocessing into a subworkflow

@muabnezor muabnezor added the WIP Work in progress label Nov 23, 2024
Copy link

This PR is against the master branch ❌

  • Do not close this PR
  • Click Edit and change the base to dev
  • This CI test will remain failed until you push a new commit

Hi @muabnezor,

It looks like this pull-request is has been made against the muabnezor/mag master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the muabnezor/mag dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

@muabnezor muabnezor changed the base branch from master to dev November 23, 2024 06:40
@muabnezor muabnezor removed the WIP Work in progress label Nov 24, 2024
@muabnezor
Copy link
Contributor Author

Think this is ready for review @jfy133. Simple PR moving short read preprocessing into subworkflow.

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a straight forward conversion to me! thank you @muabnezor you're making the code base much cleaner!

Comment on lines 5 to 16
include { FASTQC as FASTQC_RAW } from '../../modules/nf-core/fastqc/main'
include { FASTQC as FASTQC_TRIMMED } from '../../modules/nf-core/fastqc/main'
include { FASTP } from '../../modules/nf-core/fastp/main'
include { ADAPTERREMOVAL as ADAPTERREMOVAL_PE } from '../../modules/nf-core/adapterremoval/main'
include { ADAPTERREMOVAL as ADAPTERREMOVAL_SE } from '../../modules/nf-core/adapterremoval/main'
include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_HOST_REMOVAL_BUILD } from '../../modules/local/bowtie2_removal_build'
include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_HOST_REMOVAL_ALIGN } from '../../modules/local/bowtie2_removal_align'
include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_PHIX_REMOVAL_BUILD } from '../../modules/local/bowtie2_removal_build'
include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_PHIX_REMOVAL_ALIGN } from '../../modules/local/bowtie2_removal_align'
include { CAT_FASTQ } from '../../modules/nf-core/cat/fastq/main'
include { SEQTK_MERGEPE } from '../../modules/nf-core/seqtk/mergepe/main'
include { BBMAP_BBNORM } from '../..//modules/nf-core/bbmap/bbnorm/main'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align :)

If you use VSCode, you should install the Nextflow extension which now includes a formatter you can activate via the command palette and should do that for you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. thanks for suggestion.

ch_multiqc_files = ch_multiqc_files.mix(FASTQC_RAW.out.zip)

ch_bowtie2_removal_host_multiqc = Channel.empty()
if (!params.assembly_input) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!params.assembly_input) in the future, it feels like we can move away from this. Could the assembly_input logic be determined by the users input?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about this actually myself:

If you've pre-assembled reads, they should already be cleaned. Which means you could skip this entire subworkflow, so maybe this if else statement could be pushed to the main mag.nf around the new subworkflow? @prototaxites what do you think, is that a reasonable assuemption to make (i.e., skip preprocessing of reads)

But what do you mean by 'users input'? This is basically that, if you give the additional extra sheet to --assembly_input (that contains the paths to contig FASTAS) then we can skip assembly. It's not a boolean flag, if that is what you were meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what do you mean by 'users input'? This is basically that, if you give the additional extra sheet to --assembly_input (that contains the paths to contig FASTAS) then we can skip assembly. It's not a boolean flag, if that is what you were meaning.

Ah, then this makes more sense. I thought it was a boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've pre-assembled reads, they should already be cleaned. Which means you could skip this entire subworkflow, so maybe this if else statement could be pushed to the main mag.nf around the new subworkflow?

yes, I actually did this first, but then moved it back into the subworkflow. Not sure what makes the most sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've pre-assembled reads, they should already be cleaned. Which means you could skip this entire subworkflow, so maybe this if else statement could be pushed to the main mag.nf around the new subworkflow? @prototaxites what do you think, is that a reasonable assuemption to make (i.e., skip preprocessing of reads)

Yes, I think this probably makes the most sense. While I'm sure there's someone out there with the use-case that they want to map a new set of reads against an existing assembly? I think we should assume that the reads accompanying pre-computed assemblies are already clean - someone with that use case probably knows what they are doing!

Personally I'm in favour of putting the if!(assembly_input) line in the main WF rather than the subWF; the subWF code should only operate when we actually want to use it. Plus I think it helps make the logic of the main WF clearer by stating when and when not things are run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for input. I think it makes sense to have it in the main workflow also. If you think this looks fine @jfy133 and @prototaxites I'll go ahead and merge.

ch_versions = ch_versions.mix(FASTQC_RAW.out.versions.first())
ch_multiqc_files = ch_multiqc_files.mix(FASTQC_RAW.out.zip)

ch_bowtie2_removal_host_multiqc = Channel.empty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we have a dedicated channel for that... maybe can be mixed directly into ch_multiqc_files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't even seem to be used anyway so can be removed :)

ch_versions = ch_versions.mix(FASTQC_RAW.out.versions.first())
ch_multiqc_files = ch_multiqc_files.mix(FASTQC_RAW.out.zip)

ch_bowtie2_removal_host_multiqc = Channel.empty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_bowtie2_removal_host_multiqc = Channel.empty()

def new_reads = meta.single_end ? reads[0] : reads.flatten()
[meta, new_reads]
}
SHORTREAD_PREPROCESSING(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this stil needs the !params.assembly_input bit? And the else should then be assigning the raw reads directly to ch_short_reads and ch_short_reads_assembly etc (assuming it has the right channel construct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is there, line 191, and 205?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants