Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum number of prediction jobs #116

Merged
merged 11 commits into from
May 8, 2024

Conversation

tillenglert
Copy link
Collaborator

I added a dynamic increase of the chunk sizes to set a limit to the generated PREDICT_EPITOPES processes. The default value for maximum_process_num is 1000, which is adjusted to the CFC2.0 HPC Cluster of Tübingen.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/metapep branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@tillenglert tillenglert changed the title Maximum_chunk_number Maximum number of prediction jobs Apr 19, 2024
Copy link

github-actions bot commented Apr 19, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit b2947e8

+| ✅ 178 tests passed       |+
!| ❗   6 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
  • pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required

✅ Tests passed:

Run details

  • nf-core/tools version 2.13.1
  • Run at 2024-05-08 10:32:08

@tillenglert tillenglert requested a review from skrakau April 19, 2024 07:43
@tillenglert
Copy link
Collaborator Author

tillenglert commented Apr 26, 2024

@skrakau I added your code review comments, but now I needed to implement another check for the parameter max_task_num as using a value below the number of chosen alleles it will fail to due to a divide by zero error.

I originally wanted to add this check within the Nextflow logic e.g. utils subworkflow and general test before any task is executed but as the alleles may not be unique between rows and multiple alleles can be assigned per row I needed to use the alleles.tsv file created by the check_samplesheet_and_create_tables process. As I needed to read in another file I didn't want to obscure the process of checking too much and created a file within the process containing the number of alleles which is now used to throw an error within the Process_Input subworkflow.

If you have suggestions to handle this error differently please let me know! Otherwise this PR is ready for review! 👍

@skrakau
Copy link
Member

skrakau commented May 3, 2024

I originally wanted to add this check within the Nextflow logic e.g. utils subworkflow and general test before any task is executed but as the alleles may not be unique between rows and multiple alleles can be assigned per row I needed to use the alleles.tsv file created by the check_samplesheet_and_create_tables process. As I needed to read in another file I didn't want to obscure the process of checking too much and created a file within the process containing the number of alleles which is now used to throw an error within the Process_Input subworkflow.

If you have suggestions to handle this error differently please let me know! Otherwise this PR is ready for review! 👍

Would it somehow be possible to avoid writing another file within check_samplesheet_and_create_tables? For example, by just checking the number of alleles in alleles.tsv directly within PROCESS_INPUT, such as using the countLines operaotr or similar?

@tillenglert
Copy link
Collaborator Author

tillenglert commented May 3, 2024

Would it somehow be possible to avoid writing another file within check_samplesheet_and_create_tables? For example, by just checking the number of alleles in alleles.tsv directly within PROCESS_INPUT, such as using the countLines operaotr or similar?

Absolutely true and way cleaner than my solution! I adjusted it to use the already existing alleles.tsv ! Thank you!

@tillenglert tillenglert force-pushed the maximum_chunk_number branch from fc52d45 to f5cecb9 Compare May 8, 2024 07:54
@tillenglert tillenglert requested a review from skrakau May 8, 2024 10:23
Copy link
Member

@skrakau skrakau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! :)

@tillenglert tillenglert merged commit ec9292e into nf-core:dev May 8, 2024
13 checks passed
@tillenglert tillenglert deleted the maximum_chunk_number branch May 8, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants