From 6fe565c5ea445886b0ff215e896f29da0c0dc843 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Fri, 16 Aug 2024 12:58:22 +0200
Subject: [PATCH 01/20] Add nf-test citation and sort alphabetically

---
 CITATIONS.md | 50 +++++++++++++++++++++++++++-----------------------
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/CITATIONS.md b/CITATIONS.md
index 91acc5db..9991137e 100644
--- a/CITATIONS.md
+++ b/CITATIONS.md
@@ -8,28 +8,20 @@
 
 > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
 
-## Pipeline tools
+## [nf-test](https://www.biorxiv.org/content/10.1101/2024.05.25.595877v1)
 
-- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
+> L. Forer, S. Schönherr Improving the Reliability and Quality of Nextflow Pipelines with nf-test. bioRxiv 2024.05.25.595877; doi: 10.1101/2024.05.25.595877
 
-  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
+## Pipeline tools
 
 - [Entrez](https://pubmed.ncbi.nlm.nih.gov/15608257/)
 
   > Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D54-8. doi: 10.1093/nar/gki031. Update in: Nucleic Acids Res. 2007 Jan;35(Database issue):D26-31. PMID: 15608257; PMCID: PMC539985.
 
-- [Prodigal](https://pubmed.ncbi.nlm.nih.gov/20211023/)
-
-  > Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648.
-
 - [Epytope](https://academic.oup.com/bioinformatics/article/32/13/2044/1743767)
 
   > Schubert, B., Walzer, M., Brachvogel, H-P., Sozolek, A., Mohr, C., and Kohlbacher, O. (2016). FRED 2 - An Immunoinformatics Framework for Python. Bioinformatics 2016; doi: 10.1093/bioinformatics/btw113
 
-- [SYFPEITHI](https://pubmed.ncbi.nlm.nih.gov/10602881/)
-
-  > Hans-Georg Rammensee, Jutta Bachmann, Niels Nikolaus Emmerich, Oskar Alexander Bachor, Stefan Stevanovic: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics (1999) 50: 213-219
-
 - [MHCflurry](https://dx.doi.org/10.1016/j.cels.2018.05.014)
 
   > Timothy J. O’Donnell, Alex Rubinsteyn, Maria Bonsack, Angelika B. Riemer, Uri Laserson, Jeff Hammerbacher. MHC flurry: open-source class I MHC binding affinity prediction. Cell systems 7(1), 129-132 (2018). doi: 10.1016/j.cels.2018.05.014.
@@ -38,8 +30,20 @@
 
   > Xiaoshan M. Shao, Rohit Bhattacharya, Justin Huang, I.K. Ashok Sivakumar, Collin Tokheim, Lily Zheng, Dylan Hirsch, Benjamin Kaminow, Ashton Omdahl, Maria Bonsack, Angelika B. Riemer, Victor E. Velculescu, Valsamo Anagnostou, Kymberleigh A. Pagel and Rachel Karchin. High-throughput prediction of MHC class i and ii neoantigens with MHCnuggets. Cancer Immunology Research 8(3), 396-408 (2020). doi: 10.1158/2326-6066.CIR-19-0464.
 
+- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
+
+  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
+
 - [pigz](https://zlib.net/pigz/)
 
+- [Prodigal](https://pubmed.ncbi.nlm.nih.gov/20211023/)
+
+  > Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648.
+
+- [SYFPEITHI](https://pubmed.ncbi.nlm.nih.gov/10602881/)
+
+  > Hans-Georg Rammensee, Jutta Bachmann, Niels Nikolaus Emmerich, Oskar Alexander Bachor, Stefan Stevanovic: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics (1999) 50: 213-219
+
 ## Python Packages
 
 - [Python](https://www.python.org/)
@@ -50,35 +54,31 @@
 
   > Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423. https://doi.org/10.1093/bioinformatics/btp163.
 
-- [pandas](https://doi.org/10.5281/zenodo.3509134)
-
-  > The pandas development team. (2023). pandas-dev/pandas: Pandas (v2.0.3). Zenodo. https://doi.org/10.5281/zenodo.8092754
-
 - [numpy](https://www.nature.com/articles/s41586-020-2649-2)
 
   > Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2. https://www.nature.com/articles/s41586-020-2649-2.
 
+- [pandas](https://doi.org/10.5281/zenodo.3509134)
+
+  > The pandas development team. (2023). pandas-dev/pandas: Pandas (v2.0.3). Zenodo. https://doi.org/10.5281/zenodo.8092754
+
 ## R Packages
 
 - [R](https://www.R-project.org/)
 
   > R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
 
-- [ggplot2](https://cran.r-project.org/package=ggplot2)
+- [data.table](https://cran.r-project.org/package=data.table)
 
-  > H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
+  > Dowle Matt (2022). data.table: Extension of 'data.frame'.
 
 - [dplyr](https://dplyr.tidyverse.org)
 
   > Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org, https://github.com/tidyverse/dplyr.
 
-- [data.table](https://cran.r-project.org/package=data.table)
-
-  > Dowle Matt (2022). data.table: Extension of 'data.frame'.
-
-- [stringr](https://stringr.tidyverse.org)
+- [ggplot2](https://cran.r-project.org/package=ggplot2)
 
-  > Wickham H (2022). stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org, https://github.com/tidyverse/stringr.
+  > H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
 
 - [ggpubr](https://cran.r-project.org/package=ggpubr)
 
@@ -88,6 +88,10 @@
 
   > Trevor L Davis (2022). optparse: Command Line Option Parser.
 
+- [stringr](https://stringr.tidyverse.org)
+
+  > Wickham H (2022). stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org, https://github.com/tidyverse/stringr.
+
 ## Software packaging/containerisation tools
 
 - [Anaconda](https://anaconda.com)

From 54fcb6cc88a70e543f94b40b40e2f1dadbc7ac36 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Fri, 16 Aug 2024 12:59:52 +0200
Subject: [PATCH 02/20] Suggestion by @mashehu

---
 CHANGELOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 433a092e..e052592e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,7 +5,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## v1.0.0 - [2022-01-20]
 
-First release of [nf-core/metapep](https://nf-co.re/metapep), created based on [nf-core](https://nf-co.re) standards and [nf-core/tools](https://nf-co.re/tools) template version 1.14.1.
+First release of [nf-core/metapep](https://nf-co.re/metapep), based on [nf-core](https://nf-co.re) standards and [nf-core/tools](https://nf-co.re/tools) template version 1.14.1.
 
 ### `Added`
 

From bbdd342b66938b6d45caa580cb3e2330456ad620 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Fri, 16 Aug 2024 13:00:47 +0200
Subject: [PATCH 03/20] include nf-core/setup_nf-test for github ci

---
 .github/workflows/ci.yml | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 8db25783..e41fbb31 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -10,7 +10,7 @@ on:
 
 env:
   NXF_ANSI_LOG: false
-  NFTEST_VER: "0.8.4"
+  NFT_VER: "0.8.4"
 
 concurrency:
   group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}"
@@ -50,9 +50,9 @@ jobs:
         uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
 
       - name: Install nf-test
-        run: |
-          wget -qO- https://code.askimed.com/install/nf-test | bash -s $NFTEST_VER
-          sudo mv nf-test /usr/local/bin/
+        uses: nf-core/setup-nf-test@v1
+        with:
+          version: ${{ env.NFT_VER }}
 
       - name: Run nf-test
         run: |
@@ -101,9 +101,9 @@ jobs:
         uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
 
       - name: Install nf-test
-        run: |
-          wget -qO- https://code.askimed.com/install/nf-test | bash -s $NFTEST_VER
-          sudo mv nf-test /usr/local/bin/
+        uses: nf-core/setup-nf-test@v1
+        with:
+          version: ${{ env.NFT_VER }}
 
       - name: Run nf-test
         env:

From df13fce4127d291c22bbbdf3dd5e5725472598de Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Fri, 16 Aug 2024 14:30:16 +0200
Subject: [PATCH 04/20] Add metapep logo without patching mutliqc module

---
 modules/nf-core/multiqc/main.nf      |  1 -
 modules/nf-core/multiqc/multiqc.diff | 13 -------------
 workflows/metapep.nf                 |  5 +----
 3 files changed, 1 insertion(+), 18 deletions(-)
 delete mode 100644 modules/nf-core/multiqc/multiqc.diff

diff --git a/modules/nf-core/multiqc/main.nf b/modules/nf-core/multiqc/main.nf
index 61bb95bb..47ac352f 100644
--- a/modules/nf-core/multiqc/main.nf
+++ b/modules/nf-core/multiqc/main.nf
@@ -8,7 +8,6 @@ process MULTIQC {
 
     input:
     path  multiqc_files, stageAs: "?/*"
-    path custom_logo
     path(multiqc_config)
     path(extra_multiqc_config)
     path(multiqc_logo)
diff --git a/modules/nf-core/multiqc/multiqc.diff b/modules/nf-core/multiqc/multiqc.diff
deleted file mode 100644
index a02a2334..00000000
--- a/modules/nf-core/multiqc/multiqc.diff
+++ /dev/null
@@ -1,13 +0,0 @@
-Changes in module 'nf-core/multiqc'
---- modules/nf-core/multiqc/main.nf
-+++ modules/nf-core/multiqc/main.nf
-@@ -8,6 +8,7 @@
- 
-     input:
-     path  multiqc_files, stageAs: "?/*"
-+    path custom_logo
-     path(multiqc_config)
-     path(extra_multiqc_config)
-     path(multiqc_logo)
-
-************************************************************
diff --git a/workflows/metapep.nf b/workflows/metapep.nf
index 7496f9e3..af2898b8 100644
--- a/workflows/metapep.nf
+++ b/workflows/metapep.nf
@@ -283,9 +283,7 @@ workflow METAPEP {
         Channel.empty()
     ch_multiqc_logo          = params.multiqc_logo ?
         Channel.fromPath(params.multiqc_logo, checkIfExists: true) :
-        Channel.empty()
-    ch_metapep_logo          = Channel.fromPath(
-        "$projectDir/assets/nf-core-metapep_logo_light.png", checkIfExists: true)
+        Channel.fromPath("$projectDir/assets/nf-core-metapep_logo_light.png")
 
     summary_params      = paramsSummaryMap(
         workflow, parameters_schema: "nextflow_schema.json")
@@ -309,7 +307,6 @@ workflow METAPEP {
 
     MULTIQC (
         ch_multiqc_files.collect(),
-        ch_metapep_logo.collect(),
         ch_multiqc_config.toList(),
         ch_multiqc_custom_config.toList(),
         ch_multiqc_logo.toList()

From 99fc5062c429806c028ad2797d02dba2d1bc16bd Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Fri, 16 Aug 2024 14:46:06 +0200
Subject: [PATCH 05/20] Add minimum values to chunk sizes, peptide lenghts and
 removed overlooked ncbi_key and email

---
 nextflow.config      |  4 ----
 nextflow_schema.json | 31 ++++++++++++++-----------------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/nextflow.config b/nextflow.config
index 426cac9a..c0349a53 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -15,10 +15,6 @@ params {
     // predict proteins
     prodigal_mode       = 'meta'
 
-    // download proteins
-    ncbi_key            = null
-    ncbi_email          = null
-
     // generate peptides
     min_pep_len         = 9
     max_pep_len         = 11
diff --git a/nextflow_schema.json b/nextflow_schema.json
index 99a9064f..3dd5f71b 100644
--- a/nextflow_schema.json
+++ b/nextflow_schema.json
@@ -251,27 +251,19 @@
             "description": "Define pipeline options.",
             "help_text": "These options modify the behaviour of tools used in the pipeline",
             "properties": {
-                "ncbi_key": {
-                    "type": "string",
-                    "description": "Required for downloading proteins from ncbi.",
-                    "fa_icon": "fas fa-id-card"
-                },
-                "ncbi_email": {
-                    "type": "string",
-                    "description": "Required for downloading proteins from ncbi.",
-                    "fa_icon": "fas fa-id-card"
-                },
                 "min_pep_len": {
                     "type": "integer",
                     "default": 9,
                     "description": "Minimum length of produced peptides.",
-                    "fa_icon": "fas fa-cogs"
+                    "fa_icon": "fas fa-cogs",
+                    "minimum": 1
                 },
                 "max_pep_len": {
                     "type": "integer",
                     "default": 11,
                     "description": "Maximum length of produced peptides.",
-                    "fa_icon": "fas fa-cogs"
+                    "fa_icon": "fas fa-cogs",
+                    "minimum": 2
                 },
                 "allow_inconsistent_pep_lengths": {
                     "type": "boolean",
@@ -325,33 +317,38 @@
                     "type": "integer",
                     "default": 4000000,
                     "description": "Maximum chunk size (#peptides) for epitope prediction jobs.",
-                    "fa_icon": "fas fa-cogs"
+                    "fa_icon": "fas fa-cogs",
+                    "minimum": 1
                 },
                 "pred_chunk_size_scaling": {
                     "type": "integer",
                     "default": 10,
                     "description": "Scaling factor for `prediction_chunk_size` parameter for usage in python scripts to reduce memory usage when handling DataFrames.",
                     "hidden": true,
-                    "fa_icon": "fas fa-cogs"
+                    "fa_icon": "fas fa-cogs",
+                    "minimum": 1
                 },
                 "downstream_chunk_size": {
                     "type": "integer",
                     "default": 7500000,
                     "description": "Maximum chunk size (#epitope predictions) for processing of downstream visualisations.",
-                    "fa_icon": "fas fa-cogs"
+                    "fa_icon": "fas fa-cogs",
+                    "minimum": 1
                 },
                 "max_task_num": {
                     "type": "integer",
                     "default": 1000,
                     "description": "Maximum number of tasks submitted by `PREDICT_EPITOPES` process",
-                    "fa_icon": "fas fa-cogs"
+                    "fa_icon": "fas fa-cogs",
+                    "minimum": 1
                 },
                 "pred_buffer_files": {
                     "type": "integer",
                     "default": 1000,
                     "description": "Number of files, which are merged in `MERGE_PREDICTION_BUFFER`",
                     "hidden": true,
-                    "fa_icon": "fas fa-cogs"
+                    "fa_icon": "fas fa-cogs",
+                    "minimum": 1
                 },
                 "hide_pvalue": {
                     "type": "boolean",

From 6b50995fa60a849c8889362f9aee8bf81556a110 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Thu, 5 Sep 2024 12:50:26 +0200
Subject: [PATCH 06/20] Code alignments

---
 main.nf                        | 2 +-
 modules/local/collect_stats.nf | 4 ++--
 workflows/metapep.nf           | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/main.nf b/main.nf
index eba1e8a6..6702d18c 100644
--- a/main.nf
+++ b/main.nf
@@ -17,7 +17,7 @@ nextflow.enable.dsl = 2
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 */
 
-include { METAPEP  }                from './workflows/metapep'
+include { METAPEP                 } from './workflows/metapep'
 include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_metapep_pipeline'
 include { PIPELINE_COMPLETION     } from './subworkflows/local/utils_nfcore_metapep_pipeline'
 
diff --git a/modules/local/collect_stats.nf b/modules/local/collect_stats.nf
index 1a9ec946..04468ed1 100644
--- a/modules/local/collect_stats.nf
+++ b/modules/local/collect_stats.nf
@@ -14,8 +14,8 @@ process COLLECT_STATS {
     path(conditions          )
 
     output:
-    path "stats.txt",       emit:   ch_stats
-    path "versions.yml",    emit:   versions
+    path "stats.txt",    emit: ch_stats
+    path "versions.yml", emit: versions
 
     script:
     """
diff --git a/workflows/metapep.nf b/workflows/metapep.nf
index af2898b8..697204ba 100644
--- a/workflows/metapep.nf
+++ b/workflows/metapep.nf
@@ -34,7 +34,7 @@ include { PLOT_SCORE_DISTRIBUTION           } from '../modules/local/plot_score_
 include { PREPARE_ENTITY_BINDING_RATIOS     } from '../modules/local/prepare_entity_binding_ratios'
 include { PLOT_ENTITY_BINDING_RATIOS        } from '../modules/local/plot_entity_binding_ratios'
 
-include { PROCESS_INPUT } from '../subworkflows/local/process_input'
+include { PROCESS_INPUT                     } from '../subworkflows/local/process_input'
 
 /*
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From 2d036748ce6c533cfddca256767b31d656a81260 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Thu, 5 Sep 2024 12:52:27 +0200
Subject: [PATCH 07/20] remove duplicated publishdir in module

---
 modules/local/plot_entity_binding_ratios.nf | 2 --
 1 file changed, 2 deletions(-)

diff --git a/modules/local/plot_entity_binding_ratios.nf b/modules/local/plot_entity_binding_ratios.nf
index 1442b9ee..07a34d87 100644
--- a/modules/local/plot_entity_binding_ratios.nf
+++ b/modules/local/plot_entity_binding_ratios.nf
@@ -6,8 +6,6 @@ process PLOT_ENTITY_BINDING_RATIOS {
         'https://depot.galaxyproject.org/singularity/mulled-v2-0be74e7b0c2e289bc8098b1491baf4f181012b1c:a1635746bc2c13635cbea8c29bd5a2837bdd7cd5-0' :
         'biocontainers/mulled-v2-0be74e7b0c2e289bc8098b1491baf4f181012b1c:a1635746bc2c13635cbea8c29bd5a2837bdd7cd5-0' }"
 
-    publishDir "${params.outdir}/figures", mode: params.publish_dir_mode
-
     input:
     each path(prep_entity_binding_ratios)
     path alleles

From dddc96de46e60c0d8b7e3aa7ab7fe3d691419e37 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Thu, 5 Sep 2024 13:03:48 +0200
Subject: [PATCH 08/20] shortened python package version calls

---
 modules/local/assign_nucl_entity_weights.nf      | 2 +-
 modules/local/check_samplesheet_create_tables.nf | 2 +-
 modules/local/collect_stats.nf                   | 2 +-
 modules/local/create_protein_tsv.nf              | 2 +-
 modules/local/download_proteins.nf               | 2 +-
 modules/local/finalize_microbiome_entities.nf    | 2 +-
 modules/local/generate_peptides.nf               | 6 +++---
 modules/local/generate_protein_and_entity_ids.nf | 6 +++---
 modules/local/merge_predictions.nf               | 2 +-
 modules/local/merge_predictions_buffer.nf        | 2 +-
 modules/local/predict_epitopes.nf                | 2 +-
 modules/local/prepare_entity_binding_ratios.nf   | 2 +-
 modules/local/prepare_score_distribution.nf      | 2 +-
 modules/local/split_pred_tasks.nf                | 2 +-
 modules/local/unify_model_lengths.nf             | 2 +-
 15 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/modules/local/assign_nucl_entity_weights.nf b/modules/local/assign_nucl_entity_weights.nf
index e8453e56..7bcbe9c7 100644
--- a/modules/local/assign_nucl_entity_weights.nf
+++ b/modules/local/assign_nucl_entity_weights.nf
@@ -27,7 +27,7 @@ process ASSIGN_NUCL_ENTITY_WEIGHTS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/check_samplesheet_create_tables.nf b/modules/local/check_samplesheet_create_tables.nf
index 0d3eb321..ddfdc5d4 100644
--- a/modules/local/check_samplesheet_create_tables.nf
+++ b/modules/local/check_samplesheet_create_tables.nf
@@ -60,7 +60,7 @@ process CHECK_SAMPLESHEET_CREATE_TABLES {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
         epytope: \$(echo \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('epytope').version)"))
         mhcflurry: \$mhcflurry_version
         mhcnuggets: \$mhcnuggets_version
diff --git a/modules/local/collect_stats.nf b/modules/local/collect_stats.nf
index 04468ed1..f3c3659d 100644
--- a/modules/local/collect_stats.nf
+++ b/modules/local/collect_stats.nf
@@ -28,7 +28,7 @@ process COLLECT_STATS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 
diff --git a/modules/local/create_protein_tsv.nf b/modules/local/create_protein_tsv.nf
index 8a30113f..02ead03f 100644
--- a/modules/local/create_protein_tsv.nf
+++ b/modules/local/create_protein_tsv.nf
@@ -24,7 +24,7 @@ process CREATE_PROTEIN_TSV {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        biopython: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('biopython').version)")
+        biopython: \$(python -c "import Bio; print(Bio.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/download_proteins.nf b/modules/local/download_proteins.nf
index 35f5788e..e68b6cc1 100644
--- a/modules/local/download_proteins.nf
+++ b/modules/local/download_proteins.nf
@@ -41,7 +41,7 @@ process DOWNLOAD_PROTEINS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        biopython: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('biopython').version)")
+        biopython: \$(python -c "import Bio; print(Bio.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/finalize_microbiome_entities.nf b/modules/local/finalize_microbiome_entities.nf
index 49c95523..ce0303a4 100644
--- a/modules/local/finalize_microbiome_entities.nf
+++ b/modules/local/finalize_microbiome_entities.nf
@@ -29,7 +29,7 @@ process FINALIZE_MICROBIOME_ENTITIES {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/generate_peptides.nf b/modules/local/generate_peptides.nf
index 689eb967..65d55e33 100644
--- a/modules/local/generate_peptides.nf
+++ b/modules/local/generate_peptides.nf
@@ -31,9 +31,9 @@ process GENERATE_PEPTIDES {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
-        biopython: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('biopython').version)")
-        numpy: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('numpy').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
+        biopython: \$(python -c "import Bio; print(Bio.__version__)")
+        numpy: \$(python -c "import numpy; print(numpy.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/generate_protein_and_entity_ids.nf b/modules/local/generate_protein_and_entity_ids.nf
index 828476e7..c40e088a 100644
--- a/modules/local/generate_protein_and_entity_ids.nf
+++ b/modules/local/generate_protein_and_entity_ids.nf
@@ -42,9 +42,9 @@ process GENERATE_PROTEIN_AND_ENTITY_IDS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
-        biopython: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('biopython').version)")
-        numpy: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('numpy').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
+        biopython: \$(python -c "import Bio; print(Bio.__version__)")
+        numpy: \$(python -c "import numpy; print(numpy.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/merge_predictions.nf b/modules/local/merge_predictions.nf
index c021ed2d..80ddf6ba 100644
--- a/modules/local/merge_predictions.nf
+++ b/modules/local/merge_predictions.nf
@@ -25,7 +25,7 @@ process MERGE_PREDICTIONS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/merge_predictions_buffer.nf b/modules/local/merge_predictions_buffer.nf
index 449f6c83..e78cc6fd 100644
--- a/modules/local/merge_predictions_buffer.nf
+++ b/modules/local/merge_predictions_buffer.nf
@@ -28,7 +28,7 @@ process MERGE_PREDICTIONS_BUFFER {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/predict_epitopes.nf b/modules/local/predict_epitopes.nf
index 55e4d93d..53cef5c9 100644
--- a/modules/local/predict_epitopes.nf
+++ b/modules/local/predict_epitopes.nf
@@ -82,7 +82,7 @@ process PREDICT_EPITOPES {
     "${task.process}":
         python: \$(python --version 2>&1 | sed 's/Python //g')
         epytope: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('epytope').version)")
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
         mhcflurry: \$mhcflurry_version
         mhcnuggets: \$mhcnuggets_version
         syfpeithi: \$syfpeithi_version
diff --git a/modules/local/prepare_entity_binding_ratios.nf b/modules/local/prepare_entity_binding_ratios.nf
index 22b2f00c..e6216cdc 100644
--- a/modules/local/prepare_entity_binding_ratios.nf
+++ b/modules/local/prepare_entity_binding_ratios.nf
@@ -44,7 +44,7 @@ process PREPARE_ENTITY_BINDING_RATIOS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/prepare_score_distribution.nf b/modules/local/prepare_score_distribution.nf
index 881f86fc..fb16b716 100644
--- a/modules/local/prepare_score_distribution.nf
+++ b/modules/local/prepare_score_distribution.nf
@@ -39,7 +39,7 @@ process PREPARE_SCORE_DISTRIBUTION {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 }
diff --git a/modules/local/split_pred_tasks.nf b/modules/local/split_pred_tasks.nf
index d4952430..ed0f2c6a 100644
--- a/modules/local/split_pred_tasks.nf
+++ b/modules/local/split_pred_tasks.nf
@@ -45,7 +45,7 @@ process SPLIT_PRED_TASKS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
     END_VERSIONS
     """
 
diff --git a/modules/local/unify_model_lengths.nf b/modules/local/unify_model_lengths.nf
index a34511de..a0800c85 100644
--- a/modules/local/unify_model_lengths.nf
+++ b/modules/local/unify_model_lengths.nf
@@ -32,7 +32,7 @@ process UNIFY_MODEL_LENGTHS {
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         python: \$(python --version | sed 's/Python //g')
-        pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
+        pandas: \$(python -c "import pandas; print(pandas.__version__)")
         epytope: \$(echo \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('epytope').version)"))
         syfpeithi: $syfpeithi_version
     END_VERSIONS

From e9685704b185bb98e386ed94e2eb77080ca4c8fc Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Mon, 30 Sep 2024 08:06:29 +0200
Subject: [PATCH 09/20] Licensing

---
 LICENSE                                | 2 +-
 bin/assign_entity_weights.py           | 1 +
 bin/check_samplesheet_create_tables.py | 1 +
 bin/collect_stats.py                   | 1 +
 bin/concat_tsv.py                      | 1 +
 bin/download_proteins_entrez.py        | 3 ++-
 bin/epytope_predict.py                 | 1 +
 bin/fasta_to_tsv.py                    | 1 +
 bin/finalize_microbiome_entities.py    | 1 +
 bin/gen_prediction_chunks.py           | 1 +
 bin/generate_peptides.py               | 1 +
 bin/generate_protein_and_entity_ids.py | 1 +
 bin/plot_entity_binding_ratios.R       | 1 +
 bin/plot_score_distribution.R          | 1 +
 bin/prepare_entity_binding_ratios.py   | 1 +
 bin/prepare_score_distribution.py      | 1 +
 bin/show_supported_models.py           | 1 +
 bin/unify_model_lengths.py             | 2 +-
 18 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/LICENSE b/LICENSE
index bb43d8b9..16073369 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) Sabrina Krakau, Leon Kuchenbecker and Till Englert
+Copyright (c) Sabrina Krakau, Leon Kuchenbecker, and Till Englert
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
diff --git a/bin/assign_entity_weights.py b/bin/assign_entity_weights.py
index 5f68f09c..c4a2a0e0 100755
--- a/bin/assign_entity_weights.py
+++ b/bin/assign_entity_weights.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import sys
diff --git a/bin/check_samplesheet_create_tables.py b/bin/check_samplesheet_create_tables.py
index b62e2685..1f4b425a 100755
--- a/bin/check_samplesheet_create_tables.py
+++ b/bin/check_samplesheet_create_tables.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import sys
diff --git a/bin/collect_stats.py b/bin/collect_stats.py
index 1638cf68..558950f1 100755
--- a/bin/collect_stats.py
+++ b/bin/collect_stats.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import sys
diff --git a/bin/concat_tsv.py b/bin/concat_tsv.py
index 174a2612..61bb2731 100755
--- a/bin/concat_tsv.py
+++ b/bin/concat_tsv.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import sys
diff --git a/bin/download_proteins_entrez.py b/bin/download_proteins_entrez.py
index 741355cf..795abd7c 100755
--- a/bin/download_proteins_entrez.py
+++ b/bin/download_proteins_entrez.py
@@ -1,6 +1,7 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
-# for each strain: select largest assembly (for now)
+# for each strain: select largest assembly or given specific (for now)
 
 import argparse
 import csv
diff --git a/bin/epytope_predict.py b/bin/epytope_predict.py
index e00dae8b..ca27d272 100755
--- a/bin/epytope_predict.py
+++ b/bin/epytope_predict.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import contextlib
diff --git a/bin/fasta_to_tsv.py b/bin/fasta_to_tsv.py
index 4c780943..c2f57454 100755
--- a/bin/fasta_to_tsv.py
+++ b/bin/fasta_to_tsv.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import gzip
diff --git a/bin/finalize_microbiome_entities.py b/bin/finalize_microbiome_entities.py
index 98447002..b610525d 100755
--- a/bin/finalize_microbiome_entities.py
+++ b/bin/finalize_microbiome_entities.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import sys
diff --git a/bin/gen_prediction_chunks.py b/bin/gen_prediction_chunks.py
index 80f7bce1..12dbbc5d 100755
--- a/bin/gen_prediction_chunks.py
+++ b/bin/gen_prediction_chunks.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import os
diff --git a/bin/generate_peptides.py b/bin/generate_peptides.py
index 097f7c70..0f2f7f8a 100755
--- a/bin/generate_peptides.py
+++ b/bin/generate_peptides.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import gzip
diff --git a/bin/generate_protein_and_entity_ids.py b/bin/generate_protein_and_entity_ids.py
index 793fe1c3..90f11075 100755
--- a/bin/generate_protein_and_entity_ids.py
+++ b/bin/generate_protein_and_entity_ids.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 # NOTE
 # entrez proteins of all microbiome input files already within one file (proteins.entrez.tsv.gz)
diff --git a/bin/plot_entity_binding_ratios.R b/bin/plot_entity_binding_ratios.R
index 2935dad3..301db280 100755
--- a/bin/plot_entity_binding_ratios.R
+++ b/bin/plot_entity_binding_ratios.R
@@ -1,4 +1,5 @@
 #!/usr/bin/env Rscript
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 library(optparse)
 library(ggplot2)
diff --git a/bin/plot_score_distribution.R b/bin/plot_score_distribution.R
index 29c92d68..a999a2b8 100755
--- a/bin/plot_score_distribution.R
+++ b/bin/plot_score_distribution.R
@@ -1,4 +1,5 @@
 #!/usr/bin/env Rscript
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 library(ggplot2)
 library(data.table)
diff --git a/bin/prepare_entity_binding_ratios.py b/bin/prepare_entity_binding_ratios.py
index a4fb8773..0608765f 100755
--- a/bin/prepare_entity_binding_ratios.py
+++ b/bin/prepare_entity_binding_ratios.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import datetime
diff --git a/bin/prepare_score_distribution.py b/bin/prepare_score_distribution.py
index 0cd5be80..33c8d1d9 100755
--- a/bin/prepare_score_distribution.py
+++ b/bin/prepare_score_distribution.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import datetime
diff --git a/bin/show_supported_models.py b/bin/show_supported_models.py
index 05dc4c61..b7957515 100755
--- a/bin/show_supported_models.py
+++ b/bin/show_supported_models.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 # This script originates from the nf-core/epitopeprediction pipeline and is modified and refactored for use in nf-core/metapep
 
diff --git a/bin/unify_model_lengths.py b/bin/unify_model_lengths.py
index ffba09d0..44c7da55 100755
--- a/bin/unify_model_lengths.py
+++ b/bin/unify_model_lengths.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-
+# Written by Sabrina Krakau, Leon Kuchenbecker, and Till Englert under the MIT license
 
 import argparse
 import sys

From 83911008be87ff62816a734912bac1b02ed7365d Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 09:39:05 +0200
Subject: [PATCH 10/20] gen_prediction_chunks change to main() and move global
 var

---
 bin/gen_prediction_chunks.py | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/bin/gen_prediction_chunks.py b/bin/gen_prediction_chunks.py
index 12dbbc5d..fdd69f29 100755
--- a/bin/gen_prediction_chunks.py
+++ b/bin/gen_prediction_chunks.py
@@ -8,7 +8,8 @@
 import pandas as pd
 
 ####################################################################################################
-
+global cur_chunk
+####################################################################################################
 
 def parse_args():
     """Parses the command line arguments specified by the user."""
@@ -81,14 +82,11 @@ def parse_args():
     return parser.parse_args()
 
 
-def write_chunks(data, alleles, max_task_per_allele, remainder=False, pbar=None):
+def write_chunks(data, alleles, max_task_per_allele, max_chunk_size, outdir, remainder=False, pbar=None):
     """Takes data in form of a table of peptide_id, peptide_sequence and
     identical allele_name values. The data is partitioned into chunks and
     written into individual output files, prepended with a comment line (#)
     indicating the allele name."""
-    global cur_chunk
-
-    max_chunk_size = args.max_chunk_size
 
     # Dynamically increase the chunk size dependent on the maximum number of allowed processes.
     if len(data)/max_chunk_size > max_task_per_allele:
@@ -104,14 +102,14 @@ def write_chunks(data, alleles, max_task_per_allele, remainder=False, pbar=None)
     for start in range(0, len(data), max_chunk_size):
         # if not handling remainder: only write out full chunks here
         if remainder or len(data) - start >= max_chunk_size:
-            with open(os.path.join(args.outdir, "peptides_" + str(cur_chunk).rjust(5, "0") + ".txt"), "w") as outfile:
+            with open(os.path.join(outdir, "peptides_" + str(globals()["cur_chunk"]).rjust(5, "0") + ".txt"), "w") as outfile:
                 print(f"#{allele_name}#{data.iloc[0].allele_id}", file=outfile)
                 write = data.iloc[start : start + max_chunk_size]
                 written = written.append(data.index[start : start + max_chunk_size])
                 if pbar:
                     pbar.update(len(write))
                 write[["peptide_id", "peptide_sequence"]].to_csv(outfile, sep="\t", index=False)
-                cur_chunk = cur_chunk + 1
+                globals()["cur_chunk"] = globals()["cur_chunk"] + 1
 
     # delete chunks that were written out already
     data.drop(written, inplace=True)
@@ -119,7 +117,9 @@ def write_chunks(data, alleles, max_task_per_allele, remainder=False, pbar=None)
 
 ####################################################################################################
 
-try:
+def main():
+
+
     # Parse command line arguments
     args = parse_args()
 
@@ -194,7 +194,7 @@ def write_chunks(data, alleles, max_task_per_allele, remainder=False, pbar=None)
     print("\nInfo: proteins_allele_info", flush=True)
     proteins_allele_info.info(verbose=False, memory_usage=print_mem)
 
-    cur_chunk = 0
+    globals()["cur_chunk"] = 0
     requests = 0
     keep = pd.DataFrame()
 
@@ -233,7 +233,7 @@ def write_chunks(data, alleles, max_task_per_allele, remainder=False, pbar=None)
             keep = (
                 pd.concat([keep, to_predict], ignore_index=True)
                 .groupby("allele_id", group_keys=False)
-                .apply(lambda x: write_chunks(x, alleles, max_task_per_allele))
+                .apply(lambda x: write_chunks(x, alleles, max_task_per_allele, max_chunk_size=args.max_chunk_size, outdir=args.outdir))
             )
             # use group_keys=False to avoid generation of extra index with "allele_id"
 
@@ -241,11 +241,10 @@ def write_chunks(data, alleles, max_task_per_allele, remainder=False, pbar=None)
             keep.info(verbose=False, memory_usage=print_mem)
 
     # Write out remaining peptides
-    keep.groupby("allele_id", group_keys=False).apply(lambda x: write_chunks(x, alleles, max_task_per_allele, remainder=True))
+    keep.groupby("allele_id", group_keys=False).apply(lambda x: write_chunks(x, alleles, max_task_per_allele, remainder=True, max_chunk_size=args.max_chunk_size, outdir=args.outdir))
 
     # We're happy if we got here
-    print(f"All done. Written {requests} peptide prediction requests into {cur_chunk} chunks.")
-    sys.exit(0)
-except KeyboardInterrupt:
-    print("\nUser aborted.", file=sys.stderr)
-    sys.exit(1)
+    print(f"All done. Written {requests} peptide prediction requests into {globals()['cur_chunk']} chunks.")
+
+if __name__ == "__main__":
+    sys.exit(main())

From 59e7a2c8d8bd1c17ce4a3ce80d7d13f06865577b Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 09:47:30 +0200
Subject: [PATCH 11/20] Add verbose error to gen_prediction_chunks

---
 bin/gen_prediction_chunks.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bin/gen_prediction_chunks.py b/bin/gen_prediction_chunks.py
index fdd69f29..aab293c2 100755
--- a/bin/gen_prediction_chunks.py
+++ b/bin/gen_prediction_chunks.py
@@ -94,7 +94,7 @@ def write_chunks(data, alleles, max_task_per_allele, max_chunk_size, outdir, rem
         max_chunk_size = int(len(data)/max_task_per_allele)+1 # Make sure that all peptides end up in chunks
 
     if remainder and len(data) > max_chunk_size:
-        print("ERROR: Something went wrong!", file=sys.stderr)
+        print("ERROR: Something went wrong! The remainder is larger than the allowed chunk size.", file=sys.stderr)
         sys.exit(1)
 
     allele_name = alleles[alleles["allele_id"] == data.iloc[0].allele_id]["allele_name"].iloc[0]

From d3571cc712328bba83b80ac9d72af30fefc3ae76 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 13:50:01 +0200
Subject: [PATCH 12/20] Fix intendations in modules

---
 modules/local/check_samplesheet_create_tables.nf | 12 ++++++------
 modules/local/collect_stats.nf                   |  2 +-
 modules/local/create_protein_tsv.nf              |  4 ++--
 modules/local/download_proteins.nf               | 12 ++++++------
 modules/local/epytope_show_supported_models.nf   |  2 +-
 modules/local/finalize_microbiome_entities.nf    |  4 ++--
 modules/local/generate_peptides.nf               |  8 ++++----
 modules/local/generate_protein_and_entity_ids.nf | 14 +++++++-------
 modules/local/merge_predictions.nf               |  6 +++---
 modules/local/merge_predictions_buffer.nf        |  6 +++---
 modules/local/plot_entity_binding_ratios.nf      |  4 ++--
 modules/local/plot_score_distribution.nf         |  6 +++---
 modules/local/predict_epitopes.nf                |  6 +++---
 modules/local/prepare_score_distribution.nf      |  4 ++--
 modules/local/split_pred_tasks.nf                | 12 ++++++------
 modules/local/unify_model_lengths.nf             |  6 +++---
 modules/local/unpack_bin_archives.nf             |  4 ++--
 17 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/modules/local/check_samplesheet_create_tables.nf b/modules/local/check_samplesheet_create_tables.nf
index ddfdc5d4..df5342c2 100644
--- a/modules/local/check_samplesheet_create_tables.nf
+++ b/modules/local/check_samplesheet_create_tables.nf
@@ -11,12 +11,12 @@ process CHECK_SAMPLESHEET_CREATE_TABLES {
     path samplesheet
 
     output:
-    path "microbiomes.tsv"          , emit: microbiomes                  // microbiome_id, microbiome_path, microbiome_type, weights_path, microbiome_bare_id
-    path "conditions.tsv"           , emit: conditions                   // condition_id, condition_name, microbiome_id
-    path "alleles.tsv"              , emit: alleles                      // allele_id, allele_name
-    path "conditions_alleles.tsv"   , emit: conditions_alleles           // condition_id, allele_id
-    path "samplesheet.valid.csv"    , emit: samplesheet_valid
-    path "versions.yml"             , emit: versions
+    path "microbiomes.tsv"       , emit: microbiomes                  // microbiome_id, microbiome_path, microbiome_type, weights_path, microbiome_bare_id
+    path "conditions.tsv"        , emit: conditions                   // condition_id, condition_name, microbiome_id
+    path "alleles.tsv"           , emit: alleles                      // allele_id, allele_name
+    path "conditions_alleles.tsv", emit: conditions_alleles           // condition_id, allele_id
+    path "samplesheet.valid.csv" , emit: samplesheet_valid
+    path "versions.yml"          , emit: versions
 
     when:
     task.ext.when == null || task.ext.when
diff --git a/modules/local/collect_stats.nf b/modules/local/collect_stats.nf
index f3c3659d..1ae53e07 100644
--- a/modules/local/collect_stats.nf
+++ b/modules/local/collect_stats.nf
@@ -14,7 +14,7 @@ process COLLECT_STATS {
     path(conditions          )
 
     output:
-    path "stats.txt",    emit: ch_stats
+    path "stats.txt"   , emit: ch_stats
     path "versions.yml", emit: versions
 
     script:
diff --git a/modules/local/create_protein_tsv.nf b/modules/local/create_protein_tsv.nf
index 02ead03f..fcd9411e 100644
--- a/modules/local/create_protein_tsv.nf
+++ b/modules/local/create_protein_tsv.nf
@@ -11,8 +11,8 @@ process CREATE_PROTEIN_TSV {
     tuple val(meta), path(protein_fasta)
 
     output:
-    tuple val(meta), path("proteins.pred_${meta.id}*.tsv.gz")   , emit: ch_pred_proteins     // Emit protein tsv
-    path    "versions.yml"                                      , emit: versions
+    tuple val(meta), path("proteins.pred_${meta.id}*.tsv.gz"), emit: ch_pred_proteins     // Emit protein tsv
+    path    "versions.yml"                                   , emit: versions
 
     script:
     name = meta.bin_basename ? "${meta.id}.${meta.bin_basename}" : "${meta.id}"
diff --git a/modules/local/download_proteins.nf b/modules/local/download_proteins.nf
index e68b6cc1..3b65608a 100644
--- a/modules/local/download_proteins.nf
+++ b/modules/local/download_proteins.nf
@@ -16,12 +16,12 @@ process DOWNLOAD_PROTEINS {
     path   microbiome_files
 
     output:
-    path    "proteins.entrez.tsv.gz"            , emit:  ch_entrez_proteins
-    path    "taxa_assemblies.tsv"               , emit:  ch_entrez_assemblies
-    path    "entities_proteins.entrez.tsv"      , emit:  ch_entrez_entities_proteins  // protein_tmp_id (accessionVersion), entity_name (taxon_id)
-    path    "microbiomes_entities.entrez.tsv"   , emit:  ch_entrez_microbiomes_entities  // entity_name, microbiome_id, entity_weight
-    path    "download_proteins.log"             , emit:  log
-    path    "versions.yml"                      , emit:  versions
+    path    "proteins.entrez.tsv.gz"         , emit: ch_entrez_proteins
+    path    "taxa_assemblies.tsv"            , emit: ch_entrez_assemblies
+    path    "entities_proteins.entrez.tsv"   , emit: ch_entrez_entities_proteins  // protein_tmp_id (accessionVersion), entity_name (taxon_id)
+    path    "microbiomes_entities.entrez.tsv", emit: ch_entrez_microbiomes_entities  // entity_name, microbiome_id, entity_weight
+    path    "download_proteins.log"          , emit: log
+    path    "versions.yml"                   , emit: versions
 
     script:
     def microbiome_ids = microbiome_ids.join(' ')
diff --git a/modules/local/epytope_show_supported_models.nf b/modules/local/epytope_show_supported_models.nf
index 7ba4431c..5f1243f7 100644
--- a/modules/local/epytope_show_supported_models.nf
+++ b/modules/local/epytope_show_supported_models.nf
@@ -7,7 +7,7 @@ process EPYTOPE_SHOW_SUPPORTED_MODELS {
         'biocontainers/epytope:3.3.1--pyh7cba7a3_0' }"
 
     output:
-    path "*.txt",        emit: txt
+    path "*.txt"       , emit: txt
     path "versions.yml", emit: versions
 
     script:
diff --git a/modules/local/finalize_microbiome_entities.nf b/modules/local/finalize_microbiome_entities.nf
index ce0303a4..7627f1c0 100644
--- a/modules/local/finalize_microbiome_entities.nf
+++ b/modules/local/finalize_microbiome_entities.nf
@@ -13,8 +13,8 @@ process FINALIZE_MICROBIOME_ENTITIES {
     path(entities)
 
     output:
-    path    "microbiomes_entities.tsv"  , emit: ch_microbiomes_entities  // entity_id, microbiome_id, entity_weight
-    path    "versions.yml"              , emit: versions
+    path    "microbiomes_entities.tsv", emit: ch_microbiomes_entities  // entity_id, microbiome_id, entity_weight
+    path    "versions.yml"            , emit: versions
 
     script:
 
diff --git a/modules/local/generate_peptides.nf b/modules/local/generate_peptides.nf
index 65d55e33..53593b4a 100644
--- a/modules/local/generate_peptides.nf
+++ b/modules/local/generate_peptides.nf
@@ -13,13 +13,13 @@ process GENERATE_PEPTIDES {
     val(peptide_lengths)
 
     output:
-    path "peptides.tsv.gz",         emit: ch_peptides               // peptide_id, peptide_sequence
-    path "proteins_peptides.tsv",   emit: ch_proteins_peptides      // protein_id, peptide_id, count
-    path "versions.yml",            emit: versions
+    path "peptides.tsv.gz"      , emit: ch_peptides               // peptide_id, peptide_sequence
+    path "proteins_peptides.tsv", emit: ch_proteins_peptides      // protein_id, peptide_id, count
+    path "versions.yml"         , emit: versions
     //file "proteins_lengths.tsv"
 
     script:
-    def mem_log_level         = params.memory_usage_log_deep ? "--mem_log_level_deep" : ""
+    def mem_log_level = params.memory_usage_log_deep ? "--mem_log_level_deep" : ""
     """
     generate_peptides.py -i $proteins \\
                         -p "peptides.tsv.gz" \\
diff --git a/modules/local/generate_protein_and_entity_ids.nf b/modules/local/generate_protein_and_entity_ids.nf
index c40e088a..a06ddee5 100644
--- a/modules/local/generate_protein_and_entity_ids.nf
+++ b/modules/local/generate_protein_and_entity_ids.nf
@@ -15,15 +15,15 @@ process GENERATE_PROTEIN_AND_ENTITY_IDS {
     path(entrez_microbiomes_entities)
 
     output:
-    path   "proteins.tsv.gz"                        , emit:   ch_proteins
-    path   "entities_proteins.tsv"                  , emit:   ch_entities_proteins
-    path   "entities.tsv"                           , emit:   ch_entities
-    path   "microbiomes_entities.no_weights.tsv"    , emit:   ch_microbiomes_entities_noweights  // microbiome_id, entitiy_id  (no weights yet!)
-    path   "versions.yml"                           , emit:   versions
+    path   "proteins.tsv.gz"                    , emit: ch_proteins
+    path   "entities_proteins.tsv"              , emit: ch_entities_proteins
+    path   "entities.tsv"                       , emit: ch_entities
+    path   "microbiomes_entities.no_weights.tsv", emit: ch_microbiomes_entities_noweights  // microbiome_id, entitiy_id  (no weights yet!)
+    path   "versions.yml"                       , emit: versions
 
     script:
-        predicted_proteins_microbiome_ids   = predicted_proteins_meta.collect { meta -> meta.id }.join(' ')
-        predicted_proteins_bin_basenames    = predicted_proteins_meta.collect { meta -> meta.bin_basename ?: "__ISASSEMBLY__" }.join(' ')
+        predicted_proteins_microbiome_ids = predicted_proteins_meta.collect { meta -> meta.id }.join(' ')
+        predicted_proteins_bin_basenames  = predicted_proteins_meta.collect { meta -> meta.bin_basename ?: "__ISASSEMBLY__" }.join(' ')
 
     """
     generate_protein_and_entity_ids.py \
diff --git a/modules/local/merge_predictions.nf b/modules/local/merge_predictions.nf
index 80ddf6ba..9bbb1a28 100644
--- a/modules/local/merge_predictions.nf
+++ b/modules/local/merge_predictions.nf
@@ -12,9 +12,9 @@ process MERGE_PREDICTIONS {
     path prediction_warnings
 
     output:
-    path "predictions.tsv.gz",          emit: ch_predictions
-    path "prediction_warnings.log",     emit: ch_prediction_warnings
-    path "versions.yml",                emit: versions
+    path "predictions.tsv.gz"     , emit: ch_predictions
+    path "prediction_warnings.log", emit: ch_prediction_warnings
+    path "versions.yml"           , emit: versions
 
     script:
     def chunk_size = params.prediction_chunk_size * params.pred_chunk_size_scaling
diff --git a/modules/local/merge_predictions_buffer.nf b/modules/local/merge_predictions_buffer.nf
index e78cc6fd..6ba77147 100644
--- a/modules/local/merge_predictions_buffer.nf
+++ b/modules/local/merge_predictions_buffer.nf
@@ -11,9 +11,9 @@ process MERGE_PREDICTIONS_BUFFER {
     path    prediction_warnings
 
     output:
-    path "predictions.buffer_*.tsv",            emit: ch_predictions_merged_buffer
-    path "prediction_warnings.buffer_*.log",    emit: ch_prediction_warnings_merged_buffer
-    path "versions.yml",                        emit: versions
+    path "predictions.buffer_*.tsv"        , emit: ch_predictions_merged_buffer
+    path "prediction_warnings.buffer_*.log", emit: ch_prediction_warnings_merged_buffer
+    path "versions.yml"                    , emit: versions
 
     script:
     def chunk_size = params.prediction_chunk_size * params.pred_chunk_size_scaling
diff --git a/modules/local/plot_entity_binding_ratios.nf b/modules/local/plot_entity_binding_ratios.nf
index 07a34d87..cf8014e2 100644
--- a/modules/local/plot_entity_binding_ratios.nf
+++ b/modules/local/plot_entity_binding_ratios.nf
@@ -11,8 +11,8 @@ process PLOT_ENTITY_BINDING_RATIOS {
     path alleles
 
     output:
-    path "entity_binding_ratios.*.pdf",     emit:   ch_plot_entity_binding_ratios
-    path "versions.yml",                    emit:   versions
+    path "entity_binding_ratios.*.pdf", emit: ch_plot_entity_binding_ratios
+    path "versions.yml"               , emit: versions
 
     script:
     def hide_pvalue = params.hide_pvalue ? "TRUE" : "FALSE"
diff --git a/modules/local/plot_score_distribution.nf b/modules/local/plot_score_distribution.nf
index 942f8dba..e50881c9 100644
--- a/modules/local/plot_score_distribution.nf
+++ b/modules/local/plot_score_distribution.nf
@@ -12,12 +12,12 @@ process PLOT_SCORE_DISTRIBUTION {
     path conditions
 
     output:
-    path "prediction_score_distribution.*.pdf",     emit:   ch_plot_score_distribution
-    path "versions.yml",                            emit:   versions
+    path "prediction_score_distribution.*.pdf", emit: ch_plot_score_distribution
+    path "versions.yml"                       , emit: versions
 
     script:
     def syfpeithi_threshold = params.syfpeithi_score_threshold
-    def mhcfn_threshold = params.mhcflurry_mhcnuggets_score_threshold
+    def mhcfn_threshold     = params.mhcflurry_mhcnuggets_score_threshold
     """
     [[ ${prep_scores} =~ prediction_scores.allele_(.*).tsv ]];
     allele_id="\${BASH_REMATCH[1]}"
diff --git a/modules/local/predict_epitopes.nf b/modules/local/predict_epitopes.nf
index 53cef5c9..bf18cafb 100644
--- a/modules/local/predict_epitopes.nf
+++ b/modules/local/predict_epitopes.nf
@@ -12,9 +12,9 @@ process PREDICT_EPITOPES {
     path(peptides)
 
     output:
-    path "*predictions.tsv",            emit:   ch_epitope_predictions
-    path "*pred_warnings.log",          emit:   ch_epitope_prediction_warnings
-    path "versions.yml",                emit:   versions
+    path "*predictions.tsv"  , emit: ch_epitope_predictions
+    path "*pred_warnings.log", emit: ch_epitope_prediction_warnings
+    path "versions.yml"      , emit: versions
 
     script:
     """
diff --git a/modules/local/prepare_score_distribution.nf b/modules/local/prepare_score_distribution.nf
index fb16b716..2159721a 100644
--- a/modules/local/prepare_score_distribution.nf
+++ b/modules/local/prepare_score_distribution.nf
@@ -22,8 +22,8 @@ process PREPARE_SCORE_DISTRIBUTION {
     path "versions.yml"                  , emit: versions
 
     script:
-    def chunk_size            = params.downstream_chunk_size
-    def mem_log_level         = params.memory_usage_log_deep ? "--mem_log_level_deep" : ""
+    def chunk_size    = params.downstream_chunk_size
+    def mem_log_level = params.memory_usage_log_deep ? "--mem_log_level_deep" : ""
     """
     prepare_score_distribution.py --predictions "$predictions" \\
                             --protein-peptide-occ "$proteins_peptides" \\
diff --git a/modules/local/split_pred_tasks.nf b/modules/local/split_pred_tasks.nf
index ed0f2c6a..c5a04d7a 100644
--- a/modules/local/split_pred_tasks.nf
+++ b/modules/local/split_pred_tasks.nf
@@ -20,14 +20,14 @@ process SPLIT_PRED_TASKS {
     // and thus to enumerate, which (peptide, allele) combinations have to be predicted.
 
     output:
-    path "peptides_*.txt",  emit:   ch_epitope_prediction_chunks
-    path "versions.yml",    emit:   versions
+    path "peptides_*.txt", emit: ch_epitope_prediction_chunks
+    path "versions.yml"  , emit: versions
 
     script:
-    def max_chunk_num         = params.max_task_num
-    def pred_chunk_size       = params.prediction_chunk_size
-    def proc_chunk_size       = params.prediction_chunk_size * params.pred_chunk_size_scaling
-    def mem_log_level         = params.memory_usage_log_deep ? "--mem_log_level_deep" : ""
+    def max_chunk_num    = params.max_task_num
+    def pred_chunk_size  = params.prediction_chunk_size
+    def proc_chunk_size  = params.prediction_chunk_size * params.pred_chunk_size_scaling
+    def mem_log_level    = params.memory_usage_log_deep ? "--mem_log_level_deep" : ""
     """
     gen_prediction_chunks.py --peptides "$peptides" \\
                             --protein-peptide-occ "$proteins_peptides" \\
diff --git a/modules/local/unify_model_lengths.nf b/modules/local/unify_model_lengths.nf
index a0800c85..8724c735 100644
--- a/modules/local/unify_model_lengths.nf
+++ b/modules/local/unify_model_lengths.nf
@@ -11,9 +11,9 @@ process UNIFY_MODEL_LENGTHS {
     path samplesheet_valid
 
     output:
-    path "*_unify_peptide_lengths.log"      , emit: log
-    env unified_peptide_lengths             , emit: unified_pep_lens
-    path "versions.yml"                     , emit: versions
+    path "*_unify_peptide_lengths.log", emit: log
+    env unified_peptide_lengths       , emit: unified_pep_lens
+    path "versions.yml"               , emit: versions
 
     when:
     task.ext.when == null || task.ext.when
diff --git a/modules/local/unpack_bin_archives.nf b/modules/local/unpack_bin_archives.nf
index 5a1f5daf..85092be9 100644
--- a/modules/local/unpack_bin_archives.nf
+++ b/modules/local/unpack_bin_archives.nf
@@ -11,8 +11,8 @@ process UNPACK_BIN_ARCHIVES {
     tuple val(meta), path(microbiome_path)
 
     output:
-    tuple val(meta), path("unpacked/*")             , emit: ch_microbiomes_bins_archives_unpacked
-    path "versions.yml"                             , emit: versions
+    tuple val(meta), path("unpacked/*"), emit: ch_microbiomes_bins_archives_unpacked
+    path "versions.yml"                , emit: versions
 
     script:
     """

From fdb594add45e85f0ec50fa966d152464280ef06c Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 13:52:12 +0200
Subject: [PATCH 13/20] Indent In main workflow

---
 workflows/metapep.nf | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/workflows/metapep.nf b/workflows/metapep.nf
index 697204ba..7ffda992 100644
--- a/workflows/metapep.nf
+++ b/workflows/metapep.nf
@@ -156,13 +156,13 @@ workflow METAPEP {
         // Split prediction tasks (peptide, allele) into chunks of peptides that are to
         // be predicted against the same allele for parallel prediction
         SPLIT_PRED_TASKS (
-        GENERATE_PEPTIDES.out.ch_peptides,
-        GENERATE_PEPTIDES.out.ch_proteins_peptides,
-        GENERATE_PROTEIN_AND_ENTITY_IDS.out.ch_entities_proteins,
-        FINALIZE_MICROBIOME_ENTITIES.out.ch_microbiomes_entities,
-        PROCESS_INPUT.out.ch_conditions,
-        PROCESS_INPUT.out.ch_conditions_alleles,
-        PROCESS_INPUT.out.ch_alleles
+            GENERATE_PEPTIDES.out.ch_peptides,
+            GENERATE_PEPTIDES.out.ch_proteins_peptides,
+            GENERATE_PROTEIN_AND_ENTITY_IDS.out.ch_entities_proteins,
+            FINALIZE_MICROBIOME_ENTITIES.out.ch_microbiomes_entities,
+            PROCESS_INPUT.out.ch_conditions,
+            PROCESS_INPUT.out.ch_conditions_alleles,
+            PROCESS_INPUT.out.ch_alleles
         )
         ch_versions = ch_versions.mix(SPLIT_PRED_TASKS.out.versions)
 

From 48d049e8a129a78f97cb01e5f8753598c6e2c5bc Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 13:55:49 +0200
Subject: [PATCH 14/20] More Indents

---
 modules/local/assign_nucl_entity_weights.nf      |  4 ++--
 modules/local/create_protein_tsv.nf              |  2 +-
 modules/local/download_proteins.nf               | 16 ++++++++--------
 modules/local/finalize_microbiome_entities.nf    |  4 ++--
 modules/local/generate_protein_and_entity_ids.nf | 10 +++++-----
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/modules/local/assign_nucl_entity_weights.nf b/modules/local/assign_nucl_entity_weights.nf
index 7bcbe9c7..58f5b916 100644
--- a/modules/local/assign_nucl_entity_weights.nf
+++ b/modules/local/assign_nucl_entity_weights.nf
@@ -12,8 +12,8 @@ process ASSIGN_NUCL_ENTITY_WEIGHTS {
     path weights_files
 
     output:
-    path   "microbiomes_entities.nucl.tsv", emit: ch_nucl_microbiomes_entities  // entity_name, microbiome_id, entity_weight
-    path    "versions.yml"                , emit: versions
+    path "microbiomes_entities.nucl.tsv", emit: ch_nucl_microbiomes_entities  // entity_name, microbiome_id, entity_weight
+    path "versions.yml"                 , emit: versions
 
 
     script:
diff --git a/modules/local/create_protein_tsv.nf b/modules/local/create_protein_tsv.nf
index fcd9411e..11350378 100644
--- a/modules/local/create_protein_tsv.nf
+++ b/modules/local/create_protein_tsv.nf
@@ -12,7 +12,7 @@ process CREATE_PROTEIN_TSV {
 
     output:
     tuple val(meta), path("proteins.pred_${meta.id}*.tsv.gz"), emit: ch_pred_proteins     // Emit protein tsv
-    path    "versions.yml"                                   , emit: versions
+    path "versions.yml"                                   , emit: versions
 
     script:
     name = meta.bin_basename ? "${meta.id}.${meta.bin_basename}" : "${meta.id}"
diff --git a/modules/local/download_proteins.nf b/modules/local/download_proteins.nf
index 3b65608a..dadb5d78 100644
--- a/modules/local/download_proteins.nf
+++ b/modules/local/download_proteins.nf
@@ -12,16 +12,16 @@ process DOWNLOAD_PROTEINS {
     secret "NCBI_KEY"
 
     input:
-    val    microbiome_ids
-    path   microbiome_files
+    val  microbiome_ids
+    path microbiome_files
 
     output:
-    path    "proteins.entrez.tsv.gz"         , emit: ch_entrez_proteins
-    path    "taxa_assemblies.tsv"            , emit: ch_entrez_assemblies
-    path    "entities_proteins.entrez.tsv"   , emit: ch_entrez_entities_proteins  // protein_tmp_id (accessionVersion), entity_name (taxon_id)
-    path    "microbiomes_entities.entrez.tsv", emit: ch_entrez_microbiomes_entities  // entity_name, microbiome_id, entity_weight
-    path    "download_proteins.log"          , emit: log
-    path    "versions.yml"                   , emit: versions
+    path "proteins.entrez.tsv.gz"         , emit: ch_entrez_proteins
+    path "taxa_assemblies.tsv"            , emit: ch_entrez_assemblies
+    path "entities_proteins.entrez.tsv"   , emit: ch_entrez_entities_proteins  // protein_tmp_id (accessionVersion), entity_name (taxon_id)
+    path "microbiomes_entities.entrez.tsv", emit: ch_entrez_microbiomes_entities  // entity_name, microbiome_id, entity_weight
+    path "download_proteins.log"          , emit: log
+    path "versions.yml"                   , emit: versions
 
     script:
     def microbiome_ids = microbiome_ids.join(' ')
diff --git a/modules/local/finalize_microbiome_entities.nf b/modules/local/finalize_microbiome_entities.nf
index 7627f1c0..b78d32d6 100644
--- a/modules/local/finalize_microbiome_entities.nf
+++ b/modules/local/finalize_microbiome_entities.nf
@@ -13,8 +13,8 @@ process FINALIZE_MICROBIOME_ENTITIES {
     path(entities)
 
     output:
-    path    "microbiomes_entities.tsv", emit: ch_microbiomes_entities  // entity_id, microbiome_id, entity_weight
-    path    "versions.yml"            , emit: versions
+    path "microbiomes_entities.tsv", emit: ch_microbiomes_entities  // entity_id, microbiome_id, entity_weight
+    path "versions.yml"            , emit: versions
 
     script:
 
diff --git a/modules/local/generate_protein_and_entity_ids.nf b/modules/local/generate_protein_and_entity_ids.nf
index a06ddee5..29ea17a2 100644
--- a/modules/local/generate_protein_and_entity_ids.nf
+++ b/modules/local/generate_protein_and_entity_ids.nf
@@ -15,11 +15,11 @@ process GENERATE_PROTEIN_AND_ENTITY_IDS {
     path(entrez_microbiomes_entities)
 
     output:
-    path   "proteins.tsv.gz"                    , emit: ch_proteins
-    path   "entities_proteins.tsv"              , emit: ch_entities_proteins
-    path   "entities.tsv"                       , emit: ch_entities
-    path   "microbiomes_entities.no_weights.tsv", emit: ch_microbiomes_entities_noweights  // microbiome_id, entitiy_id  (no weights yet!)
-    path   "versions.yml"                       , emit: versions
+    path "proteins.tsv.gz"                    , emit: ch_proteins
+    path "entities_proteins.tsv"              , emit: ch_entities_proteins
+    path "entities.tsv"                       , emit: ch_entities
+    path "microbiomes_entities.no_weights.tsv", emit: ch_microbiomes_entities_noweights  // microbiome_id, entitiy_id  (no weights yet!)
+    path "versions.yml"                       , emit: versions
 
     script:
         predicted_proteins_microbiome_ids = predicted_proteins_meta.collect { meta -> meta.id }.join(' ')

From ebb62642a9f7bd804746fbd4f5ba937f364e91ba Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 13:56:03 +0200
Subject: [PATCH 15/20] remove echo of allele_id

---
 modules/local/plot_entity_binding_ratios.nf | 1 -
 modules/local/plot_score_distribution.nf    | 1 -
 2 files changed, 2 deletions(-)

diff --git a/modules/local/plot_entity_binding_ratios.nf b/modules/local/plot_entity_binding_ratios.nf
index cf8014e2..a76e15bb 100644
--- a/modules/local/plot_entity_binding_ratios.nf
+++ b/modules/local/plot_entity_binding_ratios.nf
@@ -19,7 +19,6 @@ process PLOT_ENTITY_BINDING_RATIOS {
     """
     [[ ${prep_entity_binding_ratios} =~ entity_binding_ratios.allele_(.*).tsv ]];
     allele_id="\${BASH_REMATCH[1]}"
-    echo \$allele_id
 
     plot_entity_binding_ratios.R \\
         -r $prep_entity_binding_ratios \\
diff --git a/modules/local/plot_score_distribution.nf b/modules/local/plot_score_distribution.nf
index e50881c9..ea4fd92c 100644
--- a/modules/local/plot_score_distribution.nf
+++ b/modules/local/plot_score_distribution.nf
@@ -21,7 +21,6 @@ process PLOT_SCORE_DISTRIBUTION {
     """
     [[ ${prep_scores} =~ prediction_scores.allele_(.*).tsv ]];
     allele_id="\${BASH_REMATCH[1]}"
-    echo \$allele_id
 
     plot_score_distribution.R \\
         $prep_scores \\

From a6398694a8d9e9b3496bd8388dcb81ea71447a8d Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 14:13:56 +0200
Subject: [PATCH 16/20] found another indent mishap

---
 subworkflows/local/process_input.nf | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/subworkflows/local/process_input.nf b/subworkflows/local/process_input.nf
index 7757ca66..dba7287b 100644
--- a/subworkflows/local/process_input.nf
+++ b/subworkflows/local/process_input.nf
@@ -146,7 +146,7 @@ workflow PROCESS_INPUT {
 
         // Concatenate the channels and remove redundant entries for nucleotide based inputs
         // In case of co-assembly the input fasta will be used for prediction only once
-        ch_nucl_input           = ch_assembly_input.concat(ch_bins_archives_input, ch_bins_folders_input).unique()
+        ch_nucl_input = ch_assembly_input.concat(ch_bins_archives_input, ch_bins_folders_input).unique()
         ch_nucl_input.dump(tag:"nucl")
 
         // ####################################################################################################

From 6b1aafd1619c190708b088ca396fca5ba8ab8ae9 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 14:14:04 +0200
Subject: [PATCH 17/20] Add tags to the plot modules

---
 modules/local/plot_entity_binding_ratios.nf | 1 +
 modules/local/plot_score_distribution.nf    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/modules/local/plot_entity_binding_ratios.nf b/modules/local/plot_entity_binding_ratios.nf
index a76e15bb..445d0151 100644
--- a/modules/local/plot_entity_binding_ratios.nf
+++ b/modules/local/plot_entity_binding_ratios.nf
@@ -1,4 +1,5 @@
 process PLOT_ENTITY_BINDING_RATIOS {
+    tag "$prep_entity_binding_ratios"
     label 'process_medium_memory'
 
     conda "conda-forge::r-ggplot2=3.4.2 conda-forge::r-data.table=1.14.8 conda-forge::r-dplyr=1.1.2 conda-forge::r-stringr=1.5.0 conda-forge::r-ggpubr=0.6.0 conda-forge::r-optparse=1.7.3"
diff --git a/modules/local/plot_score_distribution.nf b/modules/local/plot_score_distribution.nf
index ea4fd92c..c5d1f62b 100644
--- a/modules/local/plot_score_distribution.nf
+++ b/modules/local/plot_score_distribution.nf
@@ -1,4 +1,5 @@
 process PLOT_SCORE_DISTRIBUTION {
+    tag "$prep_scores"
     label 'process_medium_memory'
 
     conda "bioconda::bioconductor-alphabeta=1.8.0"

From d77b4cb4a7cf3041b142cd0ca17381329db0cd24 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 21:09:14 +0200
Subject: [PATCH 18/20] fix linting

---
 LICENSE | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/LICENSE b/LICENSE
index 16073369..bb43d8b9 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) Sabrina Krakau, Leon Kuchenbecker, and Till Englert
+Copyright (c) Sabrina Krakau, Leon Kuchenbecker and Till Englert
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

From 516714e81474ed76e41e6262381c34b12e0df5a2 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 21:21:34 +0200
Subject: [PATCH 19/20] Add fancy release name

---
 CHANGELOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index e052592e..362d38a8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,7 +3,7 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v1.0.0 - [2022-01-20]
+## v1.0.0 nf-core/metapep "Golden Megalodon" - [2022-01-20]
 
 First release of [nf-core/metapep](https://nf-co.re/metapep), based on [nf-core](https://nf-co.re) standards and [nf-core/tools](https://nf-co.re/tools) template version 1.14.1.
 

From 954de9953d764a77a96c01db512830a3f8f3e447 Mon Sep 17 00:00:00 2001
From: Till Englert <till.englert96@gmail.com>
Date: Tue, 1 Oct 2024 21:22:02 +0200
Subject: [PATCH 20/20] missed semicolon

---
 CHANGELOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 362d38a8..f8ed7d95 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,7 +3,7 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v1.0.0 nf-core/metapep "Golden Megalodon" - [2022-01-20]
+## v1.0.0 - nf-core/metapep "Golden Megalodon" - [2022-01-20]
 
 First release of [nf-core/metapep](https://nf-co.re/metapep), based on [nf-core](https://nf-co.re) standards and [nf-core/tools](https://nf-co.re/tools) template version 1.14.1.