From 54c71c68df495a9d45e80f835dfbf81c9920632f Mon Sep 17 00:00:00 2001 From: EngyNasr Date: Wed, 19 Jun 2024 10:47:51 +0200 Subject: [PATCH 1/4] correcting some training explanation in the preprocessing workflow, and adding a missing step in the long version of the training --- .../tutorial.md | 27 ++++++++++++++----- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md index 18647f7df36a8e..4be61bbbb8f642 100644 --- a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md +++ b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md @@ -198,6 +198,14 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con > > This step, as it does not require the results of FastQC to run, can be launched even if FastQC is not ready > {: .comment} > + > 3. {% tool [MultiQC](toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0) %} with the following parameters: + > - In *"Results"*: + > - {% icon param-repeat %} *"Insert Results"* + > - *"Which tool was used generate logs?"*: `FastQC` + > - In *"FastQC output"*: + > - {% icon param-repeat %} *"Insert FastQC output"* + > - *"Type of FastQC output?"*: `Raw data` + > - {% icon param-files %} *"FastQC output"*: collection of `Raw data` outputs of **FastQC** {% icon tool %} {: .hands_on} @@ -215,7 +223,7 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con > > 2. {% tool [fastp](toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.20.1+galaxy0) %} with the following parameters: > - *"Single-end or paired reads"*: `Single-end` - > - {% icon param-files %} *"Input 1"*: outputs of **Porechop** {% icon tool %} + > - {% icon param-files %} *"Input 1"*: output collection of **Porechop** {% icon tool %} > - In *Output Options* > - *"Output JSON report"*: `Yes` > @@ -232,12 +240,12 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con > Final quality checks > 1. {% tool [FastQC](toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.73+galaxy0) %} with the following parameters: - > - {% icon param-files %} *"Raw read data from your current history"*: outputs of **fastp** {% icon tool %} + > - {% icon param-files %} *"Raw read data from your current history"*: output collection of **fastp** {% icon tool %} > > 2. {% tool [NanoPlot](toolshed.g2.bx.psu.edu/repos/iuc/nanoplot/nanoplot/1.28.2+galaxy1) %} with the following parameters: > - *"Select multifile mode"*: `batch` > - *"Type of the file(s) to work on"*: `fastq` - > - {% icon param-files %} *"files"*: outputs of **fastp** {% icon tool %} + > - {% icon param-files %} *"files"*: output collection of **fastp** {% icon tool %} > > 3. {% tool [MultiQC](toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0) %} with the following parameters: > - In *"Results"*: @@ -246,17 +254,17 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con > - In *"FastQC output"*: > - {% icon param-repeat %} *"Insert FastQC output"* > - *"Type of FastQC output?"*: `Raw data` - > - {% icon param-files %} *"FastQC output"*: 4 `Raw data` outputs of **FastQC** {% icon tool %} + > - {% icon param-files %} *"FastQC output"*: collection of `Raw data` outputs of **FastQC** {% icon tool %} done after fastp > - {% icon param-repeat %} *"Insert Results"* > - *"Which tool was used generate logs?"*: `fastp` - > - {% icon param-files %} *"Output of fastp"*: `JSON report` outputs of **fastp** {% icon tool %} + > - {% icon param-files %} *"Output of fastp"*: `JSON report` output of **fastp** {% icon tool %} {: .hands_on} > > -> Inspect the HTML output of **MultiQC** for `Barcode10` +> Inspect the HTML two outputs of **MultiQC** for `Barcode10` before and after preprocessing tagged `MultiQC_Before_Preprocessing` and `MultiQC_After_Preprocessing` > > 1. How many sequences does `Barcode10` contain before and after trimming? > 2. What is the quality score over the reads before and after trimming? And the mean score? @@ -267,8 +275,13 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con > > 1. Before trimming the file has 114,986 sequences and After trimming the file has 91,434 sequences > > 2. The "Per base sequence quality" is globally medium: the quality score stays above 20 over the entire length of reads after trimming, while quality below 20 could be seen before trimming specially at the beginning and the end of the reads. > > +> > Sequence quality of Barcode 10 and Barcode 11 before preprocessing: +> > > > ![Sequence Quality of Barcode 10 and Barcode 11 Before Trimming](./images/multiqc_per_base_sequence_quality_plot_barcode10_barcode11_before_trimming.png) > > +> > +> > Sequence quality of Barcode 10 and Barcode 11 after preprocessing: +> > > > ![Sequence Quality of Barcode 10 and Barcode 11 After Trimming](./images/multiqc_per_base_sequence_quality_plot_barcode10_barcode11_after_trimming.png) > > > > 3. After checking what is wrong, e.g. before trimming, we should think about the errors reported by **FastQC**: they may come from the type of sequencing or what we sequenced (check the ["Quality control" training]({% link topics/sequence-analysis/tutorials/quality-control/tutorial.md %}): [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more details). However, despite these challenges, we can already see sequences getting slightly better after the trimming and filtering, so now we can proceed with our analyses. @@ -311,7 +324,7 @@ In this tutorial we use: > {: .hands_on} -2. Assign filted reads, after mapping (non __chicken__ reads), to taxa using **Kraken2** ({% cite Wood2014 %}) and **Kalamari**, a database of completed assemblies for metagenomics-related tasks used widely in contamination and host filtering +2. Assign filted reads, after mapping (non __chicken__ reads), to taxa using **Kraken2** ({% cite Wood2014 %}) as a further contamination detection using the **Kalamari** database. The **Kalamari** database includes mitochondrial sequences of various known hosts including food hosts.
From 56f79079677e95a64567889f4c06802504afb611 Mon Sep 17 00:00:00 2001 From: EngyNasr Date: Wed, 19 Jun 2024 10:53:38 +0200 Subject: [PATCH 2/4] one more typo --- .../pathogen-detection-from-nanopore-foodborne-data/tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md index 4be61bbbb8f642..962a308cd375a0 100644 --- a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md +++ b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md @@ -254,7 +254,7 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con > - In *"FastQC output"*: > - {% icon param-repeat %} *"Insert FastQC output"* > - *"Type of FastQC output?"*: `Raw data` - > - {% icon param-files %} *"FastQC output"*: collection of `Raw data` outputs of **FastQC** {% icon tool %} done after fastp + > - {% icon param-files %} *"FastQC output"*: collection of `Raw data` output of **FastQC** {% icon tool %} done after **fastp** > - {% icon param-repeat %} *"Insert Results"* > - *"Which tool was used generate logs?"*: `fastp` > - {% icon param-files %} *"Output of fastp"*: `JSON report` output of **fastp** {% icon tool %} From e2070467bc95215a95c0663de1a5f2f02d7c3c24 Mon Sep 17 00:00:00 2001 From: Engy Nasr Date: Wed, 19 Jun 2024 22:54:34 +0200 Subject: [PATCH 3/4] Update topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md Co-authored-by: paulzierep --- .../pathogen-detection-from-nanopore-foodborne-data/tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md index 962a308cd375a0..3c8d94080c2989 100644 --- a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md +++ b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md @@ -195,7 +195,7 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con > - {% icon param-files %} *"Data input files"*: `Samples` collection created from the imported Fastq.qz files > > > - > > This step, as it does not require the results of FastQC to run, can be launched even if FastQC is not ready + > > The `NanoPlot` step, as it does not require the results of FastQC to run, can be launched even if FastQC is not ready > {: .comment} > > 3. {% tool [MultiQC](toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0) %} with the following parameters: From fea16c6949ebaafe1f872bfe3a5f05958a15e7f9 Mon Sep 17 00:00:00 2001 From: EngyNasr Date: Wed, 19 Jun 2024 23:47:55 +0200 Subject: [PATCH 4/4] updating to include all previous comments --- .../tutorial.md | 8 ++++---- ....yml => allele_based_pathogen_identification-test.yml} | 2 +- ...ication.ga => allele_based_pathogen_identification.ga} | 5 ++--- 3 files changed, 7 insertions(+), 8 deletions(-) rename topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/{nanopore_allele_based_pathogen_identification-test.yml => allele_based_pathogen_identification-test.yml} (92%) rename topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/{nanopore_allele_based_pathogen_identification.ga => allele_based_pathogen_identification.ga} (97%) diff --git a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md index 3c8d94080c2989..4342d5ab0a436a 100644 --- a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md +++ b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.md @@ -160,7 +160,7 @@ We will run all these steps using a single workflow, then discuss each step and > {% snippet faqs/galaxy/workflows_import.md %} > > 2. Run **Workflow 1: Nanopore Preprocessing** {% icon workflow %} using the following parameters -> - *"Samples Profile"*: `PacBio/Oxford Nanopore read to reference mapping` +> - *"Samples Profile"*: `PacBio/Oxford Nanopore read to reference mapping`, which is the technique used for sequencing the samples. > > - {% icon param-files %} *"Collection of all samples"*: `Samples` collection created from the imported Fastq.qz files > @@ -296,7 +296,7 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con Generally, we are not interested in the food (host) sequences, rather only those originating from the pathogen itself. It is an important to get rid of all host sequences and to only retain sequences that might include a pathogen, both in order to speed up further steps and to avoid host sequences compromising the analysis. -In this tutorial, we know the samples come from __chicken__ meat spiked with **_Salmonella_** so we already know what will we get as the host and the main pathogen. +In this tutorial, we know the samples come from __chicken__ meat spiked with **_Salmonella_** so we already know what will we get as the host and the main pathogen. If the host is not known, **Kraken2** with **Kalamari** database can be used to detect it. In this tutorial we use: 1. Map reads to __chicken__ reference genome using **Map with minimap2** and **Chicken (Gallus gallus): galGal6** built in reference genome of __chicken__, and we move forward with the unmapped ones. @@ -310,7 +310,7 @@ In this tutorial we use: > - *"Using reference genome"*: `Chicken (Gallus gallus): galGal6` > - *"Single or Paired-end reads"*: `Single` > - {% icon param-file %} *"Select fastq dataset"*: `out1` (output of **fastp** {% icon tool %}) - > - *"Select a profile of preset options"*: `PacBio/Oxford Nanopore read to reference mapping (-Hk19) (map-pb)` + > - *"Select a profile of preset options"*: `PacBio/Oxford Nanopore read to reference mapping (-Hk19) (map-pb)`, which is the technique used for sequencing the samples. > - In *"Alignment options"*: > - *"Customize spliced alignment mode?"*: `No, use profile setting or leave turned off` > @@ -900,7 +900,7 @@ In this training, we are testing _Salmonella enterica_, with different strains o > Allele based Pathogenic Identification > > 1. **Import the workflow** into Galaxy -> - Copy the URL (e.g. via right-click) of [this workflow]({{ site.baseurl }}{{ page.dir }}workflows/nanopore_allele_based_pathogen_identification.ga) or download it to your computer. +> - Copy the URL (e.g. via right-click) of [this workflow]({{ site.baseurl }}{{ page.dir }}workflows/allele_based_pathogen_identification.ga) or download it to your computer. > - Import the workflow into Galaxy > > {% snippet faqs/galaxy/workflows_import.md %} diff --git a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/nanopore_allele_based_pathogen_identification-test.yml b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/allele_based_pathogen_identification-test.yml similarity index 92% rename from topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/nanopore_allele_based_pathogen_identification-test.yml rename to topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/allele_based_pathogen_identification-test.yml index 10c1294bf91c7e..043c84517cd59e 100644 --- a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/nanopore_allele_based_pathogen_identification-test.yml +++ b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/allele_based_pathogen_identification-test.yml @@ -1,4 +1,4 @@ -- doc: Test outline for nanopore_allele_based_pathogen_identification +- doc: Test outline for allele_based_pathogen_identification job: Reference Genome of Tested Strain: class: File diff --git a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/nanopore_allele_based_pathogen_identification.ga b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/allele_based_pathogen_identification.ga similarity index 97% rename from topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/nanopore_allele_based_pathogen_identification.ga rename to topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/allele_based_pathogen_identification.ga index 4d41cf3073e7ae..df0ee2575c7665 100644 --- a/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/nanopore_allele_based_pathogen_identification.ga +++ b/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/workflows/allele_based_pathogen_identification.ga @@ -101,9 +101,9 @@ "format-version": "0.1", "license": "MIT", "release": "0.1", - "name": "Nanopore Allele-based Pathogen Identification", + "name": "Allele-based Pathogen Identification", "report": { - "markdown": "# Nanopore - Allele based Pathogen Identification Workflow Report\nBelow are the results for the Allele based Pathogenic Identification Workflow\n\nThis workflow was run on:\n\n```galaxy\ngenerate_time()\n```\n\nWith Galaxy version:\n\n```galaxy\ngenerate_galaxy_version()\n```\n\n## Workflow Inputs\nThe Perprocessing workflow main output (Collection of all samples reads after quality retaining and hosts filtering), and a FASTA file of the reference genome of the main Pathogen identified in the Gene based Pathogen Identification workflow, or per-known to the user.\n\n## Workflow Output: \n\n### All variants found per sample against the reference genome\n\n```galaxy\nhistory_dataset_display(output=\"extracted_fields_from_the_vcf_output\")\n```\n\n### Number of variants per sample\n\n```galaxy\nhistory_dataset_display(output=\"number_of_variants_per_sample\")\n```\n\n### Mapping mean depth per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_mean_depth_per_sample\")\n```\n\n### Mapping coverage per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_coverage_percentage_per_sample\")\n```\n" + "markdown": "# Allele based Pathogen Identification Workflow Report\nBelow are the results for the Allele based Pathogenic Identification Workflow\n\nThis workflow was run on:\n\n```galaxy\ngenerate_time()\n```\n\nWith Galaxy version:\n\n```galaxy\ngenerate_galaxy_version()\n```\n\n## Workflow Inputs\nThe Perprocessing workflow main output (Collection of all samples reads after quality retaining and hosts filtering), and a FASTA file of the reference genome of the main Pathogen identified in the Gene based Pathogen Identification workflow, or per-known to the user.\n\n## Workflow Output: \n\n### All variants found per sample against the reference genome\n\n```galaxy\nhistory_dataset_display(output=\"extracted_fields_from_the_vcf_output\")\n```\n\n### Number of variants per sample\n\n```galaxy\nhistory_dataset_display(output=\"number_of_variants_per_sample\")\n```\n\n### Mapping mean depth per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_mean_depth_per_sample\")\n```\n\n### Mapping coverage per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_coverage_percentage_per_sample\")\n```\n" }, "steps": { "0": { @@ -1320,7 +1320,6 @@ "name:Collection", "name:microGalaxy", "name:PathoGFAIR", - "name:Nanopore", "name:IWC" ], "uuid": "deb94861-ed4d-41fe-881a-8565c6b8fa82",