Skip to content

Commit

Permalink
Merge pull request #5059 from EngyNasr/PathogenTrainingJune2024Update2
Browse files Browse the repository at this point in the history
Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition
  • Loading branch information
paulzierep authored Jun 20, 2024
2 parents 61e7b9b + fea16c6 commit 374441d
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ We will run all these steps using a single workflow, then discuss each step and
> {% snippet faqs/galaxy/workflows_import.md %}
>
> 2. Run **Workflow 1: Nanopore Preprocessing** {% icon workflow %} using the following parameters
> - *"Samples Profile"*: `PacBio/Oxford Nanopore read to reference mapping`
> - *"Samples Profile"*: `PacBio/Oxford Nanopore read to reference mapping`, which is the technique used for sequencing the samples.
>
> - {% icon param-files %} *"Collection of all samples"*: `Samples` collection created from the imported Fastq.qz files
>
Expand Down Expand Up @@ -206,9 +206,17 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con
> - {% icon param-files %} *"Data input files"*: `Samples` collection created from the imported Fastq.qz files
>
> > <comment-title></comment-title>
> > This step, as it does not require the results of FastQC to run, can be launched even if FastQC is not ready
> > The `NanoPlot` step, as it does not require the results of FastQC to run, can be launched even if FastQC is not ready
> {: .comment}
>
> 3. {% tool [MultiQC](toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0) %} with the following parameters:
> - In *"Results"*:
> - {% icon param-repeat %} *"Insert Results"*
> - *"Which tool was used generate logs?"*: `FastQC`
> - In *"FastQC output"*:
> - {% icon param-repeat %} *"Insert FastQC output"*
> - *"Type of FastQC output?"*: `Raw data`
> - {% icon param-files %} *"FastQC output"*: collection of `Raw data` outputs of **FastQC** {% icon tool %}
{: .hands_on}
</div>
Expand All @@ -226,7 +234,7 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con
>
> 2. {% tool [fastp](toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.20.1+galaxy0) %} with the following parameters:
> - *"Single-end or paired reads"*: `Single-end`
> - {% icon param-files %} *"Input 1"*: outputs of **Porechop** {% icon tool %}
> - {% icon param-files %} *"Input 1"*: output collection of **Porechop** {% icon tool %}
> - In *Output Options*
> - *"Output JSON report"*: `Yes`
>
Expand All @@ -243,12 +251,12 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con
> <hands-on-title> Final quality checks </hands-on-title>
> 1. {% tool [FastQC](toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.73+galaxy0) %} with the following parameters:
> - {% icon param-files %} *"Raw read data from your current history"*: outputs of **fastp** {% icon tool %}
> - {% icon param-files %} *"Raw read data from your current history"*: output collection of **fastp** {% icon tool %}
>
> 2. {% tool [NanoPlot](toolshed.g2.bx.psu.edu/repos/iuc/nanoplot/nanoplot/1.28.2+galaxy1) %} with the following parameters:
> - *"Select multifile mode"*: `batch`
> - *"Type of the file(s) to work on"*: `fastq`
> - {% icon param-files %} *"files"*: outputs of **fastp** {% icon tool %}
> - {% icon param-files %} *"files"*: output collection of **fastp** {% icon tool %}
>
> 3. {% tool [MultiQC](toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0) %} with the following parameters:
> - In *"Results"*:
Expand All @@ -257,17 +265,17 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con
> - In *"FastQC output"*:
> - {% icon param-repeat %} *"Insert FastQC output"*
> - *"Type of FastQC output?"*: `Raw data`
> - {% icon param-files %} *"FastQC output"*: 4 `Raw data` outputs of **FastQC** {% icon tool %}
> - {% icon param-files %} *"FastQC output"*: collection of `Raw data` output of **FastQC** {% icon tool %} done after **fastp**
> - {% icon param-repeat %} *"Insert Results"*
> - *"Which tool was used generate logs?"*: `fastp`
> - {% icon param-files %} *"Output of fastp"*: `JSON report` outputs of **fastp** {% icon tool %}
> - {% icon param-files %} *"Output of fastp"*: `JSON report` output of **fastp** {% icon tool %}
{: .hands_on}
</div>
> <question-title></question-title>
>
> Inspect the HTML output of **MultiQC** for `Barcode10`
> Inspect the HTML two outputs of **MultiQC** for `Barcode10` before and after preprocessing tagged `MultiQC_Before_Preprocessing` and `MultiQC_After_Preprocessing`
>
> 1. How many sequences does `Barcode10` contain before and after trimming?
> 2. What is the quality score over the reads before and after trimming? And the mean score?
Expand All @@ -278,8 +286,13 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con
> > 1. Before trimming the file has 114,986 sequences and After trimming the file has 91,434 sequences
> > 2. The "Per base sequence quality" is globally medium: the quality score stays above 20 over the entire length of reads after trimming, while quality below 20 could be seen before trimming specially at the beginning and the end of the reads.
> >
> > Sequence quality of Barcode 10 and Barcode 11 before preprocessing:
> >
> > ![Sequence Quality of Barcode 10 and Barcode 11 Before Trimming](./images/multiqc_per_base_sequence_quality_plot_barcode10_barcode11_before_trimming.png)
> >
> >
> > Sequence quality of Barcode 10 and Barcode 11 after preprocessing:
> >
> > ![Sequence Quality of Barcode 10 and Barcode 11 After Trimming](./images/multiqc_per_base_sequence_quality_plot_barcode10_barcode11_after_trimming.png)
> >
> > 3. After checking what is wrong, e.g. before trimming, we should think about the errors reported by **FastQC**: they may come from the type of sequencing or what we sequenced (check the ["Quality control" training]({% link topics/sequence-analysis/tutorials/quality-control/tutorial.md %}): [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more details). However, despite these challenges, we can already see sequences getting slightly better after the trimming and filtering, so now we can proceed with our analyses.
Expand All @@ -294,7 +307,7 @@ In this tutorial we use similar tools as described in the tutorial ["Quality con
Generally, we are not interested in the food (host) sequences, rather only those originating from the pathogen itself. It is an important to get rid of all host sequences and to only retain sequences that might include a pathogen, both in order to speed up further steps and to avoid host sequences compromising the analysis.
In this tutorial, we know the samples come from __chicken__ meat spiked with **_Salmonella_** so we already know what will we get as the host and the main pathogen.
In this tutorial, we know the samples come from __chicken__ meat spiked with **_Salmonella_** so we already know what will we get as the host and the main pathogen. If the host is not known, **Kraken2** with **Kalamari** database can be used to detect it.
In this tutorial we use:
1. Map reads to __chicken__ reference genome using **Map with minimap2** and **Chicken (Gallus gallus): galGal6** built in reference genome of __chicken__, and we move forward with the unmapped ones.
Expand All @@ -308,7 +321,7 @@ In this tutorial we use:
> - *"Using reference genome"*: `Chicken (Gallus gallus): galGal6`
> - *"Single or Paired-end reads"*: `Single`
> - {% icon param-file %} *"Select fastq dataset"*: `out1` (output of **fastp** {% icon tool %})
> - *"Select a profile of preset options"*: `PacBio/Oxford Nanopore read to reference mapping (-Hk19) (map-pb)`
> - *"Select a profile of preset options"*: `PacBio/Oxford Nanopore read to reference mapping (-Hk19) (map-pb)`, which is the technique used for sequencing the samples.
> - In *"Alignment options"*:
> - *"Customize spliced alignment mode?"*: `No, use profile setting or leave turned off`
>
Expand All @@ -322,7 +335,7 @@ In this tutorial we use:
>
{: .hands_on}
2. Assign filted reads, after mapping (non __chicken__ reads), to taxa using **Kraken2** ({% cite Wood2014 %}) and **Kalamari**, a database of completed assemblies for metagenomics-related tasks used widely in contamination and host filtering
2. Assign filted reads, after mapping (non __chicken__ reads), to taxa using **Kraken2** ({% cite Wood2014 %}) as a further contamination detection using the **Kalamari** database. The **Kalamari** database includes mitochondrial sequences of various known hosts including food hosts.
<div class="Long-Version" markdown="1">
Expand Down Expand Up @@ -898,7 +911,7 @@ In this training, we are testing _Salmonella enterica_, with different strains o
> <hands-on-title>Allele based Pathogenic Identification</hands-on-title>
>
> 1. **Import the workflow** into Galaxy
> - Copy the URL (e.g. via right-click) of [this workflow]({{ site.baseurl }}{{ page.dir }}workflows/nanopore_allele_based_pathogen_identification.ga) or download it to your computer.
> - Copy the URL (e.g. via right-click) of [this workflow]({{ site.baseurl }}{{ page.dir }}workflows/allele_based_pathogen_identification.ga) or download it to your computer.
> - Import the workflow into Galaxy
>
> {% snippet faqs/galaxy/workflows_import.md %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
- doc: Test outline for nanopore_allele_based_pathogen_identification
- doc: Test outline for allele_based_pathogen_identification
job:
Reference Genome of Tested Strain:
class: File
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,9 @@
"format-version": "0.1",
"license": "MIT",
"release": "0.1",
"name": "Nanopore Allele-based Pathogen Identification",
"name": "Allele-based Pathogen Identification",
"report": {
"markdown": "# Nanopore - Allele based Pathogen Identification Workflow Report\nBelow are the results for the Allele based Pathogenic Identification Workflow\n\nThis workflow was run on:\n\n```galaxy\ngenerate_time()\n```\n\nWith Galaxy version:\n\n```galaxy\ngenerate_galaxy_version()\n```\n\n## Workflow Inputs\nThe Perprocessing workflow main output (Collection of all samples reads after quality retaining and hosts filtering), and a FASTA file of the reference genome of the main Pathogen identified in the Gene based Pathogen Identification workflow, or per-known to the user.\n\n## Workflow Output: \n\n### All variants found per sample against the reference genome\n\n```galaxy\nhistory_dataset_display(output=\"extracted_fields_from_the_vcf_output\")\n```\n\n### Number of variants per sample\n\n```galaxy\nhistory_dataset_display(output=\"number_of_variants_per_sample\")\n```\n\n### Mapping mean depth per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_mean_depth_per_sample\")\n```\n\n### Mapping coverage per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_coverage_percentage_per_sample\")\n```\n"
"markdown": "# Allele based Pathogen Identification Workflow Report\nBelow are the results for the Allele based Pathogenic Identification Workflow\n\nThis workflow was run on:\n\n```galaxy\ngenerate_time()\n```\n\nWith Galaxy version:\n\n```galaxy\ngenerate_galaxy_version()\n```\n\n## Workflow Inputs\nThe Perprocessing workflow main output (Collection of all samples reads after quality retaining and hosts filtering), and a FASTA file of the reference genome of the main Pathogen identified in the Gene based Pathogen Identification workflow, or per-known to the user.\n\n## Workflow Output: \n\n### All variants found per sample against the reference genome\n\n```galaxy\nhistory_dataset_display(output=\"extracted_fields_from_the_vcf_output\")\n```\n\n### Number of variants per sample\n\n```galaxy\nhistory_dataset_display(output=\"number_of_variants_per_sample\")\n```\n\n### Mapping mean depth per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_mean_depth_per_sample\")\n```\n\n### Mapping coverage per sample\n\n```galaxy\nhistory_dataset_display(output=\"mapping_coverage_percentage_per_sample\")\n```\n"
},
"steps": {
"0": {
Expand Down Expand Up @@ -1320,7 +1320,6 @@
"name:Collection",
"name:microGalaxy",
"name:PathoGFAIR",
"name:Nanopore",
"name:IWC"
],
"uuid": "deb94861-ed4d-41fe-881a-8565c6b8fa82",
Expand Down

0 comments on commit 374441d

Please sign in to comment.