Skip to content

Commit

Permalink
fix #10
Browse files Browse the repository at this point in the history
  • Loading branch information
tavareshugo committed Dec 8, 2023
1 parent d3fec9f commit 6f0abd1
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 20 deletions.
5 changes: 3 additions & 2 deletions materials/10-phylogenetics.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,8 +272,9 @@ iqtree \
```

- We extract the variant sites and count of invariant sites using `SNP-sites`.
- We specify as input the `aligned_pseudogenomes_masked_snps.fas` produced in the previous step of the script by running `SNP-sites`.
- We specify the number of constant sites, also generated from the previous exercise. We can use `$(cat results/snp-sites/constant_sites.txt)` to directly add the contents of `constant_sites.txt` without having to open the file to obtain these numbers.
- As input to both `snp-sites` steps, we use the `aligned_pseudogenomes_masked_snps.fas` produced in the previous step of our analysis.
- The input alignment used in `iqtree` is the one from the previous step.
- The number of constant sites was specified in the script as `$(cat results/snp-sites/constant_sites.txt)`. This allows to directly add the contents of the `constant_sites.txt` file, without having to open the file to obtain these numbers.
- We use as prefix for our output files "Nam_TB" (since we are using the "Namibian TB" data), so all the output file names will be named as such.
- We automatically detect the number of threads/CPUs for parallel computation.
- We specify the maximum amount of memory and threads/CPUs to use for computation.
Expand Down
28 changes: 11 additions & 17 deletions materials/23-panaroo.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,18 +182,6 @@ combined_protein_cdhit_out.txt.clstr gene_presence_absence.Rtab
core_alignment_filtered_header.embl gene_presence_absence.csv
```

The script included other code - which we didn't need to fix - that extracted the variable sites from our alignment (as detailed in [Building phylogenetic trees](10-phylogenetics.md)). These steps took as input the alignment file generated by Panaroo:

```bash
# extract variable sites
snp-sites results/panaroo/core_gene_alignment.aln > results/snp-sites/core_gene_alignment_snps.aln

# count invariant sites
snp-sites -C results/panaroo/core_gene_alignment.aln > results/snp-sites/constant_sites.txt
```

These two steps of the analysis produce the two files that we want to use in the next exercise.

:::
:::

Expand All @@ -210,10 +198,9 @@ Produce a tree from the core genome alignment from the previous step.
- Run the script using `bash scripts/03-run_iqtree.sh`. Several messages will be printed on the screen while `IQ-TREE` runs.

:::{.callout-hint}
For IQ-TREE:
For _SNP-sites_:

- The constant sites can be obtained by looking at the output of the `snp-sites` in `results/snp-sites/constant_sites.txt` (or in the `preprocessed` folder if you are still waiting for your `Panaroo` analysis to finish).
- The input alignment should be the output from the `snp-sites` program in `results/snp-sites/core_gene_alignment_snps.aln` (or in the `preprocessed` folder if you are still waiting for your `Panaroo` analysis to finish).
- The input alignment should be the output from the `panaroo` program found in `results/panaroo/` (or in the `preprocessed` folder if you are still waiting for your analysis to finish).

:::

Expand All @@ -227,6 +214,12 @@ The fixed script is:
# create output directory
mkdir -p results/iqtree/

# extract variable sites
snp-sites results/panaroo/core_gene_alignment_filtered.aln > results/snp-sites/core_gene_alignment_snps.aln

# count invariant sites
snp-sites -C results/panaroo/core_gene_alignment_filtered.aln > results/snp-sites/constant_sites.txt

# Run iqtree
iqtree \
-fconst $(cat results/snp-sites/constant_sites.txt) \
Expand All @@ -239,8 +232,9 @@ iqtree \
-bb 1000
```

- We specify as input the `core_gene_alignment_snps.aln` produced in the previous exercise by running `Panaroo` followed by `SNP-sites`.
- We specify the number of constant sites, also generated from the previous exercise. We can use `$(cat results/snp-sites/constant_sites.txt)` to directly add the contents of `constant_sites.txt` without having to open the file to obtain these numbers.
- We extract the variant sites and count of invariant sites using `SNP-sites`.
- As input to both `snp-sites` steps, we use the `core_gene_alignment_snps.aln` alignment produced by _Panaroo_ in the previous step of our analysis.
- The number of constant sites was specified with `$(cat results/snp-sites/constant_sites.txt)` to directly add the contents of the `constant_sites.txt` file, without having to open the file to obtain these numbers.
- We use as prefix for our output files "School_Staph" (since we are using the data from the schools Staph paper), so all the output file names will be named as such.
- We automatically detect the number of threads/CPUs for parallel computation.

Expand Down
9 changes: 8 additions & 1 deletion materials/28-recombination.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,12 @@ Now that we have created a recombination-masked alignment, we can extract the va
- Fix the script provided in `scripts/04-run_iqtree.sh`. See @sec-iqtree if you need a hint of how to fix the code in the script.
- Run the script using `bash scripts/04-run_iqtree.sh`. Several messages will be printed on the screen while `IQ-TREE` runs.

:::{.callout-hint}
For _SNP-sites_:

- The input alignment should be the output from the `gubbins` program found in `results/gubbins/` (or in the `preprocessed` folder if you are still waiting for your analysis to finish).
:::

:::{.callout-answer}

The fixed script is:
Expand Down Expand Up @@ -180,7 +186,8 @@ iqtree \
-bb 1000
```

- The script starts by extracting variable sites using _SNP-sites_. We used as input the `aligned_pseudogenomes_masked_snps.fas` file produced in the previous exercise.
- We extract the variant sites and count of invariant sites using `SNP-sites`.
- As input to both `snp-sites` steps, we use the `aligned_pseudogenomes_masked_snps.fas` file produced in the previous exercise.
- The next step runs _IQ-tree_:
- We specify the number of constant sites, also generated from the previous exercise. We can use `$(cat results/snp-sites/constant_sites.txt)` to directly add the contents of `constant_sites.txt` without having to open the file to obtain these numbers.
- We use as prefix for our output files "sero1" (since we are using the data from the Chaguza serotype 1 paper), so all the output file names will be named as such.
Expand Down

0 comments on commit 6f0abd1

Please sign in to comment.