Skip to content

Commit

Permalink
readme update7
Browse files Browse the repository at this point in the history
  • Loading branch information
stuber committed Sep 19, 2024
1 parent c64acb4 commit 0974549
Show file tree
Hide file tree
Showing 3 changed files with 157 additions and 102 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ This step combines the VCF files from Step 1 to create SNP matrices and construc
conda create -c conda-forge -c bioconda -n vsnp3 vsnp3=3.25
```

For detailed Anaconda setup instructions, see [conda instructions](./docs/instructions/conda_instructions.md).
For detailed Miniconda setup instructions, see [conda instructions](./docs/instructions/conda_instructions.md).

### Verification

Expand Down
213 changes: 129 additions & 84 deletions docs/instructions/additional_tools.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,36 @@
# Additional Programs
# Additional Bioinformatics Tools for Genomic Analysis

Many programs can be used to help identify reads. Three programs useful to use alongside vSNP are Mashtree, kSNP and Kraken.
## Table of Contents
1. [Introduction](#introduction)
2. [Example Dataset](#example-dataset)
3. [Mashtree](#mashtree)
4. [kSNP](#ksnp)
5. [Kraken/Krona](#krakenkreona)
6. [SRA Tools](#sra-tools)

Best results from vSNP are provided when a sample is less than 1,000 SNPs from a reference. If a sample is too distant from a reference the alignment error can cause time consuming corrections. Good reference selection is important for best results. Mashtree and kSNP can help in reference selection. [Mashtree](https://github.com/lskatz/mashtree) and [kSNP](https://pubmed.ncbi.nlm.nih.gov/25913206) are reference independent phylogenetic tree building programs. Mashtree is very fast, kSNP is slower but results may be more accurate and additional information is provide to help qualify results.
## Introduction

[Kraken](https://ccb.jhu.edu/software/kraken2/) uses kmers to identify reads. If a sample is not behaving as expected or contamination is suspected Kraken is a powerful tool for determining read identification quickly. When used with Krona an easy to read HTML file is provided.
In genomic analysis, particularly when working with vSNP (variant calling and phylogenetic analysis tool), several complementary programs can significantly enhance your workflow. This guide focuses on three powerful tools: Mashtree, kSNP, and Kraken, along with instructions for using SRA Tools to obtain sequence data.

Below are brief installation and usage insturctions for these tools. See their individual links for more detail. The scripts provided for kSNP and Kraken are only for example. Users should make updates as needed.
### Why use these tools?

# Example Dataset
- **Reference Selection**: vSNP performs best when samples are within 1,000 SNPs of a reference. Mashtree and kSNP can aid in selecting appropriate references.
- **Phylogenetic Analysis**: Both Mashtree and kSNP build reference-independent phylogenetic trees, offering different trade-offs between speed and accuracy.
- **Read Identification**: Kraken excels at rapid read identification, crucial for detecting contamination or unexpected sample composition.

## FASTAs for reference-free tree building
## Example Dataset

```
cd ~; mkdir tree_test; cd tree_test
```
Before we dive into the tools, let's set up an example dataset to work with.

Make `list` with the following
### Preparing FASTA files for reference-free tree building

```
```bash
# Create and navigate to a working directory
cd ~
mkdir tree_test && cd tree_test

# Create a list of accession numbers
cat << EOF > accession_list.txt
NC_000962
NC_018143
NZ_CP017594
Expand All @@ -27,127 +39,160 @@ NC_015758
NC_002945
NZ_CP039850
NZ_LR882497
```
EOF

Download list

```
for i in `cat list`; do vsnp3_download_fasta_gbk_gff_by_acc.py -a $i -f; done
# Download FASTA files using vSNP3
while read i; do
vsnp3_download_fasta_gbk_gff_by_acc.py -a $i -f
done < accession_list.txt
```

vsnp3 available from github
```
cd ~; git clone https://github.com/USDA-VS/vsnp3.git
```
Note: Ensure you have vSNP3 installed. If not, you can install it following these [instructions](https://github.com/USDA-VS/vSNP3).

Building Mashtree, kSNP and Kraken in their own conda environments ensures installation dependencies do not conflict. Scripts provided in the cloned vsnp3 repo above are needed since conda environments are independent.
## Mashtree

# Mashtree
Mashtree is a rapid method for creating phylogenetic trees based on MinHash distances.

Create conda environment
### Installation and Usage

```
```bash
# Create and activate a conda environment for Mashtree
conda create -n mashtree -c conda-forge -c bioconda mashtree
```
```
conda activate mashtree
```
Change to directory with test files

```
# Navigate to the directory with test files
cd ~/tree_test
```
Build tree from FASTAs

```
# Build a tree from FASTA files
mashtree --sketch-size 1000000 --numcpus 4 *.fasta > mashtree.tre
```

## kSNP

# kSNP
kSNP is a SNP-based approach to phylogenetic tree construction that doesn't require genome alignment or a reference genome.

As of late 2023 kSNP needs to be download from [sourceforge](https://sourceforge.net/projects/ksnp/files/).
### Installation

There is a new version of kSNP as of 2023, kSNP4.1.
As of late 2023, kSNP4.1 needs to be downloaded from [SourceForge](https://sourceforge.net/projects/ksnp/files/).

Choose the prebuild binary for your environment, download and unzip.
1. Download the prebuilt binary for your environment.
2. Unzip the file and place it in your desired location (e.g., `${HOME}`).
3. Add kSNP to your PATH:
```bash
echo 'export PATH="${HOME}/kSNP4/kSNP4.1pkg:$PATH"' >> ~/.zshrc
source ~/.zshrc
```

Place unzipped file in desired location (${HOME} will work)
### Usage

Add to PATH, `PATH="${HOME}/kSNP4/kSNP4.1pkg":$PATH`

Change directory to location of FASTA files
```bash
# Navigate to the directory with FASTA files
cd ~/tree_test

```
# Prepare input file
MakeKSNP4infile -indir ./ -outfile myInfile S
```
```
Kchooser4 -in myInfile
```
```
kSNP4 -in myInfile -outdir run -CPU 8 -k 21 -core -ML -min_frac 0.8
```

# Kraken/Krona

Create conda environment
# Choose optimal k-mer size
Kchooser4 -in myInfile

```
conda create -n kraken -c conda-forge -c bioconda kraken2 krona krakentools wget pandas pigz
# Run kSNP
kSNP4 -in myInfile -outdir ksnp_run -CPU 8 -k 21 -core -ML -min_frac 0.8
```

After the conda install it will provide additional setup instructions for these programs.
## Kraken/Krona

[Download](https://benlangmead.github.io/aws-indexes/k2) Kraken database.
Kraken is a system for ultrafast metagenomic sequence classification using exact k-mer matches. Krona provides interactive visualization of the results.

There are many Databases to choose from. If unsure and download speeds allow try the standard database. If a smaller database is necessary Standard-8 may be a good option. Look at site for exact database naming.
### Installation

Example download
```
cd ~; wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240112.tar.gz
```
```
mkdir k2_standard_08gb; tar -xzf k2_standard_08gb_*.tar.gz -C k2_standard_08gb
```
```bash
# Create and activate a conda environment for Kraken
conda create -n kraken -c conda-forge -c bioconda kraken2 krona krakentools wget pandas pigz
conda activate kraken

If needed link database to conda environment and download taxonomy.
# Download Kraken database (example using standard-8 database)
cd ~
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240112.tar.gz
mkdir k2_standard_08gb
tar -xzf k2_standard_08gb_*.tar.gz -C k2_standard_08gb

```
# Link database and update taxonomy (adjust paths as needed)
rm -rf ${HOME}/anaconda3/envs/kraken/opt/krona/taxonomy
ln -s ${HOME}/k2_standard_08gb ${HOME}/anaconda3/envs/kraken/opt/krona/taxonomy
ktUpdateTaxonomy.sh
```

Just an Example. Supply your specific path to wrapper.
```
~/anaconda3/envs/vsnp3/bin/vsnp3_kraken2_wrapper.py -r1 SRR6046640_R1.fastq.gz -r2 SRR6046640_R2.fastq.gz --database ~/k2_standard_08gb
Additional prebuilt Kraken Databases available [here](https://benlangmead.github.io/aws-indexes/k2)

### Usage

Here's an example using a wrapper script (adjust the path to your specific location):

```bash
./vsnp3/bin/vsnp3_kraken2_wrapper.py -r1 SRR6046640_R1.fastq.gz -r2 SRR6046640_R2.fastq.gz --database ~/k2_standard_08gb
```

## SRA Tools

SRA Tools allow you to access data from the NCBI Sequence Read Archive.

### Installation

```bash
conda create -n sra-tools -c conda-forge -c bioconda sra-tools
conda activate sra-tools
```
conda create -n sra-tools -c conda-forge -c bioconda -n sra-tools
```
```

### Usage

#### Basic Usage

```bash
# Download and split FASTQ files
fasterq-dump --split-files -O . SRR26282520
```
```
wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR6046640/SRR6046640
```
```

# Alternative method
wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR6046640/SRR6046640
fastq-dump --split-files SRR6046640
```
### macOS
```

#### Platform-Specific Instructions

##### macOS

If you've downloaded the SRA Toolkit directly:

```bash
~/sratoolkit.3.0.7-mac64/bin/fasterq-dump -S SRR6046640
```
### Docker
Download Docker. It must be running.
```

##### Docker

Ensure Docker is installed and running, then:

```bash
docker pull ncbi/sra-tools
docker run -t --rm -v $PWD:/output:rw -w /output ncbi/sra-tools fasterq-dump -e 2 -p SRR6046640
```
### Singularity
```

##### Singularity

```bash
singularity pull docker://ncbi/sra-tools
singularity run sra-tools_latest.sif fasterq-dump -e 2 -p SRR6046640
```

## Conclusion

These tools form a powerful suite for genomic analysis, complementing vSNP3 and each other. By mastering Mashtree, kSNP, Kraken/Krona, and SRA Tools, you'll be well-equipped to handle a wide range of genomic analysis tasks efficiently.

Remember to always check for the latest versions and updates of these tools, as bioinformatics software evolves rapidly.

For more detailed information on each tool, please refer to their respective documentation:

- [Mashtree GitHub](https://github.com/lskatz/mashtree)
- [kSNP Documentation](https://sourceforge.net/projects/ksnp/files/)
- [Kraken2 Manual](https://github.com/DerrickWood/kraken2/wiki/Manual)
- [SRA Tools Documentation](https://github.com/ncbi/sra-tools/wiki)

Happy analyzing!
44 changes: 27 additions & 17 deletions docs/instructions/conda_instructions.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,54 @@
# Anaconda Installation
# Miniconda Installation

Linux environment is needed to install and use the [Anaconda package manager](https://www.anaconda.com/products/distribution).
A Linux environment is needed to install and use [Miniconda](https://docs.conda.io/en/latest/miniconda.html), a minimal installer for conda.

`wget` links below are only for example. One should check for updated distributions.
`wget` links below are for example. Always check for the latest distributions on the official Miniconda website.

If using a Mac download the Mac distribution, Mac 64-Bit Command Line installer
If using a Mac, download the Mac distribution:

```
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-MacOSX-x86_64.sh
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
```

If using WSL, download the Linux 64-Bit Installer
If using WSL or Linux, download the Linux 64-Bit Installer:

```
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
```

Install Anaconda using the downloaded file.
Install Miniconda using the downloaded file:

```
bash ./Anaconda3-2022.05-*-x86_64.sh
bash Miniconda3-latest-*-x86_64.sh
```

Press `Enter` to review agreement. Exit agreement, `q`. Accept terms, `yes`. Press enter to install in default home directory. After installation agree to `conda init`.
Follow the prompts:
1. Press `Enter` to review the license agreement.
2. Press `q` to exit the agreement view.
3. Type `yes` to accept the terms.
4. Press `Enter` to confirm the default installation location or enter a custom path.
5. When asked if you wish to initialize Miniconda3, type `yes`.

Close and reopen terminal.
Close and reopen your terminal for the changes to take effect.

# Anaconda Environment
# Conda Environment

Do not install packages in base. Instead make an environment.
It's best practice not to install packages in the base environment. Instead, create a new environment for your project.

Summary of commands to [manage environments](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
Summary of commands to [manage environments](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html):

Create new environment
Create a new environment:
```
conda create --name myenv
```

Activate the environment:
```
conda activate myenv
```
For vSNP3 see [README](../../README.md)

[Additional Tools](../../docs/instructions/additional_tools.md)
For vSNP3, please refer to the [README](../../README.md).

For information on additional tools, see [Additional Tools](../../docs/instructions/additional_tools.md).

Remember, Miniconda provides a minimal set of packages. If you need additional packages, you can install them using `conda install` within your activated environment.

0 comments on commit 0974549

Please sign in to comment.