readme update7

USDA-VS · Sep 19, 2024 · 0974549 · 0974549
1 parent c64acb4
commit 0974549
Show file tree

Hide file tree

Showing 3 changed files with 157 additions and 102 deletions.
diff --git a/README.md b/README.md
@@ -56,7 +56,7 @@ This step combines the VCF files from Step 1 to create SNP matrices and construc
 conda create -c conda-forge -c bioconda -n vsnp3 vsnp3=3.25
 ```
 
-For detailed Anaconda setup instructions, see [conda instructions](./docs/instructions/conda_instructions.md).
+For detailed Miniconda setup instructions, see [conda instructions](./docs/instructions/conda_instructions.md).
 
 ### Verification
 

diff --git a/docs/instructions/additional_tools.md b/docs/instructions/additional_tools.md
@@ -1,24 +1,36 @@
-# Additional Programs
+# Additional Bioinformatics Tools for Genomic Analysis
 
-Many programs can be used to help identify reads.  Three programs useful to use alongside vSNP are Mashtree, kSNP and Kraken.  
+## Table of Contents
+1. [Introduction](#introduction)
+2. [Example Dataset](#example-dataset)
+3. [Mashtree](#mashtree)
+4. [kSNP](#ksnp)
+5. [Kraken/Krona](#krakenkreona)
+6. [SRA Tools](#sra-tools)
 
-Best results from vSNP are provided when a sample is less than 1,000 SNPs from a reference.  If a sample is too distant from a reference the alignment error can cause time consuming corrections.  Good reference selection is important for best results.  Mashtree and kSNP can help in reference selection.  [Mashtree](https://github.com/lskatz/mashtree) and [kSNP](https://pubmed.ncbi.nlm.nih.gov/25913206) are reference independent phylogenetic tree building programs.  Mashtree is very fast, kSNP is slower but results may be more accurate and additional information is provide to help qualify results.  
+## Introduction
 
-[Kraken](https://ccb.jhu.edu/software/kraken2/) uses kmers to identify reads.  If a sample is not behaving as expected or contamination is suspected Kraken is a powerful tool for determining read identification quickly.  When used with Krona an easy to read HTML file is provided.
+In genomic analysis, particularly when working with vSNP (variant calling and phylogenetic analysis tool), several complementary programs can significantly enhance your workflow. This guide focuses on three powerful tools: Mashtree, kSNP, and Kraken, along with instructions for using SRA Tools to obtain sequence data.
 
-Below are brief installation and usage insturctions for these tools.  See their individual links for more detail.  The scripts provided for kSNP and Kraken are only for example.  Users should make updates as needed.
+### Why use these tools?
 
-# Example Dataset
+- **Reference Selection**: vSNP performs best when samples are within 1,000 SNPs of a reference. Mashtree and kSNP can aid in selecting appropriate references.
+- **Phylogenetic Analysis**: Both Mashtree and kSNP build reference-independent phylogenetic trees, offering different trade-offs between speed and accuracy.
+- **Read Identification**: Kraken excels at rapid read identification, crucial for detecting contamination or unexpected sample composition.
 
-## FASTAs for reference-free tree building
+## Example Dataset
 
-```
-cd ~; mkdir tree_test; cd tree_test
-```
+Before we dive into the tools, let's set up an example dataset to work with.
 
-Make `list` with the following
+### Preparing FASTA files for reference-free tree building
 
-```
+```bash
+# Create and navigate to a working directory
+cd ~
+mkdir tree_test && cd tree_test
+
+# Create a list of accession numbers
+cat << EOF > accession_list.txt
 NC_000962
 NC_018143
 NZ_CP017594
@@ -27,127 +39,160 @@ NC_015758
 NC_002945
 NZ_CP039850
 NZ_LR882497
-```
+EOF
 
-Download list
-
-```
-for i in `cat list`; do vsnp3_download_fasta_gbk_gff_by_acc.py -a $i -f; done
+# Download FASTA files using vSNP3
+while read i; do
+    vsnp3_download_fasta_gbk_gff_by_acc.py -a $i -f
+done < accession_list.txt
 ```
 
-vsnp3 available from github
-```
-cd ~; git clone https://github.com/USDA-VS/vsnp3.git
-```
+Note: Ensure you have vSNP3 installed. If not, you can install it following these [instructions](https://github.com/USDA-VS/vSNP3).
 
-Building Mashtree, kSNP and Kraken in their own conda environments ensures installation dependencies do not conflict.  Scripts provided in the cloned vsnp3 repo above are needed since conda environments are independent.
+## Mashtree
 
-# Mashtree
+Mashtree is a rapid method for creating phylogenetic trees based on MinHash distances.
 
-Create conda environment
+### Installation and Usage
 
-```
+```bash
+# Create and activate a conda environment for Mashtree
 conda create -n mashtree -c conda-forge -c bioconda mashtree
-```
-```
 conda activate mashtree
-```
-Change to directory with test files
 
-```
+# Navigate to the directory with test files
 cd ~/tree_test
-```
-Build tree from FASTAs
 
-```
+# Build a tree from FASTA files
 mashtree --sketch-size 1000000 --numcpus 4 *.fasta > mashtree.tre
 ```
 
+## kSNP
 
-# kSNP
+kSNP is a SNP-based approach to phylogenetic tree construction that doesn't require genome alignment or a reference genome.
 
-As of late 2023 kSNP needs to be download from [sourceforge](https://sourceforge.net/projects/ksnp/files/).
+### Installation
 
-There is a new version of kSNP as of 2023, kSNP4.1.
+As of late 2023, kSNP4.1 needs to be downloaded from [SourceForge](https://sourceforge.net/projects/ksnp/files/).
 
-Choose the prebuild binary for your environment, download and unzip.
+1. Download the prebuilt binary for your environment.
+2. Unzip the file and place it in your desired location (e.g., `${HOME}`).
+3. Add kSNP to your PATH:
+   ```bash
+   echo 'export PATH="${HOME}/kSNP4/kSNP4.1pkg:$PATH"' >> ~/.zshrc
+   source ~/.zshrc
+   ```
 
-Place unzipped file in desired location (${HOME} will work)
+### Usage
 
-Add to PATH, `PATH="${HOME}/kSNP4/kSNP4.1pkg":$PATH`
-
-Change directory to location of FASTA files
+```bash
+# Navigate to the directory with FASTA files
+cd ~/tree_test
 
-```
+# Prepare input file
 MakeKSNP4infile -indir ./ -outfile myInfile S
-```
-```
-Kchooser4 -in myInfile
-```
-```
-kSNP4 -in myInfile -outdir run -CPU 8 -k 21 -core -ML -min_frac 0.8
-```
 
-# Kraken/Krona
-
-Create conda environment
+# Choose optimal k-mer size
+Kchooser4 -in myInfile
 
-```
-conda create -n kraken -c conda-forge -c bioconda kraken2 krona krakentools wget pandas pigz
+# Run kSNP
+kSNP4 -in myInfile -outdir ksnp_run -CPU 8 -k 21 -core -ML -min_frac 0.8
 ```
 
-After the conda install it will provide additional setup instructions for these programs.
+## Kraken/Krona
 
-[Download](https://benlangmead.github.io/aws-indexes/k2) Kraken database.
+Kraken is a system for ultrafast metagenomic sequence classification using exact k-mer matches. Krona provides interactive visualization of the results.
 
-There are many Databases to choose from.  If unsure and download speeds allow try the standard database.  If a smaller database is necessary Standard-8 may be a good option.  Look at site for exact database naming.
+### Installation
 
-Example download
-```
-cd ~; wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240112.tar.gz
-```
-```
-mkdir k2_standard_08gb; tar -xzf k2_standard_08gb_*.tar.gz -C k2_standard_08gb
-```
+```bash
+# Create and activate a conda environment for Kraken
+conda create -n kraken -c conda-forge -c bioconda kraken2 krona krakentools wget pandas pigz
+conda activate kraken
 
-If needed link database to conda environment and download taxonomy.
+# Download Kraken database (example using standard-8 database)
+cd ~
+wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240112.tar.gz
+mkdir k2_standard_08gb
+tar -xzf k2_standard_08gb_*.tar.gz -C k2_standard_08gb
 
-```
+# Link database and update taxonomy (adjust paths as needed)
 rm -rf ${HOME}/anaconda3/envs/kraken/opt/krona/taxonomy
 ln -s ${HOME}/k2_standard_08gb ${HOME}/anaconda3/envs/kraken/opt/krona/taxonomy
 ktUpdateTaxonomy.sh
 ```
 
-Just an Example.  Supply your specific path to wrapper.
-```
-~/anaconda3/envs/vsnp3/bin/vsnp3_kraken2_wrapper.py -r1 SRR6046640_R1.fastq.gz -r2 SRR6046640_R2.fastq.gz --database ~/k2_standard_08gb
+Additional prebuilt Kraken Databases available [here](https://benlangmead.github.io/aws-indexes/k2)
+
+### Usage
+
+Here's an example using a wrapper script (adjust the path to your specific location):
+
+```bash
+./vsnp3/bin/vsnp3_kraken2_wrapper.py -r1 SRR6046640_R1.fastq.gz -r2 SRR6046640_R2.fastq.gz --database ~/k2_standard_08gb
 ```
 
 ## SRA Tools
+
+SRA Tools allow you to access data from the NCBI Sequence Read Archive.
+
+### Installation
+
+```bash
+conda create -n sra-tools -c conda-forge -c bioconda sra-tools
+conda activate sra-tools
 ```
-conda create -n sra-tools -c conda-forge -c bioconda -n sra-tools
-```
-```
+
+### Usage
+
+#### Basic Usage
+
+```bash
+# Download and split FASTQ files
 fasterq-dump --split-files -O . SRR26282520
-```
-```
-wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR6046640/SRR6046640 
-```
-```
+
+# Alternative method
+wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR6046640/SRR6046640
 fastq-dump --split-files SRR6046640
 ```
-### macOS
-```
+
+#### Platform-Specific Instructions
+
+##### macOS
+
+If you've downloaded the SRA Toolkit directly:
+
+```bash
 ~/sratoolkit.3.0.7-mac64/bin/fasterq-dump -S SRR6046640
 ```
-### Docker
-Download Docker.  It must be running.
-```
+
+##### Docker
+
+Ensure Docker is installed and running, then:
+
+```bash
 docker pull ncbi/sra-tools
 docker run -t --rm -v $PWD:/output:rw -w /output ncbi/sra-tools fasterq-dump -e 2 -p SRR6046640
 ```
-### Singularity
-```
+
+##### Singularity
+
+```bash
 singularity pull docker://ncbi/sra-tools
 singularity run sra-tools_latest.sif fasterq-dump -e 2 -p SRR6046640
 ```
+
+## Conclusion
+
+These tools form a powerful suite for genomic analysis, complementing vSNP3 and each other. By mastering Mashtree, kSNP, Kraken/Krona, and SRA Tools, you'll be well-equipped to handle a wide range of genomic analysis tasks efficiently.
+
+Remember to always check for the latest versions and updates of these tools, as bioinformatics software evolves rapidly.
+
+For more detailed information on each tool, please refer to their respective documentation:
+
+- [Mashtree GitHub](https://github.com/lskatz/mashtree)
+- [kSNP Documentation](https://sourceforge.net/projects/ksnp/files/)
+- [Kraken2 Manual](https://github.com/DerrickWood/kraken2/wiki/Manual)
+- [SRA Tools Documentation](https://github.com/ncbi/sra-tools/wiki)
+
+Happy analyzing!
diff --git a/docs/instructions/conda_instructions.md b/docs/instructions/conda_instructions.md
@@ -1,44 +1,54 @@
-# Anaconda Installation
+# Miniconda Installation
 
-Linux environment is needed to install and use the [Anaconda package manager](https://www.anaconda.com/products/distribution).  
+A Linux environment is needed to install and use [Miniconda](https://docs.conda.io/en/latest/miniconda.html), a minimal installer for conda.
 
-`wget` links below are only for example.  One should check for updated distributions.
+`wget` links below are for example. Always check for the latest distributions on the official Miniconda website.
 
-If using a Mac download the Mac distribution, Mac 64-Bit Command Line installer
+If using a Mac, download the Mac distribution:
 
 ```
-wget https://repo.anaconda.com/archive/Anaconda3-2022.05-MacOSX-x86_64.sh
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
 ```
 
-If using WSL, download the Linux 64-Bit Installer
+If using WSL or Linux, download the Linux 64-Bit Installer:
 
 ```
-wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
 ```
 
-Install Anaconda using the downloaded file.
+Install Miniconda using the downloaded file:
 
 ```
-bash ./Anaconda3-2022.05-*-x86_64.sh
+bash Miniconda3-latest-*-x86_64.sh
 ```
 
-Press `Enter` to review agreement.  Exit agreement, `q`.  Accept terms, `yes`.  Press enter to install in default home directory.  After installation agree to `conda init`.
+Follow the prompts:
+1. Press `Enter` to review the license agreement.
+2. Press `q` to exit the agreement view.
+3. Type `yes` to accept the terms.
+4. Press `Enter` to confirm the default installation location or enter a custom path.
+5. When asked if you wish to initialize Miniconda3, type `yes`.
 
-Close and reopen terminal.
+Close and reopen your terminal for the changes to take effect.
 
-# Anaconda Environment
+# Conda Environment
 
-Do not install packages in base.  Instead make an environment.  
+It's best practice not to install packages in the base environment. Instead, create a new environment for your project.
 
-Summary of commands to [manage environments](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
+Summary of commands to [manage environments](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html):
 
-Create new environment
+Create a new environment:
 ```
 conda create --name myenv
 ```
+
+Activate the environment:
 ```
 conda activate myenv
 ```
-For vSNP3 see [README](../../README.md)
 
-[Additional Tools](../../docs/instructions/additional_tools.md)
+For vSNP3, please refer to the [README](../../README.md).
+
+For information on additional tools, see [Additional Tools](../../docs/instructions/additional_tools.md).
+
+Remember, Miniconda provides a minimal set of packages. If you need additional packages, you can install them using `conda install` within your activated environment.