Metabarcoding is the barcoding of DNA/RNA (or eDNA/eRNA) in a manner that allows for the simultaneous identification of many taxa within the same sample. The main difference between barcoding and metabarcoding is that metabarcoding does not focus on one specific organism, but instead aims to determine species composition within a sample.
Here I present a Bioinformatics Metabarcoding analysis pipeline, starting from raw PE fastq data, using DADA2 and qiime2. All scripts should be run after a quality control check. I recommend fastqc and multiqc.
Be sure to replace the variables with your own variables of interest.
This analysis was run on a Slurm HPC.
Use of conda and singularity is simply for convenience.
singularity pull docker://quay.io/qiime2/amplicon:2023.9
wget https://raw.githubusercontent.com/qiime2/distributions/dev/latest/passed/qiime2-amplicon-ubuntu-latest-conda.yml
conda env create -n qiime2-dev --file qiime2-amplicon-ubuntu-latest-conda.yml
conda install -c bioconda cutadapt
conda install -c bioconda biom-format
Silva 138 SSURef NR99 full-length sequences and taxonomy to train the classifier are available here:
https://docs.qiime2.org/2023.9/data-resources/
-
CUTADAPT documentation: https://cutadapt.readthedocs.io/en/stable/
-
DADA2 documentation: https://www.bioconductor.org/packages/release/bioc/manuals/dada2/man/dada2.pdf
-
QIIME2 documentation: https://docs.qiime2.org/2023.9/
A bash script in order to remove primer, using cutadapt
A bash script in order to remove adapter, using cutadapt
A bash script in order to import files into a qiime artifact (.qza file), to work easily and faster on fastq files
A bash script in order to do denoising using DADA2, output are Amplicon Sequence Variants (better than OTUs as it is said in literature)
A bash script to extract reference reads from SILVA database using PCR primers
A bash script to train a Naive Bayes classifier. Output is a classifier.qza
A bash script to test the previously trained classifier on our data
A bash script to:
-
export taxa barplot
-
include only sequence classified at the phylum level
-
filter out chloroplast sequence
-
export taxonomic counts at all level
-
collapse groups of features that have the same taxonomic assignment through the specified level
-
convert tables to .tsv
A bash script to generate tree for phylogenetic diversity analysis
A bash script including core metrics method, which rarefies a feature table to a user-specified depth, computes qiime2 default alpha and beta diversity metrics, and generates PCoA plots using Emperor for each of the beta diversity metrics
A bash script to calculate metrics that are not the default, such as CHAO1, simpson, ACE.
A bash script to do alpha group significance analysis
A bash script to do beta group significance analysis