https://github.com/TGenNorth/ASAP/tree/public/INSTALL
The Amplicon Sequencing Analysis Pipeline (ASAP) is a highly customizable, automated way to examine amplicon sequencing data. The important details of the amplicon targets are described in a text-based input file written in JavaScript Object Notation (JSON) [1]. This data includes the target name, amplicon sequence (or sequences in the case of gene variant assays), any known SNPs or regions of interest (ROIs) within the target, and what the presence of this target or SNP signifies. This file can be hand-generated or created from an Excel spreadsheet using a provided template and Python script. The sequenced reads are processed by performing adapter, and optionally, quality trimming using Trimmomatic [2], and then aligned to the reference amplicon sequences extracted from the JSON file using one of several alignment packages (BWA-MEM [3], bowtie2 [4], and NovoAlign [5] are currently supported). The resulting BAM [6] files are analyzed with a custom Python script using the pysam [7] and scikit-bio [8] libraries to aid in analysis. This script combines the alignment data in the BAM file with the assay data in the JSON file and interprets the results. The output is an XML file with complete details for each assay against each sample. These details include number of reads aligning to each target, any SNPs found above a user-defined threshold, and the nucleotide distribution at each of these SNP positions. For ROI assays, the output includes the sequence distribution at each of the regions of interest -- both the DNA sequences and translated into amino acid sequences. Also, each assay target is assigned a significance if it meets the requirements laid out in the JSON file (i.e. a particular SNP or amino acid change is present) To make this output easier for the user to interpret, a number of XSLT [9] stylesheets are provided for transforming the XML output into other, more readable formats, including Excel spreadsheets, web pages, and PDF documents. Additionally, the use of XSLT stylesheets allows for multiple different views of the same data, from clinical summaries showing only the most important or relevant results to full researcher summaries containing all of the data.
0) ASAP Help
Can be generated from Excel spreadsheet template, or for simple cases, directly from multifasta file.
typical usage: asap -h
full usage: asap [-h] <subfunction> [-h]
asap.cmdParser -- Handles all the cmd line options for the various parts of asap
- optional arguments:
-h, --help show this help message and exit -V, --version show program's version number and exit
1) Generating JSON File
Can be generated from Excel spreadsheet template, or for simple cases, directly from multifasta file.
typical usage: asap prepareJSONInput -x <EXCEL_FILE> -o <OUTPUT_JSON_FILE>
full usage: asap prepareJSONInput [-h] (-f FILE | -x FILE) -o FILE [-w WORKSHEET] [-V]
asap.prepareJSONInput -- Create a JSON input file for ASAP from a multifasta or Excel spreadsheet
- optional arguments:
-h, --help show this help message and exit -w WORKSHEET, --worksheet WORKSHEET Excel worksheet to use, the first one in the file will be used if not specified -V, --version show program's version number and exit - required arguments:
-f FILE, --fasta FILE fasta file containing amplicon sequences. -x FILE, --excel FILE Excel file of assay data. -o FILE, --out FILE output JSON file to write. [REQUIRED]
2) Running ASAP
typical usage: asap analyzeAmplicons -n <RUN_NAME> -j <PATH_TO_JSON_FILE> -r <DIRECTORY_OF_READ_FILES> -o <OUTPUT_DIRECTORY> <other options>
<RUN_NAME>
can be whatever you want, the final output file will be: <OUTPUT_DIRECTORY>/<RUN_NAME>_analysis.xml
You can also change the depth (default 100), proportion (default 0.1), breadth (default 0.8) filters using the -d
, -p
and -b
options
full usage: asap analyzeAmplicons [-h] -n NAME -j JSON [-r DIR | --bam-dir DIR] [-o DIR] [-s JOB_MANAGER] [--submitter-args ARGS] [--smor] [--trim | --no-trim] [-s ADAPTERS] [-q [QUAL]] [-m LEN] [-a ALIGNER] [--aligner-args ARGS] [-d DEPTH] [--breadth BREADTH] [-p PROPORTION] [-i PERCID] [-V]
asap.analyzeAmplicons -- Align and interpret amplicon sequencing reads
- optional arguments:
-h, --help show this help message and exit -V, --version show program's version number and exit - required arguments:
-n NAME, --name NAME name for this run. [REQUIRED] -j JSON, --json JSON JSON file of assay descriptions. [REQUIRED] - optional arguments:
-r DIR, --read-dir DIR directory of read files to analyze. --bam-dir DIR directory of bam files to analyze. -o DIR, --out-dir DIR directory to write output files to. [default: pwd] -s JOB_MANAGER, --submitter JOB_MANAGER cluster job submitter to use (PBS, SLURM, SLURM_NO_ARRAY, SGE, TASK, none). [default: SLURM] --submitter-args ARGS additional arguments to pass to the job submitter, enclosed in "". --smor perform SMOR analysis with overlapping reads. [default: False] - read trimming options:
--trim perform adapter trimming on reads. [default: True] --no-trim do not perform adapter trimming. -s ADAPTERS, --adapter-sequences ADAPTERS location of the adapter sequence file to use for trimming. -q QUAL, --qual QUAL perform quality trimming [default: False], optional parameter can be used to customize quality trimming parameters to trimmomatic. [default: SLIDINGWINDOW:5:20] -m LEN, --minlen LEN minimum read length to keep after trimming. [default: 80] - read mapping options:
-a ALIGNER, --aligner ALIGNER aligner to use for read mapping, supports bowtie2, novoalign, and bwa. [default: bowtie2] --aligner-args ARGS additional arguments to pass to the aligner, enclosed in "". -d DEPTH, --depth DEPTH minimum read depth required to consider a position covered. [default: 100] -b BREADTH, --breadth BREADTH minimum breadth of coverage required to consider an amplicon as present. [default: 0.8] -p PROPORTION, --proportion PROPORTION minimum proportion required to call a SNP at a given position. [default: 0.1] -i PERCID, --identity PERCID minimum percent identity required to align a read to a reference amplicon sequence. [default: 0]
This command will ultimately generate the xml file. To convert this into more better things, run:
3) Formatting Output
typical usage asap formatOutput -s <XSLT_FILE> -x <XML_OUTPUT_FILE> -o <MAIN_OUTPUT_FILE_TO_WRITE>
This will generate all the html files, which you can open directly in your web browser. Some xslt files are available in the output_transforms
directory.
full usage: asap formatOutput [-h] -s FILE -x FILE [-o FILE] [-t] [-V]
asap.formatOutput -- Apply an XSLT transformation on the XML output to generate a more user-friendly output
- optional arguments:
-h, --help show this help message and exit -t, --text output plain text -V, --version show program's version number and exit - required arguments:
-s FILE, --stylesheet FILE XSLT stylesheet to use for transforming the output. [REQUIRED] -x FILE, --xml FILE XML output file to transform. [REQUIRED] -o FILE, --out FILE output file to write. [REQUIRED]
For information about external tools that are required, or can be utilized, and those versions that have been tested to work with ASAP, refer to the included "INSTALL" document.
Copyright © The Translational Genomics Research Institute See the included "LICENSE" document.
Darrin Lemmer ([email protected]) | TGen North | 3051 W Shamrell Blvd Ste 106 | Flagstaff, AZ 86001-9435
[1] | JSON: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf |
[2] | Trimmomatic: Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170. |
[3] | BWA-MEM: http://bio-bwa.sourceforge.net - There’s a publication for BWA-SW, and BWA short read aligner, but not for BWA-MEM. Maybe the short read aligner paper should be referenced here? The details are at this link. |
[4] | Bowtie2: Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359. |
[5] | NovoAlign: http://www.novocraft.com - seems there should be a better reference, but I haven’t found one. |
[6] | SAM format/SAMtools: Li, Heng et al. “The Sequence Alignment/Map Format and SAMtools.” Bioinformatics 25.16 (2009): 2078–2079. PMC. Web. 9 Nov. 2015. |
[7] | Pysam: https://github.com/pysam-developers/pysam |
[8] | Scikit-bio: http://scikit-bio.org |
[9] | XSLT: http://www.w3.org/TR/xslt20/ |