-
Notifications
You must be signed in to change notification settings - Fork 22
How to setup ngsane
NGSANE is a lightweight framework designed for the use within a High Performance Compute (HPC) cluster environment. While it can be used on a desktop machine the effort to setup and compile the different bioinformatics software suites provides the best payoff on a big HPC cluster. NGSANE has been tested on
- CentOS release 6.2 on a 1300 CPU cluster
- SuSE Linux Enterprise Server 11.2 on a 1752 core cluster (48GB-92GB memory nodes)
- SuSE Linux Enterprise Server 11.2 on a SGI UV1000 large shared memory multiprocessor 512 virtual CPU (64GB memory per node)
- your system?
Depending on the use case the following software and data sets have to be installed before using ngsane:
- bioinformatics software for NGS processing
- reference and annotation data (e.g. reference genomes)
- module - command interface to the Modules package (optional)
Different pipelines in NGSANE leverage different software packages. These packages need to be compiled on the target system prior to using the NGSANE framework. A list of software required for the different pipelines is shown below. An example of how to install the necessary software under Ubuntu is in the install-script.
Reference genomes and annotation data are usually shared between users/research groups and should be put at a place that is accessible to anybody. Illumina iGenomes are a ready-To-Use Reference Sequences and Annotations files for various model organisms. You need to arrange everything in a common folder, e.g. like so
user@host:/iGenomes/UCSC_hg19/Homo_sapiens/UCSC/hg19> for i in $(ls */*/genome.* */WholeGenomeFasta/genome.dict ); do ln -sf $i $(basename $i) ; done
However, other annotations are required for specific analysis tasks (e.g. bam annotation), see the example annotation from the smokebox.
The modules package makes dynamic modification of the user's environment easy. It is recommended to create module files for any bioinformatic software.
NGSANE needs to know, which software (modules) to use for the different pipelines. In the conf/sample_header.sh script can be used as a template to specify a default set of software for any given pipeline.
- copy the conf/sample_header.sh to conf/header.sh
- fill in the module to use for each pipeline
- adjust the default resources for each pipeline as required.
The lists the software required to run pipelines in NGSANE. NGSANE has been tested with the indicated versions. Other versions may work as well. See list of supported software for the different version of NGSANE.
[Supported Software NGSANE v0.5.x](Supported Software v0.5.x)
[Supported Software NGSANE v0.4.x](Supported Software v0.4.x)
- go to the smokebox in the NGSANE folder
-bash-4.1$ cd $ngsane_root/smokebox
-bash-4.1$ tree .
- download the reference data and untar it
-bash-4.1$ mkdir referenceData && cd referenceData
-bash-4.1$ tar -xvzf referenceData.tgz)
-bash-4.1$ tree .
- have the the NGSANE version you want to test in the path e.g.
-bash-4.1$ module load ngsane
- if you are on a hpc (qsub) run make to submit available analysis in the smokebox
-bash-4.1$ make
if you are on a local machine run make with the direct
parameter to execute everything sequentially (this may take a while).
-bash-4.1$ make SB_MODE="direct"
- once all jobs have finished (check with
qstat
)
-bash-4.1$ make test
PASS result/diffCHIPSEQ.txt
PASS result/diffTOPHATCUFFHTSEQ.txt
PASS result/diffVARCALLS.txt