Skip to content

How to setup ngsane

Gurado edited this page Oct 28, 2014 · 19 revisions

NGSANE is a lightweight framework designed for the use within a High Performance Compute (HPC) cluster environment. While it can be used on a desktop machine the effort to setup and compile the different bioinformatics software suites provides the best payoff on a big HPC cluster. NGSANE has been tested on

  • CentOS release 6.2 on a 1300 CPU cluster
  • SuSE Linux Enterprise Server 11.2 on a 1752 core cluster (48GB-92GB memory nodes)
  • SuSE Linux Enterprise Server 11.2 on a SGI UV1000 large shared memory multiprocessor 512 virtual CPU (64GB memory per node)
  • your system?

Pre-requisites:

Depending on the use case the following software and data sets have to be installed before using ngsane:

  1. bioinformatics software for NGS processing
  2. reference and annotation data (e.g. reference genomes)
  3. module - command interface to the Modules package (optional)

1. bioinformatics software

Different pipelines in NGSANE leverage different software packages. These packages need to be compiled on the target system prior to using the NGSANE framework. A list of software required for the different pipelines is shown below. An example of how to install the necessary software under Ubuntu is in the install-script.

2. reference and annotation data

Reference genomes and annotation data are usually shared between users/research groups and should be put at a place that is accessible to anybody. Illumina iGenomes are a ready-To-Use Reference Sequences and Annotations files for various model organisms. You need to arrange everything in a common folder, e.g. like so

user@host:/iGenomes/UCSC_hg19/Homo_sapiens/UCSC/hg19> for i in $(ls */*/genome.* */WholeGenomeFasta/genome.dict ); do ln -sf $i $(basename $i) ; done

However, other annotations are required for specific analysis tasks (e.g. bam annotation), see the example annotation from the smokebox.

3. modules

The modules package makes dynamic modification of the user's environment easy. It is recommended to create module files for any bioinformatic software.


Configuration

NGSANE needs to know, which software (modules) to use for the different pipelines. In the conf/sample_header.sh script can be used as a template to specify a default set of software for any given pipeline.

  1. copy the conf/sample_header.sh to conf/header.sh
  2. fill in the module to use for each pipeline
  3. adjust the default resources for each pipeline as required.

Pipelines

The lists the software required to run pipelines in NGSANE. NGSANE has been tested with the indicated versions. Other versions may work as well. See list of supported software for the different version of NGSANE.

[Supported Software NGSANE v0.5.x](Supported Software v0.5.x)

[Supported Software NGSANE v0.4.x](Supported Software v0.4.x)

Test NGSANE by running the smokebox

  1. go to the smokebox in the NGSANE folder
-bash-4.1$ cd $ngsane_root/smokebox
-bash-4.1$ tree  .
  1. download the reference data and untar it
-bash-4.1$ mkdir referenceData && cd referenceData
-bash-4.1$ tar -xvzf referenceData.tgz)
-bash-4.1$ tree  .
  1. have the the NGSANE version you want to test in the path e.g.
-bash-4.1$ module load ngsane
  1. if you are on a hpc (qsub) run make to submit available analysis in the smokebox
-bash-4.1$ make

if you are on a local machine run make with the direct parameter to execute everything sequentially (this may take a while).

-bash-4.1$ make SB_MODE="direct"
  1. once all jobs have finished (check with qstat)
-bash-4.1$ make test
PASS result/diffCHIPSEQ.txt
PASS result/diffTOPHATCUFFHTSEQ.txt
PASS result/diffVARCALLS.txt