Skip to content
Brittany N. Lasseigne, PhD edited this page Feb 20, 2017 · 30 revisions

aRNApipe: A balanced, efficient and distributed pipeline for processing RNA-seq data in high performance computing environments

aRNApipe is a project-oriented pipeline for processing of RNA-seq data in high performance cluster environments. The provided framework is highly modular and has been designed to be deployen on HPC environments using IBM Platform LSF, although it can be easily migrated to any other workload manager. The main features of aRNApipe are:

  • Automatization and synchronization of a broad range of RNA-seq primary analyses including quality control metrics, transcript alignment, count generation, fusion identification, variant calling and differential alternative splicing detection.
  • Project-oriented and dynamic approach allowing users to easily update analyses to include or exclude samples or enable additional processing modules
  • Use of centralized parametrization files that guarantees that all the libraries will be processed using the same workflow and the same parametrization
  • Use the power of HPC clusters to distribute the workload across different nodes
  • Independent setting of the computational resources assigned to each processing module to optimize the use of the available resources.
  • Generation of interactive web reports for sample and project tracking
  • Management of genome assemblies available to perform an analysis

Updates (v1.2):

  • Demonstrations of how to build a genome reference and how to run the aRNApipe in different scenarios added to the project wiki site.
  • Complete guide to install aRNApipe dependencies added to the project wiki site.
  • aRNApipe has been updated to work with the most recent versions of STAR (v2.5.2b), STAR-Fusion (v0.8.0) and cutadapt (v1.8.1).
  • aRNApipe now includes support to perform differential alternative splicing detection using jSplice. Sample files can now be expanded with phenotype columns to perform this type of analysis.
  • aRNApipe now includes support to compute insert-size statistics on paired-end RNAseq data using PicardTools.
  • The Spider generates new report sections including Kallisto statistics, insert-size distribution statistics, and links to the reports generated with jSplice.
  • BARCODE header in the samples files (samples.list) has been replaced by SampleID.
  • The configuration file includes new arguments.