quandico

Abstract

The tool quandico applies statistical methods to detect copy number variations (CNV) in a sample by comparing the read counts from next generation sequencing (NGS) performed after PCR-enrichment of regions of interest, typically a set of genes with known or expected relevance for the sample, e.g. genes that play a role in cancer. Counts from a normal control (ideally matched normal from the same individual, e.g. healthy tissue) are required.

A detailed description can be found in the publication of this tool: Quantitative analysis of differences in copy numbers using read depth obtained from PCR-enriched samples and controls: BMC Bioinformatics 2015, 16:17.

Implementation

The main script is written in R and there are two helper scripts and one main driver script written in Perl.

Workflow

PCR-enrichment of target regions, sequencing and read-mapping (SAM/BAM file)
Extract counts for every PCR amplicon (qgetcounts)
Cluster read-counts into regions (qcluster)
Compare counts per cluster and perform CNV calling (quandico)

Steps 2, 3 and 4 can be performed with one single command line call of the Perl driver script quandico. Step 4 alone can be performed inside R.

Requirements

Running quandico requires R with some commonly available packages from CRAN. Please visit The R-project homepage for advice how to install R on your system. Dependencies for quandico should be installed automatically when installing the package.

Accessory applications such as a command-line driver script (quandico), the script to extract counts of mapped reads (qgetcounts) and the region clustering script (qcluster) require Perl. Perl is already installed on most Linux systems, please check Perl.org for details. If you are using Windows we recommend Strawberry Perl. All dependencies should automatically be installed from CPAN.

Installation

Fast

Download the install.sh shell script for Unix/Linux. Check the configuration in the first section of that script, adjust parameters and then start it in a clean directory, e.g.:

$ mkdir quandico
$ cd quandico
$ wget https://github.com/reineckef/quandico/raw/master/install.sh
$ sh install.sh

This is not well tested and may fail on various systems for various reasons.

Detailed

Install these R packages, either by using R> install.packages('name') inside an R session or by running the install command from a command line: $ R CMD INSTALL 'name':

MASS
ggplot2
grid
gridExtra
naturalsort
scales

The string n.m will be used for the version number (major.minor) of quandico. This was 1.14 by the time of writing.

Download the latest release and install the packaged R code of quandico (file quandico_n.m.tar.gz):

$ R CMD INSTALL quandico_n.m.tar.gz

optional but recommended

Download the latest release and install the Perl module QUANDICO (file QUANDICO-vn.m.tar.gz) using make on Linux, and dmake on Windows:

$ tar xvfz QUANDICO-vn.m.tar.gz
$ cd QUANDICO-vn.m
$ perl Makefile.PL
$ [d]make
$ [d]make test
$ [d]make install # run this as sudo/root for system install

Alternatively, you can use cpanm from App::cpanminus to install to module directly from the downloaded archive. We recommend to use --verbose mode:

$ cpanm --verbose QUANDICO-vn.m.tar.gz # run this as sudo/root for system install

To start from mapped reads in SAM/BAM format, the Perl script qgetcounts will call samtools (version 1.1 or later) to extract the counts. Therefore, samtools needs to be installed, please visit Samtools for advice.

Example Data

Example data files to test the tools are available on Google drive. To start from mapped reads (in SAM or BAM files, please note that this step additionally requires samtools version >= 1.1).

CNA902Y.bed amplicon coordinates
M62_NA13019.bam example case (sample) ⚠️ 252 MB !
M62_NA12878.bam example control (reference) ⚠️ 222 MB !

Please note, these files can also be downloaded using wget (sometimes these links do not work, for unclear reason):

$ wget --no-check-certificate https://googledrive.com/host/0BzLnl09R3GITQjBUZUFVcy1BNFk -O CNA902Y.bed
$ wget --no-check-certificate https://googledrive.com/host/0BzLnl09R3GITMzNyakhveTh3UVE -O M62_NA13019.bam
$ wget --no-check-certificate https://googledrive.com/host/0BzLnl09R3GITSnU1TlVRSjRXRHM -O M62_NA12878.bam

To skip count extraction and start with clustering, the extracted counts of these samples are also available:

M62_NA13019.tsv control counts (sample)
M62_NA12878.tsv control counts (reference)

Please note, these files can also be downloaded using wget (sometimes these links do not work, for unclear reason):

$ wget --no-check-certificate https://googledrive.com/host/0BzLnl09R3GITUDZ0aXFBd2pDR0k -O M62_NA13019.tsv
$ wget --no-check-certificate https://googledrive.com/host/0BzLnl09R3GITWU9xTndtZE5iOEE -O M62_NA12878.tsv

A file with gene names and coordinates is required if clusters should be named using gene names. For human assemblies GRCh37 (hg19) and GRCh38 (hg38), these file can be downloaded from UCSC:

refGene.txt.gz assembly hg19 (5 MB)
refGene.txt.gz assembly hg38 (5 MB)

Clustered count files ready for processing with the final step are these:

Please note, these files can also be downloaded using wget (sometimes these links do not work, for unclear reason):

$ wget --no-check-certificate https://googledrive.com/host/0BzLnl09R3GITYTduNDY4azJNZXM -O M62_NA13019.clustered
$ wget --no-check-certificate https://googledrive.com/host/0BzLnl09R3GITWm1FS0duczVlejQ -O M62_NA12878.clustered

Running

You can run all steps separately:

# extract counts
$ qgetcounts -i M62_NA13019.bam -a CNA902Y.bed > M62_NA13019.tsv
$ qgetcounts -i M62_NA12878.bam -a CNA902Y.bed > M62_NA12878.tsv

# cluster the counts
$ qcluster -i M62_NA13019.tsv [--names refGene.txt] > M62_NA13019.clustered
$ qcluster -i M62_NA12878.tsv [--names refGene.txt] > M62_NA12878.clustered

# call copy numbers
$ quandico --no-cluster \
   -s data=M62_NA13019.clustered \   # file with clustered counts
   -r data=M62_NA12878.clustered \   # file with clustered counts
   -s x=2 -s y=0 -r x=2 -r y=0   \   # sample (-s) and reference (-r) are female
   [--cp names=refGene.txt]          # optional for naming of clusters

Alternatively, all this can be done using one single command:

$ quandico -s map=M62_NA13019.bam -s x=2 -s y=0 \ # sample
           -r map=M62_NA12878.bam -r x=2 -r y=0 \ # reference
           -a CNA902Y.bed                       \ # amplicons
           -d results -b 13019_vs_12878         \ # output location and name
           [--cp names=refGene.txt]               # optional cluster names

Example Output

In the folder 'ExampleOutput' the result files obtained by running the above mentioned steps with sample NA13019 and reference NA12878 from sequencing run M62 are presented. These are:

NA13019_overview.pdf an overview for all clusters on one page
NA13019.pdf a detailed plot for every cluster
NA13019.vcf VCF of called variants
NA13019.csv the called variants in tabular format

An image from one page (number 19) from the detailed PDF shows a loss (top) and a gain (bottom) event in that sample on chromsome X:

Help

The Perl command-line tools present a help and usage information when run without arguments (or by explicitly using -h or --help).

The R function provides help (after installation) by doing:

R> library(quandico)
R> ?quandico

Issues

Please report bugs to the author Frank Reinecke or file them on quandico's home on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Data		Data
ExampleOutput		ExampleOutput
PerlModule		PerlModule
RPackage		RPackage
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quandico

Abstract

Implementation

Workflow

Requirements

Installation

Fast

Detailed

Example Data

Running

Example Output

Help

Issues

About

Releases

Packages

Languages

License

jingquanlim/quandico

Folders and files

Latest commit

History

Repository files navigation

quandico

Abstract

Implementation

Workflow

Requirements

Installation

Fast

Detailed

Example Data

Running

Example Output

Help

Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages