===============================================================================
A Tool for Robust Annotation of Topologically Associating Domain (TAD) Boundaries.
===============================================================================
RobusTAD calculates TAD boundary scores for every bin in the genome based on an interaction frequency matrix from Hi-C data. It also calls significant TAD boundaries.
The only required input for RobusTAD is an Interaction frequency matrix for a given chromosome
File properties:
- Tab-separated
- Square matrix: number of rows should be equal to number of columns
- Can contain raw or normalized counts
- The file can have column and row names or not at all
An example input file is included in Extras.
Example:
Option1
chr1-0 chr1-50 chr1-100 chr1-150 chr1-200
chr1-0 20 1 4 8 1
chr1-50 5 25 8 2 0
chr1-100 4 7 18 3 6
chr1-150 3 2 7 30 8
chr1-200 5 2 1 9 27
Option2
20 1 4 8 1
5 25 8 2 0
4 7 18 3 6
3 2 7 30 8
5 2 1 9 27
RobusTAD outputs 2 files:
- Boundary Scores: BoundaryScores_*.txt
- Significant Boundaries: TADBoundaryCalls_*.txt
BoundaryScores_*.txt contains the Right, Left and Final scores for all bins in the provided IF matrix.
The LeftBoundaryScore for bin b captures the evidence that there is a TAD starting between bins b-1 and b. The RightBoundaryScore for bin b describes the evidence that there is a TAD ending between bins b and b+1.
The Final TADscore is an integration of both Right and Left scores (max(R, L)). While this is more convenient than comparing Right and Left scores across samples, it should be understood that some details are lost through this simplification. For best results, use the Right and Left scores separately.
Example format:
coordinates LeftBoundaryScore RightBoundaryScore TADscore
chr1-0 -0.27130659072985 -0.241287485278218 -0.241287485278218
chr1-50 -0.27130659072985 -0.241287485278218 -0.241287485278218
chr1-100 -0.27130659072985 -0.241287485278218 -0.241287485278218
chr1-150 -0.27130659072985 -0.241287485278218 -0.241287485278218
TADBoundaryCalls_*.txt contains the Right, Left and Final scores for bins in the provided IF matrix that are called as TAD boundaries. Calls are made based on locating peaks in the boundary score profile that are above the set threshold.
Example format:
coordinates LeftBoundaryScore RightBoundaryScore TADscore
chr1-100 -0.27130659072985 -0.241287485278218 -0.241287485278218
Example output files are included in Extras.
RobusTAD is written in R; a working R environment should be available.
RobusTAD also requires the "optparse" library to be able to parse command line options. You can install it in R using:
install.packages("optparse", repos="http://cran.us.r-project.org")
Rscript RobusTAD.R -i InputMatrix [options]
An example IF matrix is included in Extras. You can download it and test RobusTAD using:
Rscript RobusTAD.R -i Extras/IFmatrix_GM12878_Rao_Mbo_Chr20_50kb.txt
Example output files are also included in Extras.
==============================================================================================================================================================
RobusTAD calculates TAD Boundary scores for each bin on a chromosome.
Input: interaction frequency matrix for a chromosome.
Output: 2 files:
I- file with TAD boundary scores (BoundaryScores_*): contains Right Boundary scores, Left Boundary scores and Final combined score (max(R, L)).
II- file with TAD boundary calls (TADBoundaryCalls_*) identified by looking for local maxima above the set threshod.
Usage: RobusTAD.R -i InputMatrix [options]
==============================================================================================================================================================
Options:
-i INPUT, --input=INPUT
Interaction Frequency Matrix. Must be a square matrix: number of columns = number of rows
-H, --header
include -H if input contains a header/column names
-n NORM, --norm=NORM
indicates if IF matrix is raw or normalized [default = raw]; [options: {raw, norm}]
-o OUTDIR, --outDir=OUTDIR
output directory name
-b BINSIZE, --binsize=BINSIZE
binsize or resolution used in Hi-C analysis in kb [default = 50]
-r MINRATIO, --minRatio=MINRATIO
minimum ratio of Within to Across IF values to contribute to boundary score calculation [default = 1.5]
-w MINWIN, --minWin=MINWIN
minimum window around the bin used to calculate the TAD score in kb [default = 250]
-W MAXWIN, --maxWin=MAXWIN
maximum window around the bin used to calculate the TAD score in kb [default = 500]
-T THRESHOLD, --threshold=THRESHOLD
data percentile of TAD scores used to calculate threshold in order to call significant TAD boundaries. [default = 0.2]; [options: 0-1];
the lower the threshold, the more stringent the TAD calls
-h, --help
Show this help message and exit
==============================================================================================================================================================
RobusTAD is available under a GPL liscence and comes with no warranties @ https://github.com/rdali/RobusTAD
==============================================================================================================================================================
RobusTAD is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation.
RobusTAD is distributed in the hopes that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.