Skip to content

dixonlab/hic_breakfinder

Repository files navigation

PREREQUISITES:

For hic_breakfinder to run, you will need to have the eigen c++ library and the bamtools libraries installed.

Eigen can be found here:
http://eigen.tuxfamily.org/index.php?title=Main_Page

and bamtools can be found here:
https://github.com/pezmaster31/bamtools

After installation, make sure to add the bamtools libraries to your LD_LIBRARY_PATH.
For instance, if you are running bash, you can modify the .bashrc file in your home directory to include the line:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/bamtools/lib



INSTALLATION:

To install, run:

./configure
make
make install

If you want to install somewhere that isn't "/usr/local/bin/" run:
./configure --prefix=/path/to/install/directory

If your version of eigen is installed in a non-traditional location, you will need to run configure as follows:
./configure CPPFLAGS="-I /path/to/eigen"

If your version of bamtools is installed in a non-traditional location, you will need to run configure as follows:
./configure CPPFLAGS="-I /path/to/bamtools/include" LDFLAGS="-L/path/to/bamtools/lib/"

If both eigen and bamtools are installed in non-traditional locations, you willn eed to run the configure file as follows:
./configure CPPFLAGS="-I /path/to/bamtools/include -I /path/to/eigen" LDFLAGS="-L/path/to/bamtools/lib/"

After installation, the hic_breakfinder executable should be stored in the bin directory.




RUNNING hic_breakfinder:

To run, simply type:
./hic_breakfinder

It will require 3 input files, a bam file, an inter-chromosomal expectation file, and an intra-chromosomal expectation file. We strongly 
recommend using the b38 build of the human genome for using Hi-C to identify structural variants.  

A note on the bam file: hic_breakfinder will expect to find pair information for each read in the bam file. It will also expect that the data will have already been filtered to remove low quality alignments. We have listed several scripts that can be used to align/post-process Hi-C data in the following repository: https://github.com/dixonlab/bwa_mem_hic_aligner

We have provided these expectation files in hg38 coordinates for users.  These are in the associated files link below.

hic_breakfinder has the option of being run at 1kb final resolution.  For this, add the option --min-1kb when executing the command.  
For 1kb resolution, the Hi-C experiment should be performed with frequent cutting enzymes like MboI and DpnII.  Using enzymes that cut
less frequently than 1kb (i.e. HindIII or NcoI) will lead to problems if trying to identify breaks at 1kb.  Further, if the library has 
high sequence coverage (>100 million reads), using --min-1kb can substantially slow down the run time.

If you want to check that the software is running correctly, we have a bam file from K562 and an associated example_output.txt file 
showing the results of running hic_breakfinder using the --min-1kb command on this bam file in the associated files link.

For any questions/comments/concerns, please email [email protected]

Associated files:
https://salkinstitute.box.com/s/m8oyv2ypf8o3kcdsybzcmrpg032xnrgx

Description of the output files:
Hi-C breakfinder will produce lists of structural variant predictions at different resolutions (bin sizes). The final output file will be
named $name.breaks.txt (where $name is the parameter that is given using the --name argument). It will also produce a series of
intermediate calls at different resolutions (1Mb, 100kb, 10kb, optional 1kb for inter-chromosomal events, 100kb, 10kb, optional 1kb for
interchromosomal events). These files are labelled as $name\_10kb.breaks.txt or $name\_10kb_intra.breaks.txt (for the case of the 10kb
calls). 

The first line in the example_output.txt file looks like this:
90.3714	chr13	93252000	93363000	-	chr9	131176000	131280000	+	1kb

Hi-C breakfinder aims to find sub-matrices of the original matrix that are most likely containing a structural rearrangement. Therefore, each prediction represents a sub-matrix, reporting the positions of the column and row coordinates of the sub-matrix.

Here we will describe the meaning of column of the line:
1 - Log-odds score of the rearrangement call. This is can be thought of as the "strength" of the call.
2 - column chromosome
3 - column start
4 - column end
5 - column strand
6 - row chromosome
7 - row start
8 - row end
9 - row strand
10 - resolution of the call (minimal bin size for which this call is made).

The strand predictions are mean to be estimates of whether the rearrangement breakpoint is at a given edge of the sub-matrix. A "+" value means that we predict the "end" coordinate to be closests to the actually breakpoint, and a "-" value indicates we believe the "start" coordinate is closest to the breakpoint.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published