- Download the code from this repository, "bin/eHiC/"
- Download the reference files for eHiC (mm10/hg19 genome build)
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/eHiC_HiCorr.tar.gz
tar -xvf eHiC_HiCorr.tar.gz
- Check the preprocessing for eHiC data (mapping, fragments filter, outs are cis and trans 500bp fragment loops)
- Run HiCorr on eHiC data:
bash eHiC.sh eHiC_HiCorr/ bin/eHiC/ <frag_loop.name.cis> <frag_loop.name.trans> <outputname> <hg19/mm10>
# specify the path of downloaded unzipped reference file and scripts
# input two fragment loop files genrated from preprocessing step
# specifiy outputname prefix
# specify genome build, the provided reference only include hg19 and mm10
eHiC mode corrects bias of eHi-C data. It takes two fragment-end-pair files as input (use HiCorr's eHiC-QC mode if you need to generate these files) and outputs an anchor_pair file.
- The two input files: one file contains intra-chromosome looping fragment-end pairs(cis pairs), and another contains inter-chromosome looping fragment-end pairs(trans pairs).
- Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
See sample file here:frag_end_id_1 frag_end_id_2 observed_reads_count distance_between_two_fragments - Inter-chromosome looping piars need to have 3 tab-delimited columns, in the following format:
See sample file here:frag_end_id_1 frag_end_id_2 observed_reads_count - These two files needs to be sorted before you run the pipeline (sort -k1 -k2).
- Intra-chromosome looping pairs need to have 4 tab-delimited columns, in the following format:
- The final result of HindIII mode is an anchor-to-anchor looping pairs file, which has 5 columns:
See sample file here: http://hiview.case.edu/test/sample/anchor_2_anchor.loop.IMR90.p_val.sampleanchor_id_1 anchor_id_2 obserced_reads_count expected_reads_count p_value_
To run the eHiC mode:
./HiCorr eHiC <cis_loop_file> <trans_loop_file> <name_of_your_data> <reference_genome>
eHiC-QC mode takes a pair of fastq.gz files as input, aligns and processes eHiC reads, outputs fragment-end-pair files for further analysis. This mode also outputs summarize numbers which works as quality check fo eHiC experiments.
Make sure to name your fastq.gz files as .R1.fastq.gz and .R1.fastq.gz.
You need to have Bowtie(http://bowtie-bio.sourceforge.net/index.shtml) and samtools(http://www.htslib.org/) installed since HiCorr calls Bowtie to do alignments.
You also need Bowtie index and fa.fai file.
To run the eHiC-QC mode, you need 4 arguments:
./HiCorr eHiC-QC <bowtie_index> <fa.fai> <name>