Preprint link: https://www.biorxiv.org/content/10.1101/2023.04.04.535480v1
In this repo, you can find:
- Code and Tutorials for running scoring methods on both experimental and predicted contact frequency maps.
- Dataset of:
- Scores for in silico insertions and deletions throughout the genome
- Scores comparing windows around differentially expressed genes between ESC and HFF in chromosomes 21 and 22
Download this repo:
git clone https://github.com/pollardlab/contact_map_scoring.git
We provide scripts to run all 25 scoring functions on experimental maps. Since they vary in input type and coding language, they are split into multiple scripts as follows:
Methods that take in matrices that correspond to contact frequency maps of a certain region. This includes the following 13 methods:
- Correlation
- MSE
- SSIM
- Contact directionality (corr)
- Distance enrichment (corr)
- Eigenvector (corr)
- Insulation (corr)
- Insulation (mse)
- Loops
- SCC
- TADs
- Triangle (corr)
- Triangle (mse)
code/run_methods_that_use_contact_maps.py
- scipy: for Correlation, Contact directionality (corr), Distance enrichment (corr), Eigenvector (corr), Insulation (corr), SCC, and Triangle (corr)
- skimage: for Contact directionality (corr), SSIM, Triangle (corr), and Triangle (mse)
- sklearn: for Eigenvector only
- cooltools: for TADs only
- hicrep: for SCC only
For package versions, please refer to environment.yml.
- Generate input files outlined below
- Change variables in the script following instructions there
- Run script in the terminal
python run_methods_that_use_contact_maps.py
To run these methods on predicted maps instead of experimental ones, follow instructions in Running_Scoring_Functions_on_Predicted_Maps.ipynb
- Windows to score. This should be a tab-delimited text file with columns: chrom, start, end.
- Two cool files that will be compared at provided windows.
Tab-delimited text file with the same rows as the input windows file and added columns with scores for each method.
- Generate input files following instructions below
- Change variables in the script following instructions there
- Run script in the terminal
Rscript run_tad_hic_compare.R
Two chromosome-specific text files generated from cool files.
- Create text file from cool file:
cooler dump --join ESC_MicroC.mcool::/resolutions/2048 > ESC_MicroC_2048.txt
- Create chromosome-specific text file from genome-wide text file:
awk -F '\t' '$1 == "chr21"{ print }' ESC_MicroC_2048.txt > ESC_MicroC_2048_chr21.txt
Tab-delimited text files with results for each method, experiment, chromosome, and resolution saved in a new directory created separately for HiC compare and TADcompare
Directions for running the dcHiC script can be printed with the help flag:
sh run_dcHiC.sh --h
Example of running dcHiC on MicroC data between ESC and HFF at 2,048 bp resolution in 1 Mb example windows:
sh run_dcHiC.sh \
-C ../data/experimental_maps/example_input/ESC_MicroC.mcool \
-c ../data/experimental_maps/example_input/HFF_MicroC.mcool \
-r ../data/experimental_maps/example_DEG_windows_noheader.bed \
-P ESC_MicroC \
-p HFF_MicroC \
-g ../data/GRCh38_EBV_norandom_noEBV.chrom.sizes \
-b 2048 \
-o ../data/experimental_maps/example_output/dcHiC_scores \
-d softwares/dcHiC
The following files are necessary for running dcHiC:
- Two genome-wide contact map in .mcool format
- A file with windows of interest without a header and with columns: chrom, start, end
Tab-delimited text files with scores for each provided genomic window in columns: dcHiC_mse, dcHiC_pearson, dcHiC_spearman.
- Change variables in the script following instructions there
- Run script in the terminal
sh run_selfish.sh
The following files are necessary for running Selfish:
- Two genome-wide contact map in .mcool format
- A file with windows of interest without a header and with columns: chrom, start, end
Tab-delimited text files with the number of differential chromatin interactions between the two contact maps for each provided genomic window.
- Change variables in the script following instructions there
- Run script in the terminal
sh run_Arrowhead.sh
The following files are necessary for running Arrowhead:
- Two genome-wide contact map in .hic format
- A file with windows of interest without a header and with columns: chrom, start, end
Tab-delimited text files with the TADs shared (within 20kb), gained or lost between two contact maps for each provided genomic window and the corresponding ratio.
Directions for running the CHESS script can be printed with the help flag:
sh run_chess.sh --h
Example of running CHESS on MicroC data between ESC and HFF at 2,048 bp resolution in 1 Mb example windows:
sh run_chess.sh \
-C ../data/experimental_maps/example_input/ESC_MicroC.mcool \
-c ../data/experimental_maps/example_input/HFF_MicroC.mcool \
-R ../data/experimental_maps/example_DEG_windows.bedpe \
-b 2048 \
-t 8 \
-o chess
-d ../data/experimental_maps/example_output/
The following files are necessary for running CHESS:
- Two genome-wide contact map in .mcool format
- A file with windows of interest in bedpe format and with columns: chrom1 start1 end1 chrom2 start2 end2 name score strand1 strand2.
Tab-delimited text files with scores for each provided genomic window in columns: SN, ssim, z_ssim (only ssim scores were used in our downstream analysis).
Directions for running the HiC1Dmetrics script can be printed with the help flag:
python run_HiC1Dmetrics.py -h
Example of running HiC1Dmetrics on MicroC data between ESC and HFF at 2,048 bp resolution in 1 Mb example windows:
python run_HiC1Dmetrics.py \
-f ../data/experimental_maps/example_input/ESC_MicroC.mcool \
-s ../data/experimental_maps/example_input/HFF_MicroC.mcool \
-i ../data/experimental_maps/example_DEG_windows \
-a ESC
-b HFF
-r 2048 \
-w 1000000 \
-c chr21
-n 8 \
-o ../data/experimental_maps/example_output/HiC1Dmetrics.tsv
The following files are necessary for running HiC1Dmetrics:
- Two genome-wide contact map in .mcool format
- A file with windows of interest with a header and with columns: chrom, start, end.
Tab-delimited text files with scores for each provided genomic window in columns: ISC, CIC, SSC, deltaDLR, CD (HiC1Dmetrics).
- python packages: Python v3.10.12, cooler v0.9.2, cooltools v0.5.4, Matplotlib v3.7.2, Numpy v1.23.5, Pandas v1.5.3, scipy v1.10.1, Seaborn v0.12.2, h5py v3.8.0, hicrep v0.2.6, sklearn v1.0.2, skimage v0.19.3, Arrowhead of Juicer Tools v1.8.9, CHESS v0.3.8, HiC1Dmetrics v0.2.5, Selfish v1.14.0.
- R packages: Rstudio v0.16.0, HiCcompare v1.26.0, TADcompare v1.14.0, dcHiC v1.
- conda v4.12.0.
- Red Hat Enterprise Linux 8.9.
- CPUs used for everything but scoring predicted maps.
Please email us at [email protected] or [email protected] if you have any questions or concerns.