Compared to previous section, the result of each function described below depends on the entire set of ranges in the input GRanges object.
+
Compared to previous section, the result of each function described below depends on the entire set of ranges in the input GRanges object.
@@ -920,7 +920,7 @@
Subsetting by overlaps between a query and a subject
-
To directly subset ranges from query overlapping with ranges from a subject (e.g. to only keep peaks overlapping a TSS), we can use the subsetByOverlaps function.
+
To directly subset ranges from query overlapping with ranges from a subject (e.g. to only keep peaks overlapping a TSS), we can use the subsetByOverlaps function. The output of subsetByOverlaps is a subset of the original GRanges object provided as a query, with retained ranges being unmodified.
subsetByOverlaps(peaks, TSSs)## GRanges object with 3 ranges and 1 metadata column:
@@ -932,19 +932,6 @@
## -------## seqinfo: 1 sequence from an unspecified genome; no seqlengths
-
-
-
-
-
-
-Note
-
-
-
-
The output of subsetByOverlaps is a subset of the original GRanges object provided as a query, with retained ranges being unmodified.
-
-
Counting overlaps between a query and a subject
@@ -1138,16 +1125,6 @@
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
The way GInteractions objects are printed in an R console mimics that of GRanges, but pairs two “ends” (a.k.a. anchors) of an interaction together, each end being represented as a separate GRanges range.
-
-
-
-
-
-
-Notes
-
-
-
Note that it is possible to have interactions joining two identical anchors.
@@ -1187,8 +1164,6 @@
## regions: 7 ranges and 0 metadata columns## seqinfo: 2 sequences from an unspecified genome; no seqlengths
-
-
2.2.2GInteractions specific slots
Compared to GRanges, extra slots are available for GInteractions objects, e.g. anchors and regions.
@@ -1431,17 +1406,7 @@
## [1] 0 2000 3000 1000 NA
Note that for “trans” inter-chromosomal interactions, i.e. interactions with anchors on different chromosomes, the notion of genomic distance is meaningless and for this reason, pairdist returns a NA value.
-
-
-
-
-
-
-Advanced pairdist arguments
-
-
-
-
The type argument can be tweaked to specify which type of “distance” should be computed:
+
The type argument of the pairdist() function can be tweaked to specify which type of “distance” should be computed:
mid: The distance between the midpoints of the two regions (rounded down to the nearest integer) is returned (Default).
@@ -1451,10 +1416,7 @@
span: The distance between the furthermost points of the two regions is computed.
diag: The difference between the anchor indices is returned. This corresponds to a diagonal on the interaction space when bins are used in the ‘regions’ slot of ‘x’.
-
-
-
-
+
2.2.3.5GInteractions overlap methods
“Overlaps” for genomic interactions could be computed in different contexts:
+## [1] "## pairs format v1.0" "#sorted: chr1-pos1-chr2-pos2" "#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2" "#chromsize: I 230218" "#chromsize: II 813184" "#chromsize: III 316620" "#chromsize: IV 1531933" "#chromsize: V 576874" "#chromsize: VI 270161" "#chromsize: VII 1090940" "#chromsize: VIII 562643" "#chromsize: IX 439888" "#chromsize: X 745751" "#chromsize: XI 666816" "#chromsize: XII 1078177" "#chromsize: XIII 924431" "#chromsize: XIV 784333" "#chromsize: XV 1091291" "#chromsize: XVI 948066" "#chromsize: Mito 85779" "NS500150:527:HHGYNBGXF:3:21611:19085:3986\tII\t105\tII\t48548\t+\t-\t1358\t1681" "NS500150:527:HHGYNBGXF:4:13604:19734:2406\tII\t113\tII\t45003\t-\t+\t1358\t1658" "NS500150:527:HHGYNBGXF:2:11108:25178:11036\tII\t119\tII\t687251\t-\t+\t1358\t5550" "NS500150:527:HHGYNBGXF:1:22301:8468:1586\tII\t160\tII\t26124\t+\t-\t1358\t1510" "NS500150:527:HHGYNBGXF:4:23606:24037:2076\tII\t169\tII\t39052\t+\t+\t1358\t1613"
2.3.2ContactFile fundamentals
@@ -1667,7 +1599,7 @@
# ----- This creates a connection to a `.(m)cool` file (path stored in `coolf`)CoolFile(coolf)## CoolFile object
-## .mcool file: /github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752
+## .mcool file: /github/home/.cache/R/ExperimentHub/1a92248c093f_7752 ## resolution: 1000 ## pairs file: ## metadata(0):
@@ -1675,7 +1607,7 @@
# ----- This creates a connection to a `.hic` file (path stored in `hicf`)HicFile(hicf)## HicFile object
-## .hic file: /github/home/.cache/R/ExperimentHub/1a9a270f71fe_7836
+## .hic file: /github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836 ## resolution: 1000 ## pairs file: ## metadata(0):
@@ -1684,8 +1616,8 @@
# ----- This creates a connection to a pairs filePairsFile(pairsf)## PairsFile object
-## resource: /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753
+## resource: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753
import also works for other types of ContactFile (HicFile, HicproFile, PairsFile), e.g.
For HicFile and HicproFile, import seamlessly returns a HiCExperiment as well:
@@ -1928,7 +1825,7 @@
hic## `HiCExperiment` object with 13,681,280 contacts over 12,165 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a270f71fe_7836"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 1000
@@ -1947,43 +1844,28 @@
pairs<-import(pf)pairs## GInteractions object with 471364 interactions and 3 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric>
-## [1] II 105 --- II 48548 | 1358 1681
-## [2] II 113 --- II 45003 | 1358 1658
-## [3] II 119 --- II 687251 | 1358 5550
-## [4] II 160 --- II 26124 | 1358 1510
-## [5] II 169 --- II 39052 | 1358 1613
-## ... ... ... ... ... ... . ... ...
-## [471360] II 808605 --- II 809683 | 6316 6320
-## [471361] II 808609 --- II 809917 | 6316 6324
-## [471362] II 808617 --- II 809506 | 6316 6319
-## [471363] II 809447 --- II 809685 | 6319 6321
-## [471364] II 809472 --- II 809675 | 6319 6320
-## distance
-## <integer>
-## [1] 48443
-## [2] 44890
-## [3] 687132
-## [4] 25964
-## [5] 38883
-## ... ...
-## [471360] 1078
-## [471361] 1308
-## [471362] 889
-## [471363] 238
-## [471364] 203
+## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2 distance
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <integer>
+## [1] II 105 --- II 48548 | 1358 1681 48443
+## [2] II 113 --- II 45003 | 1358 1658 44890
+## [3] II 119 --- II 687251 | 1358 5550 687132
+## [4] II 160 --- II 26124 | 1358 1510 25964
+## [5] II 169 --- II 39052 | 1358 1613 38883
+## ... ... ... ... ... ... . ... ... ...
+## [471360] II 808605 --- II 809683 | 6316 6320 1078
+## [471361] II 808609 --- II 809917 | 6316 6324 1308
+## [471362] II 808617 --- II 809506 | 6316 6319 889
+## [471363] II 809447 --- II 809685 | 6319 6321 238
+## [471364] II 809472 --- II 809675 | 6319 6320 203## -------## regions: 549331 ranges and 0 metadata columns## seqinfo: 1 sequence from an unspecified genome; no seqlengths
-
-
2.4.1.2 Customizing the import
To reduce the import to only parse the data that is relevant to the study, two arguments can be passed to import, along with a ContactFile.
-
+
@@ -2009,102 +1891,61 @@
regions(hic)# ---- `regions()` work on `HiCExperiment` the same way than on `GInteractions`## GRanges object with 407 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## II_1_2000 II 1-2000 * | 116 NaN II
-## II_2001_4000 II 2001-4000 * | 117 NaN II
-## II_4001_6000 II 4001-6000 * | 118 NaN II
-## II_6001_8000 II 6001-8000 * | 119 NaN II
-## II_8001_10000 II 8001-10000 * | 120 0.0461112 II
-## ... ... ... ... . ... ... ...
-## II_804001_806000 II 804001-806000 * | 518 0.0493107 II
-## II_806001_808000 II 806001-808000 * | 519 0.0611355 II
-## II_808001_810000 II 808001-810000 * | 520 NaN II
-## II_810001_812000 II 810001-812000 * | 521 NaN II
-## II_812001_813184 II 812001-813184 * | 522 NaN II
-## center
-## <integer>
-## II_1_2000 1000
-## II_2001_4000 3000
-## II_4001_6000 5000
-## II_6001_8000 7000
-## II_8001_10000 9000
-## ... ...
-## II_804001_806000 805000
-## II_806001_808000 807000
-## II_808001_810000 809000
-## II_810001_812000 811000
-## II_812001_813184 812592
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## II_1_2000 II 1-2000 * | 116 NaN II 1000
+## II_2001_4000 II 2001-4000 * | 117 NaN II 3000
+## II_4001_6000 II 4001-6000 * | 118 NaN II 5000
+## II_6001_8000 II 6001-8000 * | 119 NaN II 7000
+## II_8001_10000 II 8001-10000 * | 120 0.0461112 II 9000
+## ... ... ... ... . ... ... ... ...
+## II_804001_806000 II 804001-806000 * | 518 0.0493107 II 805000
+## II_806001_808000 II 806001-808000 * | 519 0.0611355 II 807000
+## II_808001_810000 II 808001-810000 * | 520 NaN II 809000
+## II_810001_812000 II 810001-812000 * | 521 NaN II 811000
+## II_812001_813184 II 812001-813184 * | 522 NaN II 812592## -------## seqinfo: 16 sequences from an unspecified genometable(seqnames(regions(hic)))##
-## I II III IV V VI VII VIII IX X XI XII XIII XIV XV
-## 0 407 0 0 0 0 0 0 0 0 0 0 0 0 0
-## XVI
-## 0
+## I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI
+## 0 407 0 0 0 0 0 0 0 0 0 0 0 0 0 0anchors(hic)# ---- `anchors()` work on `HiCExperiment` the same way than on `GInteractions`## $first## GRanges object with 34063 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## [1] II 1-2000 * | 116 NaN II
-## [2] II 1-2000 * | 116 NaN II
-## [3] II 1-2000 * | 116 NaN II
-## [4] II 1-2000 * | 116 NaN II
-## [5] II 1-2000 * | 116 NaN II
-## ... ... ... ... . ... ... ...
-## [34059] II 804001-806000 * | 518 0.0493107 II
-## [34060] II 806001-808000 * | 519 0.0611355 II
-## [34061] II 806001-808000 * | 519 0.0611355 II
-## [34062] II 806001-808000 * | 519 0.0611355 II
-## [34063] II 808001-810000 * | 520 NaN II
-## center
-## <integer>
-## [1] 1000
-## [2] 1000
-## [3] 1000
-## [4] 1000
-## [5] 1000
-## ... ...
-## [34059] 805000
-## [34060] 807000
-## [34061] 807000
-## [34062] 807000
-## [34063] 809000
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## [1] II 1-2000 * | 116 NaN II 1000
+## [2] II 1-2000 * | 116 NaN II 1000
+## [3] II 1-2000 * | 116 NaN II 1000
+## [4] II 1-2000 * | 116 NaN II 1000
+## [5] II 1-2000 * | 116 NaN II 1000
+## ... ... ... ... . ... ... ... ...
+## [34059] II 804001-806000 * | 518 0.0493107 II 805000
+## [34060] II 806001-808000 * | 519 0.0611355 II 807000
+## [34061] II 806001-808000 * | 519 0.0611355 II 807000
+## [34062] II 806001-808000 * | 519 0.0611355 II 807000
+## [34063] II 808001-810000 * | 520 NaN II 809000## -------## seqinfo: 16 sequences from an unspecified genome## ## $second## GRanges object with 34063 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## [1] II 1-2000 * | 116 NaN II
-## [2] II 4001-6000 * | 118 NaN II
-## [3] II 6001-8000 * | 119 NaN II
-## [4] II 8001-10000 * | 120 0.0461112 II
-## [5] II 10001-12000 * | 121 0.0334807 II
-## ... ... ... ... . ... ... ...
-## [34059] II 810001-812000 * | 521 NaN II
-## [34060] II 806001-808000 * | 519 0.0611355 II
-## [34061] II 808001-810000 * | 520 NaN II
-## [34062] II 810001-812000 * | 521 NaN II
-## [34063] II 808001-810000 * | 520 NaN II
-## center
-## <integer>
-## [1] 1000
-## [2] 5000
-## [3] 7000
-## [4] 9000
-## [5] 11000
-## ... ...
-## [34059] 811000
-## [34060] 807000
-## [34061] 809000
-## [34062] 811000
-## [34063] 809000
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## [1] II 1-2000 * | 116 NaN II 1000
+## [2] II 4001-6000 * | 118 NaN II 5000
+## [3] II 6001-8000 * | 119 NaN II 7000
+## [4] II 8001-10000 * | 120 0.0461112 II 9000
+## [5] II 10001-12000 * | 121 0.0334807 II 11000
+## ... ... ... ... . ... ... ... ...
+## [34059] II 810001-812000 * | 521 NaN II 811000
+## [34060] II 806001-808000 * | 519 0.0611355 II 807000
+## [34061] II 808001-810000 * | 520 NaN II 809000
+## [34062] II 810001-812000 * | 521 NaN II 811000
+## [34063] II 808001-810000 * | 520 NaN II 809000## -------## seqinfo: 16 sequences from an unspecified genome
@@ -2116,32 +1957,19 @@
regions(hic)## GRanges object with 21 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## II_39001_40000 II 39001-40000 * | 270 0.0220798 II
-## II_40001_41000 II 40001-41000 * | 271 0.0246775 II
-## II_41001_42000 II 41001-42000 * | 272 0.0269232 II
-## II_42001_43000 II 42001-43000 * | 273 0.0341849 II
-## II_43001_44000 II 43001-44000 * | 274 0.0265386 II
-## ... ... ... ... . ... ... ...
-## II_55001_56000 II 55001-56000 * | 286 0.0213532 II
-## II_56001_57000 II 56001-57000 * | 287 0.0569839 II
-## II_57001_58000 II 57001-58000 * | 288 0.0338612 II
-## II_58001_59000 II 58001-59000 * | 289 0.0294531 II
-## II_59001_60000 II 59001-60000 * | 290 0.0306662 II
-## center
-## <integer>
-## II_39001_40000 39500
-## II_40001_41000 40500
-## II_41001_42000 41500
-## II_42001_43000 42500
-## II_43001_44000 43500
-## ... ...
-## II_55001_56000 55500
-## II_56001_57000 56500
-## II_57001_58000 57500
-## II_58001_59000 58500
-## II_59001_60000 59500
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## II_39001_40000 II 39001-40000 * | 270 0.0220798 II 39500
+## II_40001_41000 II 40001-41000 * | 271 0.0246775 II 40500
+## II_41001_42000 II 41001-42000 * | 272 0.0269232 II 41500
+## II_42001_43000 II 42001-43000 * | 273 0.0341849 II 42500
+## II_43001_44000 II 43001-44000 * | 274 0.0265386 II 43500
+## ... ... ... ... . ... ... ... ...
+## II_55001_56000 II 55001-56000 * | 286 0.0213532 II 55500
+## II_56001_57000 II 56001-57000 * | 287 0.0569839 II 56500
+## II_57001_58000 II 57001-58000 * | 288 0.0338612 II 57500
+## II_58001_59000 II 58001-59000 * | 289 0.0294531 II 58500
+## II_59001_60000 II 59001-60000 * | 290 0.0306662 II 59500## -------## seqinfo: 16 sequences from an unspecified genome
@@ -2190,95 +2018,56 @@
regions(hic2)## GRanges object with 477 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight
-## <Rle> <IRanges> <Rle> | <numeric> <numeric>
-## II_1_4000 II 1-4000 * | 58 NaN
-## II_4001_8000 II 4001-8000 * | 59 NaN
-## II_8001_12000 II 8001-12000 * | 60 0.0274474
-## II_12001_16000 II 12001-16000 * | 61 0.0342116
-## II_16001_20000 II 16001-20000 * | 62 0.0195128
-## ... ... ... ... . ... ...
-## XV_1072001_1076000 XV 1072001-1076000 * | 2783 0.041763
-## XV_1076001_1080000 XV 1076001-1080000 * | 2784 NaN
-## XV_1080001_1084000 XV 1080001-1084000 * | 2785 NaN
-## XV_1084001_1088000 XV 1084001-1088000 * | 2786 NaN
-## XV_1088001_1091291 XV 1088001-1091291 * | 2787 NaN
-## chr center
-## <Rle> <integer>
-## II_1_4000 II 2000
-## II_4001_8000 II 6000
-## II_8001_12000 II 10000
-## II_12001_16000 II 14000
-## II_16001_20000 II 18000
-## ... ... ...
-## XV_1072001_1076000 XV 1074000
-## XV_1076001_1080000 XV 1078000
-## XV_1080001_1084000 XV 1082000
-## XV_1084001_1088000 XV 1086000
-## XV_1088001_1091291 XV 1089646
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## II_1_4000 II 1-4000 * | 58 NaN II 2000
+## II_4001_8000 II 4001-8000 * | 59 NaN II 6000
+## II_8001_12000 II 8001-12000 * | 60 0.0274474 II 10000
+## II_12001_16000 II 12001-16000 * | 61 0.0342116 II 14000
+## II_16001_20000 II 16001-20000 * | 62 0.0195128 II 18000
+## ... ... ... ... . ... ... ... ...
+## XV_1072001_1076000 XV 1072001-1076000 * | 2783 0.041763 XV 1074000
+## XV_1076001_1080000 XV 1076001-1080000 * | 2784 NaN XV 1078000
+## XV_1080001_1084000 XV 1080001-1084000 * | 2785 NaN XV 1082000
+## XV_1084001_1088000 XV 1084001-1088000 * | 2786 NaN XV 1086000
+## XV_1088001_1091291 XV 1088001-1091291 * | 2787 NaN XV 1089646## -------## seqinfo: 16 sequences from an unspecified genomeanchors(hic2)## $first## GRanges object with 18032 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## [1] II 1-4000 * | 58 NaN II
-## [2] II 1-4000 * | 58 NaN II
-## [3] II 1-4000 * | 58 NaN II
-## [4] II 1-4000 * | 58 NaN II
-## [5] II 1-4000 * | 58 NaN II
-## ... ... ... ... . ... ... ...
-## [18028] II 808001-812000 * | 260 NaN II
-## [18029] II 808001-812000 * | 260 NaN II
-## [18030] II 808001-812000 * | 260 NaN II
-## [18031] II 808001-812000 * | 260 NaN II
-## [18032] II 808001-812000 * | 260 NaN II
-## center
-## <integer>
-## [1] 2000
-## [2] 2000
-## [3] 2000
-## [4] 2000
-## [5] 2000
-## ... ...
-## [18028] 810000
-## [18029] 810000
-## [18030] 810000
-## [18031] 810000
-## [18032] 810000
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## [1] II 1-4000 * | 58 NaN II 2000
+## [2] II 1-4000 * | 58 NaN II 2000
+## [3] II 1-4000 * | 58 NaN II 2000
+## [4] II 1-4000 * | 58 NaN II 2000
+## [5] II 1-4000 * | 58 NaN II 2000
+## ... ... ... ... . ... ... ... ...
+## [18028] II 808001-812000 * | 260 NaN II 810000
+## [18029] II 808001-812000 * | 260 NaN II 810000
+## [18030] II 808001-812000 * | 260 NaN II 810000
+## [18031] II 808001-812000 * | 260 NaN II 810000
+## [18032] II 808001-812000 * | 260 NaN II 810000## -------## seqinfo: 16 sequences from an unspecified genome## ## $second## GRanges object with 18032 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## [1] XV 48001-52000 * | 2527 0.0185354 XV
-## [2] XV 348001-352000 * | 2602 0.0233750 XV
-## [3] XV 468001-472000 * | 2632 0.0153615 XV
-## [4] XV 472001-476000 * | 2633 0.0189624 XV
-## [5] XV 584001-588000 * | 2661 0.0167715 XV
-## ... ... ... ... . ... ... ...
-## [18028] XV 980001-984000 * | 2760 0.0187827 XV
-## [18029] XV 984001-988000 * | 2761 0.0250094 XV
-## [18030] XV 992001-996000 * | 2763 0.0185599 XV
-## [18031] XV 1004001-1008000 * | 2766 0.0196942 XV
-## [18032] XV 1064001-1068000 * | 2781 0.0208220 XV
-## center
-## <integer>
-## [1] 50000
-## [2] 350000
-## [3] 470000
-## [4] 474000
-## [5] 586000
-## ... ...
-## [18028] 982000
-## [18029] 986000
-## [18030] 994000
-## [18031] 1006000
-## [18032] 1066000
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## [1] XV 48001-52000 * | 2527 0.0185354 XV 50000
+## [2] XV 348001-352000 * | 2602 0.0233750 XV 350000
+## [3] XV 468001-472000 * | 2632 0.0153615 XV 470000
+## [4] XV 472001-476000 * | 2633 0.0189624 XV 474000
+## [5] XV 584001-588000 * | 2661 0.0167715 XV 586000
+## ... ... ... ... . ... ... ... ...
+## [18028] XV 980001-984000 * | 2760 0.0187827 XV 982000
+## [18029] XV 984001-988000 * | 2761 0.0250094 XV 986000
+## [18030] XV 992001-996000 * | 2763 0.0185599 XV 994000
+## [18031] XV 1004001-1008000 * | 2766 0.0196942 XV 1006000
+## [18032] XV 1064001-1068000 * | 2781 0.0208220 XV 1066000## -------## seqinfo: 16 sequences from an unspecified genome
@@ -2290,32 +2079,19 @@
regions(hic3)## GRanges object with 32 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## III_8001_10000 III 8001-10000 * | 527 NaN III
-## III_10001_12000 III 10001-12000 * | 528 NaN III
-## III_12001_14000 III 12001-14000 * | 529 NaN III
-## III_14001_16000 III 14001-16000 * | 530 0.0356351 III
-## III_16001_18000 III 16001-18000 * | 531 0.0230693 III
-## ... ... ... ... . ... ... ...
-## XV_30001_32000 XV 30001-32000 * | 5039 0.0482465 XV
-## XV_32001_34000 XV 32001-34000 * | 5040 0.0241580 XV
-## XV_34001_36000 XV 34001-36000 * | 5041 0.0273166 XV
-## XV_36001_38000 XV 36001-38000 * | 5042 0.0542235 XV
-## XV_38001_40000 XV 38001-40000 * | 5043 0.0206849 XV
-## center
-## <integer>
-## III_8001_10000 9000
-## III_10001_12000 11000
-## III_12001_14000 13000
-## III_14001_16000 15000
-## III_16001_18000 17000
-## ... ...
-## XV_30001_32000 31000
-## XV_32001_34000 33000
-## XV_34001_36000 35000
-## XV_36001_38000 37000
-## XV_38001_40000 39000
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## III_8001_10000 III 8001-10000 * | 527 NaN III 9000
+## III_10001_12000 III 10001-12000 * | 528 NaN III 11000
+## III_12001_14000 III 12001-14000 * | 529 NaN III 13000
+## III_14001_16000 III 14001-16000 * | 530 0.0356351 III 15000
+## III_16001_18000 III 16001-18000 * | 531 0.0230693 III 17000
+## ... ... ... ... . ... ... ... ...
+## XV_30001_32000 XV 30001-32000 * | 5039 0.0482465 XV 31000
+## XV_32001_34000 XV 32001-34000 * | 5040 0.0241580 XV 33000
+## XV_34001_36000 XV 34001-36000 * | 5041 0.0273166 XV 35000
+## XV_36001_38000 XV 36001-38000 * | 5042 0.0542235 XV 37000
+## XV_38001_40000 XV 38001-40000 * | 5043 0.0206849 XV 39000## -------## seqinfo: 16 sequences from an unspecified genome
@@ -2370,14 +2146,14 @@
yeast_hic## `HiCExperiment` object with 8,757,906 contacts over 763 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 16000 ## interactions: 267709 ## scores(2): count balanced ## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16)
-## pairsFile: /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753
+## pairsFile: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 ## metadata(3): ID org date
@@ -2386,141 +2162,77 @@
interactions(yeast_hic)## GInteractions object with 267709 interactions and 4 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | bin_id1
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric>
-## [1] I 1-16000 --- I 1-16000 | 0
-## [2] I 1-16000 --- I 16001-32000 | 0
-## [3] I 1-16000 --- I 32001-48000 | 0
-## [4] I 1-16000 --- I 48001-64000 | 0
-## [5] I 1-16000 --- I 64001-80000 | 0
-## ... ... ... ... ... ... . ...
-## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759
-## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759
-## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760
-## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760
-## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761
-## bin_id2 count balanced
-## <numeric> <numeric> <numeric>
-## [1] 0 2836 1.0943959
-## [2] 1 2212 0.9592069
-## [3] 2 1183 0.4385242
-## [4] 3 831 0.2231192
-## [5] 4 310 0.0821255
-## ... ... ... ...
-## [267705] 760 3565 1.236371
-## [267706] 761 1359 0.385016
-## [267707] 760 3534 2.103988
-## [267708] 761 3055 1.485794
-## [267709] 761 4308 1.711565
+## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>
+## [1] I 1-16000 --- I 1-16000 | 0 0 2836 1.0943959
+## [2] I 1-16000 --- I 16001-32000 | 0 1 2212 0.9592069
+## [3] I 1-16000 --- I 32001-48000 | 0 2 1183 0.4385242
+## [4] I 1-16000 --- I 48001-64000 | 0 3 831 0.2231192
+## [5] I 1-16000 --- I 64001-80000 | 0 4 310 0.0821255
+## ... ... ... ... ... ... . ... ... ... ...
+## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759 760 3565 1.236371
+## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759 761 1359 0.385016
+## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760 760 3534 2.103988
+## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760 761 3055 1.485794
+## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761 761 4308 1.711565## -------## regions: 763 ranges and 4 metadata columns## seqinfo: 16 sequences from an unspecified genome
-
-
-
-
-
-
-Note
-
-
-
Because genomic interactions are actually stored as GInteractions, regions and anchors work on HiCExperiment objects just as they work with GInteractions!
-
-
regions(yeast_hic)## GRanges object with 763 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight
-## <Rle> <IRanges> <Rle> | <numeric> <numeric>
-## I_1_16000 I 1-16000 * | 0 0.0196442
-## I_16001_32000 I 16001-32000 * | 1 0.0220746
-## I_32001_48000 I 32001-48000 * | 2 0.0188701
-## I_48001_64000 I 48001-64000 * | 3 0.0136679
-## I_64001_80000 I 64001-80000 * | 4 0.0134860
-## ... ... ... ... . ... ...
-## XVI_880001_896000 XVI 880001-896000 * | 758 0.00910873
-## XVI_896001_912000 XVI 896001-912000 * | 759 0.01421350
-## XVI_912001_928000 XVI 912001-928000 * | 760 0.02439992
-## XVI_928001_944000 XVI 928001-944000 * | 761 0.01993237
-## XVI_944001_948066 XVI 944001-948066 * | 762 NaN
-## chr center
-## <Rle> <integer>
-## I_1_16000 I 8000
-## I_16001_32000 I 24000
-## I_32001_48000 I 40000
-## I_48001_64000 I 56000
-## I_64001_80000 I 72000
-## ... ... ...
-## XVI_880001_896000 XVI 888000
-## XVI_896001_912000 XVI 904000
-## XVI_912001_928000 XVI 920000
-## XVI_928001_944000 XVI 936000
-## XVI_944001_948066 XVI 946033
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## I_1_16000 I 1-16000 * | 0 0.0196442 I 8000
+## I_16001_32000 I 16001-32000 * | 1 0.0220746 I 24000
+## I_32001_48000 I 32001-48000 * | 2 0.0188701 I 40000
+## I_48001_64000 I 48001-64000 * | 3 0.0136679 I 56000
+## I_64001_80000 I 64001-80000 * | 4 0.0134860 I 72000
+## ... ... ... ... . ... ... ... ...
+## XVI_880001_896000 XVI 880001-896000 * | 758 0.00910873 XVI 888000
+## XVI_896001_912000 XVI 896001-912000 * | 759 0.01421350 XVI 904000
+## XVI_912001_928000 XVI 912001-928000 * | 760 0.02439992 XVI 920000
+## XVI_928001_944000 XVI 928001-944000 * | 761 0.01993237 XVI 936000
+## XVI_944001_948066 XVI 944001-948066 * | 762 NaN XVI 946033## -------## seqinfo: 16 sequences from an unspecified genomeanchors(yeast_hic)## $first## GRanges object with 267709 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## [1] I 1-16000 * | 0 0.0196442 I
-## [2] I 1-16000 * | 0 0.0196442 I
-## [3] I 1-16000 * | 0 0.0196442 I
-## [4] I 1-16000 * | 0 0.0196442 I
-## [5] I 1-16000 * | 0 0.0196442 I
-## ... ... ... ... . ... ... ...
-## [267705] XVI 896001-912000 * | 759 0.0142135 XVI
-## [267706] XVI 896001-912000 * | 759 0.0142135 XVI
-## [267707] XVI 912001-928000 * | 760 0.0243999 XVI
-## [267708] XVI 912001-928000 * | 760 0.0243999 XVI
-## [267709] XVI 928001-944000 * | 761 0.0199324 XVI
-## center
-## <integer>
-## [1] 8000
-## [2] 8000
-## [3] 8000
-## [4] 8000
-## [5] 8000
-## ... ...
-## [267705] 904000
-## [267706] 904000
-## [267707] 920000
-## [267708] 920000
-## [267709] 936000
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## [1] I 1-16000 * | 0 0.0196442 I 8000
+## [2] I 1-16000 * | 0 0.0196442 I 8000
+## [3] I 1-16000 * | 0 0.0196442 I 8000
+## [4] I 1-16000 * | 0 0.0196442 I 8000
+## [5] I 1-16000 * | 0 0.0196442 I 8000
+## ... ... ... ... . ... ... ... ...
+## [267705] XVI 896001-912000 * | 759 0.0142135 XVI 904000
+## [267706] XVI 896001-912000 * | 759 0.0142135 XVI 904000
+## [267707] XVI 912001-928000 * | 760 0.0243999 XVI 920000
+## [267708] XVI 912001-928000 * | 760 0.0243999 XVI 920000
+## [267709] XVI 928001-944000 * | 761 0.0199324 XVI 936000## -------## seqinfo: 16 sequences from an unspecified genome## ## $second## GRanges object with 267709 ranges and 4 metadata columns:
-## seqnames ranges strand | bin_id weight chr
-## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>
-## [1] I 1-16000 * | 0 0.0196442 I
-## [2] I 16001-32000 * | 1 0.0220746 I
-## [3] I 32001-48000 * | 2 0.0188701 I
-## [4] I 48001-64000 * | 3 0.0136679 I
-## [5] I 64001-80000 * | 4 0.0134860 I
-## ... ... ... ... . ... ... ...
-## [267705] XVI 912001-928000 * | 760 0.0243999 XVI
-## [267706] XVI 928001-944000 * | 761 0.0199324 XVI
-## [267707] XVI 912001-928000 * | 760 0.0243999 XVI
-## [267708] XVI 928001-944000 * | 761 0.0199324 XVI
-## [267709] XVI 928001-944000 * | 761 0.0199324 XVI
-## center
-## <integer>
-## [1] 8000
-## [2] 24000
-## [3] 40000
-## [4] 56000
-## [5] 72000
-## ... ...
-## [267705] 920000
-## [267706] 936000
-## [267707] 920000
-## [267708] 936000
-## [267709] 936000
+## seqnames ranges strand | bin_id weight chr center
+## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>
+## [1] I 1-16000 * | 0 0.0196442 I 8000
+## [2] I 16001-32000 * | 1 0.0220746 I 24000
+## [3] I 32001-48000 * | 2 0.0188701 I 40000
+## [4] I 48001-64000 * | 3 0.0136679 I 56000
+## [5] I 64001-80000 * | 4 0.0134860 I 72000
+## ... ... ... ... . ... ... ... ...
+## [267705] XVI 912001-928000 * | 760 0.0243999 XVI 920000
+## [267706] XVI 928001-944000 * | 761 0.0199324 XVI 936000
+## [267707] XVI 912001-928000 * | 760 0.0243999 XVI 920000
+## [267708] XVI 928001-944000 * | 761 0.0199324 XVI 936000
+## [267709] XVI 928001-944000 * | 761 0.0199324 XVI 936000## -------## seqinfo: 16 sequences from an unspecified genome
Calling interactions(hic) returns a GInteractions with scores already stored in extra columns. This short-hand allows one to dynamically check scores directly from the interactions output.
interactions(yeast_hic)## GInteractions object with 267709 interactions and 4 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | bin_id1
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric>
-## [1] I 1-16000 --- I 1-16000 | 0
-## [2] I 1-16000 --- I 16001-32000 | 0
-## [3] I 1-16000 --- I 32001-48000 | 0
-## [4] I 1-16000 --- I 48001-64000 | 0
-## [5] I 1-16000 --- I 64001-80000 | 0
-## ... ... ... ... ... ... . ...
-## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759
-## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759
-## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760
-## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760
-## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761
-## bin_id2 count balanced
-## <numeric> <numeric> <numeric>
-## [1] 0 2836 1.0943959
-## [2] 1 2212 0.9592069
-## [3] 2 1183 0.4385242
-## [4] 3 831 0.2231192
-## [5] 4 310 0.0821255
-## ... ... ... ...
-## [267705] 760 3565 1.236371
-## [267706] 761 1359 0.385016
-## [267707] 760 3534 2.103988
-## [267708] 761 3055 1.485794
-## [267709] 761 4308 1.711565
+## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>
+## [1] I 1-16000 --- I 1-16000 | 0 0 2836 1.0943959
+## [2] I 1-16000 --- I 16001-32000 | 0 1 2212 0.9592069
+## [3] I 1-16000 --- I 32001-48000 | 0 2 1183 0.4385242
+## [4] I 1-16000 --- I 48001-64000 | 0 3 831 0.2231192
+## [5] I 1-16000 --- I 64001-80000 | 0 4 310 0.0821255
+## ... ... ... ... ... ... . ... ... ... ...
+## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759 760 3565 1.236371
+## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759 761 1359 0.385016
+## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760 760 3534 2.103988
+## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760 761 3055 1.485794
+## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761 761 4308 1.711565## -------## regions: 763 ranges and 4 metadata columns## seqinfo: 16 sequences from an unspecified genome
@@ -2650,8 +2339,6 @@
In Hi-C studies, “topological features” refer to genomic structures identified (usually from a Hi-C map, but not necessarily). For instance, one may want to study known structural loops anchored at CTCF sites, or interactions around or over centromeres, or simply specific genomic “viewpoints”.
@@ -2694,82 +2381,36 @@
pairsFile(yeast_hic)## EH7703
-## "/github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753"
+## "/github/home/.cache/R/ExperimentHub/1a92835ced9_7753"readLines(pairsFile(yeast_hic), 25)
-## [1] "## pairs format v1.0"
-## [2] "#sorted: chr1-pos1-chr2-pos2"
-## [3] "#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2"
-## [4] "#chromsize: I 230218"
-## [5] "#chromsize: II 813184"
-## [6] "#chromsize: III 316620"
-## [7] "#chromsize: IV 1531933"
-## [8] "#chromsize: V 576874"
-## [9] "#chromsize: VI 270161"
-## [10] "#chromsize: VII 1090940"
-## [11] "#chromsize: VIII 562643"
-## [12] "#chromsize: IX 439888"
-## [13] "#chromsize: X 745751"
-## [14] "#chromsize: XI 666816"
-## [15] "#chromsize: XII 1078177"
-## [16] "#chromsize: XIII 924431"
-## [17] "#chromsize: XIV 784333"
-## [18] "#chromsize: XV 1091291"
-## [19] "#chromsize: XVI 948066"
-## [20] "#chromsize: Mito 85779"
-## [21] "NS500150:527:HHGYNBGXF:3:21611:19085:3986\tII\t105\tII\t48548\t+\t-\t1358\t1681"
-## [22] "NS500150:527:HHGYNBGXF:4:13604:19734:2406\tII\t113\tII\t45003\t-\t+\t1358\t1658"
-## [23] "NS500150:527:HHGYNBGXF:2:11108:25178:11036\tII\t119\tII\t687251\t-\t+\t1358\t5550"
-## [24] "NS500150:527:HHGYNBGXF:1:22301:8468:1586\tII\t160\tII\t26124\t+\t-\t1358\t1510"
-## [25] "NS500150:527:HHGYNBGXF:4:23606:24037:2076\tII\t169\tII\t39052\t+\t+\t1358\t1613"
+## [1] "## pairs format v1.0" "#sorted: chr1-pos1-chr2-pos2" "#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2" "#chromsize: I 230218" "#chromsize: II 813184" "#chromsize: III 316620" "#chromsize: IV 1531933" "#chromsize: V 576874" "#chromsize: VI 270161" "#chromsize: VII 1090940" "#chromsize: VIII 562643" "#chromsize: IX 439888" "#chromsize: X 745751" "#chromsize: XI 666816" "#chromsize: XII 1078177" "#chromsize: XIII 924431" "#chromsize: XIV 784333" "#chromsize: XV 1091291" "#chromsize: XVI 948066" "#chromsize: Mito 85779" "NS500150:527:HHGYNBGXF:3:21611:19085:3986\tII\t105\tII\t48548\t+\t-\t1358\t1681" "NS500150:527:HHGYNBGXF:4:13604:19734:2406\tII\t113\tII\t45003\t-\t+\t1358\t1658" "NS500150:527:HHGYNBGXF:2:11108:25178:11036\tII\t119\tII\t687251\t-\t+\t1358\t5550" "NS500150:527:HHGYNBGXF:1:22301:8468:1586\tII\t160\tII\t26124\t+\t-\t1358\t1510" "NS500150:527:HHGYNBGXF:4:23606:24037:2076\tII\t169\tII\t39052\t+\t+\t1358\t1613"
-
-
-
-
-
-
-Importing a PairsFile
-
-
-
+
+2.4.2.6 Importing a PairsFile
+
The .pairs file linked to a HiCExperiment object can itself be imported in a GInteractions object:
import(pairsFile(yeast_hic), format ='pairs')## GInteractions object with 471364 interactions and 3 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric>
-## [1] II 105 --- II 48548 | 1358 1681
-## [2] II 113 --- II 45003 | 1358 1658
-## [3] II 119 --- II 687251 | 1358 5550
-## [4] II 160 --- II 26124 | 1358 1510
-## [5] II 169 --- II 39052 | 1358 1613
-## ... ... ... ... ... ... . ... ...
-## [471360] II 808605 --- II 809683 | 6316 6320
-## [471361] II 808609 --- II 809917 | 6316 6324
-## [471362] II 808617 --- II 809506 | 6316 6319
-## [471363] II 809447 --- II 809685 | 6319 6321
-## [471364] II 809472 --- II 809675 | 6319 6320
-## distance
-## <integer>
-## [1] 48443
-## [2] 44890
-## [3] 687132
-## [4] 25964
-## [5] 38883
-## ... ...
-## [471360] 1078
-## [471361] 1308
-## [471362] 889
-## [471363] 238
-## [471364] 203
+## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2 distance
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <integer>
+## [1] II 105 --- II 48548 | 1358 1681 48443
+## [2] II 113 --- II 45003 | 1358 1658 44890
+## [3] II 119 --- II 687251 | 1358 5550 687132
+## [4] II 160 --- II 26124 | 1358 1510 25964
+## [5] II 169 --- II 39052 | 1358 1613 38883
+## ... ... ... ... ... ... . ... ... ...
+## [471360] II 808605 --- II 809683 | 6316 6320 1078
+## [471361] II 808609 --- II 809917 | 6316 6324 1308
+## [471362] II 808617 --- II 809506 | 6316 6319 889
+## [471363] II 809447 --- II 809685 | 6319 6321 238
+## [471364] II 809472 --- II 809675 | 6319 6320 203## -------## regions: 549331 ranges and 0 metadata columns## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Note that these GInteractions are not binned, contrary to interactions extracted from a HiCExperiment. Anchors of the interactions listed in the GInteractions imported from a disk-stored .pairs file are all of width 1.
-
-
2.5 Visual summary of the HiCExperiment data structure
The HiCExperiment data structure provided by the HiCExperiment package inherits methods from core GInteractions and BiocFile classes to provide a flexible representation of Hi-C data in R. It allows random access-based queries to seamlessly import parts or all the data contained in disk-stored Hi-C contact matrices in a variety of formats.
This is the landing page of the “Orchestrating Hi-C analysis with Bioconductor” book. The primary aim of this book is to introduce the R user to Hi-C analysis. This book starts with key concepts important for the analysis of chromatin conformation capture and then presents Bioconductor tools that can be leveraged to process, analyze, explore and visualize Hi-C data.
-
Authors: Jacques Serizay [aut, cre] Version: 1.1.0 Modified: 2023-04-14 Compiled: 2023-10-19 Environment: R version 4.3.1 (2023-06-16), Bioconductor 3.18 License: MIT + file LICENSE Copyright: J. Serizay
+
Authors: Jacques Serizay [aut, cre] Version: 1.1.0 Modified: 2023-04-14 Compiled: 2023-10-30 Environment: R version 4.3.1 (2023-06-16), Bioconductor 3.18 License: MIT + file LICENSE Copyright: J. Serizay
Table of contents
This book is divided in three parts:
Part I: Introduction to Hi-C analysis
@@ -442,7 +442,7 @@
Orchestrating Hi-C analysis with Bioconductor
Session info
sessioninfo::session_info()
-## ─ Session info ────────────────────────────────────────────────────────────
+## ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## setting value## version R version 4.3.1 (2023-06-16)## os Ubuntu 22.04.3 LTS
@@ -452,26 +452,26 @@
PairsFile connections can be imported directly into a GInteractions object with import():
+
If needed, PairsFile connections can be imported directly into a GInteractions object with import().
import(pf)## GInteractions object with 471364 interactions and 3 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric>
-## [1] II 105 --- II 48548 | 1358 1681
-## [2] II 113 --- II 45003 | 1358 1658
-## [3] II 119 --- II 687251 | 1358 5550
-## [4] II 160 --- II 26124 | 1358 1510
-## [5] II 169 --- II 39052 | 1358 1613
-## ... ... ... ... ... ... . ... ...
-## [471360] II 808605 --- II 809683 | 6316 6320
-## [471361] II 808609 --- II 809917 | 6316 6324
-## [471362] II 808617 --- II 809506 | 6316 6319
-## [471363] II 809447 --- II 809685 | 6319 6321
-## [471364] II 809472 --- II 809675 | 6319 6320
-## distance
-## <integer>
-## [1] 48443
-## [2] 44890
-## [3] 687132
-## [4] 25964
-## [5] 38883
-## ... ...
-## [471360] 1078
-## [471361] 1308
-## [471362] 889
-## [471363] 238
-## [471364] 203
+## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2 distance
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <integer>
+## [1] II 105 --- II 48548 | 1358 1681 48443
+## [2] II 113 --- II 45003 | 1358 1658 44890
+## [3] II 119 --- II 687251 | 1358 5550 687132
+## [4] II 160 --- II 26124 | 1358 1510 25964
+## [5] II 169 --- II 39052 | 1358 1613 38883
+## ... ... ... ... ... ... . ... ... ...
+## [471360] II 808605 --- II 809683 | 6316 6320 1078
+## [471361] II 808609 --- II 809917 | 6316 6324 1308
+## [471362] II 808617 --- II 809506 | 6316 6319 889
+## [471363] II 809447 --- II 809685 | 6319 6321 238
+## [471364] II 809472 --- II 809675 | 6319 6320 203## -------## regions: 549331 ranges and 0 metadata columns## seqinfo: 1 sequence from an unspecified genome; no seqlengths
-
-
We can compute a P(s) per chromosome from this .pairs file using the distanceLaw function.
library(HiContacts)ps<-distanceLaw(pf, by_chr =TRUE)
-## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753 in memory. This may take a while...
+## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while...ps## # A tibble: 115 × 6## chr binned_distance p norm_p norm_p_unity slope
@@ -465,28 +440,9 @@
## 6 II 23 0.0000870 0.0000290 62.1 1.53## # ℹ 109 more rows
-
-
-
-
-
-
-Note
-
-
-
-
Because this is a toy dataset, contacts are only provided for the chromosome II.
eco1_ps<-distanceLaw(eco1_pf, by_chr =TRUE)
-## Importing pairs file /github/home/.cache/R/ExperimentHub/21fb251da216_7755 in memory. This may take a while...
+
+
eco1_ps<-distanceLaw(eco1_pf, by_chr =TRUE)
+## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while...eco1_ps## # A tibble: 115 × 6## chr binned_distance p norm_p norm_p_unity slope
@@ -527,8 +483,8 @@
## # ℹ 109 more rows
A little data wrangling can help plotting the distance laws for 2 different samples in the same plot.
plotPsSlope(merged_ps, aes(x =binned_distance, y =slope, color =sample, linetype =chr))+
+
plotPsSlope(merged_ps, aes(x =binned_distance, y =slope, color =sample, linetype =chr))+scale_color_manual(values =c('#c6c6c6', '#ca0000'))## Warning: Removed 135 rows containing missing values (`geom_line()`).
-
+
@@ -555,44 +511,31 @@
6.1.3 P(s) from HiCExperiment objects
Alternatively, distance laws can be computed from binned matrices directly by providing HiCExperiment objects. For deeply sequenced datasets, this can be significantly faster than when using original .pairs files, but the smoothness of the resulting curves will be greatly impacted, notably at short distances.
ps_from_hic<-distanceLaw(hic, by_chr =TRUE)## pairsFile not specified. The P(s) curve will be an approximation.plotPs(ps_from_hic, aes(x =binned_distance, y =norm_p))## Warning: Removed 9 rows containing missing values (`geom_line()`).
-
+
-
plotPsSlope(ps_from_hic, aes(x =binned_distance, y =slope))
+
The ratio between cis interactions and trans interactions is often used to assess the overall quality of a Hi-C dataset. It can be computed per chromosome using the cisTransRatio() function.
-
-
-
-
-
-
-Tip!
-
-
-
-
You will need to provide a genome-wide HiCExperiment to estimate cis/trans ratios!
The ratio between cis interactions and trans interactions is often used to assess the overall quality of a Hi-C dataset. It can be computed per chromosome using the cisTransRatio() function. You will need to provide a genome-wideHiCExperiment to estimate cis/trans ratios!
Cis/trans contact ratios will greatly vary depending on the cell cycle phase the sample is in! For instance, chromosomes during the mitosis phase of the cell cycle have very little trans contacts, due to their structural organization and individualization.
-
-
6.3 Virtual 4C profiles
Interaction profile of a genomic locus of interest with its surrounding environment or the rest of the genome is frequently generated. In some cases, this can help in identifying and/or comparing regulatory or structural interactions.
For instance, we can compute the genome-wide virtual 4C profile of interactions anchored at the centromere in chromosome II (located at ~ 238kb).
Scalograms were introduced in Lioy et al. (2018) to investigate distance-dependent contact frequencies for individual genomic bins along chromosomes.
To generate a scalogram, one needs to provide a HiCExperiment object with a valid associated pairsFile.
pairsFile(hic)<-pairsfscalo<-scalogram(hic)
-## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753 in memory. This may take a while...
+## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while...plotScalogram(scalo|>filter(chr=='II'), ylim =c(1e3, 1e5))
-
+
Several scalograms can be plotted together to compare distance-dependent contact frequencies along a given chromosome in different samples.
## loading from cachepairsFile(eco1_hic)<-eco1_pairsfeco1_scalo<-scalogram(eco1_hic)
-## Importing pairs file /github/home/.cache/R/ExperimentHub/21fb251da216_7755 in memory. This may take a while...
+## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while...merged_scalo<-rbind(scalo|>mutate(sample ='WT'), eco1_scalo|>mutate(sample ='eco1')
@@ -732,7 +650,7 @@
res## `HiCExperiment` object with 471,364 contacts over 802 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 1000
@@ -660,45 +649,19 @@
interactions(res)## GInteractions object with 74360 interactions and 9 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | bin_id1
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric>
-## [1] II 1-1000 --- II 1001-2000 | 231
-## [2] II 1-1000 --- II 5001-6000 | 231
-## [3] II 1-1000 --- II 6001-7000 | 231
-## [4] II 1-1000 --- II 8001-9000 | 231
-## [5] II 1-1000 --- II 9001-10000 | 231
-## ... ... ... ... ... ... . ...
-## [74356] II 807001-808000 --- II 809001-810000 | 1038
-## [74357] II 807001-808000 --- II 810001-811000 | 1038
-## [74358] II 808001-809000 --- II 808001-809000 | 1039
-## [74359] II 808001-809000 --- II 809001-810000 | 1039
-## [74360] II 809001-810000 --- II 809001-810000 | 1040
-## bin_id2 count balanced probability predicted pvalue
-## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
-## [1] 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03
-## [2] 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05
-## [3] 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03
-## [4] 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04
-## [5] 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06
-## ... ... ... ... ... ... ...
-## [74356] 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11
-## [74357] 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02
-## [74358] 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02
-## [74359] 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13
-## [74360] 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03
-## qvalue logFoldChange
-## <numeric> <numeric>
-## [1] 0.063385760 8.08079
-## [2] 0.001926954 7.23674
-## [3] 0.150288341 6.70775
-## [4] 0.009806734 5.97810
-## [5] 0.000173165 6.43158
-## ... ... ...
-## [74356] 1.07966e-09 5.45977
-## [74357] 3.38098e-01 5.39837
-## [74358] 5.49519e-01 4.60031
-## [74359] 5.77259e-12 7.18423
-## [74360] 2.79707e-02 5.15344
+## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
+## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079
+## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674
+## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775
+## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810
+## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158
+## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ...
+## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977
+## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837
+## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031
+## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423
+## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344## -------## regions: 802 ranges and 4 metadata columns## seqinfo: 16 sequences from an unspecified genome
@@ -707,7 +670,7 @@
References
Session info
-
## ─ Session info ────────────────────────────────────────────────────────────
+
## ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.3.1 (2023-06-16)
## os Ubuntu 22.04.3 LTS
@@ -717,23 +680,23 @@
hic## `HiCExperiment` object with 471,364 contacts over 407 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -399,13 +399,13 @@
5.1.1 Balancing a raw interaction count map
Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices.
-
To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function.
+
To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified).
normalized_hic<-normalize(hic)normalized_hic## `HiCExperiment` object with 471,364 contacts over 407 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -415,19 +415,6 @@
## pairsFile: N/A ## metadata(0):
-
-
-
-
-
-
-Note
-
-
-
-
The only change done to the HiCExperiment object by the normalize function is the addition of a single extra ICE in scores list. The interactions themselves are unmodified.
-
-
It is possible to plot the different scores of the resulting object to visualize the newly computed scores. In this example, ICE scores should be nearly identical to balanced scores, which were originally imported from the disk-stored contact matrix.
@@ -449,13 +436,13 @@
5.1.2 Computing observed/expected (O/E) map
The most prominent feature of a balanced Hi-C matrix is the strong main diagonal. This main diagonal is observed because interactions between immediate adjacent genomic loci are more prone to happen than interactions spanning longer genomic distances. This “expected” behavior is due to the polymer nature of the chromosomes being studied, and can be locally estimated using the distance-dependent interaction frequency (a.k.a. the “distance law”, or P(s)). It can be used to compute an expected matrix on interactions.
When it is desirable to “mask” this polymer behavior to emphasize topological structures formed by chromosomes, one can divide a given balanced matrix by its expected matrix, i.e. calculate the observed/expected (O/E) map. This is sometimes called “detrending”, as it effectively removes the average polymer behavior from the balanced matrix.
-
The detrend function performs this operation on a given HiCExperiment object.
+
The detrend function performs this operation on a given HiCExperiment object. It adds two extra elements in scores list: expected and detrended metrics (while the interactions themselves are unmodified).
detrended_hic<-detrend(hic)detrended_hic## `HiCExperiment` object with 471,364 contacts over 407 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -465,24 +452,6 @@
## pairsFile: N/A ## metadata(0):
-
-
-
-
-
-
-Note
-
-
-
-
The only change done to the HiCExperiment object by the detrend function is the addition of two extra scores:
-
-
expected
-
detrended
-
-
The interactions themselves are unmodified.
-
-
Topological features will be visually more prominent in the O/E detrended Hi-C map.
@@ -528,7 +497,7 @@
autocorr_hic## `HiCExperiment` object with 471,364 contacts over 407 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -600,7 +569,7 @@
hic2## `HiCExperiment` object with 168,785 contacts over 150 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
diff --git a/matrix-centric_files/figure-html/unnamed-chunk-10-1.png b/matrix-centric_files/figure-html/unnamed-chunk-10-1.png
index 7039b43..e1c7347 100644
Binary files a/matrix-centric_files/figure-html/unnamed-chunk-10-1.png and b/matrix-centric_files/figure-html/unnamed-chunk-10-1.png differ
diff --git a/matrix-centric_files/figure-html/unnamed-chunk-13-1.png b/matrix-centric_files/figure-html/unnamed-chunk-13-1.png
index 8671b5c..101686a 100644
Binary files a/matrix-centric_files/figure-html/unnamed-chunk-13-1.png and b/matrix-centric_files/figure-html/unnamed-chunk-13-1.png differ
diff --git a/matrix-centric_files/figure-html/unnamed-chunk-16-1.png b/matrix-centric_files/figure-html/unnamed-chunk-16-1.png
index b4a86b1..2973d81 100644
Binary files a/matrix-centric_files/figure-html/unnamed-chunk-16-1.png and b/matrix-centric_files/figure-html/unnamed-chunk-16-1.png differ
diff --git a/parsing.html b/parsing.html
index 9a659c6..e4b307e 100644
--- a/parsing.html
+++ b/parsing.html
@@ -350,7 +350,7 @@
hic## `HiCExperiment` object with 10,801 contacts over 11 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:10,000-50,000" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 4000
@@ -366,32 +366,19 @@
interactions(hic)## GInteractions object with 45 interactions and 4 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric>
-## [1] II 12001-16000 --- II 12001-16000 | 61 61
-## [2] II 12001-16000 --- II 16001-20000 | 61 62
-## [3] II 12001-16000 --- II 20001-24000 | 61 63
-## [4] II 12001-16000 --- II 24001-28000 | 61 64
-## [5] II 12001-16000 --- II 28001-32000 | 61 65
-## ... ... ... ... ... ... . ... ...
-## [41] II 36001-40000 --- II 40001-44000 | 67 68
-## [42] II 36001-40000 --- II 44001-48000 | 67 69
-## [43] II 40001-44000 --- II 40001-44000 | 68 68
-## [44] II 40001-44000 --- II 44001-48000 | 68 69
-## [45] II 44001-48000 --- II 44001-48000 | 69 69
-## count balanced
-## <numeric> <numeric>
-## [1] 213 0.249303
-## [2] 673 0.449271
-## [3] 325 0.210001
-## [4] 137 0.125732
-## [5] 77 0.106917
-## ... ... ...
-## [41] 941 0.358860
-## [42] 275 0.114972
-## [43] 675 0.253868
-## [44] 497 0.204920
-## [45] 295 0.133344
+## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>
+## [1] II 12001-16000 --- II 12001-16000 | 61 61 213 0.249303
+## [2] II 12001-16000 --- II 16001-20000 | 61 62 673 0.449271
+## [3] II 12001-16000 --- II 20001-24000 | 61 63 325 0.210001
+## [4] II 12001-16000 --- II 24001-28000 | 61 64 137 0.125732
+## [5] II 12001-16000 --- II 28001-32000 | 61 65 77 0.106917
+## ... ... ... ... ... ... . ... ... ... ...
+## [41] II 36001-40000 --- II 40001-44000 | 67 68 941 0.358860
+## [42] II 36001-40000 --- II 44001-48000 | 67 69 275 0.114972
+## [43] II 40001-44000 --- II 40001-44000 | 68 68 675 0.253868
+## [44] II 40001-44000 --- II 44001-48000 | 68 69 497 0.204920
+## [45] II 44001-48000 --- II 44001-48000 | 69 69 295 0.133344## -------## regions: 11 ranges and 4 metadata columns## seqinfo: 16 sequences from an unspecified genome
refocus(hic, 'III')
## `HiCExperiment` object with 151,990 contacts over 159 regions
## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752"
## focus: "III"
## resolutions(5): 1000 2000 4000 8000 16000
## active resolution: 2000
@@ -647,83 +634,45 @@
telomere<-GRanges("II:700001-813184")subsetByOverlaps(hic, telomere)|>interactions()## GInteractions object with 1540 interactions and 4 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | bin_id1
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric>
-## [1] II 700001-702000 --- II 700001-702000 | 466
-## [2] II 700001-702000 --- II 702001-704000 | 466
-## [3] II 700001-702000 --- II 704001-706000 | 466
-## [4] II 700001-702000 --- II 706001-708000 | 466
-## [5] II 700001-702000 --- II 708001-710000 | 466
-## ... ... ... ... ... ... . ...
-## [1536] II 804001-806000 --- II 810001-812000 | 518
-## [1537] II 806001-808000 --- II 806001-808000 | 519
-## [1538] II 806001-808000 --- II 808001-810000 | 519
-## [1539] II 806001-808000 --- II 810001-812000 | 519
-## [1540] II 808001-810000 --- II 808001-810000 | 520
-## bin_id2 count balanced
-## <numeric> <numeric> <numeric>
-## [1] 466 30 0.0283618
-## [2] 467 145 0.0709380
-## [3] 468 124 0.0704979
-## [4] 469 59 0.0510221
-## [5] 470 59 0.0384004
-## ... ... ... ...
-## [1536] 521 1 NaN
-## [1537] 519 15 0.0560633
-## [1538] 520 25 NaN
-## [1539] 521 1 NaN
-## [1540] 520 10 NaN
+## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>
+## [1] II 700001-702000 --- II 700001-702000 | 466 466 30 0.0283618
+## [2] II 700001-702000 --- II 702001-704000 | 466 467 145 0.0709380
+## [3] II 700001-702000 --- II 704001-706000 | 466 468 124 0.0704979
+## [4] II 700001-702000 --- II 706001-708000 | 466 469 59 0.0510221
+## [5] II 700001-702000 --- II 708001-710000 | 466 470 59 0.0384004
+## ... ... ... ... ... ... . ... ... ... ...
+## [1536] II 804001-806000 --- II 810001-812000 | 518 521 1 NaN
+## [1537] II 806001-808000 --- II 806001-808000 | 519 519 15 0.0560633
+## [1538] II 806001-808000 --- II 808001-810000 | 519 520 25 NaN
+## [1539] II 806001-808000 --- II 810001-812000 | 519 521 1 NaN
+## [1540] II 808001-810000 --- II 808001-810000 | 520 520 10 NaN## -------## regions: 57 ranges and 4 metadata columns## seqinfo: 16 sequences from an unspecified genome
-
-
-
-
-
-
-type argument
-
-
-
By default, subsetByOverlaps(hic, telomere) will only recover interactions constrained within telomere, i.e. interactions for which both ends are in telomere.
Alternatively, type = "any" can be specified to get all interactions with at least one of their anchors within telomere.
subsetByOverlaps(hic, telomere, type ="any")|>interactions()## GInteractions object with 6041 interactions and 4 metadata columns:
-## seqnames1 ranges1 seqnames2 ranges2 | bin_id1
-## <Rle> <IRanges> <Rle> <IRanges> | <numeric>
-## [1] II 300001-302000 --- II 702001-704000 | 266
-## [2] II 300001-302000 --- II 704001-706000 | 266
-## [3] II 300001-302000 --- II 768001-770000 | 266
-## [4] II 300001-302000 --- II 784001-786000 | 266
-## [5] II 302001-304000 --- II 740001-742000 | 267
-## ... ... ... ... ... ... . ...
-## [6037] II 804001-806000 --- II 810001-812000 | 518
-## [6038] II 806001-808000 --- II 806001-808000 | 519
-## [6039] II 806001-808000 --- II 808001-810000 | 519
-## [6040] II 806001-808000 --- II 810001-812000 | 519
-## [6041] II 808001-810000 --- II 808001-810000 | 520
-## bin_id2 count balanced
-## <numeric> <numeric> <numeric>
-## [1] 467 1 0.000590999
-## [2] 468 1 0.000686799
-## [3] 500 1 0.000728215
-## [4] 508 1 0.000923092
-## [5] 486 1 0.000382222
-## ... ... ... ...
-## [6037] 521 1 NaN
-## [6038] 519 15 0.0560633
-## [6039] 520 25 NaN
-## [6040] 521 1 NaN
-## [6041] 520 10 NaN
+## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced
+## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>
+## [1] II 300001-302000 --- II 702001-704000 | 266 467 1 0.000590999
+## [2] II 300001-302000 --- II 704001-706000 | 266 468 1 0.000686799
+## [3] II 300001-302000 --- II 768001-770000 | 266 500 1 0.000728215
+## [4] II 300001-302000 --- II 784001-786000 | 266 508 1 0.000923092
+## [5] II 302001-304000 --- II 740001-742000 | 267 486 1 0.000382222
+## ... ... ... ... ... ... . ... ... ... ...
+## [6037] II 804001-806000 --- II 810001-812000 | 518 521 1 NaN
+## [6038] II 806001-808000 --- II 806001-808000 | 519 519 15 0.0560633
+## [6039] II 806001-808000 --- II 808001-810000 | 519 520 25 NaN
+## [6040] II 806001-808000 --- II 810001-812000 | 519 521 1 NaN
+## [6041] II 808001-810000 --- II 808001-810000 | 520 520 10 NaN## -------## regions: 257 ranges and 4 metadata columns## seqinfo: 16 sequences from an unspecified genome
-
-
3.1.2.2<HiCExperiment>["..."]
@@ -739,7 +688,7 @@
# c("II", "III", "IV") --> import contacts within and between several chromosomes
-
+
@@ -748,7 +697,7 @@
-
+
Subsetting to a specific on-diagonal genomic location using standard UCSC coordinates query:
@@ -757,7 +706,7 @@
hic["II:800001-813184"]## `HiCExperiment` object with 1,040 contacts over 6 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:800,001-813,184" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -774,7 +723,7 @@
hic["II:300001-320000|II:800001-813184"]## `HiCExperiment` object with 3 contacts over 6 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:300001-320000|II:800001-813184" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -791,7 +740,7 @@
hic["II"]## `HiCExperiment` object with 306,212 contacts over 257 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -808,7 +757,7 @@
hic["II|IV"]## `HiCExperiment` object with 0 contacts over 0 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:1-813184|IV:1-1531933" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -825,7 +774,7 @@
hic["II:300001-320000|IV:1-100000"]## `HiCExperiment` object with 0 contacts over 0 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:300001-320000|IV:1-100000" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -842,7 +791,7 @@
hic[c('II', 'III', 'IV')]## `HiCExperiment` object with 306,212 contacts over 257 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II, III, IV" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -852,19 +801,7 @@
## pairsFile: N/A ## metadata(0):
-
-
-
-
-
-
-
-
-
-Note
-
-
-
+
Some notes:
This last example (subsetting for a vector of several chromosomes) is the only scenario for which [-based in-memory subsetting of pre-imported data is the only way to go, as such subsetting is not possible with focus from disk-stored data.
All the other [ subsetting scenarii illustrated above can be achieved more efficiently using the focus argument when importing data into a HiCExperiment object.
@@ -872,6 +809,7 @@
+
3.1.3 Zooming on a HiCExperiment
@@ -880,7 +818,7 @@
hic## `HiCExperiment` object with 306,212 contacts over 257 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:300,001-813,184" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -893,7 +831,7 @@
zoom(hic, 4000)## `HiCExperiment` object with 306,212 contacts over 129 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:300,001-813,184" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 4000
@@ -906,7 +844,7 @@
zoom(hic, 1000)## `HiCExperiment` object with 306,212 contacts over 514 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:300,001-813,184" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 1000
@@ -1076,7 +1014,7 @@
hic## `HiCExperiment` object with 306,212 contacts over 257 regions ## -------
-## fileName: "/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752"
+## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" ## focus: "II:300,001-813,184" ## resolutions(5): 1000 2000 4000 8000 16000## active resolution: 2000
@@ -1086,16 +1024,6 @@
## pairsFile: N/A ## metadata(0):
-
-
-
-
-
-
-Note
-
-
-
All these objects can be used in *Overlap methods, as they all extend the GRanges class of objects.
# ---- This counts the number of times `CTCF` anchors are being used in the
@@ -1106,8 +1034,6 @@
Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts.
1.1.3 Sequencing
-
Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C.
-
-
-
-
-
-
-What is a fastq file?
-
-
-
+
Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C.
Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz.
Here is the first read listed in sample_R1.fq.gz file:
@@ -371,30 +361,18 @@
+
@@@FFFFFFHHHHIJJIJJHIIEH
-
These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher.
-
-
+
These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher.
1.2 Hi-C file formats
Two important output files are typically generated during Hi-C data pre-processing:
-
A “pairs” file
-
A binned “contact matrix” file
+
A “pairs” file;
+
A binned “contact matrix” file
We will now describe the structure of these different types of files. Directly jump to the next chapter if you want to know more about importing data from a contact matrix or a pairs file in R.
1.2.1 Pairs files
A “pairs” file (optionally, but generally filtered and sorted) is the direct output of processing Hi-C fastq files. It stores information about putative proximity contacts identified by digestion/religation, in the lossless, human-readable, indexable format: the .pairs format.
-
-
-
-
-
-
-What is a .pairs file?
-
-
-
A .pairs file is organized in a header followed by a body:
More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format.
The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for.
diff --git a/search.json b/search.json
index 62899b1..4b4ea8c 100644
--- a/search.json
+++ b/search.json
@@ -4,7 +4,7 @@
"href": "index.html",
"title": "Orchestrating Hi-C analysis with Bioconductor",
"section": "",
- "text": "Welcome\nThis is the landing page of the “Orchestrating Hi-C analysis with Bioconductor” book. The primary aim of this book is to introduce the R user to Hi-C analysis. This book starts with key concepts important for the analysis of chromatin conformation capture and then presents Bioconductor tools that can be leveraged to process, analyze, explore and visualize Hi-C data.\nAuthors: Jacques Serizay [aut, cre]Version: 1.1.0Modified: 2023-04-14Compiled: 2023-10-19Environment: R version 4.3.1 (2023-06-16), Bioconductor 3.18License: MIT + file LICENSECopyright: J. Serizay\nThis book is divided in three parts:\nPart I: Introduction to Hi-C analysis\nPart II: In-depth Hi-C analysis\nPart III: Hi-C analysis workflows"
+ "text": "Welcome\nThis is the landing page of the “Orchestrating Hi-C analysis with Bioconductor” book. The primary aim of this book is to introduce the R user to Hi-C analysis. This book starts with key concepts important for the analysis of chromatin conformation capture and then presents Bioconductor tools that can be leveraged to process, analyze, explore and visualize Hi-C data.\nAuthors: Jacques Serizay [aut, cre]Version: 1.1.0Modified: 2023-04-14Compiled: 2023-10-30Environment: R version 4.3.1 (2023-06-16), Bioconductor 3.18License: MIT + file LICENSECopyright: J. Serizay\nThis book is divided in three parts:\nPart I: Introduction to Hi-C analysis\nPart II: In-depth Hi-C analysis\nPart III: Hi-C analysis workflows"
},
{
"objectID": "index.html#general-audience",
@@ -39,7 +39,7 @@
"href": "index.html#session-info",
"title": "Orchestrating Hi-C analysis with Bioconductor",
"section": "Session info",
- "text": "Session info\n\nsessioninfo::session_info()\n## ─ Session info ────────────────────────────────────────────────────────────\n## setting value\n## version R version 4.3.1 (2023-06-16)\n## os Ubuntu 22.04.3 LTS\n## system x86_64, linux-gnu\n## ui X11\n## language (EN)\n## collate en_US.UTF-8\n## ctype en_US.UTF-8\n## tz Etc/UTC\n## date 2023-10-19\n## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)\n## \n## ─ Packages ────────────────────────────────────────────────────────────────\n## package * version date (UTC) lib source\n## abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.1)\n## AnnotationDbi 1.63.2 2023-07-02 [1] Bioconductor\n## AnnotationHub * 3.9.2 2023-08-24 [1] Bioconductor\n## basilisk 1.13.4 2023-10-04 [1] Bioconductor\n## basilisk.utils 1.13.3 2023-09-04 [1] Bioconductor\n## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1)\n## Biobase 2.61.0 2023-04-25 [1] Bioconductor\n## BiocFileCache * 2.9.1 2023-07-12 [1] Bioconductor\n## BiocGenerics * 0.47.0 2023-04-25 [1] Bioconductor\n## BiocIO 1.11.0 2023-04-25 [1] Bioconductor\n## BiocManager 1.30.22 2023-08-08 [1] CRAN (R 4.3.1)\n## BiocParallel 1.35.4 2023-08-17 [1] Bioconductor\n## BiocStyle 2.29.2 2023-09-14 [1] Bioconductor\n## BiocVersion 3.18.0 2023-04-25 [1] Bioconductor\n## Biostrings 2.69.2 2023-07-02 [1] Bioconductor\n## bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.1)\n## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1)\n## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1)\n## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1)\n## bookdown 0.36 2023-10-16 [1] CRAN (R 4.3.1)\n## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)\n## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)\n## CodeDepends 0.6.5 2018-07-17 [1] CRAN (R 4.3.1)\n## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1)\n## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)\n## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)\n## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1)\n## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1)\n## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1)\n## dbplyr * 2.3.4 2023-09-26 [1] CRAN (R 4.3.1)\n## DelayedArray 0.27.10 2023-07-28 [1] Bioconductor\n## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)\n## dir.expiry 1.9.0 2023-04-25 [1] Bioconductor\n## DNAZooData * 1.1.0 2023-04-27 [1] Bioconductor\n## dplyr 1.1.3 2023-09-03 [1] CRAN (R 4.3.1)\n## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)\n## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1)\n## ExperimentHub * 2.9.1 2023-07-12 [1] Bioconductor\n## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)\n## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)\n## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1)\n## fourDNData * 1.1.0 2023-04-27 [1] Bioconductor\n## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)\n## GenomeInfoDb 1.37.6 2023-10-02 [1] Bioconductor\n## GenomeInfoDbData 1.2.11 2023-10-19 [1] Bioconductor\n## GenomicRanges 1.53.2 2023-10-08 [1] Bioconductor\n## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1)\n## ggplot2 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)\n## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1)\n## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)\n## graph 1.79.4 2023-10-09 [1] Bioconductor\n## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1)\n## HiCExperiment * 1.1.2 2023-09-04 [1] Bioconductor\n## HiContacts * 1.3.2 2023-09-04 [1] Bioconductor\n## HiContactsData * 1.3.0 2023-04-27 [1] Bioconductor\n## HiCool * 1.1.0 2023-05-19 [1] Bioconductor\n## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1)\n## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)\n## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1)\n## httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.1)\n## httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.1)\n## InteractionSet 1.29.1 2023-06-14 [1] Bioconductor\n## interactiveDisplayBase 1.39.0 2023-04-25 [1] Bioconductor\n## IRanges 2.35.3 2023-10-12 [1] Bioconductor\n## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)\n## KEGGREST 1.41.4 2023-09-25 [1] Bioconductor\n## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1)\n## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1)\n## lattice 0.21-9 2023-10-01 [1] CRAN (R 4.3.1)\n## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.1)\n## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)\n## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)\n## Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1)\n## MatrixGenerics 1.13.1 2023-07-25 [1] Bioconductor\n## matrixStats 1.0.0 2023-06-02 [1] CRAN (R 4.3.1)\n## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)\n## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1)\n## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1)\n## OHCA * 1.1.0 2023-10-19 [1] local\n## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)\n## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)\n## plotly 4.10.2 2023-06-03 [1] CRAN (R 4.3.1)\n## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1)\n## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1)\n## purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)\n## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)\n## rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.1)\n## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)\n## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1)\n## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1)\n## rebook 1.11.1 2023-05-25 [1] Bioconductor\n## reticulate 1.34.0 2023-10-12 [1] CRAN (R 4.3.1)\n## rhdf5 2.45.1 2023-07-10 [1] Bioconductor\n## rhdf5filters 1.13.5 2023-07-19 [1] Bioconductor\n## Rhdf5lib 1.23.2 2023-09-10 [1] Bioconductor\n## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1)\n## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)\n## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)\n## rmdformats 1.0.4 2022-05-17 [1] CRAN (R 4.3.1)\n## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1)\n## RSQLite 2.3.1 2023-04-03 [1] CRAN (R 4.3.1)\n## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)\n## S4Arrays 1.1.6 2023-08-30 [1] Bioconductor\n## S4Vectors 0.39.3 2023-10-11 [1] Bioconductor\n## scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.1)\n## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)\n## shiny 1.7.5.1 2023-10-14 [1] CRAN (R 4.3.1)\n## SparseArray 1.1.12 2023-08-31 [1] Bioconductor\n## strawr 0.0.91 2023-03-29 [1] CRAN (R 4.3.1)\n## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.1)\n## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)\n## SummarizedExperiment 1.31.1 2023-05-01 [1] Bioconductor\n## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)\n## tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.1)\n## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1)\n## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1)\n## utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.1)\n## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)\n## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1)\n## viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.1)\n## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1)\n## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1)\n## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1)\n## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)\n## XVector 0.41.1 2023-05-03 [1] Bioconductor\n## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.1)\n## zlibbioc 1.47.0 2023-04-25 [1] Bioconductor\n## \n## [1] /usr/local/lib/R/site-library\n## [2] /usr/local/lib/R/library\n## \n## ───────────────────────────────────────────────────────────────────────────"
+ "text": "Session info\n\nsessioninfo::session_info()\n## ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n## setting value\n## version R version 4.3.1 (2023-06-16)\n## os Ubuntu 22.04.3 LTS\n## system x86_64, linux-gnu\n## ui X11\n## language (EN)\n## collate en_US.UTF-8\n## ctype en_US.UTF-8\n## tz Etc/UTC\n## date 2023-10-30\n## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)\n## \n## ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n## package * version date (UTC) lib source\n## abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.1)\n## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor\n## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor\n## basilisk 1.14.0 2023-10-24 [1] Bioconductor\n## basilisk.utils 1.14.0 2023-10-24 [1] Bioconductor\n## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1)\n## Biobase 2.62.0 2023-10-24 [1] Bioconductor\n## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor\n## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor\n## BiocIO 1.12.0 2023-10-24 [1] Bioconductor\n## BiocManager 1.30.22 2023-08-08 [1] CRAN (R 4.3.1)\n## BiocParallel 1.36.0 2023-10-24 [1] Bioconductor\n## BiocStyle 2.30.0 2023-10-24 [1] Bioconductor\n## BiocVersion 3.18.0 2023-04-25 [1] Bioconductor\n## Biostrings 2.70.1 2023-10-25 [1] Bioconductor\n## bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.1)\n## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1)\n## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1)\n## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1)\n## bookdown 0.36 2023-10-16 [1] CRAN (R 4.3.1)\n## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)\n## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)\n## CodeDepends 0.6.5 2018-07-17 [1] CRAN (R 4.3.1)\n## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1)\n## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)\n## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)\n## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1)\n## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1)\n## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1)\n## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1)\n## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor\n## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)\n## dir.expiry 1.10.0 2023-10-24 [1] Bioconductor\n## DNAZooData * 1.2.0 2023-10-26 [1] Bioconductor\n## dplyr 1.1.3 2023-09-03 [1] CRAN (R 4.3.1)\n## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)\n## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1)\n## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor\n## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)\n## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)\n## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1)\n## fourDNData * 1.2.0 2023-10-26 [1] Bioconductor\n## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)\n## GenomeInfoDb 1.38.0 2023-10-24 [1] Bioconductor\n## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor\n## GenomicRanges 1.54.0 2023-10-24 [1] Bioconductor\n## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1)\n## ggplot2 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)\n## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1)\n## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)\n## graph 1.80.0 2023-10-24 [1] Bioconductor\n## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1)\n## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor\n## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor\n## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor\n## HiCool * 1.2.0 2023-10-24 [1] Bioconductor\n## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1)\n## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)\n## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1)\n## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1)\n## httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.1)\n## InteractionSet 1.30.0 2023-10-24 [1] Bioconductor\n## interactiveDisplayBase 1.40.0 2023-10-24 [1] Bioconductor\n## IRanges 2.36.0 2023-10-24 [1] Bioconductor\n## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)\n## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor\n## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1)\n## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1)\n## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1)\n## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.1)\n## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)\n## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)\n## Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1)\n## MatrixGenerics 1.14.0 2023-10-24 [1] Bioconductor\n## matrixStats 1.0.0 2023-06-02 [1] CRAN (R 4.3.1)\n## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)\n## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1)\n## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1)\n## OHCA * 1.1.0 2023-10-30 [1] local\n## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)\n## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)\n## plotly 4.10.3 2023-10-21 [1] CRAN (R 4.3.1)\n## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1)\n## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1)\n## purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)\n## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)\n## rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.1)\n## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)\n## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1)\n## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1)\n## rebook 1.12.0 2023-10-24 [1] Bioconductor\n## reticulate 1.34.0 2023-10-12 [1] CRAN (R 4.3.1)\n## rhdf5 2.46.0 2023-10-24 [1] Bioconductor\n## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor\n## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor\n## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1)\n## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)\n## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)\n## rmdformats 1.0.4 2022-05-17 [1] CRAN (R 4.3.1)\n## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1)\n## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1)\n## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)\n## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor\n## S4Vectors 0.40.1 2023-10-26 [1] Bioconductor\n## scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.1)\n## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)\n## shiny 1.7.5.1 2023-10-14 [1] CRAN (R 4.3.1)\n## SparseArray 1.2.0 2023-10-24 [1] Bioconductor\n## strawr 0.0.91 2023-03-29 [1] CRAN (R 4.3.1)\n## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.1)\n## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)\n## SummarizedExperiment 1.32.0 2023-10-24 [1] Bioconductor\n## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)\n## tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.1)\n## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1)\n## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1)\n## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)\n## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)\n## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1)\n## viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.1)\n## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1)\n## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1)\n## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1)\n## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)\n## XVector 0.42.0 2023-10-24 [1] Bioconductor\n## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.1)\n## zlibbioc 1.48.0 2023-10-24 [1] Bioconductor\n## \n## [1] /usr/local/lib/R/site-library\n## [2] /usr/local/lib/R/library\n## \n## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────"
},
{
"objectID": "preamble.html",
@@ -60,21 +60,21 @@
"href": "principles.html#experimental-considerations",
"title": "\n1 Hi-C pre-processing steps\n",
"section": "\n1.1 Experimental considerations",
- "text": "1.1 Experimental considerations\n\n1.1.1 Experimental approach\nThe Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)).\nIn Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library.\n\n\n1.1.2 C variants\nA number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below).\n\nCapture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts.\n\n1.1.3 Sequencing\nHi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C.\n\n\n\n\n\n\nWhat is a fastq file?\n\n\n\nFastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz.\nHere is the first read listed in sample_R1.fq.gz file:\n\n\nsample_R1.fq.gz\n\n@SRR5399542.1.1 DH1DQQN1:393:H9GEWADXX:1:1101:1187:2211 length=24\nCAACTTCAATACCAGCAGCAGCAA\n+\nCCCFFFFFHHHHHJJJJJIJJJJJ\n\nAnd here is the first read listed in sample_R2.fq.gz file:\n\n\nsample_R2.fq.gz\n\n@SRR5399542.1.1 DH1DQQN1:393:H9GEWADXX:1:1101:1187:2211 length=24\nGCTGTTGTTGTTGTTGTATTTGCA\n+\n@@@FFFFFFHHHHIJJIJJHIIEH\n\nThese two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher."
+ "text": "1.1 Experimental considerations\n\n1.1.1 Experimental approach\nThe Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)).\nIn Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library.\n\n\n1.1.2 C variants\nA number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below).\n\nCapture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts.\n\n1.1.3 Sequencing\nHi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C.\nFastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz.\nHere is the first read listed in sample_R1.fq.gz file:\n\n\nsample_R1.fq.gz\n\n@SRR5399542.1.1 DH1DQQN1:393:H9GEWADXX:1:1101:1187:2211 length=24\nCAACTTCAATACCAGCAGCAGCAA\n+\nCCCFFFFFHHHHHJJJJJIJJJJJ\n\nAnd here is the first read listed in sample_R2.fq.gz file:\n\n\nsample_R2.fq.gz\n\n@SRR5399542.1.1 DH1DQQN1:393:H9GEWADXX:1:1101:1187:2211 length=24\nGCTGTTGTTGTTGTTGTATTTGCA\n+\n@@@FFFFFFHHHHIJJIJJHIIEH\n\nThese two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher."
},
{
"objectID": "principles.html#hi-c-file-formats",
"href": "principles.html#hi-c-file-formats",
"title": "\n1 Hi-C pre-processing steps\n",
"section": "\n1.2 Hi-C file formats",
- "text": "1.2 Hi-C file formats\nTwo important output files are typically generated during Hi-C data pre-processing:\n\nA “pairs” file\nA binned “contact matrix” file\n\nWe will now describe the structure of these different types of files. Directly jump to the next chapter if you want to know more about importing data from a contact matrix or a pairs file in R.\n\n1.2.1 Pairs files\nA “pairs” file (optionally, but generally filtered and sorted) is the direct output of processing Hi-C fastq files. It stores information about putative proximity contacts identified by digestion/religation, in the lossless, human-readable, indexable format: the .pairs format.\n\n\n\n\n\n\nWhat is a .pairs file?\n\n\n\nA .pairs file is organized in a header followed by a body:\n\n\nheader: starts with #\n\nRequired entries\n\nFirst line: ## pairs format v1.0\n\n\n#columns: column contents and ordering (e.g. #columns: readID chr1 pos1 chr2 pos2 strand1 strand2 <column_name> <column_name> ...)\n\n#chromsize: chromosome names and their size in bp, one chromosome per line, in the same order that defines ordering between mates (e.g. #chromsize: chr1 230218). Chromosome order is actually defined by this header, not by the order of pairs listed in the body!\n\n\nOptional entries with reserved header keys (sorted, shape, command, genome_assembly)\n\n\n#sorted: to indicate the sorting mechanism (e.g. #sorted: chr1-chr2-pos1-pos2, #sorted: chr1-pos1, #sorted: none)\n\n#shape: to specify whether the matrix is stored as upper triangle or lower triangle (#shape: upper triangle, #shape: lower triangle)\n\n#command: to specify any command, e.g. the command used to generate the pairs file (#command: bam2pairs mysample.bam mysample)\n\n#genome_assembly: to specify the genome assembly (e.g. #genome_assembly: hg38)\n\n\n\n\n\nbody: tab-separated columns\n\n7 reserved (4 of them required) columns:\n\nreadID, chr1, pos1, chr2, pos2, strand1, strand2\nColumns 2-5 (chr1, pos1, chr2, pos2) are required and cannot have missing values\nFor column 1, 6 & 7: missing values are annotated with a single-character dummy (.)\n\n\n2 extra reserved, optional column names:\n\n\nfrag1, frag2: restriction enzyme fragment index used by Juicer\n\n\n\nAny number of optional columns can be added\n\n\n\n\n\nsample.pairs\n\n## pairs format v1.0\n#sorted: chr1-chr2-pos1-pos2\n#shape: upper triangle\n#genome_assembly: hg38\n#chromsize: chr1 249250621\n#chromsize: chr2 243199373\n#chromsize: chr3 198022430\n...\n#columns: readID chr1 pos1 chr2 pos2 strand1 strand2\nEAS139:136:FC706VJ:2:2104:23462:197393 chr1 10000 chr1 20000 + +\nEAS139:136:FC706VJ:2:8762:23765:128766 chr1 50000 chr1 70000 + +\nEAS139:136:FC706VJ:2:2342:15343:9863 chr1 60000 chr2 10000 + +\nEAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + -\n\n\n\nMore information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format.\n\n1.2.2 Binned contact matrix files\n\n1.2.2.1 Binning pairs into a matrix\nThe action of “binning” a .pairs file into a contact matrix consists in (1) discretizing a genome reference into genomic bins, (2) attributing bins for each pair’s extremity and (3) computing the interaction frequency between any pair of genomic bins, i.e. the “contact matrix”.\nFor instance, here is a dummy .pairs file with a total of 5 pairs:\n\n\ndummy.pairs\n\n## pairs format v1.0\n#sorted: chr1-chr2-pos1-pos2\n#columns: readID chr1 pos1 chr2 pos2 strand1 strand2\n#chromsize: chr1 389\n. chr1 162 chr1 172 . . \n. chr1 180 chr1 192 . . \n. chr1 183 chr1 254 . .\n. chr1 221 chr1 273 . . \n. chr1 254 chr1 298 . . \n\nNote that this genome reference is made of a single chromosome (chr1), very short (length of 389). By binning this chromosome in 100bp-wide bins (100 bp is the resolution), one would optain the following four bins:\n\n\nbins.bed\n\n<chr> <pos> <bin>\nchr1 1 100\nchr1 101 200\nchr1 201 300\nchr1 301 389\n\nEach pair extremity can be changed to an integer indicating the position of the bin it falls in, e.g. for the left-hand extremity of the pairs file printed hereinabove (bin1):\n<chr1> <pos1> -> <bin1>\nchr1 162 -> 2\nchr1 180 -> 2\nchr1 183 -> 2\nchr1 221 -> 3\nchr1 254 -> 3\nSimilarly for the right-hand extremity of the pairs file (bin2):\n<chr2> <pos2> -> <bin2>\nchr1 172 -> chr1 2\nchr1 192 -> chr1 2\nchr1 254 -> chr1 3\nchr1 273 -> chr1 3\nchr1 298 -> chr1 3\nBy pasting side-to-side the left-hand and right-hand extremities of each pair, the .pairs file can be turned into something like:\n<bin1> <bin2>\n2 2\n2 2\n2 3\n3 3\n3 3\nAnd if we now count the number of each <bin1> <bin2> combinaison, adding a third <count> column, we end up with a count.matrix text file:\n\n\ncount.matrix\n\n<bin1> <bin2> <count>\n2 2 2\n2 3 1\n3 3 2\n\nThis count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it.\nThis “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”.\nIn this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins.\n\n1.2.2.2 Plain-text matrices: HiC-Pro style\nThe HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above.\nTogether, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files.\n.(m)cool and .hic file formats are two standards addressing these limitations.\n\n1.2.2.3 .(m)cool matrices\nThe .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called:\n\n\nbins: containing the same information than the regions.bed file;\n\npixels: containing the same information than the count.matrix (each “pixel” is a pair of 2 bins and has one or several associated scores);\n\nchroms: summarizing the order and length of the chromosomes present in a Hi-C contact matrix;\n\nindexes: allowing random access, i.e. parsing of only a subset of the data without having to read through the entire set of data.\n\n\nA single .pairs file binned at different resolutions can also be saved into a single, multi-resolution .mcool file. .mcool essentially consists of nested .cool files.\nImportantly, as an HDF5-based format, .cool files are binarized, indexed and highly-compressed. This has two major benefits:\n\nSmaller disk storage footprint\n\nRapid subsetting of the data through random access\n\n\nMoreover, parsing .cool files is possible using HDF standard APIs.\n\n1.2.2.4 .hic matrices\nThe .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016))."
+ "text": "1.2 Hi-C file formats\nTwo important output files are typically generated during Hi-C data pre-processing:\n\nA “pairs” file;\nA binned “contact matrix” file\n\nWe will now describe the structure of these different types of files. Directly jump to the next chapter if you want to know more about importing data from a contact matrix or a pairs file in R.\n\n1.2.1 Pairs files\nA “pairs” file (optionally, but generally filtered and sorted) is the direct output of processing Hi-C fastq files. It stores information about putative proximity contacts identified by digestion/religation, in the lossless, human-readable, indexable format: the .pairs format.\nA .pairs file is organized in a header followed by a body:\n\n\nheader: starts with #\n\nRequired entries\n\nFirst line: ## pairs format v1.0\n\n\n#columns: column contents and ordering (e.g. #columns: readID chr1 pos1 chr2 pos2 strand1 strand2 <column_name> <column_name> ...)\n\n#chromsize: chromosome names and their size in bp, one chromosome per line, in the same order that defines ordering between mates (e.g. #chromsize: chr1 230218). Chromosome order is actually defined by this header, not by the order of pairs listed in the body!\n\n\nOptional entries with reserved header keys (sorted, shape, command, genome_assembly)\n\n\n#sorted: to indicate the sorting mechanism (e.g. #sorted: chr1-chr2-pos1-pos2, #sorted: chr1-pos1, #sorted: none)\n\n#shape: to specify whether the matrix is stored as upper triangle or lower triangle (#shape: upper triangle, #shape: lower triangle)\n\n#command: to specify any command, e.g. the command used to generate the pairs file (#command: bam2pairs mysample.bam mysample)\n\n#genome_assembly: to specify the genome assembly (e.g. #genome_assembly: hg38)\n\n\n\n\n\nbody: tab-separated columns\n\n7 reserved (4 of them required) columns:\n\nreadID, chr1, pos1, chr2, pos2, strand1, strand2\nColumns 2-5 (chr1, pos1, chr2, pos2) are required and cannot have missing values\nFor column 1, 6 & 7: missing values are annotated with a single-character dummy (.)\n\n\n2 extra reserved, optional column names:\n\n\nfrag1, frag2: restriction enzyme fragment index used by Juicer\n\n\n\nAny number of optional columns can be added\n\n\n\n\n\nsample.pairs\n\n## pairs format v1.0\n#sorted: chr1-chr2-pos1-pos2\n#shape: upper triangle\n#genome_assembly: hg38\n#chromsize: chr1 249250621\n#chromsize: chr2 243199373\n#chromsize: chr3 198022430\n...\n#columns: readID chr1 pos1 chr2 pos2 strand1 strand2\nEAS139:136:FC706VJ:2:2104:23462:197393 chr1 10000 chr1 20000 + +\nEAS139:136:FC706VJ:2:8762:23765:128766 chr1 50000 chr1 70000 + +\nEAS139:136:FC706VJ:2:2342:15343:9863 chr1 60000 chr2 10000 + +\nEAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + -\n\nMore information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format.\n\n1.2.2 Binned contact matrix files\n\n1.2.2.1 Binning pairs into a matrix\nThe action of “binning” a .pairs file into a contact matrix consists in (1) discretizing a genome reference into genomic bins, (2) attributing bins for each pair’s extremity and (3) computing the interaction frequency between any pair of genomic bins, i.e. the “contact matrix”.\nFor instance, here is a dummy .pairs file with a total of 5 pairs:\n\n\ndummy.pairs\n\n## pairs format v1.0\n#sorted: chr1-chr2-pos1-pos2\n#columns: readID chr1 pos1 chr2 pos2 strand1 strand2\n#chromsize: chr1 389\n. chr1 162 chr1 172 . . \n. chr1 180 chr1 192 . . \n. chr1 183 chr1 254 . .\n. chr1 221 chr1 273 . . \n. chr1 254 chr1 298 . . \n\nNote that this genome reference is made of a single chromosome (chr1), very short (length of 389). By binning this chromosome in 100bp-wide bins (100 bp is the resolution), one would optain the following four bins:\n\n\nbins.bed\n\n<chr> <pos> <bin>\nchr1 1 100\nchr1 101 200\nchr1 201 300\nchr1 301 389\n\nEach pair extremity can be changed to an integer indicating the position of the bin it falls in, e.g. for the left-hand extremity of the pairs file printed hereinabove (bin1):\n<chr1> <pos1> -> <bin1>\nchr1 162 -> 2\nchr1 180 -> 2\nchr1 183 -> 2\nchr1 221 -> 3\nchr1 254 -> 3\nSimilarly for the right-hand extremity of the pairs file (bin2):\n<chr2> <pos2> -> <bin2>\nchr1 172 -> chr1 2\nchr1 192 -> chr1 2\nchr1 254 -> chr1 3\nchr1 273 -> chr1 3\nchr1 298 -> chr1 3\nBy pasting side-to-side the left-hand and right-hand extremities of each pair, the .pairs file can be turned into something like:\n<bin1> <bin2>\n2 2\n2 2\n2 3\n3 3\n3 3\nAnd if we now count the number of each <bin1> <bin2> combinaison, adding a third <count> column, we end up with a count.matrix text file:\n\n\ncount.matrix\n\n<bin1> <bin2> <count>\n2 2 2\n2 3 1\n3 3 2\n\nThis count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it.\nThis “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”.\nIn this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins.\n\n1.2.2.2 Plain-text matrices: HiC-Pro style\nThe HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above.\nTogether, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files.\n.(m)cool and .hic file formats are two standards addressing these limitations.\n\n1.2.2.3 .(m)cool matrices\nThe .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called:\n\n\nbins: containing the same information than the regions.bed file;\n\npixels: containing the same information than the count.matrix (each “pixel” is a pair of 2 bins and has one or several associated scores);\n\nchroms: summarizing the order and length of the chromosomes present in a Hi-C contact matrix;\n\nindexes: allowing random access, i.e. parsing of only a subset of the data without having to read through the entire set of data.\n\n\nA single .pairs file binned at different resolutions can also be saved into a single, multi-resolution .mcool file. .mcool essentially consists of nested .cool files.\nImportantly, as an HDF5-based format, .cool files are binarized, indexed and highly-compressed. This has two major benefits:\n\nSmaller disk storage footprint\n\nRapid subsetting of the data through random access\n\n\nMoreover, parsing .cool files is possible using HDF standard APIs.\n\n1.2.2.4 .hic matrices\nThe .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016))."
},
{
"objectID": "principles.html#pre-processing-hi-c-data",
"href": "principles.html#pre-processing-hi-c-data",
"title": "\n1 Hi-C pre-processing steps\n",
"section": "\n1.3 Pre-processing Hi-C data",
- "text": "1.3 Pre-processing Hi-C data\n\n1.3.1 Processing workflow\nFundamentally, the main steps performed to pre-process Hi-C are:\n\nSeparate read mapping\nPairs parsing\nPairs sorting\nPairs filtering\nPairs binning into a contact matrix\nNormalization of contact matrix and multi-resolution matrix generation\n\n\nIn practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)):\n\n## Note these fields have to be replaced by appropriate variables: \n## <index>\n## <input.R1.fq.gz>\n## <input.R2.fq.gz>\n## <chromsizes.txt>\n## <prefix>\nbwa mem2 -SP5M <index> <input.R1.fq.gz> <input.R2.fq.gz> \\\n | pairtools parse -c <chromsizes.txt> \\\n | pairtools sort \\\n | pairtools dedup \\\n | cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 <chromsizes.txt>:10000 - <prefix>.cool\ncooler zoomify --balance --nproc 32 --resolutions 5000N --out <prefix>.mcool <prefix>.cool\n\nSeveral pipelines have been developed to facilitate Hi-C data pre-processing. A few of them stand out from the crowd:\n\n\nnf-distiller: a combination of an aligner + pairtools + cooler\n\n\nHiC-pro (Servant et al. (2015))\n\nJuicer (Durand et al. (2016))\n\n\n\n\n\n\n\nNote\n\n\n\nFor larger genomes (> 1Gb) with more than few tens of M of reads per fastq (e.g. > 100M), we recommend pre-processing data on an HPC cluster. Aligners, pairs processing and matrix binning can greatly benefit from parallelization over multiple CPUs (Open2C et al. (2023))).\nTo scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler.\n\n\n\n1.3.2 hicstuff: lightweight Hi-C pipeline\nhicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command.\nhicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows:\n\n## Note these fields have to be replaced by appropriate variables: \n## <hicstuff-options>\n## <genome.fa>\n## <input.R1.fq.gz>\n## <input.R2.fq.gz>\nhicstuff pipeline \\\n <hicstuff-options> \\\n --genome <genome.fa> \\\n <input.R1.fq.gz> \\\n <input.R2.fq.gz> \n\nhicstuff documentation website is available here: https://hicstuff.readthedocs.io/ to read more about available options and internal processing steps.\n\n1.3.3 HiCool: hicstuff within R\nhicstuff is available as a standalone (conda install -c bioconda hicstuff it!). It is also shipped in an R package: HiCool. Thus, HiCool can process fastq files directly within an R console.\n\n1.3.3.1 Executing HiCool\nTo demonstrate this, we first fetch example .fastq files:\n\nlibrary(HiContactsData)\nr1 <- HiContactsData(sample = 'yeast_wt', format = 'fastq_R1')\nr2 <- HiContactsData(sample = 'yeast_wt', format = 'fastq_R2')\n\nWe then load the HiCool library and execute the main HiCool function.\n\nlibrary(HiCool)\n## Loading required package: HiCExperiment\n## Consider using the `HiContacts` package to perform advanced genomic operations \n## on `HiCExperiment` objects.\n## \n## Read \"Orchestrating Hi-C analysis with Bioconductor\" online book to learn more:\n## https://js2264.github.io/OHCA/\nHiCool(\n r1, \n r2, \n restriction = 'DpnII,HinfI', \n resolutions = c(4000, 8000, 16000), \n genome = 'R64-1-1', \n output = './HiCool/'\n)\n## HiCool :: Fetching bowtie genome index files from AWS iGenomes S3 bucket...\n## HiCool :: Recovering bowtie2 genome index from AWS iGenomes...\n## + /github/home/.cache/R/basilisk/1.13.4/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.13.4/HiCool/1.1.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda'\n## + /github/home/.cache/R/basilisk/1.13.4/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.13.4/HiCool/1.1.0/env' 'python=3.7.12'\n## + /github/home/.cache/R/basilisk/1.13.4/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.13.4/HiCool/1.1.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1'\n## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpIWmk55/WL4DIE]...\n## HiCool :: Mapping fastq files...\n## HiCool :: Removing unwanted chromosomes...\n## HiCool :: Parsing pairs into .cool file...\n## HiCool :: Generating multi-resolution .mcool file...\n## HiCool :: Balancing .mcool file...\n## HiCool :: Tidying up everything for you...\n## HiCool :: .fastq to .mcool processing done!\n## HiCool :: Check ./HiCool/folder to find the generated files\n## HiCool :: Generating HiCool report. This might take a while.\n## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/14976d56f7a_7833^mapped-R64-1-1^WL4DIE.html\n## HiCool :: All processing successfully achieved. Congrats!\n## CoolFile object\n## .mcool file: ./HiCool//matrices/14976d56f7a_7833^mapped-R64-1-1^WL4DIE.mcool \n## resolution: 4000 \n## pairs file: ./HiCool//pairs/14976d56f7a_7833^mapped-R64-1-1^WL4DIE.pairs \n## metadata(3): log args stats\n\n\n\n\n\n\n\nHiCool arguments\n\n\n\nSeveral arguments can be passed to HiCool and some are worth mentioning them:\n- restriction: (default: \"DpnII,HinfI\")\n- resolutions: (default: NULL, automatically inferring resolutions based on genome size)\n- iterative: (default: TRUE)\n- filter: (default: TRUE)\n- balancing_args: (default: \" --cis-only --min-nnz 3 --mad-max 7 \")\n- threads: (default: 1L)\n\n\nOther HiCool arguments can be listed by checking HiCool documentation in R: ?HiCool::HiCool.\n\n1.3.3.2 HiCool outputs\nWe can check the generated output files placed in the HiCool/ directory.\n\nfs::dir_tree('HiCool/')\n## HiCool/\n## ├── 14976d56f7a_7833^mapped-R64-1-1^WL4DIE.html\n## ├── logs\n## │ └── 14976d56f7a_7833^mapped-R64-1-1^WL4DIE.log\n## ├── matrices\n## │ └── 14976d56f7a_7833^mapped-R64-1-1^WL4DIE.mcool\n## ├── pairs\n## │ └── 14976d56f7a_7833^mapped-R64-1-1^WL4DIE.pairs\n## └── plots\n## ├── 14976d56f7a_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf\n## └── 14976d56f7a_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf\n\n\nThe *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for.\n\nThe *.html file is a report summarizing pairs numbers, filtering, etc…\nThe *.log file contains all output and error messages, as well as the full list of commands that have been executed to pre-process the input dataset.\nThe *.pdf graphic files provide a visual representation of the distribution of informative/non-informative pairs.\n\n\n\n\n\n\n\nTip\n\n\n\nAll the files generated by a single HiCool pipeline execution contain the same 6-letter unique hash to make sure they are not overwritten if re-executing the same command."
+ "text": "1.3 Pre-processing Hi-C data\n\n1.3.1 Processing workflow\nFundamentally, the main steps performed to pre-process Hi-C are:\n\nSeparate read mapping\nPairs parsing\nPairs sorting\nPairs filtering\nPairs binning into a contact matrix\nNormalization of contact matrix and multi-resolution matrix generation\n\n\nIn practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)):\n\n## Note these fields have to be replaced by appropriate variables: \n## <index>\n## <input.R1.fq.gz>\n## <input.R2.fq.gz>\n## <chromsizes.txt>\n## <prefix>\nbwa mem2 -SP5M <index> <input.R1.fq.gz> <input.R2.fq.gz> \\\n | pairtools parse -c <chromsizes.txt> \\\n | pairtools sort \\\n | pairtools dedup \\\n | cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 <chromsizes.txt>:10000 - <prefix>.cool\ncooler zoomify --balance --nproc 32 --resolutions 5000N --out <prefix>.mcool <prefix>.cool\n\nSeveral pipelines have been developed to facilitate Hi-C data pre-processing. A few of them stand out from the crowd:\n\n\nnf-distiller: a combination of an aligner + pairtools + cooler\n\n\nHiC-pro (Servant et al. (2015))\n\nJuicer (Durand et al. (2016))\n\n\n\n\n\n\n\nNote\n\n\n\nFor larger genomes (> 1Gb) with more than few tens of M of reads per fastq (e.g. > 100M), we recommend pre-processing data on an HPC cluster. Aligners, pairs processing and matrix binning can greatly benefit from parallelization over multiple CPUs (Open2C et al. (2023))).\nTo scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler.\n\n\n\n1.3.2 hicstuff: lightweight Hi-C pipeline\nhicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command.\nhicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows:\n\n## Note these fields have to be replaced by appropriate variables: \n## <hicstuff-options>\n## <genome.fa>\n## <input.R1.fq.gz>\n## <input.R2.fq.gz>\nhicstuff pipeline \\\n <hicstuff-options> \\\n --genome <genome.fa> \\\n <input.R1.fq.gz> \\\n <input.R2.fq.gz> \n\nhicstuff documentation website is available here: https://hicstuff.readthedocs.io/ to read more about available options and internal processing steps.\n\n1.3.3 HiCool: hicstuff within R\nhicstuff is available as a standalone (conda install -c bioconda hicstuff it!). It is also shipped in an R package: HiCool. Thus, HiCool can process fastq files directly within an R console.\n\n1.3.3.1 Executing HiCool\nTo demonstrate this, we first fetch example .fastq files:\n\nlibrary(HiContactsData)\nr1 <- HiContactsData(sample = 'yeast_wt', format = 'fastq_R1')\nr2 <- HiContactsData(sample = 'yeast_wt', format = 'fastq_R2')\n\nWe then load the HiCool library and execute the main HiCool function.\n\nlibrary(HiCool)\n## Loading required package: HiCExperiment\n## Consider using the `HiContacts` package to perform advanced genomic operations \n## on `HiCExperiment` objects.\n## \n## Read \"Orchestrating Hi-C analysis with Bioconductor\" online book to learn more:\n## https://js2264.github.io/OHCA/\nHiCool(\n r1, \n r2, \n restriction = 'DpnII,HinfI', \n resolutions = c(4000, 8000, 16000), \n genome = 'R64-1-1', \n output = './HiCool/'\n)\n## HiCool :: Fetching bowtie genome index files from AWS iGenomes S3 bucket...\n## HiCool :: Recovering bowtie2 genome index from AWS iGenomes...\n## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda'\n## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12'\n## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1'\n## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]...\n## HiCool :: Mapping fastq files...\n## HiCool :: Removing unwanted chromosomes...\n## HiCool :: Parsing pairs into .cool file...\n## HiCool :: Generating multi-resolution .mcool file...\n## HiCool :: Balancing .mcool file...\n## HiCool :: Tidying up everything for you...\n## HiCool :: .fastq to .mcool processing done!\n## HiCool :: Check ./HiCool/folder to find the generated files\n## HiCool :: Generating HiCool report. This might take a while.\n## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html\n## HiCool :: All processing successfully achieved. Congrats!\n## CoolFile object\n## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool \n## resolution: 4000 \n## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs \n## metadata(3): log args stats\n\n\n1.3.3.2 HiCool arguments\nSeveral arguments can be passed to HiCool and some are worth mentioning them:\n\n\nrestriction: (default: \"DpnII,HinfI\")\n\n\nresolutions: (default: NULL, automatically inferring resolutions based on genome size)\n\n\niterative: (default: TRUE)\n\n\nfilter: (default: TRUE)\n\n\nbalancing_args: (default: \" --cis-only --min-nnz 3 --mad-max 7 \")\n\n\nthreads: (default: 1L)\n\nOther HiCool arguments can be listed by checking HiCool documentation in R: ?HiCool::HiCool.\n\n1.3.3.3 HiCool outputs\nWe can check the generated output files placed in the HiCool/ directory.\n\nfs::dir_tree('HiCool/')\n## HiCool/\n## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html\n## ├── logs\n## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log\n## ├── matrices\n## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool\n## ├── pairs\n## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs\n## └── plots\n## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf\n## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf\n\n\nThe *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for.\n\nThe *.html file is a report summarizing pairs numbers, filtering, etc…\nThe *.log file contains all output and error messages, as well as the full list of commands that have been executed to pre-process the input dataset.\nThe *.pdf graphic files provide a visual representation of the distribution of informative/non-informative pairs.\n\n\n\n\n\n\n\nTip\n\n\n\nAll the files generated by a single HiCool pipeline execution contain the same 6-letter unique hash to make sure they are not overwritten if re-executing the same command."
},
{
"objectID": "principles.html#exploratory-data-analysis-of-processed-hi-c-files",
@@ -95,28 +95,28 @@
"href": "data-representation.html#granges-class",
"title": "\n2 Hi-C data structures in R\n",
"section": "\n2.1 GRanges class",
- "text": "2.1 GRanges class\nGRanges is a shorthand for GenomicRanges, a core class in Bioconductor. This class is primarily used to describe genomic ranges of any nature, e.g. sets of promoters, SNPs, chromatin loop anchors, ….\nThe data structure has been published in the seminal 2015 publication by the Bioconductor team (Huber et al. (2015)).\n\n2.1.1 GRanges fundamentals\nThe easiest way to generate a GRanges object is to coerce it from a vector of genomic coordinates in the UCSC format (e.g. \"chr2:2004-4853\"):\n\nlibrary(GenomicRanges)\ngr <- GRanges(c(\n \"chr2:2004-7853:+\", \n \"chr4:4482-9873:-\", \n \"chr5:1943-4203:+\", \n \"chr5:4103-5004:+\" \n))\ngr\n## GRanges object with 4 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 2004-7853 +\n## [2] chr4 4482-9873 -\n## [3] chr5 1943-4203 +\n## [4] chr5 4103-5004 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nA single GRanges object can contain one or several “ranges”, or genomic intervals. To navigate between these ranges, GRanges can be subset using the standard R single bracket notation [:\n\ngr[1]\n## GRanges object with 1 range and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 2004-7853 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr[1:3]\n## GRanges object with 3 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 2004-7853 +\n## [2] chr4 4482-9873 -\n## [3] chr5 1943-4203 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nGenomicRanges objects aim to provide a natural description of genomic intervals (ranges) and are incredibly versatile. They extend the data.frame object and have four required pieces of information:\n\n\nseqnames (i.e. chromosome names) (accessible with seqnames())\n\nstart (accessible with start())\n\nend (accessible with end())\n\nstrand (accessible with strand())\n\n\nseqnames(gr)\n## factor-Rle of length 4 with 3 runs\n## Lengths: 1 1 2\n## Values : chr2 chr4 chr5\n## Levels(3): chr2 chr4 chr5\n\nstart(gr)\n## [1] 2004 4482 1943 4103\n\nend(gr)\n## [1] 7853 9873 4203 5004\n\nstrand(gr)\n## factor-Rle of length 4 with 3 runs\n## Lengths: 1 1 2\n## Values : + - +\n## Levels(3): + - *\n\nHere is a graphical representation of a GRanges object, taken from Bioconductor course material:\n\nWe will now delve into the detailed structure and operability of GRanges objects.\n\n2.1.2 GRanges metadata\nAn important aspect of GRanges objects is that each entry (range) can have extra optional metadata. This metadata is stored in a rectangular DataFrame. Each column can contain a different type of information, e.g. a numerical vector, a factor, a list, …\nOne can directly access this DataFrame using the mcols() function, and individual columns of metadata using the $ notation:\n\nmcols(gr)\n## DataFrame with 4 rows and 0 columns\nmcols(gr)$GC <- c(0.45, 0.43, 0.44, 0.42)\nmcols(gr)$annotation <- factor(c(NA, 'promoter', 'enhancer', 'centromere'))\nmcols(gr)$extended.info <- c(\n list(c(NA)), \n list(c(date = 2023, source = 'manual')), \n list(c(date = 2021, source = 'manual')), \n list(c(date = 2019, source = 'homology'))\n)\nmcols(gr)\n## DataFrame with 4 rows and 3 columns\n## GC annotation extended.info\n## <numeric> <factor> <list>\n## 1 0.45 NA NA\n## 2 0.43 promoter 2023,manual\n## 3 0.44 enhancer 2021,manual\n## 4 0.42 centromere 2019,homology\n\nWhen metadata columns are defined for a GRanges object, they are pasted next to the minimal 4 required GRanges fields, separated by a | character.\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\n2.1.3 Genomic arithmetics on individual GRanges objects\nA GRanges object primarily describes a set of genomic ranges (it is in the name!). Useful genomic-oriented methods have been implemented to investigate individual GRanges object from a genomic perspective.\n\n2.1.3.1 Intra-range methods\nStandard genomic arithmetics are possible with GRanges, e.g. shifting ranges, resizing, trimming, … These methods are referred to as “intra-range” methods as they work “one-region-at-a-time”.\n\n\n\n\n\n\nNote\n\n\n\n\nEach range of the input GRanges object is modified independently from the other ranges in the following code chunks.\nIntra-range operations are endomorphisms: they all take GRanges inputs and always return GRanges objects.\n\n\n\n\nShifting each genomic range in a GRanges object by a certain number of bases:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Shift all genomic ranges towards the \"right\" (downstream in `+` strand), by 1000bp:\nshift(gr, 1000)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 3004-8853 + | 0.45 NA <NA>\n## [2] chr4 5482-10873 - | 0.43 promoter 2023,manual\n## [3] chr5 2943-5203 + | 0.44 enhancer 2021,manual\n## [4] chr5 5103-6004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Shift all genomic ranges towards the \"left\" (upstream in `+` strand), by 1000bp:\nshift(gr, -1000)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 1004-6853 + | 0.45 NA <NA>\n## [2] chr4 3482-8873 - | 0.43 promoter 2023,manual\n## [3] chr5 943-3203 + | 0.44 enhancer 2021,manual\n## [4] chr5 3103-4004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nNarrowing each genomic range in a GRanges object by a certain number of bases:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 21st-40th subrange for each range in `gr`:\nnarrow(gr, start = 21, end = 40)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2024-2043 + | 0.45 NA <NA>\n## [2] chr4 4502-4521 - | 0.43 promoter 2023,manual\n## [3] chr5 1963-1982 + | 0.44 enhancer 2021,manual\n## [4] chr5 4123-4142 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nwidth(narrow(gr, start = 21, end = 40))\n## [1] 20 20 20 20\n\n\nResizing each genomic range in a GRanges object to a certain number of bases:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 100, fixed at the start of each range:\nresize(gr, 100, fix = \"start\")\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-2103 + | 0.45 NA <NA>\n## [2] chr4 9774-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-2042 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-4202 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 100, fixed at the start of each range, disregarding strand information:\nresize(gr, 100, fix = \"start\", ignore.strand = TRUE)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-2103 + | 0.45 NA <NA>\n## [2] chr4 4482-4581 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-2042 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-4202 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 1 bp, fixed at the center of each range:\nresize(gr, 1, fix = \"center\")\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 4928 + | 0.45 NA <NA>\n## [2] chr4 7177 - | 0.43 promoter 2023,manual\n## [3] chr5 3073 + | 0.44 enhancer 2021,manual\n## [4] chr5 4553 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nExtracting flanking coordinates for each entry in gr:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 100bp UPSTREAM of each genomic range, according to range strandness:\nflank(gr, 100, start = TRUE)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 1904-2003 + | 0.45 NA <NA>\n## [2] chr4 9874-9973 - | 0.43 promoter 2023,manual\n## [3] chr5 1843-1942 + | 0.44 enhancer 2021,manual\n## [4] chr5 4003-4102 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 1bp DOWNSTREAM of each genomic range, according to range strandness:\nflank(gr, 1, start = FALSE)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 7854 + | 0.45 NA <NA>\n## [2] chr4 4481 - | 0.43 promoter 2023,manual\n## [3] chr5 4204 + | 0.44 enhancer 2021,manual\n## [4] chr5 5005 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nNote how here again, strand information is crucial and correctly leveraged to extract “upstream” or “downstream” flanking regions in agreement with genomic range orientation.\n\nSeveral arithmetics operators can also directly work with GRanges:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr + 100 # ----- Extend each side of the `GRanges` by a given number of bases\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 1904-7953 + | 0.45 NA <NA>\n## [2] chr4 4382-9973 - | 0.43 promoter 2023,manual\n## [3] chr5 1843-4303 + | 0.44 enhancer 2021,manual\n## [4] chr5 4003-5104 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr - 200 # ----- Shrink each side of the `GRanges` by a given number of bases \n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2204-7653 + | 0.45 NA <NA>\n## [2] chr4 4682-9673 - | 0.43 promoter 2023,manual\n## [3] chr5 2143-4003 + | 0.44 enhancer 2021,manual\n## [4] chr5 4303-4804 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr * 1000 # ----- Zoom in by a given factor (effectively decreasing the `GRanges` width by the same factor)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 4926-4930 + | 0.45 NA <NA>\n## [2] chr4 7175-7179 - | 0.43 promoter 2023,manual\n## [3] chr5 3072-3073 + | 0.44 enhancer 2021,manual\n## [4] chr5 4554-4553 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\n\n\n\n\n\nGoing further\n\n\n\nTo fully grasp how to operate GRanges objects, we highly recommend reading the detailed documentation for this class by typing ?GenomicRanges and ?GenomicRanges::`intra-range-methods`.\n\n\n\n2.1.3.2 Inter-range methods\nCompared to “intra-range” methods described above, inter-range methods involve comparisons between ranges in a single GRanges object.\n\n\n\n\n\n\nNote\n\n\n\nCompared to previous section, the result of each function described below depends on the entire set of ranges in the input GRanges object.\n\n\n\nComputing the “inverse” genomic ranges, i.e. ranges in-between the input ranges:\n\n\ngaps(gr)\n## GRanges object with 3 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 1-2003 +\n## [2] chr4 1-4481 -\n## [3] chr5 1-1942 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nFor each entry in a GRanges, finding the index of the preceding/following/nearest genomic range:\n\n\nprecede(gr)\n## [1] NA NA NA NA\n\nfollow(gr)\n## [1] NA NA NA NA\n\nnearest(gr)\n## [1] NA NA 4 3\n\n\nComputing a coverage over a genome, optionally indicated a “score” column from metadata:\n\n\ncoverage(gr, weight = 'GC')\n## RleList of length 3\n## $chr2\n## numeric-Rle of length 7853 with 2 runs\n## Lengths: 2003 5850\n## Values : 0.00 0.45\n## \n## $chr4\n## numeric-Rle of length 9873 with 2 runs\n## Lengths: 4481 5392\n## Values : 0.00 0.43\n## \n## $chr5\n## numeric-Rle of length 5004 with 4 runs\n## Lengths: 1942 2160 101 801\n## Values : 0.00 0.44 0.86 0.42\n\n\n\n\n\n\n\nGoing further\n\n\n\nTo fully grasp how to operate GRanges objects, we highly recommend reading the detailed documentation for this class by typing ?GenomicRanges::`inter-range-methods`.\n\n\n\n2.1.4 Comparing multiple GRanges objects\nGenomic analysis typically requires intersection of two sets of genomic ranges, e.g. to find which ranges from one set overlap with those from another set.\nIn the next examples, we will use two GRanges:\n\n\npeaks represents dummy 8 ChIP-seq peaks\n\n\npeaks <- GRanges(c(\n 'chr1:320-418',\n 'chr1:512-567',\n 'chr1:843-892',\n 'chr1:1221-1317', \n 'chr1:1329-1372', \n 'chr1:1852-1909', \n 'chr1:2489-2532', \n 'chr1:2746-2790'\n))\npeaks\n## GRanges object with 8 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 320-418 *\n## [2] chr1 512-567 *\n## [3] chr1 843-892 *\n## [4] chr1 1221-1317 *\n## [5] chr1 1329-1372 *\n## [6] chr1 1852-1909 *\n## [7] chr1 2489-2532 *\n## [8] chr1 2746-2790 *\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\nTSSs represents dummy 3 gene promoters (± 10bp around the TSS)\n\n\ngenes <- GRanges(c(\n 'chr1:358-1292:+',\n 'chr1:1324-2343:+', \n 'chr1:2732-2751:+'\n))\nTSSs <- resize(genes, width = 1, fix = 'start') + 10\nTSSs\n## GRanges object with 3 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 348-368 +\n## [2] chr1 1314-1334 +\n## [3] chr1 2722-2742 +\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nLet’s see how they overlap by plotting them:\n\nlibrary(ggplot2)\npeaks$type <- 'peaks'\nTSSs$type <- 'TSSs'\nggplot() + \n ggbio::geom_rect(c(peaks, TSSs), aes(fill = type), facets = type~.) + \n ggbio::theme_alignment() + \n coord_fixed(ratio = 300)\n## Registered S3 method overwritten by 'GGally':\n## method from \n## +.gg ggplot2\n## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.\n## ℹ Please use the `rows` argument instead.\n## ℹ The deprecated feature was likely used in the ggbio package.\n## Please report the issue at <https://github.com/lawremi/ggbio/issues>.\n## Scale for y is already present.\n## Adding another scale for y, which will replace the existing scale.\n\n\n\n\n\n\n\n\n2.1.4.1 Finding overlaps between two GRanges sets\n\nFinding overlaps between a query and a subject\n\nIn our case, we want to identify which ChIP-seq peaks overlap with a TSS: the query is the set of peaks and the subject is the set of TSSs.\nfindOverlaps returns a Hits object listing which query ranges overlap with which subject ranges.\n\nov <- findOverlaps(query = peaks, subject = TSSs)\nov\n## Hits object with 3 hits and 0 metadata columns:\n## queryHits subjectHits\n## <integer> <integer>\n## [1] 1 1\n## [2] 4 2\n## [3] 5 2\n## -------\n## queryLength: 8 / subjectLength: 3\n\nThe Hits output clearly describes what overlaps with what:\n\nThe query (peak) #1 overlaps with subject (TSS) #1\n\nThe query (peak) #5 overlaps with subject (TSS) #2\n\n\n\n\n\n\n\n\nNote\n\n\n\nBecause no other query index or subject index is listed in the ov output, none of the remaining ranges from query overlap with ranges from subject.\n\n\n\nSubsetting by overlaps between a query and a subject\n\nTo directly subset ranges from query overlapping with ranges from a subject (e.g. to only keep peaks overlapping a TSS), we can use the subsetByOverlaps function.\n\nsubsetByOverlaps(peaks, TSSs)\n## GRanges object with 3 ranges and 1 metadata column:\n## seqnames ranges strand | type\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr1 320-418 * | peaks\n## [2] chr1 1221-1317 * | peaks\n## [3] chr1 1329-1372 * | peaks\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\n\n\n\n\nNote\n\n\n\nThe output of subsetByOverlaps is a subset of the original GRanges object provided as a query, with retained ranges being unmodified.\n\n\n\nCounting overlaps between a query and a subject\n\nFinally, the countOverlaps is used to count, for each range in a query, how many ranges in the subject it overlaps with.\n\ncountOverlaps(query = peaks, subject = TSSs)\n## [1] 1 0 0 1 1 0 0 0\n\n\n\n\n\n\n\nNote\n\n\n\nNote that which GRanges goes in query or subject is crucial! Counting for each peak, the number of TSSs it overlaps with is very different from for each TSS, how many peaks it overlaps with.\nIn our case example, it would also be informative to count how many peaks overlap with each TSS, so we’d need to swap query and subject:\n\ncountOverlaps(query = TSSs, subject = peaks)\n## [1] 1 2 0\n\nWe can add these counts to the original query object:\n\nTSSs$n_peaks <- countOverlaps(query = TSSs, subject = peaks)\nTSSs\n## GRanges object with 3 ranges and 2 metadata columns:\n## seqnames ranges strand | type n_peaks\n## <Rle> <IRanges> <Rle> | <character> <integer>\n## [1] chr1 348-368 + | TSSs 1\n## [2] chr1 1314-1334 + | TSSs 2\n## [3] chr1 2722-2742 + | TSSs 0\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\n\n\n%over%, %within%, %outside% : handy operators\n\nHandy operators exist that return logical vectors (same length as the query). They essentially are short-hands for specific findOverlaps() cases.\n<query> %over% <subject>:\n\npeaks %over% TSSs\n## [1] TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE\n\npeaks[peaks %over% TSSs] # ----- Equivalent to `subsetByOverlaps(peaks, TSSs)`\n## GRanges object with 3 ranges and 1 metadata column:\n## seqnames ranges strand | type\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr1 320-418 * | peaks\n## [2] chr1 1221-1317 * | peaks\n## [3] chr1 1329-1372 * | peaks\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n<query> %within% <subject>:\n\npeaks %within% TSSs\n## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE\n\nTSSs %within% peaks\n## [1] TRUE FALSE FALSE\n\n<query> %outside% <subject>:\n\npeaks %outside% TSSs\n## [1] FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE\n\n\n\n\n\n\n\nGoing further\n\n\n\nTo fully grasp how to find overlaps between GRanges objects, we highly recommend reading the detailed documentation by typing ?IRanges::`findOverlaps-methods`.\n\n\n\n2.1.4.2 Find nearest range from a subject for each range in a query\n*Overlaps methods are not always enough to match a query to a subject. For instance, some peaks in the query might be very near to some TSSs in the subject, but not quite overlapping.\n\npeaks[8]\n## GRanges object with 1 range and 1 metadata column:\n## seqnames ranges strand | type\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr1 2746-2790 * | peaks\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nTSSs[3]\n## GRanges object with 1 range and 2 metadata columns:\n## seqnames ranges strand | type n_peaks\n## <Rle> <IRanges> <Rle> | <character> <integer>\n## [1] chr1 2722-2742 + | TSSs 0\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\nnearest()\n\nRather than finding the overlapping range in a subject for each range in a query, we can find the nearest range.\nFor each range in the query, this returns the index of the range in the subject to which the query is the nearest.\n\nnearest(peaks, TSSs)\n## [1] 1 1 2 2 2 2 3 3\n\nTSSs[nearest(peaks, TSSs)]\n## GRanges object with 8 ranges and 2 metadata columns:\n## seqnames ranges strand | type n_peaks\n## <Rle> <IRanges> <Rle> | <character> <integer>\n## [1] chr1 348-368 + | TSSs 1\n## [2] chr1 348-368 + | TSSs 1\n## [3] chr1 1314-1334 + | TSSs 2\n## [4] chr1 1314-1334 + | TSSs 2\n## [5] chr1 1314-1334 + | TSSs 2\n## [6] chr1 1314-1334 + | TSSs 2\n## [7] chr1 2722-2742 + | TSSs 0\n## [8] chr1 2722-2742 + | TSSs 0\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\ndistance()\n\nAlternatively, one can simply ask to calculate the distanceToNearest between ranges in a query and ranges in a subject.\n\ndistanceToNearest(peaks, TSSs)\n## Hits object with 8 hits and 1 metadata column:\n## queryHits subjectHits | distance\n## <integer> <integer> | <integer>\n## [1] 1 1 | 0\n## [2] 2 1 | 143\n## [3] 3 2 | 421\n## [4] 4 2 | 0\n## [5] 5 2 | 0\n## [6] 6 2 | 517\n## [7] 7 3 | 189\n## [8] 8 3 | 3\n## -------\n## queryLength: 8 / subjectLength: 3\n\npeaks$distance_to_nearest_TSS <- mcols(distanceToNearest(peaks, TSSs))$distance\n\nNote how close from a TSS the 8th peak was. It could be worth considering this as an overlap!"
+ "text": "2.1 GRanges class\nGRanges is a shorthand for GenomicRanges, a core class in Bioconductor. This class is primarily used to describe genomic ranges of any nature, e.g. sets of promoters, SNPs, chromatin loop anchors, ….\nThe data structure has been published in the seminal 2015 publication by the Bioconductor team (Huber et al. (2015)).\n\n2.1.1 GRanges fundamentals\nThe easiest way to generate a GRanges object is to coerce it from a vector of genomic coordinates in the UCSC format (e.g. \"chr2:2004-4853\"):\n\nlibrary(GenomicRanges)\ngr <- GRanges(c(\n \"chr2:2004-7853:+\", \n \"chr4:4482-9873:-\", \n \"chr5:1943-4203:+\", \n \"chr5:4103-5004:+\" \n))\ngr\n## GRanges object with 4 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 2004-7853 +\n## [2] chr4 4482-9873 -\n## [3] chr5 1943-4203 +\n## [4] chr5 4103-5004 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nA single GRanges object can contain one or several “ranges”, or genomic intervals. To navigate between these ranges, GRanges can be subset using the standard R single bracket notation [:\n\ngr[1]\n## GRanges object with 1 range and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 2004-7853 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr[1:3]\n## GRanges object with 3 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 2004-7853 +\n## [2] chr4 4482-9873 -\n## [3] chr5 1943-4203 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nGenomicRanges objects aim to provide a natural description of genomic intervals (ranges) and are incredibly versatile. They extend the data.frame object and have four required pieces of information:\n\n\nseqnames (i.e. chromosome names) (accessible with seqnames())\n\nstart (accessible with start())\n\nend (accessible with end())\n\nstrand (accessible with strand())\n\n\nseqnames(gr)\n## factor-Rle of length 4 with 3 runs\n## Lengths: 1 1 2\n## Values : chr2 chr4 chr5\n## Levels(3): chr2 chr4 chr5\n\nstart(gr)\n## [1] 2004 4482 1943 4103\n\nend(gr)\n## [1] 7853 9873 4203 5004\n\nstrand(gr)\n## factor-Rle of length 4 with 3 runs\n## Lengths: 1 1 2\n## Values : + - +\n## Levels(3): + - *\n\nHere is a graphical representation of a GRanges object, taken from Bioconductor course material:\n\nWe will now delve into the detailed structure and operability of GRanges objects.\n\n2.1.2 GRanges metadata\nAn important aspect of GRanges objects is that each entry (range) can have extra optional metadata. This metadata is stored in a rectangular DataFrame. Each column can contain a different type of information, e.g. a numerical vector, a factor, a list, …\nOne can directly access this DataFrame using the mcols() function, and individual columns of metadata using the $ notation:\n\nmcols(gr)\n## DataFrame with 4 rows and 0 columns\nmcols(gr)$GC <- c(0.45, 0.43, 0.44, 0.42)\nmcols(gr)$annotation <- factor(c(NA, 'promoter', 'enhancer', 'centromere'))\nmcols(gr)$extended.info <- c(\n list(c(NA)), \n list(c(date = 2023, source = 'manual')), \n list(c(date = 2021, source = 'manual')), \n list(c(date = 2019, source = 'homology'))\n)\nmcols(gr)\n## DataFrame with 4 rows and 3 columns\n## GC annotation extended.info\n## <numeric> <factor> <list>\n## 1 0.45 NA NA\n## 2 0.43 promoter 2023,manual\n## 3 0.44 enhancer 2021,manual\n## 4 0.42 centromere 2019,homology\n\nWhen metadata columns are defined for a GRanges object, they are pasted next to the minimal 4 required GRanges fields, separated by a | character.\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\n2.1.3 Genomic arithmetics on individual GRanges objects\nA GRanges object primarily describes a set of genomic ranges (it is in the name!). Useful genomic-oriented methods have been implemented to investigate individual GRanges object from a genomic perspective.\n\n2.1.3.1 Intra-range methods\nStandard genomic arithmetics are possible with GRanges, e.g. shifting ranges, resizing, trimming, … These methods are referred to as “intra-range” methods as they work “one-region-at-a-time”.\n\n\n\n\n\n\nNote\n\n\n\n\nEach range of the input GRanges object is modified independently from the other ranges in the following code chunks.\nIntra-range operations are endomorphisms: they all take GRanges inputs and always return GRanges objects.\n\n\n\n\nShifting each genomic range in a GRanges object by a certain number of bases:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Shift all genomic ranges towards the \"right\" (downstream in `+` strand), by 1000bp:\nshift(gr, 1000)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 3004-8853 + | 0.45 NA <NA>\n## [2] chr4 5482-10873 - | 0.43 promoter 2023,manual\n## [3] chr5 2943-5203 + | 0.44 enhancer 2021,manual\n## [4] chr5 5103-6004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Shift all genomic ranges towards the \"left\" (upstream in `+` strand), by 1000bp:\nshift(gr, -1000)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 1004-6853 + | 0.45 NA <NA>\n## [2] chr4 3482-8873 - | 0.43 promoter 2023,manual\n## [3] chr5 943-3203 + | 0.44 enhancer 2021,manual\n## [4] chr5 3103-4004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nNarrowing each genomic range in a GRanges object by a certain number of bases:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 21st-40th subrange for each range in `gr`:\nnarrow(gr, start = 21, end = 40)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2024-2043 + | 0.45 NA <NA>\n## [2] chr4 4502-4521 - | 0.43 promoter 2023,manual\n## [3] chr5 1963-1982 + | 0.44 enhancer 2021,manual\n## [4] chr5 4123-4142 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nwidth(narrow(gr, start = 21, end = 40))\n## [1] 20 20 20 20\n\n\nResizing each genomic range in a GRanges object to a certain number of bases:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 100, fixed at the start of each range:\nresize(gr, 100, fix = \"start\")\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-2103 + | 0.45 NA <NA>\n## [2] chr4 9774-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-2042 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-4202 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 100, fixed at the start of each range, disregarding strand information:\nresize(gr, 100, fix = \"start\", ignore.strand = TRUE)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-2103 + | 0.45 NA <NA>\n## [2] chr4 4482-4581 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-2042 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-4202 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Resize `gr` entries to 1 bp, fixed at the center of each range:\nresize(gr, 1, fix = \"center\")\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 4928 + | 0.45 NA <NA>\n## [2] chr4 7177 - | 0.43 promoter 2023,manual\n## [3] chr5 3073 + | 0.44 enhancer 2021,manual\n## [4] chr5 4553 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nExtracting flanking coordinates for each entry in gr:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 100bp UPSTREAM of each genomic range, according to range strandness:\nflank(gr, 100, start = TRUE)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 1904-2003 + | 0.45 NA <NA>\n## [2] chr4 9874-9973 - | 0.43 promoter 2023,manual\n## [3] chr5 1843-1942 + | 0.44 enhancer 2021,manual\n## [4] chr5 4003-4102 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n# ----- Extract 1bp DOWNSTREAM of each genomic range, according to range strandness:\nflank(gr, 1, start = FALSE)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 7854 + | 0.45 NA <NA>\n## [2] chr4 4481 - | 0.43 promoter 2023,manual\n## [3] chr5 4204 + | 0.44 enhancer 2021,manual\n## [4] chr5 5005 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\nNote how here again, strand information is crucial and correctly leveraged to extract “upstream” or “downstream” flanking regions in agreement with genomic range orientation.\n\nSeveral arithmetics operators can also directly work with GRanges:\n\n\ngr\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2004-7853 + | 0.45 NA <NA>\n## [2] chr4 4482-9873 - | 0.43 promoter 2023,manual\n## [3] chr5 1943-4203 + | 0.44 enhancer 2021,manual\n## [4] chr5 4103-5004 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr + 100 # ----- Extend each side of the `GRanges` by a given number of bases\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 1904-7953 + | 0.45 NA <NA>\n## [2] chr4 4382-9973 - | 0.43 promoter 2023,manual\n## [3] chr5 1843-4303 + | 0.44 enhancer 2021,manual\n## [4] chr5 4003-5104 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr - 200 # ----- Shrink each side of the `GRanges` by a given number of bases \n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 2204-7653 + | 0.45 NA <NA>\n## [2] chr4 4682-9673 - | 0.43 promoter 2023,manual\n## [3] chr5 2143-4003 + | 0.44 enhancer 2021,manual\n## [4] chr5 4303-4804 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\ngr * 1000 # ----- Zoom in by a given factor (effectively decreasing the `GRanges` width by the same factor)\n## GRanges object with 4 ranges and 3 metadata columns:\n## seqnames ranges strand | GC annotation extended.info\n## <Rle> <IRanges> <Rle> | <numeric> <factor> <list>\n## [1] chr2 4926-4930 + | 0.45 NA <NA>\n## [2] chr4 7175-7179 - | 0.43 promoter 2023,manual\n## [3] chr5 3072-3073 + | 0.44 enhancer 2021,manual\n## [4] chr5 4554-4553 + | 0.42 centromere 2019,homology\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\n\n\n\n\n\nGoing further\n\n\n\nTo fully grasp how to operate GRanges objects, we highly recommend reading the detailed documentation for this class by typing ?GenomicRanges and ?GenomicRanges::`intra-range-methods`.\n\n\n\n2.1.3.2 Inter-range methods\nCompared to “intra-range” methods described above, inter-range methods involve comparisons between ranges in a single GRanges object.\n\n\n\n\n\n\nNote\n\n\n\nCompared to previous section, the result of each function described below depends on the entire set of ranges in the input GRanges object.\n\n\n\nComputing the “inverse” genomic ranges, i.e. ranges in-between the input ranges:\n\n\ngaps(gr)\n## GRanges object with 3 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr2 1-2003 +\n## [2] chr4 1-4481 -\n## [3] chr5 1-1942 +\n## -------\n## seqinfo: 3 sequences from an unspecified genome; no seqlengths\n\n\nFor each entry in a GRanges, finding the index of the preceding/following/nearest genomic range:\n\n\nprecede(gr)\n## [1] NA NA NA NA\n\nfollow(gr)\n## [1] NA NA NA NA\n\nnearest(gr)\n## [1] NA NA 4 3\n\n\nComputing a coverage over a genome, optionally indicated a “score” column from metadata:\n\n\ncoverage(gr, weight = 'GC')\n## RleList of length 3\n## $chr2\n## numeric-Rle of length 7853 with 2 runs\n## Lengths: 2003 5850\n## Values : 0.00 0.45\n## \n## $chr4\n## numeric-Rle of length 9873 with 2 runs\n## Lengths: 4481 5392\n## Values : 0.00 0.43\n## \n## $chr5\n## numeric-Rle of length 5004 with 4 runs\n## Lengths: 1942 2160 101 801\n## Values : 0.00 0.44 0.86 0.42\n\n\n\n\n\n\n\nGoing further\n\n\n\nTo fully grasp how to operate GRanges objects, we highly recommend reading the detailed documentation for this class by typing ?GenomicRanges::`inter-range-methods`.\n\n\n\n2.1.4 Comparing multiple GRanges objects\nGenomic analysis typically requires intersection of two sets of genomic ranges, e.g. to find which ranges from one set overlap with those from another set.\nIn the next examples, we will use two GRanges:\n\n\npeaks represents dummy 8 ChIP-seq peaks\n\n\npeaks <- GRanges(c(\n 'chr1:320-418',\n 'chr1:512-567',\n 'chr1:843-892',\n 'chr1:1221-1317', \n 'chr1:1329-1372', \n 'chr1:1852-1909', \n 'chr1:2489-2532', \n 'chr1:2746-2790'\n))\npeaks\n## GRanges object with 8 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 320-418 *\n## [2] chr1 512-567 *\n## [3] chr1 843-892 *\n## [4] chr1 1221-1317 *\n## [5] chr1 1329-1372 *\n## [6] chr1 1852-1909 *\n## [7] chr1 2489-2532 *\n## [8] chr1 2746-2790 *\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\nTSSs represents dummy 3 gene promoters (± 10bp around the TSS)\n\n\ngenes <- GRanges(c(\n 'chr1:358-1292:+',\n 'chr1:1324-2343:+', \n 'chr1:2732-2751:+'\n))\nTSSs <- resize(genes, width = 1, fix = 'start') + 10\nTSSs\n## GRanges object with 3 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 348-368 +\n## [2] chr1 1314-1334 +\n## [3] chr1 2722-2742 +\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nLet’s see how they overlap by plotting them:\n\nlibrary(ggplot2)\npeaks$type <- 'peaks'\nTSSs$type <- 'TSSs'\nggplot() + \n ggbio::geom_rect(c(peaks, TSSs), aes(fill = type), facets = type~.) + \n ggbio::theme_alignment() + \n coord_fixed(ratio = 300)\n## Registered S3 method overwritten by 'GGally':\n## method from \n## +.gg ggplot2\n## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.\n## ℹ Please use the `rows` argument instead.\n## ℹ The deprecated feature was likely used in the ggbio package.\n## Please report the issue at <https://github.com/lawremi/ggbio/issues>.\n## Scale for y is already present.\n## Adding another scale for y, which will replace the existing scale.\n\n\n\n\n\n\n\n\n2.1.4.1 Finding overlaps between two GRanges sets\n\nFinding overlaps between a query and a subject\n\nIn our case, we want to identify which ChIP-seq peaks overlap with a TSS: the query is the set of peaks and the subject is the set of TSSs.\nfindOverlaps returns a Hits object listing which query ranges overlap with which subject ranges.\n\nov <- findOverlaps(query = peaks, subject = TSSs)\nov\n## Hits object with 3 hits and 0 metadata columns:\n## queryHits subjectHits\n## <integer> <integer>\n## [1] 1 1\n## [2] 4 2\n## [3] 5 2\n## -------\n## queryLength: 8 / subjectLength: 3\n\nThe Hits output clearly describes what overlaps with what:\n\nThe query (peak) #1 overlaps with subject (TSS) #1\n\nThe query (peak) #5 overlaps with subject (TSS) #2\n\n\n\n\n\n\n\n\nNote\n\n\n\nBecause no other query index or subject index is listed in the ov output, none of the remaining ranges from query overlap with ranges from subject.\n\n\n\nSubsetting by overlaps between a query and a subject\n\nTo directly subset ranges from query overlapping with ranges from a subject (e.g. to only keep peaks overlapping a TSS), we can use the subsetByOverlaps function. The output of subsetByOverlaps is a subset of the original GRanges object provided as a query, with retained ranges being unmodified.\n\nsubsetByOverlaps(peaks, TSSs)\n## GRanges object with 3 ranges and 1 metadata column:\n## seqnames ranges strand | type\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr1 320-418 * | peaks\n## [2] chr1 1221-1317 * | peaks\n## [3] chr1 1329-1372 * | peaks\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\nCounting overlaps between a query and a subject\n\nFinally, the countOverlaps is used to count, for each range in a query, how many ranges in the subject it overlaps with.\n\ncountOverlaps(query = peaks, subject = TSSs)\n## [1] 1 0 0 1 1 0 0 0\n\n\n\n\n\n\n\nNote\n\n\n\nNote that which GRanges goes in query or subject is crucial! Counting for each peak, the number of TSSs it overlaps with is very different from for each TSS, how many peaks it overlaps with.\nIn our case example, it would also be informative to count how many peaks overlap with each TSS, so we’d need to swap query and subject:\n\ncountOverlaps(query = TSSs, subject = peaks)\n## [1] 1 2 0\n\nWe can add these counts to the original query object:\n\nTSSs$n_peaks <- countOverlaps(query = TSSs, subject = peaks)\nTSSs\n## GRanges object with 3 ranges and 2 metadata columns:\n## seqnames ranges strand | type n_peaks\n## <Rle> <IRanges> <Rle> | <character> <integer>\n## [1] chr1 348-368 + | TSSs 1\n## [2] chr1 1314-1334 + | TSSs 2\n## [3] chr1 2722-2742 + | TSSs 0\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\n\n\n%over%, %within%, %outside% : handy operators\n\nHandy operators exist that return logical vectors (same length as the query). They essentially are short-hands for specific findOverlaps() cases.\n<query> %over% <subject>:\n\npeaks %over% TSSs\n## [1] TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE\n\npeaks[peaks %over% TSSs] # ----- Equivalent to `subsetByOverlaps(peaks, TSSs)`\n## GRanges object with 3 ranges and 1 metadata column:\n## seqnames ranges strand | type\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr1 320-418 * | peaks\n## [2] chr1 1221-1317 * | peaks\n## [3] chr1 1329-1372 * | peaks\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n<query> %within% <subject>:\n\npeaks %within% TSSs\n## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE\n\nTSSs %within% peaks\n## [1] TRUE FALSE FALSE\n\n<query> %outside% <subject>:\n\npeaks %outside% TSSs\n## [1] FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE\n\n\n\n\n\n\n\nGoing further\n\n\n\nTo fully grasp how to find overlaps between GRanges objects, we highly recommend reading the detailed documentation by typing ?IRanges::`findOverlaps-methods`.\n\n\n\n2.1.4.2 Find nearest range from a subject for each range in a query\n*Overlaps methods are not always enough to match a query to a subject. For instance, some peaks in the query might be very near to some TSSs in the subject, but not quite overlapping.\n\npeaks[8]\n## GRanges object with 1 range and 1 metadata column:\n## seqnames ranges strand | type\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr1 2746-2790 * | peaks\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nTSSs[3]\n## GRanges object with 1 range and 2 metadata columns:\n## seqnames ranges strand | type n_peaks\n## <Rle> <IRanges> <Rle> | <character> <integer>\n## [1] chr1 2722-2742 + | TSSs 0\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\nnearest()\n\nRather than finding the overlapping range in a subject for each range in a query, we can find the nearest range.\nFor each range in the query, this returns the index of the range in the subject to which the query is the nearest.\n\nnearest(peaks, TSSs)\n## [1] 1 1 2 2 2 2 3 3\n\nTSSs[nearest(peaks, TSSs)]\n## GRanges object with 8 ranges and 2 metadata columns:\n## seqnames ranges strand | type n_peaks\n## <Rle> <IRanges> <Rle> | <character> <integer>\n## [1] chr1 348-368 + | TSSs 1\n## [2] chr1 348-368 + | TSSs 1\n## [3] chr1 1314-1334 + | TSSs 2\n## [4] chr1 1314-1334 + | TSSs 2\n## [5] chr1 1314-1334 + | TSSs 2\n## [6] chr1 1314-1334 + | TSSs 2\n## [7] chr1 2722-2742 + | TSSs 0\n## [8] chr1 2722-2742 + | TSSs 0\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\ndistance()\n\nAlternatively, one can simply ask to calculate the distanceToNearest between ranges in a query and ranges in a subject.\n\ndistanceToNearest(peaks, TSSs)\n## Hits object with 8 hits and 1 metadata column:\n## queryHits subjectHits | distance\n## <integer> <integer> | <integer>\n## [1] 1 1 | 0\n## [2] 2 1 | 143\n## [3] 3 2 | 421\n## [4] 4 2 | 0\n## [5] 5 2 | 0\n## [6] 6 2 | 517\n## [7] 7 3 | 189\n## [8] 8 3 | 3\n## -------\n## queryLength: 8 / subjectLength: 3\n\npeaks$distance_to_nearest_TSS <- mcols(distanceToNearest(peaks, TSSs))$distance\n\nNote how close from a TSS the 8th peak was. It could be worth considering this as an overlap!"
},
{
"objectID": "data-representation.html#ginteractions-class",
"href": "data-representation.html#ginteractions-class",
"title": "\n2 Hi-C data structures in R\n",
"section": "\n2.2 GInteractions class",
- "text": "2.2 GInteractions class\nGRanges describe genomic ranges and hence are of general use to study 1D genome organization. To study chromatin interactions, we need a way to link pairs of GRanges. This is exactly what the GInteractions class does. This data structure is defined in the InteractionSet package and has been published in the 2016 paper by Lun et al. (Lun et al. (2016)).\n\n\n2.2.1 Building a GInteractions object from scratch\nLet’s first define two parallel GRanges objects (i.e. two GRanges of same length). Each GRanges will contain 5 ranges.\n\ngr_first <- GRanges(c(\n 'chr1:1-100', \n 'chr1:1001-2000', \n 'chr1:5001-6000', \n 'chr1:8001-9000', \n 'chr1:7001-8000' \n))\ngr_second <- GRanges(c(\n 'chr1:1-100', \n 'chr1:3001-4000', \n 'chr1:8001-9000', \n 'chr1:7001-8000', \n 'chr2:13000-14000' \n))\n\nBecause these two GRanges objects are of same length (5), one can “bind” them together by using the GInteractionsfunction. This effectively associate each entry from one GRanges to the entry aligned in the other GRanges object.\n\nlibrary(InteractionSet)\ngi <- GInteractions(gr_first, gr_second)\ngi\n## GInteractions object with 5 interactions and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 1-100 --- chr1 1-100\n## [2] chr1 1001-2000 --- chr1 3001-4000\n## [3] chr1 5001-6000 --- chr1 8001-9000\n## [4] chr1 8001-9000 --- chr1 7001-8000\n## [5] chr1 7001-8000 --- chr2 13000-14000\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nThe way GInteractions objects are printed in an R console mimics that of GRanges, but pairs two “ends” (a.k.a. anchors) of an interaction together, each end being represented as a separate GRanges range.\n\n\n\n\n\n\nNotes\n\n\n\n\nNote that it is possible to have interactions joining two identical anchors.\n\n\ngi[1]\n## GInteractions object with 1 interaction and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 1-100 --- chr1 1-100\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nIt is also technically possible (though not advised) to have interactions for which the “first” end is located after the “second” end along the chromosome.\n\n\ngi[4]\n## GInteractions object with 1 interaction and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 8001-9000 --- chr1 7001-8000\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nFinally, it is possible to define inter-chromosomal interactions (a.k.a. trans interactions).\n\n\ngi[5]\n## GInteractions object with 1 interaction and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 7001-8000 --- chr2 13000-14000\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n\n\n2.2.2 GInteractions specific slots\nCompared to GRanges, extra slots are available for GInteractions objects, e.g. anchors and regions.\n\n2.2.2.1 Anchors\n“Anchors” of a single genomic interaction refer to the two ends of this interaction. These anchors can be extracted from a GInteractions object using the anchors() function. This outputs a list of two GRanges, the first corresponding to the “left” end of interactions (when printed to the console) and the second corresponding to the “right” end of interactions (when printed to the console).\n\n# ----- This extracts the two sets of anchors (\"first\" and \"second\") from a GInteractions object\nanchors(gi)\n## $first\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 5001-6000 *\n## [4] chr1 8001-9000 *\n## [5] chr1 7001-8000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n## \n## $second\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 3001-4000 *\n## [3] chr1 8001-9000 *\n## [4] chr1 7001-8000 *\n## [5] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n# ----- We can query for the \"first\" or \"second\" set of anchors directly\nanchors(gi, \"first\")\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 5001-6000 *\n## [4] chr1 8001-9000 *\n## [5] chr1 7001-8000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nanchors(gi, \"second\")\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 3001-4000 *\n## [3] chr1 8001-9000 *\n## [4] chr1 7001-8000 *\n## [5] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.2.2 Regions\n“Regions” of a set of interactions refer to the universe of unique anchors represented in a set of interactions. Therefore, the length of the regions can only be equal to or strictly lower than twice the length of anchors.\nThe regions function returns the regions associated with a GInteractions object, stored as a GRanges object.\n\nregions(gi)\n## GRanges object with 7 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 3001-4000 *\n## [4] chr1 5001-6000 *\n## [5] chr1 7001-8000 *\n## [6] chr1 8001-9000 *\n## [7] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nlength(regions(gi))\n## [1] 7\n\nlength(anchors(gi, \"first\"))\n## [1] 5\n\n\n2.2.3 GInteractions methods\nGInteractions behave as an extension of GRanges. For this reason, many methods that work with GRanges will work seamlessly with GInteractions.\n\n2.2.3.1 Metadata\nOne can add metadata columns directly to a GInteractions object.\n\nmcols(gi)\n## DataFrame with 5 rows and 0 columns\nmcols(gi) <- data.frame(\n idx = seq(1, length(gi)),\n type = c(\"cis\", \"cis\", \"cis\", \"trans\", \"cis\")\n)\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\ngi$type\n## [1] \"cis\" \"cis\" \"cis\" \"trans\" \"cis\"\n\nImportantly, metadata columns can also be directly added to regions of a GInteractions object, since these regions are a GRanges object themselves!\n\nregions(gi)\n## GRanges object with 7 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 3001-4000 *\n## [4] chr1 5001-6000 *\n## [5] chr1 7001-8000 *\n## [6] chr1 8001-9000 *\n## [7] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\nregions(gi)$binID <- seq_along(regions(gi))\nregions(gi)$type <- c(\"P\", \"P\", \"P\", \"E\", \"E\", \"P\", \"P\")\nregions(gi)\n## GRanges object with 7 ranges and 2 metadata columns:\n## seqnames ranges strand | binID type\n## <Rle> <IRanges> <Rle> | <integer> <character>\n## [1] chr1 1-100 * | 1 P\n## [2] chr1 1001-2000 * | 2 P\n## [3] chr1 3001-4000 * | 3 P\n## [4] chr1 5001-6000 * | 4 E\n## [5] chr1 7001-8000 * | 5 E\n## [6] chr1 8001-9000 * | 6 P\n## [7] chr2 13000-14000 * | 7 P\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.3.2 Sorting GInteractions\n\nThe sort function works seamlessly with GInteractions objects. It sorts the interactions using a similar approach to that performed by pairtools sort ... for disk-stored .pairs files, sorting on the “first” anchor first, then for interactions with the same “first” anchors, sorting on the “second” anchor.\n\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nsort(gi)\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## [5] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.3.3 Swapping GInteractions anchors\nFor an individual interaction contained in a GInteractions object, the “first” and “second” anchors themselves can be sorted as well. This is called “pairs swapping”, and it is performed similarly to pairtools flip ... for disk-stored .pairs files. This ensures that interactions, when represented as a contact matrix, generate an upper-triangular matrix.\n\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nswapAnchors(gi)\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 7001-8000 --- chr1 8001-9000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n\n\n\n\n\nNote\n\n\n\n“Sorting” and “swapping” a GInteractions object are two entirely different actions:\n\n“sorting” reorganizes all rows (interactions);\n“swapping” anchors reorganizes “first” and “second” anchors for each interaction independently.\n\n\n\n\n2.2.3.4 GInteractions distance method\n“Distance”, when applied to genomic interactions, typically refers to the genomic distance between the two anchors of a single interaction. For GInteractions, this is computed using the pairdist function.\n\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\npairdist(gi)\n## [1] 0 2000 3000 1000 NA\n\nNote that for “trans” inter-chromosomal interactions, i.e. interactions with anchors on different chromosomes, the notion of genomic distance is meaningless and for this reason, pairdist returns a NA value.\n\n\n\n\n\n\nAdvanced pairdist arguments\n\n\n\nThe type argument can be tweaked to specify which type of “distance” should be computed:\n\n\nmid: The distance between the midpoints of the two regions (rounded down to the nearest integer) is returned (Default).\n\ngap: The length of the gap between the closest points of the two regions is computed - negative lengths are returned for overlapping regions, indicating the length of the overlap.\n\nspan: The distance between the furthermost points of the two regions is computed.\n\ndiag: The difference between the anchor indices is returned. This corresponds to a diagonal on the interaction space when bins are used in the ‘regions’ slot of ‘x’.\n\n\n\n\n2.2.3.5 GInteractions overlap methods\n“Overlaps” for genomic interactions could be computed in different contexts:\n\nCase 1: Overlap between any of the two anchors of an interaction with a genomic range\nCase 2: Overlap between anchors of an interaction with anchors of another interaction\nCase 3: Spanning of the interaction “across” a genomic range\n\n\nCase 1: Overlap between any of the two anchors of an interaction with a genomic range\n\nThis is the default behavior of findOverlaps when providing a GInteractions object as query and a GRanges as a subject.\n\ngr <- GRanges(c(\"chr1:7501-7600\", \"chr1:8501-8600\"))\nfindOverlaps(query = gi, subject = gr)\n## Hits object with 4 hits and 0 metadata columns:\n## queryHits subjectHits\n## <integer> <integer>\n## [1] 3 2\n## [2] 4 1\n## [3] 4 2\n## [4] 5 1\n## -------\n## queryLength: 5 / subjectLength: 2\n\ncountOverlaps(gi, gr)\n## [1] 0 0 1 2 1\n\nsubsetByOverlaps(gi, gr)\n## GInteractions object with 3 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [2] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [3] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nHere again, the order matters!\n\ncountOverlaps(gr, gi)\n## [1] 2 2\n\nAnd again, the %over% operator can be used here:\n\ngi %over% gr\n## [1] FALSE FALSE TRUE TRUE TRUE\n\ngi[gi %over% gr] # ----- Equivalent to `subsetByOverlaps(gi, gr)`\n## GInteractions object with 3 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [2] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [3] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nCase 2: Overlap between anchors of an interaction with anchors of another interaction\n\nThis slightly different scenario involves overlapping two sets of interactions, to see whether any interaction in Set-1 has its two anchors overlapping anchors from an interaction in Set-2.\n\ngi2 <- GInteractions(\n GRanges(\"chr1:1081-1090\"), \n GRanges(\"chr1:3401-3501\")\n)\ngi %over% gi2\n## [1] FALSE TRUE FALSE FALSE FALSE\n\nNote that both anchors of an interaction from a query have to overlap to a pair of anchors of a single interaction from a subject with this method!\n\ngi3 <- GInteractions(\n GRanges(\"chr1:1-1000\"), \n GRanges(\"chr1:3401-3501\")\n)\ngi %over% gi3\n## [1] FALSE FALSE FALSE FALSE FALSE\n\n\nCase 3 : Spanning of the interaction “accross” a genomic range\n\nThis requires a bit of wrangling, to mimic an overlap between two GRanges objects:\n\ngi <- swapAnchors(gi) # ----- Make sure anchors are correctly sorted\ngi <- sort(gi) # ----- Make sure interactions are correctly sorted\ngi <- gi[!is.na(pairdist(gi))] # ----- Remove inter-chromosomal interactions\nspanning_gi <- GRanges(\n seqnames = seqnames(anchors(gi)[[1]]), \n ranges = IRanges(\n start(anchors(gi)[[1]]), \n end(anchors(gi)[[2]])\n )\n)\nspanning_gi \n## GRanges object with 4 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-4000 *\n## [3] chr1 5001-9000 *\n## [4] chr1 7001-9000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nspanning_gi %over% gr\n## [1] FALSE FALSE TRUE TRUE\n\n\n\n\n\n\n\nGoing further\n\n\n\nA detailed manual of overlap methods available for GInteractions object can be read by typing ?`Interaction-overlaps` in R."
+ "text": "2.2 GInteractions class\nGRanges describe genomic ranges and hence are of general use to study 1D genome organization. To study chromatin interactions, we need a way to link pairs of GRanges. This is exactly what the GInteractions class does. This data structure is defined in the InteractionSet package and has been published in the 2016 paper by Lun et al. (Lun et al. (2016)).\n\n\n2.2.1 Building a GInteractions object from scratch\nLet’s first define two parallel GRanges objects (i.e. two GRanges of same length). Each GRanges will contain 5 ranges.\n\ngr_first <- GRanges(c(\n 'chr1:1-100', \n 'chr1:1001-2000', \n 'chr1:5001-6000', \n 'chr1:8001-9000', \n 'chr1:7001-8000' \n))\ngr_second <- GRanges(c(\n 'chr1:1-100', \n 'chr1:3001-4000', \n 'chr1:8001-9000', \n 'chr1:7001-8000', \n 'chr2:13000-14000' \n))\n\nBecause these two GRanges objects are of same length (5), one can “bind” them together by using the GInteractionsfunction. This effectively associate each entry from one GRanges to the entry aligned in the other GRanges object.\n\nlibrary(InteractionSet)\ngi <- GInteractions(gr_first, gr_second)\ngi\n## GInteractions object with 5 interactions and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 1-100 --- chr1 1-100\n## [2] chr1 1001-2000 --- chr1 3001-4000\n## [3] chr1 5001-6000 --- chr1 8001-9000\n## [4] chr1 8001-9000 --- chr1 7001-8000\n## [5] chr1 7001-8000 --- chr2 13000-14000\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nThe way GInteractions objects are printed in an R console mimics that of GRanges, but pairs two “ends” (a.k.a. anchors) of an interaction together, each end being represented as a separate GRanges range.\n\nNote that it is possible to have interactions joining two identical anchors.\n\n\ngi[1]\n## GInteractions object with 1 interaction and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 1-100 --- chr1 1-100\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nIt is also technically possible (though not advised) to have interactions for which the “first” end is located after the “second” end along the chromosome.\n\n\ngi[4]\n## GInteractions object with 1 interaction and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 8001-9000 --- chr1 7001-8000\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nFinally, it is possible to define inter-chromosomal interactions (a.k.a. trans interactions).\n\n\ngi[5]\n## GInteractions object with 1 interaction and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] chr1 7001-8000 --- chr2 13000-14000\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.2 GInteractions specific slots\nCompared to GRanges, extra slots are available for GInteractions objects, e.g. anchors and regions.\n\n2.2.2.1 Anchors\n“Anchors” of a single genomic interaction refer to the two ends of this interaction. These anchors can be extracted from a GInteractions object using the anchors() function. This outputs a list of two GRanges, the first corresponding to the “left” end of interactions (when printed to the console) and the second corresponding to the “right” end of interactions (when printed to the console).\n\n# ----- This extracts the two sets of anchors (\"first\" and \"second\") from a GInteractions object\nanchors(gi)\n## $first\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 5001-6000 *\n## [4] chr1 8001-9000 *\n## [5] chr1 7001-8000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n## \n## $second\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 3001-4000 *\n## [3] chr1 8001-9000 *\n## [4] chr1 7001-8000 *\n## [5] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n# ----- We can query for the \"first\" or \"second\" set of anchors directly\nanchors(gi, \"first\")\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 5001-6000 *\n## [4] chr1 8001-9000 *\n## [5] chr1 7001-8000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nanchors(gi, \"second\")\n## GRanges object with 5 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 3001-4000 *\n## [3] chr1 8001-9000 *\n## [4] chr1 7001-8000 *\n## [5] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.2.2 Regions\n“Regions” of a set of interactions refer to the universe of unique anchors represented in a set of interactions. Therefore, the length of the regions can only be equal to or strictly lower than twice the length of anchors.\nThe regions function returns the regions associated with a GInteractions object, stored as a GRanges object.\n\nregions(gi)\n## GRanges object with 7 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 3001-4000 *\n## [4] chr1 5001-6000 *\n## [5] chr1 7001-8000 *\n## [6] chr1 8001-9000 *\n## [7] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nlength(regions(gi))\n## [1] 7\n\nlength(anchors(gi, \"first\"))\n## [1] 5\n\n\n2.2.3 GInteractions methods\nGInteractions behave as an extension of GRanges. For this reason, many methods that work with GRanges will work seamlessly with GInteractions.\n\n2.2.3.1 Metadata\nOne can add metadata columns directly to a GInteractions object.\n\nmcols(gi)\n## DataFrame with 5 rows and 0 columns\nmcols(gi) <- data.frame(\n idx = seq(1, length(gi)),\n type = c(\"cis\", \"cis\", \"cis\", \"trans\", \"cis\")\n)\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 0 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\ngi$type\n## [1] \"cis\" \"cis\" \"cis\" \"trans\" \"cis\"\n\nImportantly, metadata columns can also be directly added to regions of a GInteractions object, since these regions are a GRanges object themselves!\n\nregions(gi)\n## GRanges object with 7 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-2000 *\n## [3] chr1 3001-4000 *\n## [4] chr1 5001-6000 *\n## [5] chr1 7001-8000 *\n## [6] chr1 8001-9000 *\n## [7] chr2 13000-14000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\nregions(gi)$binID <- seq_along(regions(gi))\nregions(gi)$type <- c(\"P\", \"P\", \"P\", \"E\", \"E\", \"P\", \"P\")\nregions(gi)\n## GRanges object with 7 ranges and 2 metadata columns:\n## seqnames ranges strand | binID type\n## <Rle> <IRanges> <Rle> | <integer> <character>\n## [1] chr1 1-100 * | 1 P\n## [2] chr1 1001-2000 * | 2 P\n## [3] chr1 3001-4000 * | 3 P\n## [4] chr1 5001-6000 * | 4 E\n## [5] chr1 7001-8000 * | 5 E\n## [6] chr1 8001-9000 * | 6 P\n## [7] chr2 13000-14000 * | 7 P\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.3.2 Sorting GInteractions\n\nThe sort function works seamlessly with GInteractions objects. It sorts the interactions using a similar approach to that performed by pairtools sort ... for disk-stored .pairs files, sorting on the “first” anchor first, then for interactions with the same “first” anchors, sorting on the “second” anchor.\n\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nsort(gi)\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## [5] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n2.2.3.3 Swapping GInteractions anchors\nFor an individual interaction contained in a GInteractions object, the “first” and “second” anchors themselves can be sorted as well. This is called “pairs swapping”, and it is performed similarly to pairtools flip ... for disk-stored .pairs files. This ensures that interactions, when represented as a contact matrix, generate an upper-triangular matrix.\n\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nswapAnchors(gi)\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 7001-8000 --- chr1 8001-9000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\n\n\n\n\n\nNote\n\n\n\n“Sorting” and “swapping” a GInteractions object are two entirely different actions:\n\n“sorting” reorganizes all rows (interactions);\n“swapping” anchors reorganizes “first” and “second” anchors for each interaction independently.\n\n\n\n\n2.2.3.4 GInteractions distance method\n“Distance”, when applied to genomic interactions, typically refers to the genomic distance between the two anchors of a single interaction. For GInteractions, this is computed using the pairdist function.\n\ngi\n## GInteractions object with 5 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 1-100 --- chr1 1-100 | 1 cis\n## [2] chr1 1001-2000 --- chr1 3001-4000 | 2 cis\n## [3] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [4] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [5] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\npairdist(gi)\n## [1] 0 2000 3000 1000 NA\n\nNote that for “trans” inter-chromosomal interactions, i.e. interactions with anchors on different chromosomes, the notion of genomic distance is meaningless and for this reason, pairdist returns a NA value.\nThe type argument of the pairdist() function can be tweaked to specify which type of “distance” should be computed:\n\n\nmid: The distance between the midpoints of the two regions (rounded down to the nearest integer) is returned (Default).\n\ngap: The length of the gap between the closest points of the two regions is computed - negative lengths are returned for overlapping regions, indicating the length of the overlap.\n\nspan: The distance between the furthermost points of the two regions is computed.\n\ndiag: The difference between the anchor indices is returned. This corresponds to a diagonal on the interaction space when bins are used in the ‘regions’ slot of ‘x’.\n\n2.2.3.5 GInteractions overlap methods\n“Overlaps” for genomic interactions could be computed in different contexts:\n\nCase 1: Overlap between any of the two anchors of an interaction with a genomic range\nCase 2: Overlap between anchors of an interaction with anchors of another interaction\nCase 3: Spanning of the interaction “across” a genomic range\n\n\nCase 1: Overlap between any of the two anchors of an interaction with a genomic range\n\nThis is the default behavior of findOverlaps when providing a GInteractions object as query and a GRanges as a subject.\n\ngr <- GRanges(c(\"chr1:7501-7600\", \"chr1:8501-8600\"))\nfindOverlaps(query = gi, subject = gr)\n## Hits object with 4 hits and 0 metadata columns:\n## queryHits subjectHits\n## <integer> <integer>\n## [1] 3 2\n## [2] 4 1\n## [3] 4 2\n## [4] 5 1\n## -------\n## queryLength: 5 / subjectLength: 2\n\ncountOverlaps(gi, gr)\n## [1] 0 0 1 2 1\n\nsubsetByOverlaps(gi, gr)\n## GInteractions object with 3 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [2] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [3] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nHere again, the order matters!\n\ncountOverlaps(gr, gi)\n## [1] 2 2\n\nAnd again, the %over% operator can be used here:\n\ngi %over% gr\n## [1] FALSE FALSE TRUE TRUE TRUE\n\ngi[gi %over% gr] # ----- Equivalent to `subsetByOverlaps(gi, gr)`\n## GInteractions object with 3 interactions and 2 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | idx type\n## <Rle> <IRanges> <Rle> <IRanges> | <integer> <character>\n## [1] chr1 5001-6000 --- chr1 8001-9000 | 3 cis\n## [2] chr1 8001-9000 --- chr1 7001-8000 | 4 trans\n## [3] chr1 7001-8000 --- chr2 13000-14000 | 5 cis\n## -------\n## regions: 7 ranges and 2 metadata columns\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\n\nCase 2: Overlap between anchors of an interaction with anchors of another interaction\n\nThis slightly different scenario involves overlapping two sets of interactions, to see whether any interaction in Set-1 has its two anchors overlapping anchors from an interaction in Set-2.\n\ngi2 <- GInteractions(\n GRanges(\"chr1:1081-1090\"), \n GRanges(\"chr1:3401-3501\")\n)\ngi %over% gi2\n## [1] FALSE TRUE FALSE FALSE FALSE\n\nNote that both anchors of an interaction from a query have to overlap to a pair of anchors of a single interaction from a subject with this method!\n\ngi3 <- GInteractions(\n GRanges(\"chr1:1-1000\"), \n GRanges(\"chr1:3401-3501\")\n)\ngi %over% gi3\n## [1] FALSE FALSE FALSE FALSE FALSE\n\n\nCase 3 : Spanning of the interaction “accross” a genomic range\n\nThis requires a bit of wrangling, to mimic an overlap between two GRanges objects:\n\ngi <- swapAnchors(gi) # ----- Make sure anchors are correctly sorted\ngi <- sort(gi) # ----- Make sure interactions are correctly sorted\ngi <- gi[!is.na(pairdist(gi))] # ----- Remove inter-chromosomal interactions\nspanning_gi <- GRanges(\n seqnames = seqnames(anchors(gi)[[1]]), \n ranges = IRanges(\n start(anchors(gi)[[1]]), \n end(anchors(gi)[[2]])\n )\n)\nspanning_gi \n## GRanges object with 4 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] chr1 1-100 *\n## [2] chr1 1001-4000 *\n## [3] chr1 5001-9000 *\n## [4] chr1 7001-9000 *\n## -------\n## seqinfo: 2 sequences from an unspecified genome; no seqlengths\n\nspanning_gi %over% gr\n## [1] FALSE FALSE TRUE TRUE\n\n\n\n\n\n\n\nGoing further\n\n\n\nA detailed manual of overlap methods available for GInteractions object can be read by typing ?`Interaction-overlaps` in R."
},
{
"objectID": "data-representation.html#contactfile-class",
"href": "data-representation.html#contactfile-class",
"title": "\n2 Hi-C data structures in R\n",
"section": "\n2.3 ContactFile class",
- "text": "2.3 ContactFile class\nHi-C contacts can be stored in four different formats (see previous chapter):\n\nAs a .(m)cool matrix (multi-scores, multi-resolution, indexed)\nAs a .hic matrix (multi-scores, multi-resolution, indexed)\nAs a HiC-pro derived matrix (single-score, single-resolution, non-indexed)\nUnbinned, Hi-C contacts can be stored in .pairs files\n\n\n2.3.1 Accessing example Hi-C files\nExample contact files can be downloaded using HiContactsData function.\n\nlibrary(HiContactsData)\ncoolf <- HiContactsData('yeast_wt', 'mcool')\n\nThis fetches files from the cloud, download them locally and returns the path of the local file.\n\ncoolf\n## EH7702 \n## \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\"\n\nSimilarly, example files are available for other file formats:\n\nhicf <- HiContactsData('yeast_wt', 'hic')\nhicpromatrixf <- HiContactsData('yeast_wt', 'hicpro_matrix')\nhicproregionsf <- HiContactsData('yeast_wt', 'hicpro_bed')\npairsf <- HiContactsData('yeast_wt', 'pairs.gz')\n\nWe can even check the content of some of these files to make sure they are actually what they are:\n\n# ---- HiC-Pro generates a tab-separated `regions.bed` file\nreadLines(hicproregionsf, 25)\n## [1] \"I\\t0\\t1000\" \"I\\t1000\\t2000\" \"I\\t2000\\t3000\" \"I\\t3000\\t4000\" \n## [5] \"I\\t4000\\t5000\" \"I\\t5000\\t6000\" \"I\\t6000\\t7000\" \"I\\t7000\\t8000\" \n## [9] \"I\\t8000\\t9000\" \"I\\t9000\\t10000\" \"I\\t10000\\t11000\" \"I\\t11000\\t12000\"\n## [13] \"I\\t12000\\t13000\" \"I\\t13000\\t14000\" \"I\\t14000\\t15000\" \"I\\t15000\\t16000\"\n## [17] \"I\\t16000\\t17000\" \"I\\t17000\\t18000\" \"I\\t18000\\t19000\" \"I\\t19000\\t20000\"\n## [21] \"I\\t20000\\t21000\" \"I\\t21000\\t22000\" \"I\\t22000\\t23000\" \"I\\t23000\\t24000\"\n## [25] \"I\\t24000\\t25000\"\n\n# ---- Pairs are also tab-separated \nreadLines(pairsf, 25)\n## [1] \"## pairs format v1.0\" \n## [2] \"#sorted: chr1-pos1-chr2-pos2\" \n## [3] \"#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2\" \n## [4] \"#chromsize: I 230218\" \n## [5] \"#chromsize: II 813184\" \n## [6] \"#chromsize: III 316620\" \n## [7] \"#chromsize: IV 1531933\" \n## [8] \"#chromsize: V 576874\" \n## [9] \"#chromsize: VI 270161\" \n## [10] \"#chromsize: VII 1090940\" \n## [11] \"#chromsize: VIII 562643\" \n## [12] \"#chromsize: IX 439888\" \n## [13] \"#chromsize: X 745751\" \n## [14] \"#chromsize: XI 666816\" \n## [15] \"#chromsize: XII 1078177\" \n## [16] \"#chromsize: XIII 924431\" \n## [17] \"#chromsize: XIV 784333\" \n## [18] \"#chromsize: XV 1091291\" \n## [19] \"#chromsize: XVI 948066\" \n## [20] \"#chromsize: Mito 85779\" \n## [21] \"NS500150:527:HHGYNBGXF:3:21611:19085:3986\\tII\\t105\\tII\\t48548\\t+\\t-\\t1358\\t1681\" \n## [22] \"NS500150:527:HHGYNBGXF:4:13604:19734:2406\\tII\\t113\\tII\\t45003\\t-\\t+\\t1358\\t1658\" \n## [23] \"NS500150:527:HHGYNBGXF:2:11108:25178:11036\\tII\\t119\\tII\\t687251\\t-\\t+\\t1358\\t5550\"\n## [24] \"NS500150:527:HHGYNBGXF:1:22301:8468:1586\\tII\\t160\\tII\\t26124\\t+\\t-\\t1358\\t1510\" \n## [25] \"NS500150:527:HHGYNBGXF:4:23606:24037:2076\\tII\\t169\\tII\\t39052\\t+\\t+\\t1358\\t1613\"\n\n\n2.3.2 ContactFile fundamentals\nA ContactFile object establishes a connection with a disk-stored Hi-C file (e.g. a .cool file, or a .pairs file, …). ContactFile classes are defined in the HiCExperiment package.\nContactFiles come in four different flavors:\n\n\nCoolFile: connection to a .(m)cool file\n\nHicFile: connection to a .hic file\n\nHicproFile: connection to output files generated by HiC-Pro\n\nPairsFile: connection to a .pairs file\n\nTo create each flavor of ContactFile, one can use the corresponding function:\n\nlibrary(HiCExperiment)\n\n# ----- This creates a connection to a `.(m)cool` file (path stored in `coolf`)\nCoolFile(coolf)\n## CoolFile object\n## .mcool file: /github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\n# ----- This creates a connection to a `.hic` file (path stored in `hicf`)\nHicFile(hicf)\n## HicFile object\n## .hic file: /github/home/.cache/R/ExperimentHub/1a9a270f71fe_7836 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\n# ----- This creates a connection to output files from HiC-Pro\nHicproFile(hicpromatrixf, hicproregionsf)\n## HicproFile object\n## HiC-Pro files:\n## $ matrix: /github/home/.cache/R/ExperimentHub/1a9a6531ab2c_7837 \n## $ regions: /github/home/.cache/R/ExperimentHub/1a9a3c1fca84_7838 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\n# ----- This creates a connection to a pairs file\nPairsFile(pairsf)\n## PairsFile object\n## resource: /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753\n\n\n2.3.3 ContactFile slots\nSeveral “slots” (i.e. pieces of information) are attached to a ContactFile object:\n\nThe path to the disk-stored contact matrix;\nThe active resolution (by default, the finest resolution available in a multi-resolution contact matrix);\nOptionally, the path to a matching pairs file (see below);\nSome metadata.\n\nSlots of a CoolFile object can be accessed as follow:\n\ncf <- CoolFile(coolf)\ncf\n## CoolFile object\n## .mcool file: /github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\nresolution(cf)\n## [1] 1000\n\npairsFile(cf)\n## NULL\n\nmetadata(cf)\n## list()\n\n\n\n\n\n\n\nImportant!\n\n\n\nContactFile objects are only connections to a disk-stored HiC file. Although metadata is available, they do not contain actual data!\n\n\n\n2.3.4 ContactFile methods\nTwo useful methods are available for ContactFiles:\n\n\navailableResolutions checks which resolutions are available in a ContactFile.\n\n\navailableResolutions(cf)\n## resolutions(5): 1000 2000 4000 8000 16000\n## \n\n\n\navailableChromosomes checks which chromosomes are available in a ContactFile, along with their length.\n\n\navailableChromosomes(cf)\n## Seqinfo object with 16 sequences from an unspecified genome:\n## seqnames seqlengths isCircular genome\n## I 230218 <NA> <NA>\n## II 813184 <NA> <NA>\n## III 316620 <NA> <NA>\n## IV 1531933 <NA> <NA>\n## V 576874 <NA> <NA>\n## ... ... ... ...\n## XII 1078177 <NA> <NA>\n## XIII 924431 <NA> <NA>\n## XIV 784333 <NA> <NA>\n## XV 1091291 <NA> <NA>\n## XVI 948066 <NA> <NA>"
+ "text": "2.3 ContactFile class\nHi-C contacts can be stored in four different formats (see previous chapter):\n\nAs a .(m)cool matrix (multi-scores, multi-resolution, indexed)\nAs a .hic matrix (multi-scores, multi-resolution, indexed)\nAs a HiC-pro derived matrix (single-score, single-resolution, non-indexed)\nUnbinned, Hi-C contacts can be stored in .pairs files\n\n\n2.3.1 Accessing example Hi-C files\nExample contact files can be downloaded using HiContactsData function.\n\nlibrary(HiContactsData)\ncoolf <- HiContactsData('yeast_wt', 'mcool')\n\nThis fetches files from the cloud, download them locally and returns the path of the local file.\n\ncoolf\n## EH7702 \n## \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\"\n\nSimilarly, example files are available for other file formats:\n\nhicf <- HiContactsData('yeast_wt', 'hic')\nhicpromatrixf <- HiContactsData('yeast_wt', 'hicpro_matrix')\nhicproregionsf <- HiContactsData('yeast_wt', 'hicpro_bed')\npairsf <- HiContactsData('yeast_wt', 'pairs.gz')\n\nWe can even check the content of some of these files to make sure they are actually what they are:\n\n# ---- HiC-Pro generates a tab-separated `regions.bed` file\nreadLines(hicproregionsf, 25)\n## [1] \"I\\t0\\t1000\" \"I\\t1000\\t2000\" \"I\\t2000\\t3000\" \"I\\t3000\\t4000\" \"I\\t4000\\t5000\" \"I\\t5000\\t6000\" \"I\\t6000\\t7000\" \"I\\t7000\\t8000\" \"I\\t8000\\t9000\" \"I\\t9000\\t10000\" \"I\\t10000\\t11000\" \"I\\t11000\\t12000\" \"I\\t12000\\t13000\" \"I\\t13000\\t14000\" \"I\\t14000\\t15000\" \"I\\t15000\\t16000\" \"I\\t16000\\t17000\" \"I\\t17000\\t18000\" \"I\\t18000\\t19000\" \"I\\t19000\\t20000\" \"I\\t20000\\t21000\" \"I\\t21000\\t22000\" \"I\\t22000\\t23000\" \"I\\t23000\\t24000\" \"I\\t24000\\t25000\"\n\n# ---- Pairs are also tab-separated \nreadLines(pairsf, 25)\n## [1] \"## pairs format v1.0\" \"#sorted: chr1-pos1-chr2-pos2\" \"#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2\" \"#chromsize: I 230218\" \"#chromsize: II 813184\" \"#chromsize: III 316620\" \"#chromsize: IV 1531933\" \"#chromsize: V 576874\" \"#chromsize: VI 270161\" \"#chromsize: VII 1090940\" \"#chromsize: VIII 562643\" \"#chromsize: IX 439888\" \"#chromsize: X 745751\" \"#chromsize: XI 666816\" \"#chromsize: XII 1078177\" \"#chromsize: XIII 924431\" \"#chromsize: XIV 784333\" \"#chromsize: XV 1091291\" \"#chromsize: XVI 948066\" \"#chromsize: Mito 85779\" \"NS500150:527:HHGYNBGXF:3:21611:19085:3986\\tII\\t105\\tII\\t48548\\t+\\t-\\t1358\\t1681\" \"NS500150:527:HHGYNBGXF:4:13604:19734:2406\\tII\\t113\\tII\\t45003\\t-\\t+\\t1358\\t1658\" \"NS500150:527:HHGYNBGXF:2:11108:25178:11036\\tII\\t119\\tII\\t687251\\t-\\t+\\t1358\\t5550\" \"NS500150:527:HHGYNBGXF:1:22301:8468:1586\\tII\\t160\\tII\\t26124\\t+\\t-\\t1358\\t1510\" \"NS500150:527:HHGYNBGXF:4:23606:24037:2076\\tII\\t169\\tII\\t39052\\t+\\t+\\t1358\\t1613\"\n\n\n2.3.2 ContactFile fundamentals\nA ContactFile object establishes a connection with a disk-stored Hi-C file (e.g. a .cool file, or a .pairs file, …). ContactFile classes are defined in the HiCExperiment package.\nContactFiles come in four different flavors:\n\n\nCoolFile: connection to a .(m)cool file\n\nHicFile: connection to a .hic file\n\nHicproFile: connection to output files generated by HiC-Pro\n\nPairsFile: connection to a .pairs file\n\nTo create each flavor of ContactFile, one can use the corresponding function:\n\nlibrary(HiCExperiment)\n\n# ----- This creates a connection to a `.(m)cool` file (path stored in `coolf`)\nCoolFile(coolf)\n## CoolFile object\n## .mcool file: /github/home/.cache/R/ExperimentHub/1a92248c093f_7752 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\n# ----- This creates a connection to a `.hic` file (path stored in `hicf`)\nHicFile(hicf)\n## HicFile object\n## .hic file: /github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\n# ----- This creates a connection to output files from HiC-Pro\nHicproFile(hicpromatrixf, hicproregionsf)\n## HicproFile object\n## HiC-Pro files:\n## $ matrix: /github/home/.cache/R/ExperimentHub/1a925372027_7837 \n## $ regions: /github/home/.cache/R/ExperimentHub/1a92600d50bf_7838 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\n# ----- This creates a connection to a pairs file\nPairsFile(pairsf)\n## PairsFile object\n## resource: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753\n\n\n2.3.3 ContactFile slots\nSeveral “slots” (i.e. pieces of information) are attached to a ContactFile object:\n\nThe path to the disk-stored contact matrix;\nThe active resolution (by default, the finest resolution available in a multi-resolution contact matrix);\nOptionally, the path to a matching pairs file (see below);\nSome metadata.\n\nSlots of a CoolFile object can be accessed as follow:\n\ncf <- CoolFile(coolf)\ncf\n## CoolFile object\n## .mcool file: /github/home/.cache/R/ExperimentHub/1a92248c093f_7752 \n## resolution: 1000 \n## pairs file: \n## metadata(0):\n\nresolution(cf)\n## [1] 1000\n\npairsFile(cf)\n## NULL\n\nmetadata(cf)\n## list()\n\n\n\n\n\n\n\nImportant!\n\n\n\nContactFile objects are only connections to a disk-stored HiC file. Although metadata is available, they do not contain actual data!\n\n\n\n2.3.4 ContactFile methods\nTwo useful methods are available for ContactFiles:\n\n\navailableResolutions checks which resolutions are available in a ContactFile.\n\n\navailableResolutions(cf)\n## resolutions(5): 1000 2000 4000 8000 16000\n## \n\n\n\navailableChromosomes checks which chromosomes are available in a ContactFile, along with their length.\n\n\navailableChromosomes(cf)\n## Seqinfo object with 16 sequences from an unspecified genome:\n## seqnames seqlengths isCircular genome\n## I 230218 <NA> <NA>\n## II 813184 <NA> <NA>\n## III 316620 <NA> <NA>\n## IV 1531933 <NA> <NA>\n## V 576874 <NA> <NA>\n## ... ... ... ...\n## XII 1078177 <NA> <NA>\n## XIII 924431 <NA> <NA>\n## XIV 784333 <NA> <NA>\n## XV 1091291 <NA> <NA>\n## XVI 948066 <NA> <NA>"
},
{
"objectID": "data-representation.html#hicexperiment-class",
"href": "data-representation.html#hicexperiment-class",
"title": "\n2 Hi-C data structures in R\n",
"section": "\n2.4 HiCExperiment class",
- "text": "2.4 HiCExperiment class\nBased on the previous sections, we have different Bioconductor classes relevant for Hi-C:\n\n\nGInteractions which can be used to represent genomic interactions in R\n\nContactFiles which can be used to establish a connection with disk-stored Hi-C files\n\nHiCExperiment objects are created when parsing a ContactFile in R. The HiCExperiment class reads a ContactFile in memory and store genomic interactions as GInteractions. The HiCExperiment class is, quite obviously, defined in the HiCExperiment package.\n\n2.4.1 Creating a HiCExperiment object\n\n2.4.1.1 Importing a ContactFile\n\nIn practice, to create a HiCExperiment object from a ContactFile, one can use the import method.\n\n\n\n\n\n\nCaution\n\n\n\n\nCreating a HiCExperiment object means importing data from a Hi-C matrix (e.g. from a ContactFile) in memory in R.\n\nCreating a HiCExperiment object from large disk-stored contact matrices can potentially take a long time.\n\n\n\n\ncf <- CoolFile(coolf)\nhic <- import(cf)\nhic\n## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 2945692 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nPrinting a HiCExperiment to the console will not reveal the actual data stored in the object (it would most likely crash your R session!). Instead, it gives a summary of the data stored in the object:\n\nThe fileName, i.e. the path to the disk-stored data file\nThe focus, i.e. the genomic location for which data has been imported (in the example above, \"whole genome\" implies that all the data has been imported in R)\n\nresolutions available in the disk-stored data file (this will be identical to availableResolutions(cf))\n\nactive resolution indicates at which resolution the data is currently imported\n\ninteractions refers to the actual GInteractions imported in R and “hidden” (for now!) in the HiCExperiment object\n\nscores refer to different interaction frequency estimates. These can be raw counts, balanced (if the contact matrix has been previously normalized), or whatever score the end-user want to attribute to each interaction (e.g. ratio of counts between two Hi-C maps, …)\n\ntopologicalFeatures is a list of GRanges or GInteractions objects to describe important topological features.\n\npairsFile is a pointer to an optional disk-stored .pairs file from which the contact matrix has been created. This is often useful to estimate some Hi-C metrics.\n\nmetadata is a list to further describe the experiment.\n\n\n\n\n\n\n\nHiCExperiment slots\n\n\n\nThese pieces of information are called slots. They can be directly accessed using getter functions, bearing the same name than the slot.\n\nfileName(hic)\n## [1] \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\"\n\nfocus(hic)\n## NULL\n\nresolutions(hic)\n## [1] 1000 2000 4000 8000 16000\n\nresolution(hic)\n## [1] 1000\n\ninteractions(hic)\n## GInteractions object with 2945692 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric>\n## [1] I 1-1000 --- I 1-1000 | 0\n## [2] I 1-1000 --- I 1001-2000 | 0\n## [3] I 1-1000 --- I 2001-3000 | 0\n## [4] I 1-1000 --- I 3001-4000 | 0\n## [5] I 1-1000 --- I 4001-5000 | 0\n## ... ... ... ... ... ... . ...\n## [2945688] XVI 940001-941000 --- XVI 942001-943000 | 12070\n## [2945689] XVI 940001-941000 --- XVI 943001-944000 | 12070\n## [2945690] XVI 941001-942000 --- XVI 941001-942000 | 12071\n## [2945691] XVI 941001-942000 --- XVI 942001-943000 | 12071\n## [2945692] XVI 941001-942000 --- XVI 943001-944000 | 12071\n## bin_id2 count balanced\n## <numeric> <numeric> <numeric>\n## [1] 0 15 0.0663491\n## [2] 1 21 0.1273505\n## [3] 2 21 0.0738691\n## [4] 3 38 0.0827051\n## [5] 4 17 0.0591984\n## ... ... ... ...\n## [2945688] 12072 11 0.0575550\n## [2945689] 12073 1 NaN\n## [2945690] 12071 74 0.0504615\n## [2945691] 12072 39 0.1624599\n## [2945692] 12073 1 NaN\n## -------\n## regions: 12079 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\nscores(hic)\n## List of length 2\n## names(2): count balanced\n\ntopologicalFeatures(hic)\n## List of length 4\n## names(4): compartments borders loops viewpoints\n\npairsFile(hic)\n## NULL\n\nmetadata(hic)\n## list()\n\n\n\n\n\n\n\n\n\nNotes\n\n\n\nimport also works for other types of ContactFile (HicFile, HicproFile, PairsFile), e.g. \n\nFor HicFile and HicproFile, import seamlessly returns a HiCExperiment as well:\n\n\nhf <- HicFile(hicf)\nhic <- import(hf)\nhic\n## `HiCExperiment` object with 13,681,280 contacts over 12,165 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a270f71fe_7836\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 2965693 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nFor PairsFile, the returned object is a representation of Hi-C “pairs” in R, i.e. GInteractions\n\n\n\npf <- PairsFile(pairsf)\npairs <- import(pf)\npairs\n## GInteractions object with 471364 interactions and 3 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric>\n## [1] II 105 --- II 48548 | 1358 1681\n## [2] II 113 --- II 45003 | 1358 1658\n## [3] II 119 --- II 687251 | 1358 5550\n## [4] II 160 --- II 26124 | 1358 1510\n## [5] II 169 --- II 39052 | 1358 1613\n## ... ... ... ... ... ... . ... ...\n## [471360] II 808605 --- II 809683 | 6316 6320\n## [471361] II 808609 --- II 809917 | 6316 6324\n## [471362] II 808617 --- II 809506 | 6316 6319\n## [471363] II 809447 --- II 809685 | 6319 6321\n## [471364] II 809472 --- II 809675 | 6319 6320\n## distance\n## <integer>\n## [1] 48443\n## [2] 44890\n## [3] 687132\n## [4] 25964\n## [5] 38883\n## ... ...\n## [471360] 1078\n## [471361] 1308\n## [471362] 889\n## [471363] 238\n## [471364] 203\n## -------\n## regions: 549331 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\n\n2.4.1.2 Customizing the import\n\nTo reduce the import to only parse the data that is relevant to the study, two arguments can be passed to import, along with a ContactFile.\n\n\n\n\n\n\nKey import arguments:\n\n\n\n\n\nfocus: This can be used to only parse data for a specific genomic location.\n\nresolution: This can be used to choose which resolution to parse the contact matrix at (this is ignored if the ContactFile is not multi-resolution, e.g. .cool or HiC-Pro generated matrices)\n\n\n\n\nImport interactions within a single chromosome:\n\n\nhic <- import(cf, focus = 'II', resolution = 2000)\n\nregions(hic) # ---- `regions()` work on `HiCExperiment` the same way than on `GInteractions`\n## GRanges object with 407 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## II_1_2000 II 1-2000 * | 116 NaN II\n## II_2001_4000 II 2001-4000 * | 117 NaN II\n## II_4001_6000 II 4001-6000 * | 118 NaN II\n## II_6001_8000 II 6001-8000 * | 119 NaN II\n## II_8001_10000 II 8001-10000 * | 120 0.0461112 II\n## ... ... ... ... . ... ... ...\n## II_804001_806000 II 804001-806000 * | 518 0.0493107 II\n## II_806001_808000 II 806001-808000 * | 519 0.0611355 II\n## II_808001_810000 II 808001-810000 * | 520 NaN II\n## II_810001_812000 II 810001-812000 * | 521 NaN II\n## II_812001_813184 II 812001-813184 * | 522 NaN II\n## center\n## <integer>\n## II_1_2000 1000\n## II_2001_4000 3000\n## II_4001_6000 5000\n## II_6001_8000 7000\n## II_8001_10000 9000\n## ... ...\n## II_804001_806000 805000\n## II_806001_808000 807000\n## II_808001_810000 809000\n## II_810001_812000 811000\n## II_812001_813184 812592\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\ntable(seqnames(regions(hic)))\n## \n## I II III IV V VI VII VIII IX X XI XII XIII XIV XV \n## 0 407 0 0 0 0 0 0 0 0 0 0 0 0 0 \n## XVI \n## 0\n\nanchors(hic) # ---- `anchors()` work on `HiCExperiment` the same way than on `GInteractions`\n## $first\n## GRanges object with 34063 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## [1] II 1-2000 * | 116 NaN II\n## [2] II 1-2000 * | 116 NaN II\n## [3] II 1-2000 * | 116 NaN II\n## [4] II 1-2000 * | 116 NaN II\n## [5] II 1-2000 * | 116 NaN II\n## ... ... ... ... . ... ... ...\n## [34059] II 804001-806000 * | 518 0.0493107 II\n## [34060] II 806001-808000 * | 519 0.0611355 II\n## [34061] II 806001-808000 * | 519 0.0611355 II\n## [34062] II 806001-808000 * | 519 0.0611355 II\n## [34063] II 808001-810000 * | 520 NaN II\n## center\n## <integer>\n## [1] 1000\n## [2] 1000\n## [3] 1000\n## [4] 1000\n## [5] 1000\n## ... ...\n## [34059] 805000\n## [34060] 807000\n## [34061] 807000\n## [34062] 807000\n## [34063] 809000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 34063 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## [1] II 1-2000 * | 116 NaN II\n## [2] II 4001-6000 * | 118 NaN II\n## [3] II 6001-8000 * | 119 NaN II\n## [4] II 8001-10000 * | 120 0.0461112 II\n## [5] II 10001-12000 * | 121 0.0334807 II\n## ... ... ... ... . ... ... ...\n## [34059] II 810001-812000 * | 521 NaN II\n## [34060] II 806001-808000 * | 519 0.0611355 II\n## [34061] II 808001-810000 * | 520 NaN II\n## [34062] II 810001-812000 * | 521 NaN II\n## [34063] II 808001-810000 * | 520 NaN II\n## center\n## <integer>\n## [1] 1000\n## [2] 5000\n## [3] 7000\n## [4] 9000\n## [5] 11000\n## ... ...\n## [34059] 811000\n## [34060] 807000\n## [34061] 809000\n## [34062] 811000\n## [34063] 809000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions within a segment of a chromosome:\n\n\nhic <- import(cf, focus = 'II:40000-60000', resolution = 1000)\n\nregions(hic) \n## GRanges object with 21 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## II_39001_40000 II 39001-40000 * | 270 0.0220798 II\n## II_40001_41000 II 40001-41000 * | 271 0.0246775 II\n## II_41001_42000 II 41001-42000 * | 272 0.0269232 II\n## II_42001_43000 II 42001-43000 * | 273 0.0341849 II\n## II_43001_44000 II 43001-44000 * | 274 0.0265386 II\n## ... ... ... ... . ... ... ...\n## II_55001_56000 II 55001-56000 * | 286 0.0213532 II\n## II_56001_57000 II 56001-57000 * | 287 0.0569839 II\n## II_57001_58000 II 57001-58000 * | 288 0.0338612 II\n## II_58001_59000 II 58001-59000 * | 289 0.0294531 II\n## II_59001_60000 II 59001-60000 * | 290 0.0306662 II\n## center\n## <integer>\n## II_39001_40000 39500\n## II_40001_41000 40500\n## II_41001_42000 41500\n## II_42001_43000 42500\n## II_43001_44000 43500\n## ... ...\n## II_55001_56000 55500\n## II_56001_57000 56500\n## II_57001_58000 57500\n## II_58001_59000 58500\n## II_59001_60000 59500\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic)\n## $first\n## GRanges object with 210 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] II 40001-41000 * | 271 0.0246775 II 40500\n## [2] II 40001-41000 * | 271 0.0246775 II 40500\n## [3] II 40001-41000 * | 271 0.0246775 II 40500\n## [4] II 40001-41000 * | 271 0.0246775 II 40500\n## [5] II 40001-41000 * | 271 0.0246775 II 40500\n## ... ... ... ... . ... ... ... ...\n## [206] II 57001-58000 * | 288 0.0338612 II 57500\n## [207] II 57001-58000 * | 288 0.0338612 II 57500\n## [208] II 58001-59000 * | 289 0.0294531 II 58500\n## [209] II 58001-59000 * | 289 0.0294531 II 58500\n## [210] II 59001-60000 * | 290 0.0306662 II 59500\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 210 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] II 40001-41000 * | 271 0.0246775 II 40500\n## [2] II 41001-42000 * | 272 0.0269232 II 41500\n## [3] II 42001-43000 * | 273 0.0341849 II 42500\n## [4] II 43001-44000 * | 274 0.0265386 II 43500\n## [5] II 44001-45000 * | 275 0.0488968 II 44500\n## ... ... ... ... . ... ... ... ...\n## [206] II 58001-59000 * | 289 0.0294531 II 58500\n## [207] II 59001-60000 * | 290 0.0306662 II 59500\n## [208] II 58001-59000 * | 289 0.0294531 II 58500\n## [209] II 59001-60000 * | 290 0.0306662 II 59500\n## [210] II 59001-60000 * | 290 0.0306662 II 59500\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions between two chromosomes:\n\n\nhic2 <- import(cf, focus = 'II|XV', resolution = 4000)\n\nregions(hic2)\n## GRanges object with 477 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight\n## <Rle> <IRanges> <Rle> | <numeric> <numeric>\n## II_1_4000 II 1-4000 * | 58 NaN\n## II_4001_8000 II 4001-8000 * | 59 NaN\n## II_8001_12000 II 8001-12000 * | 60 0.0274474\n## II_12001_16000 II 12001-16000 * | 61 0.0342116\n## II_16001_20000 II 16001-20000 * | 62 0.0195128\n## ... ... ... ... . ... ...\n## XV_1072001_1076000 XV 1072001-1076000 * | 2783 0.041763\n## XV_1076001_1080000 XV 1076001-1080000 * | 2784 NaN\n## XV_1080001_1084000 XV 1080001-1084000 * | 2785 NaN\n## XV_1084001_1088000 XV 1084001-1088000 * | 2786 NaN\n## XV_1088001_1091291 XV 1088001-1091291 * | 2787 NaN\n## chr center\n## <Rle> <integer>\n## II_1_4000 II 2000\n## II_4001_8000 II 6000\n## II_8001_12000 II 10000\n## II_12001_16000 II 14000\n## II_16001_20000 II 18000\n## ... ... ...\n## XV_1072001_1076000 XV 1074000\n## XV_1076001_1080000 XV 1078000\n## XV_1080001_1084000 XV 1082000\n## XV_1084001_1088000 XV 1086000\n## XV_1088001_1091291 XV 1089646\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic2)\n## $first\n## GRanges object with 18032 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## [1] II 1-4000 * | 58 NaN II\n## [2] II 1-4000 * | 58 NaN II\n## [3] II 1-4000 * | 58 NaN II\n## [4] II 1-4000 * | 58 NaN II\n## [5] II 1-4000 * | 58 NaN II\n## ... ... ... ... . ... ... ...\n## [18028] II 808001-812000 * | 260 NaN II\n## [18029] II 808001-812000 * | 260 NaN II\n## [18030] II 808001-812000 * | 260 NaN II\n## [18031] II 808001-812000 * | 260 NaN II\n## [18032] II 808001-812000 * | 260 NaN II\n## center\n## <integer>\n## [1] 2000\n## [2] 2000\n## [3] 2000\n## [4] 2000\n## [5] 2000\n## ... ...\n## [18028] 810000\n## [18029] 810000\n## [18030] 810000\n## [18031] 810000\n## [18032] 810000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 18032 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## [1] XV 48001-52000 * | 2527 0.0185354 XV\n## [2] XV 348001-352000 * | 2602 0.0233750 XV\n## [3] XV 468001-472000 * | 2632 0.0153615 XV\n## [4] XV 472001-476000 * | 2633 0.0189624 XV\n## [5] XV 584001-588000 * | 2661 0.0167715 XV\n## ... ... ... ... . ... ... ...\n## [18028] XV 980001-984000 * | 2760 0.0187827 XV\n## [18029] XV 984001-988000 * | 2761 0.0250094 XV\n## [18030] XV 992001-996000 * | 2763 0.0185599 XV\n## [18031] XV 1004001-1008000 * | 2766 0.0196942 XV\n## [18032] XV 1064001-1068000 * | 2781 0.0208220 XV\n## center\n## <integer>\n## [1] 50000\n## [2] 350000\n## [3] 470000\n## [4] 474000\n## [5] 586000\n## ... ...\n## [18028] 982000\n## [18029] 986000\n## [18030] 994000\n## [18031] 1006000\n## [18032] 1066000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions between segments of two chromosomes:\n\n\nhic3 <- import(cf, focus = 'III:10000-40000|XV:10000-40000', resolution = 2000)\n\nregions(hic3)\n## GRanges object with 32 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## III_8001_10000 III 8001-10000 * | 527 NaN III\n## III_10001_12000 III 10001-12000 * | 528 NaN III\n## III_12001_14000 III 12001-14000 * | 529 NaN III\n## III_14001_16000 III 14001-16000 * | 530 0.0356351 III\n## III_16001_18000 III 16001-18000 * | 531 0.0230693 III\n## ... ... ... ... . ... ... ...\n## XV_30001_32000 XV 30001-32000 * | 5039 0.0482465 XV\n## XV_32001_34000 XV 32001-34000 * | 5040 0.0241580 XV\n## XV_34001_36000 XV 34001-36000 * | 5041 0.0273166 XV\n## XV_36001_38000 XV 36001-38000 * | 5042 0.0542235 XV\n## XV_38001_40000 XV 38001-40000 * | 5043 0.0206849 XV\n## center\n## <integer>\n## III_8001_10000 9000\n## III_10001_12000 11000\n## III_12001_14000 13000\n## III_14001_16000 15000\n## III_16001_18000 17000\n## ... ...\n## XV_30001_32000 31000\n## XV_32001_34000 33000\n## XV_34001_36000 35000\n## XV_36001_38000 37000\n## XV_38001_40000 39000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic3)\n## $first\n## GRanges object with 11 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] III 14001-16000 * | 530 0.0356351 III 15000\n## [2] III 16001-18000 * | 531 0.0230693 III 17000\n## [3] III 16001-18000 * | 531 0.0230693 III 17000\n## [4] III 20001-22000 * | 533 0.0343250 III 21000\n## [5] III 22001-24000 * | 534 0.0258604 III 23000\n## [6] III 24001-26000 * | 535 0.0290757 III 25000\n## [7] III 28001-30000 * | 537 0.0290713 III 29000\n## [8] III 30001-32000 * | 538 0.0266373 III 31000\n## [9] III 32001-34000 * | 539 0.0201137 III 33000\n## [10] III 32001-34000 * | 539 0.0201137 III 33000\n## [11] III 36001-38000 * | 541 0.0220603 III 37000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 11 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] XV 16001-18000 * | 5032 0.0187250 XV 17000\n## [2] XV 16001-18000 * | 5032 0.0187250 XV 17000\n## [3] XV 20001-22000 * | 5034 0.0247973 XV 21000\n## [4] XV 14001-16000 * | 5031 0.0379727 XV 15000\n## [5] XV 10001-12000 * | 5029 0.0296913 XV 11000\n## [6] XV 32001-34000 * | 5040 0.0241580 XV 33000\n## [7] XV 16001-18000 * | 5032 0.0187250 XV 17000\n## [8] XV 38001-40000 * | 5043 0.0206849 XV 39000\n## [9] XV 22001-24000 * | 5035 0.0613856 XV 23000\n## [10] XV 30001-32000 * | 5039 0.0482465 XV 31000\n## [11] XV 10001-12000 * | 5029 0.0296913 XV 11000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\n2.4.2 Interacting with HiCExperiment data\n\nAn HiCExperiment object allows parsing of a disk-stored contact matrix.\nAn HiCExperiment object operates by wrapping together (1) a ContactFile (i.e. a connection to a disk-stored data file) and (2) a GInteractions generated by parsing the data file.\n\nWe will use the yeast_hic HiCExperiment object to demonstrate how to parse information from a HiCExperiment object.\n\nyeast_hic <- contacts_yeast()\n\n\nyeast_hic\n## `HiCExperiment` object with 8,757,906 contacts over 763 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 16000 \n## interactions: 267709 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753 \n## metadata(3): ID org date\n\n\n2.4.2.1 Interactions\nThe imported genomic interactions can be directly exposed using the interactions function and are returned as a GInteractions object.\n\ninteractions(yeast_hic)\n## GInteractions object with 267709 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric>\n## [1] I 1-16000 --- I 1-16000 | 0\n## [2] I 1-16000 --- I 16001-32000 | 0\n## [3] I 1-16000 --- I 32001-48000 | 0\n## [4] I 1-16000 --- I 48001-64000 | 0\n## [5] I 1-16000 --- I 64001-80000 | 0\n## ... ... ... ... ... ... . ...\n## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759\n## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759\n## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760\n## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760\n## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761\n## bin_id2 count balanced\n## <numeric> <numeric> <numeric>\n## [1] 0 2836 1.0943959\n## [2] 1 2212 0.9592069\n## [3] 2 1183 0.4385242\n## [4] 3 831 0.2231192\n## [5] 4 310 0.0821255\n## ... ... ... ...\n## [267705] 760 3565 1.236371\n## [267706] 761 1359 0.385016\n## [267707] 760 3534 2.103988\n## [267708] 761 3055 1.485794\n## [267709] 761 4308 1.711565\n## -------\n## regions: 763 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\n\n\n\n\n\n\nNote\n\n\n\nBecause genomic interactions are actually stored as GInteractions, regions and anchors work on HiCExperiment objects just as they work with GInteractions!\n\n\n\nregions(yeast_hic)\n## GRanges object with 763 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight\n## <Rle> <IRanges> <Rle> | <numeric> <numeric>\n## I_1_16000 I 1-16000 * | 0 0.0196442\n## I_16001_32000 I 16001-32000 * | 1 0.0220746\n## I_32001_48000 I 32001-48000 * | 2 0.0188701\n## I_48001_64000 I 48001-64000 * | 3 0.0136679\n## I_64001_80000 I 64001-80000 * | 4 0.0134860\n## ... ... ... ... . ... ...\n## XVI_880001_896000 XVI 880001-896000 * | 758 0.00910873\n## XVI_896001_912000 XVI 896001-912000 * | 759 0.01421350\n## XVI_912001_928000 XVI 912001-928000 * | 760 0.02439992\n## XVI_928001_944000 XVI 928001-944000 * | 761 0.01993237\n## XVI_944001_948066 XVI 944001-948066 * | 762 NaN\n## chr center\n## <Rle> <integer>\n## I_1_16000 I 8000\n## I_16001_32000 I 24000\n## I_32001_48000 I 40000\n## I_48001_64000 I 56000\n## I_64001_80000 I 72000\n## ... ... ...\n## XVI_880001_896000 XVI 888000\n## XVI_896001_912000 XVI 904000\n## XVI_912001_928000 XVI 920000\n## XVI_928001_944000 XVI 936000\n## XVI_944001_948066 XVI 946033\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(yeast_hic)\n## $first\n## GRanges object with 267709 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## [1] I 1-16000 * | 0 0.0196442 I\n## [2] I 1-16000 * | 0 0.0196442 I\n## [3] I 1-16000 * | 0 0.0196442 I\n## [4] I 1-16000 * | 0 0.0196442 I\n## [5] I 1-16000 * | 0 0.0196442 I\n## ... ... ... ... . ... ... ...\n## [267705] XVI 896001-912000 * | 759 0.0142135 XVI\n## [267706] XVI 896001-912000 * | 759 0.0142135 XVI\n## [267707] XVI 912001-928000 * | 760 0.0243999 XVI\n## [267708] XVI 912001-928000 * | 760 0.0243999 XVI\n## [267709] XVI 928001-944000 * | 761 0.0199324 XVI\n## center\n## <integer>\n## [1] 8000\n## [2] 8000\n## [3] 8000\n## [4] 8000\n## [5] 8000\n## ... ...\n## [267705] 904000\n## [267706] 904000\n## [267707] 920000\n## [267708] 920000\n## [267709] 936000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 267709 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle>\n## [1] I 1-16000 * | 0 0.0196442 I\n## [2] I 16001-32000 * | 1 0.0220746 I\n## [3] I 32001-48000 * | 2 0.0188701 I\n## [4] I 48001-64000 * | 3 0.0136679 I\n## [5] I 64001-80000 * | 4 0.0134860 I\n## ... ... ... ... . ... ... ...\n## [267705] XVI 912001-928000 * | 760 0.0243999 XVI\n## [267706] XVI 928001-944000 * | 761 0.0199324 XVI\n## [267707] XVI 912001-928000 * | 760 0.0243999 XVI\n## [267708] XVI 928001-944000 * | 761 0.0199324 XVI\n## [267709] XVI 928001-944000 * | 761 0.0199324 XVI\n## center\n## <integer>\n## [1] 8000\n## [2] 24000\n## [3] 40000\n## [4] 56000\n## [5] 72000\n## ... ...\n## [267705] 920000\n## [267706] 936000\n## [267707] 920000\n## [267708] 936000\n## [267709] 936000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\n2.4.2.2 Bins and seqinfo\nAdditional useful information can be recovered from a HiCExperiment object. This includes:\n\nThe seqinfo of the HiCExperiment:\n\n\nseqinfo(yeast_hic)\n## Seqinfo object with 16 sequences from an unspecified genome:\n## seqnames seqlengths isCircular genome\n## I 230218 <NA> <NA>\n## II 813184 <NA> <NA>\n## III 316620 <NA> <NA>\n## IV 1531933 <NA> <NA>\n## V 576874 <NA> <NA>\n## ... ... ... ...\n## XII 1078177 <NA> <NA>\n## XIII 924431 <NA> <NA>\n## XIV 784333 <NA> <NA>\n## XV 1091291 <NA> <NA>\n## XVI 948066 <NA> <NA>\n\nThis lists the different chromosomes available to parse along with their length.\n\nThe bins of the HiCExperiment:\n\n\nbins(yeast_hic)\n## GRanges object with 763 ranges and 2 metadata columns:\n## seqnames ranges strand | bin_id weight\n## <Rle> <IRanges> <Rle> | <numeric> <numeric>\n## I_1_16000 I 1-16000 * | 0 0.0196442\n## I_16001_32000 I 16001-32000 * | 1 0.0220746\n## I_32001_48000 I 32001-48000 * | 2 0.0188701\n## I_48001_64000 I 48001-64000 * | 3 0.0136679\n## I_64001_80000 I 64001-80000 * | 4 0.0134860\n## ... ... ... ... . ... ...\n## XVI_880001_896000 XVI 880001-896000 * | 758 0.00910873\n## XVI_896001_912000 XVI 896001-912000 * | 759 0.01421350\n## XVI_912001_928000 XVI 912001-928000 * | 760 0.02439992\n## XVI_928001_944000 XVI 928001-944000 * | 761 0.01993237\n## XVI_944001_948066 XVI 944001-948066 * | 762 NaN\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\n\n\n\n\n\nDifference between bins and regions\n\n\n\nbins are not equivalent to regions of an HiCExperiment.\n\n\nbins refer to all the possible regions of a HiCExperiment. For instance, for a HiCExperiment with a total genome size of 1,000,000 and a resolution of 2000, bins will always return a GRanges object with 500 ranges.\n\nregions, on the opposite, refer to the union of anchors of all the interactions imported in a HiCExperiment object.\n\nThus, all the regions will necessarily be a subset of the HiCExperiment bins, or equal to bins if no focus has been specified when importing a ContactFile.\n\n\n\n2.4.2.3 Scores\nOf course, what the end-user would be looking for is the frequency for each genomic interaction. Such frequency scores are available using the scores function. scores returns a list with a number of different types of scores.\n\nhead(scores(yeast_hic))\n## List of length 2\n## names(2): count balanced\n\nhead(scores(yeast_hic, \"count\"))\n## [1] 2836 2212 1183 831 310 159\n\nhead(scores(yeast_hic, \"balanced\"))\n## [1] 1.09439586 0.95920688 0.43852417 0.22311917 0.08212549 0.03345221\n\n\n\n\n\n\n\nTip\n\n\n\nCalling interactions(hic) returns a GInteractions with scores already stored in extra columns. This short-hand allows one to dynamically check scores directly from the interactions output.\n\ninteractions(yeast_hic)\n## GInteractions object with 267709 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric>\n## [1] I 1-16000 --- I 1-16000 | 0\n## [2] I 1-16000 --- I 16001-32000 | 0\n## [3] I 1-16000 --- I 32001-48000 | 0\n## [4] I 1-16000 --- I 48001-64000 | 0\n## [5] I 1-16000 --- I 64001-80000 | 0\n## ... ... ... ... ... ... . ...\n## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759\n## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759\n## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760\n## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760\n## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761\n## bin_id2 count balanced\n## <numeric> <numeric> <numeric>\n## [1] 0 2836 1.0943959\n## [2] 1 2212 0.9592069\n## [3] 2 1183 0.4385242\n## [4] 3 831 0.2231192\n## [5] 4 310 0.0821255\n## ... ... ... ...\n## [267705] 760 3565 1.236371\n## [267706] 761 1359 0.385016\n## [267707] 760 3534 2.103988\n## [267708] 761 3055 1.485794\n## [267709] 761 4308 1.711565\n## -------\n## regions: 763 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\nhead(interactions(yeast_hic)$count)\n## [1] 2836 2212 1183 831 310 159\n\n\n\n\n2.4.2.4 topologicalFeatures\nIn Hi-C studies, “topological features” refer to genomic structures identified (usually from a Hi-C map, but not necessarily). For instance, one may want to study known structural loops anchored at CTCF sites, or interactions around or over centromeres, or simply specific genomic “viewpoints”.\nHiCExperiment objects can store topologicalFeatures to facilitate this analysis. By default, four empty topologicalFeatures are stored in a list:\n\ncompartments\nborders\nloops\nviewpoints\n\nAdditional topologicalFeatures can be added to this list (read next chapter for more detail).\n\ntopologicalFeatures(yeast_hic)\n## List of length 5\n## names(5): compartments borders loops viewpoints centromeres\n\ntopologicalFeatures(yeast_hic, 'centromeres')\n## GRanges object with 16 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] I 151583-151641 +\n## [2] II 238361-238419 +\n## [3] III 114322-114380 +\n## [4] IV 449879-449937 +\n## [5] V 152522-152580 +\n## ... ... ... ...\n## [12] XII 151366-151424 +\n## [13] XIII 268222-268280 +\n## [14] XIV 628588-628646 +\n## [15] XV 326897-326955 +\n## [16] XVI 556255-556313 +\n## -------\n## seqinfo: 17 sequences (1 circular) from R64-1-1 genome\n\n\n2.4.2.5 pairsFile\nAs a contact matrix is typically obtained from binning a .pairs file, it is often the case that the matching .pairs file is available to then end-user. A PairsFile can thus be created and associated to the corresponding HiCExperiment object. This allows more accurate estimation of contact distribution, e.g. when calculating distance-dependent genomic interaction frequency.\n\npairsFile(yeast_hic) <- pairsf\n\npairsFile(yeast_hic)\n## EH7703 \n## \"/github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753\"\n\nreadLines(pairsFile(yeast_hic), 25)\n## [1] \"## pairs format v1.0\" \n## [2] \"#sorted: chr1-pos1-chr2-pos2\" \n## [3] \"#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2\" \n## [4] \"#chromsize: I 230218\" \n## [5] \"#chromsize: II 813184\" \n## [6] \"#chromsize: III 316620\" \n## [7] \"#chromsize: IV 1531933\" \n## [8] \"#chromsize: V 576874\" \n## [9] \"#chromsize: VI 270161\" \n## [10] \"#chromsize: VII 1090940\" \n## [11] \"#chromsize: VIII 562643\" \n## [12] \"#chromsize: IX 439888\" \n## [13] \"#chromsize: X 745751\" \n## [14] \"#chromsize: XI 666816\" \n## [15] \"#chromsize: XII 1078177\" \n## [16] \"#chromsize: XIII 924431\" \n## [17] \"#chromsize: XIV 784333\" \n## [18] \"#chromsize: XV 1091291\" \n## [19] \"#chromsize: XVI 948066\" \n## [20] \"#chromsize: Mito 85779\" \n## [21] \"NS500150:527:HHGYNBGXF:3:21611:19085:3986\\tII\\t105\\tII\\t48548\\t+\\t-\\t1358\\t1681\" \n## [22] \"NS500150:527:HHGYNBGXF:4:13604:19734:2406\\tII\\t113\\tII\\t45003\\t-\\t+\\t1358\\t1658\" \n## [23] \"NS500150:527:HHGYNBGXF:2:11108:25178:11036\\tII\\t119\\tII\\t687251\\t-\\t+\\t1358\\t5550\"\n## [24] \"NS500150:527:HHGYNBGXF:1:22301:8468:1586\\tII\\t160\\tII\\t26124\\t+\\t-\\t1358\\t1510\" \n## [25] \"NS500150:527:HHGYNBGXF:4:23606:24037:2076\\tII\\t169\\tII\\t39052\\t+\\t+\\t1358\\t1613\"\n\n\n\n\n\n\n\nImporting a PairsFile\n\n\n\nThe .pairs file linked to a HiCExperiment object can itself be imported in a GInteractions object:\n\nimport(pairsFile(yeast_hic), format = 'pairs')\n## GInteractions object with 471364 interactions and 3 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric>\n## [1] II 105 --- II 48548 | 1358 1681\n## [2] II 113 --- II 45003 | 1358 1658\n## [3] II 119 --- II 687251 | 1358 5550\n## [4] II 160 --- II 26124 | 1358 1510\n## [5] II 169 --- II 39052 | 1358 1613\n## ... ... ... ... ... ... . ... ...\n## [471360] II 808605 --- II 809683 | 6316 6320\n## [471361] II 808609 --- II 809917 | 6316 6324\n## [471362] II 808617 --- II 809506 | 6316 6319\n## [471363] II 809447 --- II 809685 | 6319 6321\n## [471364] II 809472 --- II 809675 | 6319 6320\n## distance\n## <integer>\n## [1] 48443\n## [2] 44890\n## [3] 687132\n## [4] 25964\n## [5] 38883\n## ... ...\n## [471360] 1078\n## [471361] 1308\n## [471362] 889\n## [471363] 238\n## [471364] 203\n## -------\n## regions: 549331 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nNote that these GInteractions are not binned, contrary to interactions extracted from a HiCExperiment. Anchors of the interactions listed in the GInteractions imported from a disk-stored .pairs file are all of width 1."
+ "text": "2.4 HiCExperiment class\nBased on the previous sections, we have different Bioconductor classes relevant for Hi-C:\n\n\nGInteractions which can be used to represent genomic interactions in R\n\nContactFiles which can be used to establish a connection with disk-stored Hi-C files\n\nHiCExperiment objects are created when parsing a ContactFile in R. The HiCExperiment class reads a ContactFile in memory and store genomic interactions as GInteractions. The HiCExperiment class is, quite obviously, defined in the HiCExperiment package.\n\n2.4.1 Creating a HiCExperiment object\n\n2.4.1.1 Importing a ContactFile\n\nIn practice, to create a HiCExperiment object from a ContactFile, one can use the import method.\n\n\n\n\n\n\nCaution\n\n\n\n\nCreating a HiCExperiment object means importing data from a Hi-C matrix (e.g. from a ContactFile) in memory in R.\n\nCreating a HiCExperiment object from large disk-stored contact matrices can potentially take a long time.\n\n\n\n\ncf <- CoolFile(coolf)\nhic <- import(cf)\nhic\n## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 2945692 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nPrinting a HiCExperiment to the console will not reveal the actual data stored in the object (it would most likely crash your R session!). Instead, it gives a summary of the data stored in the object:\n\nThe fileName, i.e. the path to the disk-stored data file\nThe focus, i.e. the genomic location for which data has been imported (in the example above, \"whole genome\" implies that all the data has been imported in R)\n\nresolutions available in the disk-stored data file (this will be identical to availableResolutions(cf))\n\nactive resolution indicates at which resolution the data is currently imported\n\ninteractions refers to the actual GInteractions imported in R and “hidden” (for now!) in the HiCExperiment object\n\nscores refer to different interaction frequency estimates. These can be raw counts, balanced (if the contact matrix has been previously normalized), or whatever score the end-user want to attribute to each interaction (e.g. ratio of counts between two Hi-C maps, …)\n\ntopologicalFeatures is a list of GRanges or GInteractions objects to describe important topological features.\n\npairsFile is a pointer to an optional disk-stored .pairs file from which the contact matrix has been created. This is often useful to estimate some Hi-C metrics.\n\nmetadata is a list to further describe the experiment.\n\nThese pieces of information are called slots. They can be directly accessed using getter functions, bearing the same name than the slot.\n\nfileName(hic)\n## [1] \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\"\n\nfocus(hic)\n## NULL\n\nresolutions(hic)\n## [1] 1000 2000 4000 8000 16000\n\nresolution(hic)\n## [1] 1000\n\ninteractions(hic)\n## GInteractions object with 2945692 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>\n## [1] I 1-1000 --- I 1-1000 | 0 0 15 0.0663491\n## [2] I 1-1000 --- I 1001-2000 | 0 1 21 0.1273505\n## [3] I 1-1000 --- I 2001-3000 | 0 2 21 0.0738691\n## [4] I 1-1000 --- I 3001-4000 | 0 3 38 0.0827051\n## [5] I 1-1000 --- I 4001-5000 | 0 4 17 0.0591984\n## ... ... ... ... ... ... . ... ... ... ...\n## [2945688] XVI 940001-941000 --- XVI 942001-943000 | 12070 12072 11 0.0575550\n## [2945689] XVI 940001-941000 --- XVI 943001-944000 | 12070 12073 1 NaN\n## [2945690] XVI 941001-942000 --- XVI 941001-942000 | 12071 12071 74 0.0504615\n## [2945691] XVI 941001-942000 --- XVI 942001-943000 | 12071 12072 39 0.1624599\n## [2945692] XVI 941001-942000 --- XVI 943001-944000 | 12071 12073 1 NaN\n## -------\n## regions: 12079 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\nscores(hic)\n## List of length 2\n## names(2): count balanced\n\ntopologicalFeatures(hic)\n## List of length 4\n## names(4): compartments borders loops viewpoints\n\npairsFile(hic)\n## NULL\n\nmetadata(hic)\n## list()\n\nimport also works for other types of ContactFile (HicFile, HicproFile, PairsFile), e.g. \n\nFor HicFile and HicproFile, import seamlessly returns a HiCExperiment as well:\n\n\nhf <- HicFile(hicf)\nhic <- import(hf)\nhic\n## `HiCExperiment` object with 13,681,280 contacts over 12,165 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 2965693 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nFor PairsFile, the returned object is a representation of Hi-C “pairs” in R, i.e. GInteractions\n\n\n\npf <- PairsFile(pairsf)\npairs <- import(pf)\npairs\n## GInteractions object with 471364 interactions and 3 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2 distance\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <integer>\n## [1] II 105 --- II 48548 | 1358 1681 48443\n## [2] II 113 --- II 45003 | 1358 1658 44890\n## [3] II 119 --- II 687251 | 1358 5550 687132\n## [4] II 160 --- II 26124 | 1358 1510 25964\n## [5] II 169 --- II 39052 | 1358 1613 38883\n## ... ... ... ... ... ... . ... ... ...\n## [471360] II 808605 --- II 809683 | 6316 6320 1078\n## [471361] II 808609 --- II 809917 | 6316 6324 1308\n## [471362] II 808617 --- II 809506 | 6316 6319 889\n## [471363] II 809447 --- II 809685 | 6319 6321 238\n## [471364] II 809472 --- II 809675 | 6319 6320 203\n## -------\n## regions: 549331 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n2.4.1.2 Customizing the import\n\nTo reduce the import to only parse the data that is relevant to the study, two arguments can be passed to import, along with a ContactFile.\n\n\n\n\n\n\nKey import arguments:\n\n\n\n\n\nfocus: This can be used to only parse data for a specific genomic location.\n\nresolution: This can be used to choose which resolution to parse the contact matrix at (this is ignored if the ContactFile is not multi-resolution, e.g. .cool or HiC-Pro generated matrices)\n\n\n\n\nImport interactions within a single chromosome:\n\n\nhic <- import(cf, focus = 'II', resolution = 2000)\n\nregions(hic) # ---- `regions()` work on `HiCExperiment` the same way than on `GInteractions`\n## GRanges object with 407 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## II_1_2000 II 1-2000 * | 116 NaN II 1000\n## II_2001_4000 II 2001-4000 * | 117 NaN II 3000\n## II_4001_6000 II 4001-6000 * | 118 NaN II 5000\n## II_6001_8000 II 6001-8000 * | 119 NaN II 7000\n## II_8001_10000 II 8001-10000 * | 120 0.0461112 II 9000\n## ... ... ... ... . ... ... ... ...\n## II_804001_806000 II 804001-806000 * | 518 0.0493107 II 805000\n## II_806001_808000 II 806001-808000 * | 519 0.0611355 II 807000\n## II_808001_810000 II 808001-810000 * | 520 NaN II 809000\n## II_810001_812000 II 810001-812000 * | 521 NaN II 811000\n## II_812001_813184 II 812001-813184 * | 522 NaN II 812592\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\ntable(seqnames(regions(hic)))\n## \n## I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI \n## 0 407 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n\nanchors(hic) # ---- `anchors()` work on `HiCExperiment` the same way than on `GInteractions`\n## $first\n## GRanges object with 34063 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] II 1-2000 * | 116 NaN II 1000\n## [2] II 1-2000 * | 116 NaN II 1000\n## [3] II 1-2000 * | 116 NaN II 1000\n## [4] II 1-2000 * | 116 NaN II 1000\n## [5] II 1-2000 * | 116 NaN II 1000\n## ... ... ... ... . ... ... ... ...\n## [34059] II 804001-806000 * | 518 0.0493107 II 805000\n## [34060] II 806001-808000 * | 519 0.0611355 II 807000\n## [34061] II 806001-808000 * | 519 0.0611355 II 807000\n## [34062] II 806001-808000 * | 519 0.0611355 II 807000\n## [34063] II 808001-810000 * | 520 NaN II 809000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 34063 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] II 1-2000 * | 116 NaN II 1000\n## [2] II 4001-6000 * | 118 NaN II 5000\n## [3] II 6001-8000 * | 119 NaN II 7000\n## [4] II 8001-10000 * | 120 0.0461112 II 9000\n## [5] II 10001-12000 * | 121 0.0334807 II 11000\n## ... ... ... ... . ... ... ... ...\n## [34059] II 810001-812000 * | 521 NaN II 811000\n## [34060] II 806001-808000 * | 519 0.0611355 II 807000\n## [34061] II 808001-810000 * | 520 NaN II 809000\n## [34062] II 810001-812000 * | 521 NaN II 811000\n## [34063] II 808001-810000 * | 520 NaN II 809000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions within a segment of a chromosome:\n\n\nhic <- import(cf, focus = 'II:40000-60000', resolution = 1000)\n\nregions(hic) \n## GRanges object with 21 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## II_39001_40000 II 39001-40000 * | 270 0.0220798 II 39500\n## II_40001_41000 II 40001-41000 * | 271 0.0246775 II 40500\n## II_41001_42000 II 41001-42000 * | 272 0.0269232 II 41500\n## II_42001_43000 II 42001-43000 * | 273 0.0341849 II 42500\n## II_43001_44000 II 43001-44000 * | 274 0.0265386 II 43500\n## ... ... ... ... . ... ... ... ...\n## II_55001_56000 II 55001-56000 * | 286 0.0213532 II 55500\n## II_56001_57000 II 56001-57000 * | 287 0.0569839 II 56500\n## II_57001_58000 II 57001-58000 * | 288 0.0338612 II 57500\n## II_58001_59000 II 58001-59000 * | 289 0.0294531 II 58500\n## II_59001_60000 II 59001-60000 * | 290 0.0306662 II 59500\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic)\n## $first\n## GRanges object with 210 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] II 40001-41000 * | 271 0.0246775 II 40500\n## [2] II 40001-41000 * | 271 0.0246775 II 40500\n## [3] II 40001-41000 * | 271 0.0246775 II 40500\n## [4] II 40001-41000 * | 271 0.0246775 II 40500\n## [5] II 40001-41000 * | 271 0.0246775 II 40500\n## ... ... ... ... . ... ... ... ...\n## [206] II 57001-58000 * | 288 0.0338612 II 57500\n## [207] II 57001-58000 * | 288 0.0338612 II 57500\n## [208] II 58001-59000 * | 289 0.0294531 II 58500\n## [209] II 58001-59000 * | 289 0.0294531 II 58500\n## [210] II 59001-60000 * | 290 0.0306662 II 59500\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 210 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] II 40001-41000 * | 271 0.0246775 II 40500\n## [2] II 41001-42000 * | 272 0.0269232 II 41500\n## [3] II 42001-43000 * | 273 0.0341849 II 42500\n## [4] II 43001-44000 * | 274 0.0265386 II 43500\n## [5] II 44001-45000 * | 275 0.0488968 II 44500\n## ... ... ... ... . ... ... ... ...\n## [206] II 58001-59000 * | 289 0.0294531 II 58500\n## [207] II 59001-60000 * | 290 0.0306662 II 59500\n## [208] II 58001-59000 * | 289 0.0294531 II 58500\n## [209] II 59001-60000 * | 290 0.0306662 II 59500\n## [210] II 59001-60000 * | 290 0.0306662 II 59500\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions between two chromosomes:\n\n\nhic2 <- import(cf, focus = 'II|XV', resolution = 4000)\n\nregions(hic2)\n## GRanges object with 477 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## II_1_4000 II 1-4000 * | 58 NaN II 2000\n## II_4001_8000 II 4001-8000 * | 59 NaN II 6000\n## II_8001_12000 II 8001-12000 * | 60 0.0274474 II 10000\n## II_12001_16000 II 12001-16000 * | 61 0.0342116 II 14000\n## II_16001_20000 II 16001-20000 * | 62 0.0195128 II 18000\n## ... ... ... ... . ... ... ... ...\n## XV_1072001_1076000 XV 1072001-1076000 * | 2783 0.041763 XV 1074000\n## XV_1076001_1080000 XV 1076001-1080000 * | 2784 NaN XV 1078000\n## XV_1080001_1084000 XV 1080001-1084000 * | 2785 NaN XV 1082000\n## XV_1084001_1088000 XV 1084001-1088000 * | 2786 NaN XV 1086000\n## XV_1088001_1091291 XV 1088001-1091291 * | 2787 NaN XV 1089646\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic2)\n## $first\n## GRanges object with 18032 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] II 1-4000 * | 58 NaN II 2000\n## [2] II 1-4000 * | 58 NaN II 2000\n## [3] II 1-4000 * | 58 NaN II 2000\n## [4] II 1-4000 * | 58 NaN II 2000\n## [5] II 1-4000 * | 58 NaN II 2000\n## ... ... ... ... . ... ... ... ...\n## [18028] II 808001-812000 * | 260 NaN II 810000\n## [18029] II 808001-812000 * | 260 NaN II 810000\n## [18030] II 808001-812000 * | 260 NaN II 810000\n## [18031] II 808001-812000 * | 260 NaN II 810000\n## [18032] II 808001-812000 * | 260 NaN II 810000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 18032 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] XV 48001-52000 * | 2527 0.0185354 XV 50000\n## [2] XV 348001-352000 * | 2602 0.0233750 XV 350000\n## [3] XV 468001-472000 * | 2632 0.0153615 XV 470000\n## [4] XV 472001-476000 * | 2633 0.0189624 XV 474000\n## [5] XV 584001-588000 * | 2661 0.0167715 XV 586000\n## ... ... ... ... . ... ... ... ...\n## [18028] XV 980001-984000 * | 2760 0.0187827 XV 982000\n## [18029] XV 984001-988000 * | 2761 0.0250094 XV 986000\n## [18030] XV 992001-996000 * | 2763 0.0185599 XV 994000\n## [18031] XV 1004001-1008000 * | 2766 0.0196942 XV 1006000\n## [18032] XV 1064001-1068000 * | 2781 0.0208220 XV 1066000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\nImport interactions between segments of two chromosomes:\n\n\nhic3 <- import(cf, focus = 'III:10000-40000|XV:10000-40000', resolution = 2000)\n\nregions(hic3)\n## GRanges object with 32 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## III_8001_10000 III 8001-10000 * | 527 NaN III 9000\n## III_10001_12000 III 10001-12000 * | 528 NaN III 11000\n## III_12001_14000 III 12001-14000 * | 529 NaN III 13000\n## III_14001_16000 III 14001-16000 * | 530 0.0356351 III 15000\n## III_16001_18000 III 16001-18000 * | 531 0.0230693 III 17000\n## ... ... ... ... . ... ... ... ...\n## XV_30001_32000 XV 30001-32000 * | 5039 0.0482465 XV 31000\n## XV_32001_34000 XV 32001-34000 * | 5040 0.0241580 XV 33000\n## XV_34001_36000 XV 34001-36000 * | 5041 0.0273166 XV 35000\n## XV_36001_38000 XV 36001-38000 * | 5042 0.0542235 XV 37000\n## XV_38001_40000 XV 38001-40000 * | 5043 0.0206849 XV 39000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(hic3)\n## $first\n## GRanges object with 11 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] III 14001-16000 * | 530 0.0356351 III 15000\n## [2] III 16001-18000 * | 531 0.0230693 III 17000\n## [3] III 16001-18000 * | 531 0.0230693 III 17000\n## [4] III 20001-22000 * | 533 0.0343250 III 21000\n## [5] III 22001-24000 * | 534 0.0258604 III 23000\n## [6] III 24001-26000 * | 535 0.0290757 III 25000\n## [7] III 28001-30000 * | 537 0.0290713 III 29000\n## [8] III 30001-32000 * | 538 0.0266373 III 31000\n## [9] III 32001-34000 * | 539 0.0201137 III 33000\n## [10] III 32001-34000 * | 539 0.0201137 III 33000\n## [11] III 36001-38000 * | 541 0.0220603 III 37000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 11 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] XV 16001-18000 * | 5032 0.0187250 XV 17000\n## [2] XV 16001-18000 * | 5032 0.0187250 XV 17000\n## [3] XV 20001-22000 * | 5034 0.0247973 XV 21000\n## [4] XV 14001-16000 * | 5031 0.0379727 XV 15000\n## [5] XV 10001-12000 * | 5029 0.0296913 XV 11000\n## [6] XV 32001-34000 * | 5040 0.0241580 XV 33000\n## [7] XV 16001-18000 * | 5032 0.0187250 XV 17000\n## [8] XV 38001-40000 * | 5043 0.0206849 XV 39000\n## [9] XV 22001-24000 * | 5035 0.0613856 XV 23000\n## [10] XV 30001-32000 * | 5039 0.0482465 XV 31000\n## [11] XV 10001-12000 * | 5029 0.0296913 XV 11000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\n2.4.2 Interacting with HiCExperiment data\n\nAn HiCExperiment object allows parsing of a disk-stored contact matrix.\nAn HiCExperiment object operates by wrapping together (1) a ContactFile (i.e. a connection to a disk-stored data file) and (2) a GInteractions generated by parsing the data file.\n\nWe will use the yeast_hic HiCExperiment object to demonstrate how to parse information from a HiCExperiment object.\n\nyeast_hic <- contacts_yeast()\n\n\nyeast_hic\n## `HiCExperiment` object with 8,757,906 contacts over 763 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 16000 \n## interactions: 267709 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 \n## metadata(3): ID org date\n\n\n2.4.2.1 Interactions\nThe imported genomic interactions can be directly exposed using the interactions function and are returned as a GInteractions object.\n\ninteractions(yeast_hic)\n## GInteractions object with 267709 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>\n## [1] I 1-16000 --- I 1-16000 | 0 0 2836 1.0943959\n## [2] I 1-16000 --- I 16001-32000 | 0 1 2212 0.9592069\n## [3] I 1-16000 --- I 32001-48000 | 0 2 1183 0.4385242\n## [4] I 1-16000 --- I 48001-64000 | 0 3 831 0.2231192\n## [5] I 1-16000 --- I 64001-80000 | 0 4 310 0.0821255\n## ... ... ... ... ... ... . ... ... ... ...\n## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759 760 3565 1.236371\n## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759 761 1359 0.385016\n## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760 760 3534 2.103988\n## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760 761 3055 1.485794\n## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761 761 4308 1.711565\n## -------\n## regions: 763 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\nBecause genomic interactions are actually stored as GInteractions, regions and anchors work on HiCExperiment objects just as they work with GInteractions!\n\nregions(yeast_hic)\n## GRanges object with 763 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## I_1_16000 I 1-16000 * | 0 0.0196442 I 8000\n## I_16001_32000 I 16001-32000 * | 1 0.0220746 I 24000\n## I_32001_48000 I 32001-48000 * | 2 0.0188701 I 40000\n## I_48001_64000 I 48001-64000 * | 3 0.0136679 I 56000\n## I_64001_80000 I 64001-80000 * | 4 0.0134860 I 72000\n## ... ... ... ... . ... ... ... ...\n## XVI_880001_896000 XVI 880001-896000 * | 758 0.00910873 XVI 888000\n## XVI_896001_912000 XVI 896001-912000 * | 759 0.01421350 XVI 904000\n## XVI_912001_928000 XVI 912001-928000 * | 760 0.02439992 XVI 920000\n## XVI_928001_944000 XVI 928001-944000 * | 761 0.01993237 XVI 936000\n## XVI_944001_948066 XVI 944001-948066 * | 762 NaN XVI 946033\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\nanchors(yeast_hic)\n## $first\n## GRanges object with 267709 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] I 1-16000 * | 0 0.0196442 I 8000\n## [2] I 1-16000 * | 0 0.0196442 I 8000\n## [3] I 1-16000 * | 0 0.0196442 I 8000\n## [4] I 1-16000 * | 0 0.0196442 I 8000\n## [5] I 1-16000 * | 0 0.0196442 I 8000\n## ... ... ... ... . ... ... ... ...\n## [267705] XVI 896001-912000 * | 759 0.0142135 XVI 904000\n## [267706] XVI 896001-912000 * | 759 0.0142135 XVI 904000\n## [267707] XVI 912001-928000 * | 760 0.0243999 XVI 920000\n## [267708] XVI 912001-928000 * | 760 0.0243999 XVI 920000\n## [267709] XVI 928001-944000 * | 761 0.0199324 XVI 936000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n## \n## $second\n## GRanges object with 267709 ranges and 4 metadata columns:\n## seqnames ranges strand | bin_id weight chr center\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer>\n## [1] I 1-16000 * | 0 0.0196442 I 8000\n## [2] I 16001-32000 * | 1 0.0220746 I 24000\n## [3] I 32001-48000 * | 2 0.0188701 I 40000\n## [4] I 48001-64000 * | 3 0.0136679 I 56000\n## [5] I 64001-80000 * | 4 0.0134860 I 72000\n## ... ... ... ... . ... ... ... ...\n## [267705] XVI 912001-928000 * | 760 0.0243999 XVI 920000\n## [267706] XVI 928001-944000 * | 761 0.0199324 XVI 936000\n## [267707] XVI 912001-928000 * | 760 0.0243999 XVI 920000\n## [267708] XVI 928001-944000 * | 761 0.0199324 XVI 936000\n## [267709] XVI 928001-944000 * | 761 0.0199324 XVI 936000\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\n2.4.2.2 Bins and seqinfo\nAdditional useful information can be recovered from a HiCExperiment object. This includes:\n\nThe seqinfo of the HiCExperiment:\n\n\nseqinfo(yeast_hic)\n## Seqinfo object with 16 sequences from an unspecified genome:\n## seqnames seqlengths isCircular genome\n## I 230218 <NA> <NA>\n## II 813184 <NA> <NA>\n## III 316620 <NA> <NA>\n## IV 1531933 <NA> <NA>\n## V 576874 <NA> <NA>\n## ... ... ... ...\n## XII 1078177 <NA> <NA>\n## XIII 924431 <NA> <NA>\n## XIV 784333 <NA> <NA>\n## XV 1091291 <NA> <NA>\n## XVI 948066 <NA> <NA>\n\nThis lists the different chromosomes available to parse along with their length.\n\nThe bins of the HiCExperiment:\n\n\nbins(yeast_hic)\n## GRanges object with 763 ranges and 2 metadata columns:\n## seqnames ranges strand | bin_id weight\n## <Rle> <IRanges> <Rle> | <numeric> <numeric>\n## I_1_16000 I 1-16000 * | 0 0.0196442\n## I_16001_32000 I 16001-32000 * | 1 0.0220746\n## I_32001_48000 I 32001-48000 * | 2 0.0188701\n## I_48001_64000 I 48001-64000 * | 3 0.0136679\n## I_64001_80000 I 64001-80000 * | 4 0.0134860\n## ... ... ... ... . ... ...\n## XVI_880001_896000 XVI 880001-896000 * | 758 0.00910873\n## XVI_896001_912000 XVI 896001-912000 * | 759 0.01421350\n## XVI_912001_928000 XVI 912001-928000 * | 760 0.02439992\n## XVI_928001_944000 XVI 928001-944000 * | 761 0.01993237\n## XVI_944001_948066 XVI 944001-948066 * | 762 NaN\n## -------\n## seqinfo: 16 sequences from an unspecified genome\n\n\n\n\n\n\n\nDifference between bins and regions\n\n\n\nbins are not equivalent to regions of an HiCExperiment.\n\n\nbins refer to all the possible regions of a HiCExperiment. For instance, for a HiCExperiment with a total genome size of 1,000,000 and a resolution of 2000, bins will always return a GRanges object with 500 ranges.\n\nregions, on the opposite, refer to the union of anchors of all the interactions imported in a HiCExperiment object.\n\nThus, all the regions will necessarily be a subset of the HiCExperiment bins, or equal to bins if no focus has been specified when importing a ContactFile.\n\n\n\n2.4.2.3 Scores\nOf course, what the end-user would be looking for is the frequency for each genomic interaction. Such frequency scores are available using the scores function. scores returns a list with a number of different types of scores.\n\nhead(scores(yeast_hic))\n## List of length 2\n## names(2): count balanced\n\nhead(scores(yeast_hic, \"count\"))\n## [1] 2836 2212 1183 831 310 159\n\nhead(scores(yeast_hic, \"balanced\"))\n## [1] 1.09439586 0.95920688 0.43852417 0.22311917 0.08212549 0.03345221\n\nCalling interactions(hic) returns a GInteractions with scores already stored in extra columns. This short-hand allows one to dynamically check scores directly from the interactions output.\n\ninteractions(yeast_hic)\n## GInteractions object with 267709 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>\n## [1] I 1-16000 --- I 1-16000 | 0 0 2836 1.0943959\n## [2] I 1-16000 --- I 16001-32000 | 0 1 2212 0.9592069\n## [3] I 1-16000 --- I 32001-48000 | 0 2 1183 0.4385242\n## [4] I 1-16000 --- I 48001-64000 | 0 3 831 0.2231192\n## [5] I 1-16000 --- I 64001-80000 | 0 4 310 0.0821255\n## ... ... ... ... ... ... . ... ... ... ...\n## [267705] XVI 896001-912000 --- XVI 912001-928000 | 759 760 3565 1.236371\n## [267706] XVI 896001-912000 --- XVI 928001-944000 | 759 761 1359 0.385016\n## [267707] XVI 912001-928000 --- XVI 912001-928000 | 760 760 3534 2.103988\n## [267708] XVI 912001-928000 --- XVI 928001-944000 | 760 761 3055 1.485794\n## [267709] XVI 928001-944000 --- XVI 928001-944000 | 761 761 4308 1.711565\n## -------\n## regions: 763 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\nhead(interactions(yeast_hic)$count)\n## [1] 2836 2212 1183 831 310 159\n\n\n2.4.2.4 topologicalFeatures\nIn Hi-C studies, “topological features” refer to genomic structures identified (usually from a Hi-C map, but not necessarily). For instance, one may want to study known structural loops anchored at CTCF sites, or interactions around or over centromeres, or simply specific genomic “viewpoints”.\nHiCExperiment objects can store topologicalFeatures to facilitate this analysis. By default, four empty topologicalFeatures are stored in a list:\n\ncompartments\nborders\nloops\nviewpoints\n\nAdditional topologicalFeatures can be added to this list (read next chapter for more detail).\n\ntopologicalFeatures(yeast_hic)\n## List of length 5\n## names(5): compartments borders loops viewpoints centromeres\n\ntopologicalFeatures(yeast_hic, 'centromeres')\n## GRanges object with 16 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] I 151583-151641 +\n## [2] II 238361-238419 +\n## [3] III 114322-114380 +\n## [4] IV 449879-449937 +\n## [5] V 152522-152580 +\n## ... ... ... ...\n## [12] XII 151366-151424 +\n## [13] XIII 268222-268280 +\n## [14] XIV 628588-628646 +\n## [15] XV 326897-326955 +\n## [16] XVI 556255-556313 +\n## -------\n## seqinfo: 17 sequences (1 circular) from R64-1-1 genome\n\n\n2.4.2.5 pairsFile\nAs a contact matrix is typically obtained from binning a .pairs file, it is often the case that the matching .pairs file is available to then end-user. A PairsFile can thus be created and associated to the corresponding HiCExperiment object. This allows more accurate estimation of contact distribution, e.g. when calculating distance-dependent genomic interaction frequency.\n\npairsFile(yeast_hic) <- pairsf\n\npairsFile(yeast_hic)\n## EH7703 \n## \"/github/home/.cache/R/ExperimentHub/1a92835ced9_7753\"\n\nreadLines(pairsFile(yeast_hic), 25)\n## [1] \"## pairs format v1.0\" \"#sorted: chr1-pos1-chr2-pos2\" \"#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2\" \"#chromsize: I 230218\" \"#chromsize: II 813184\" \"#chromsize: III 316620\" \"#chromsize: IV 1531933\" \"#chromsize: V 576874\" \"#chromsize: VI 270161\" \"#chromsize: VII 1090940\" \"#chromsize: VIII 562643\" \"#chromsize: IX 439888\" \"#chromsize: X 745751\" \"#chromsize: XI 666816\" \"#chromsize: XII 1078177\" \"#chromsize: XIII 924431\" \"#chromsize: XIV 784333\" \"#chromsize: XV 1091291\" \"#chromsize: XVI 948066\" \"#chromsize: Mito 85779\" \"NS500150:527:HHGYNBGXF:3:21611:19085:3986\\tII\\t105\\tII\\t48548\\t+\\t-\\t1358\\t1681\" \"NS500150:527:HHGYNBGXF:4:13604:19734:2406\\tII\\t113\\tII\\t45003\\t-\\t+\\t1358\\t1658\" \"NS500150:527:HHGYNBGXF:2:11108:25178:11036\\tII\\t119\\tII\\t687251\\t-\\t+\\t1358\\t5550\" \"NS500150:527:HHGYNBGXF:1:22301:8468:1586\\tII\\t160\\tII\\t26124\\t+\\t-\\t1358\\t1510\" \"NS500150:527:HHGYNBGXF:4:23606:24037:2076\\tII\\t169\\tII\\t39052\\t+\\t+\\t1358\\t1613\"\n\n\n2.4.2.6 Importing a PairsFile\n\nThe .pairs file linked to a HiCExperiment object can itself be imported in a GInteractions object:\n\nimport(pairsFile(yeast_hic), format = 'pairs')\n## GInteractions object with 471364 interactions and 3 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2 distance\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <integer>\n## [1] II 105 --- II 48548 | 1358 1681 48443\n## [2] II 113 --- II 45003 | 1358 1658 44890\n## [3] II 119 --- II 687251 | 1358 5550 687132\n## [4] II 160 --- II 26124 | 1358 1510 25964\n## [5] II 169 --- II 39052 | 1358 1613 38883\n## ... ... ... ... ... ... . ... ... ...\n## [471360] II 808605 --- II 809683 | 6316 6320 1078\n## [471361] II 808609 --- II 809917 | 6316 6324 1308\n## [471362] II 808617 --- II 809506 | 6316 6319 889\n## [471363] II 809447 --- II 809685 | 6319 6321 238\n## [471364] II 809472 --- II 809675 | 6319 6320 203\n## -------\n## regions: 549331 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nNote that these GInteractions are not binned, contrary to interactions extracted from a HiCExperiment. Anchors of the interactions listed in the GInteractions imported from a disk-stored .pairs file are all of width 1."
},
{
"objectID": "data-representation.html#visual-summary-of-the-hicexperiment-data-structure",
@@ -130,21 +130,21 @@
"href": "parsing.html#subsetting-a-contact-matrix",
"title": "\n3 Manipulating Hi-C data in R\n",
"section": "\n3.1 Subsetting a contact matrix",
- "text": "3.1 Subsetting a contact matrix\nTwo entirely different approaches are possible to subset of a Hi-C contact matrix:\n\nSubsetting before importing: leveraging random access to a disk-stored contact matrix to only import interactions overlapping with a genomic locus of interest.\nSubsetting after importing: parsing the entire contact matrix in memory, and subsequently subset interactions overlapping with a genomic locus of interest.\n\n\n\n3.1.1 Subsetting before import: with focus\n\nSpecifying a focus when importing a dataset in R (i.e. \"Subset first, then parse\") is generally the recommended approach to import Hi-C data in R.\nThe focus argument can be set when importing a ContactFile in R, as follows:\n\nimport(cf, focus = \"...\")\n\nThis ensures that only the needed data is parsed in R, reducing memory load and accelerating the import. Thus, this should be the preferred way of parsing HiCExperiment data, as disk-stored contact matrices allow efficient random access to indexed data.\nfocus can be any of the following string types:\n\n# \"II\" --> import contacts over an entire chromosome\n# \"II:300001-800000\" --> import on-diagonal contacts within a chromosome\n# \"II:300001-400000|II:600001-700000\" --> import off-diagonal contacts within a chromosome\n# \"II|III\" --> import contacts between two chromosomes\n# \"II:300001-800000|V:1-500000\" --> import contacts between segments of two chromosomes\n\n\n\n\n\n\n\nMore examples for import with focus argument 👇\n\n\n\n\n\n\nSubsetting to a specific on-diagonal genomic location using standard UCSC coordinates query:\n\n\nimport(cf, focus = 'II:300001-800000', resolution = 2000)\n## `HiCExperiment` object with 301,018 contacts over 250 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300,001-800,000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 17974 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting to a specific off-diagonal genomic location using pairs of coordinates query:\n\n\nimport(cf, focus = 'II:300001-400000|II:600001-700000', resolution = 2000)\n## `HiCExperiment` object with 402 contacts over 100 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300001-400000|II:600001-700000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 357 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those constrained within a single chromosome:\n\n\nimport(cf, focus = 'II', resolution = 2000)\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between two chromosomes:\n\n\nimport(cf, focus = 'II|III', resolution = 2000)\n## `HiCExperiment` object with 9,092 contacts over 566 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II|III\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 7438 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between parts of two chromosomes:\n\n\nimport(cf, focus = 'II:300001-800000|V:1-500000', resolution = 2000)\n## `HiCExperiment` object with 7,147 contacts over 500 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300001-800000|V:1-500000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 6523 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n3.1.2 Subsetting after import\nIt may sometimes be desirable to import a full dataset from disk first, and only then perform in-memory subsetting of the HiCExperiment object (i.e. \"Parse first, then subset\"). This is for example necessary when the end user aims to investigate subsets of interactions across a large number of different areas of a contact matrix.\nSeveral strategies are possible to allow subsetting of imported data, either with subsetByOverlaps or [.\n\n3.1.2.1 subsetByOverlaps(<HiCExperiment>, <GRanges>)\n\nsubsetByOverlaps can take a HiCExperiment as a query and a GRanges as a query. In this case, the GRanges is used to extract a subset of a HiCExperiment constrained within a specific genomic location.\n\ntelomere <- GRanges(\"II:700001-813184\")\nsubsetByOverlaps(hic, telomere) |> interactions()\n## GInteractions object with 1540 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric>\n## [1] II 700001-702000 --- II 700001-702000 | 466\n## [2] II 700001-702000 --- II 702001-704000 | 466\n## [3] II 700001-702000 --- II 704001-706000 | 466\n## [4] II 700001-702000 --- II 706001-708000 | 466\n## [5] II 700001-702000 --- II 708001-710000 | 466\n## ... ... ... ... ... ... . ...\n## [1536] II 804001-806000 --- II 810001-812000 | 518\n## [1537] II 806001-808000 --- II 806001-808000 | 519\n## [1538] II 806001-808000 --- II 808001-810000 | 519\n## [1539] II 806001-808000 --- II 810001-812000 | 519\n## [1540] II 808001-810000 --- II 808001-810000 | 520\n## bin_id2 count balanced\n## <numeric> <numeric> <numeric>\n## [1] 466 30 0.0283618\n## [2] 467 145 0.0709380\n## [3] 468 124 0.0704979\n## [4] 469 59 0.0510221\n## [5] 470 59 0.0384004\n## ... ... ... ...\n## [1536] 521 1 NaN\n## [1537] 519 15 0.0560633\n## [1538] 520 25 NaN\n## [1539] 521 1 NaN\n## [1540] 520 10 NaN\n## -------\n## regions: 57 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\n\n\n\n\n\n\ntype argument\n\n\n\nBy default, subsetByOverlaps(hic, telomere) will only recover interactions constrained within telomere, i.e. interactions for which both ends are in telomere.\nAlternatively, type = \"any\" can be specified to get all interactions with at least one of their anchors within telomere.\n\nsubsetByOverlaps(hic, telomere, type = \"any\") |> interactions()\n## GInteractions object with 6041 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric>\n## [1] II 300001-302000 --- II 702001-704000 | 266\n## [2] II 300001-302000 --- II 704001-706000 | 266\n## [3] II 300001-302000 --- II 768001-770000 | 266\n## [4] II 300001-302000 --- II 784001-786000 | 266\n## [5] II 302001-304000 --- II 740001-742000 | 267\n## ... ... ... ... ... ... . ...\n## [6037] II 804001-806000 --- II 810001-812000 | 518\n## [6038] II 806001-808000 --- II 806001-808000 | 519\n## [6039] II 806001-808000 --- II 808001-810000 | 519\n## [6040] II 806001-808000 --- II 810001-812000 | 519\n## [6041] II 808001-810000 --- II 808001-810000 | 520\n## bin_id2 count balanced\n## <numeric> <numeric> <numeric>\n## [1] 467 1 0.000590999\n## [2] 468 1 0.000686799\n## [3] 500 1 0.000728215\n## [4] 508 1 0.000923092\n## [5] 486 1 0.000382222\n## ... ... ... ...\n## [6037] 521 1 NaN\n## [6038] 519 15 0.0560633\n## [6039] 520 25 NaN\n## [6040] 521 1 NaN\n## [6041] 520 10 NaN\n## -------\n## regions: 257 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\n\n\n\n3.1.2.2 <HiCExperiment>[\"...\"]\n\nThe square bracket operator [ allows for more advanced textual queries, similarly to focus arguments that can be used when importing contact matrices in memory.\nThis ensures that only the needed data is parsed in R, reducing memory load and accelerating the import. Thus, this should be the preferred way of parsing HiCExperiment data, as disk-stored contact matrices allow efficient random access to indexed data.\nThe following string types can be used to subset a HiCExperiment object with the [ notation:\n\n# \"II\" --> import contacts over an entire chromosome\n# \"II:300001-800000\" --> import on-diagonal contacts within a chromosome\n# \"II:300001-400000|II:600001-700000\" --> import off-diagonal contacts within a chromosome\n# \"II|III\" --> import contacts between two chromosomes\n# \"II:300001-800000|V:1-500000\" --> import contacts between segments of two chromosomes\n# c(\"II\", \"III\", \"IV\") --> import contacts within and between several chromosomes\n\n\n\n\n\n\n\nMore examples for subsetting with [ 👇\n\n\n\n\n\n\nSubsetting to a specific on-diagonal genomic location using standard UCSC coordinates query:\n\n\nhic[\"II:800001-813184\"]\n## `HiCExperiment` object with 1,040 contacts over 6 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:800,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 19 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting to a specific off-diagonal genomic location using pairs of coordinates query:\n\n\nhic[\"II:300001-320000|II:800001-813184\"]\n## `HiCExperiment` object with 3 contacts over 6 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300001-320000|II:800001-813184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 3 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those constrained within a single chromosome:\n\n\nhic[\"II\"]\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between two chromosomes:\n\n\nhic[\"II|IV\"]\n## `HiCExperiment` object with 0 contacts over 0 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:1-813184|IV:1-1531933\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 0 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between segments of two chromosomes:\n\n\nhic[\"II:300001-320000|IV:1-100000\"]\n## `HiCExperiment` object with 0 contacts over 0 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300001-320000|IV:1-100000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 0 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those constrained within several chromosomes:\n\n\nhic[c('II', 'III', 'IV')]\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II, III, IV\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n\n\n\n\nNote\n\n\n\n\nThis last example (subsetting for a vector of several chromosomes) is the only scenario for which [-based in-memory subsetting of pre-imported data is the only way to go, as such subsetting is not possible with focus from disk-stored data.\nAll the other [ subsetting scenarii illustrated above can be achieved more efficiently using the focus argument when importing data into a HiCExperiment object.\nHowever, keep in mind that subsetting preserves extra data, e.g. added scores, topologicalFeatures, metadata or pairsFile, whereas this information is lost using focus with import.\n\n\n\n\n3.1.3 Zooming on a HiCExperiment\n\n“Zooming” refers to dynamically changing the resolution of a HiCExperiment. By zooming a HiCExperiment, one can refine or coarsen the contact matrix. This operation takes aContactFile and focus from an existing HiCExperiment input and re-generates a new HiCExperiment with updated resolution, interactions and scores. Note that zoom will preserve existing metadata, topologicalFeatures and pairsFile information.\n\nhic\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nzoom(hic, 4000)\n## `HiCExperiment` object with 306,212 contacts over 129 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 4000 \n## interactions: 6800 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nzoom(hic, 1000)\n## `HiCExperiment` object with 306,212 contacts over 514 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 44363 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n\nNote\n\n\n\nThe sum of raw counts do not change after zooming, however the number of individual interactions and regions changes.\n\nlength(hic)\n## [1] 18513\nlength(zoom(hic, 1000))\n## [1] 44363\nlength(zoom(hic, 4000))\n## [1] 6800\nsum(scores(hic, \"count\"))\n## [1] 306212\nsum(scores(zoom(hic, 1000), \"count\"))\n## [1] 306212\nsum(scores(zoom(hic, 4000), \"count\"))\n## [1] 306212\n\n\n\n\n\n\n\n\n\nImportant\n\n\n\n\n\nzoom does not change the focus! It only affects the resolution (and consequently, the interactions).\n\nzoom will only work for multi-resolution contact matrices, e.g. .mcool or .hic."
+ "text": "3.1 Subsetting a contact matrix\nTwo entirely different approaches are possible to subset of a Hi-C contact matrix:\n\nSubsetting before importing: leveraging random access to a disk-stored contact matrix to only import interactions overlapping with a genomic locus of interest.\nSubsetting after importing: parsing the entire contact matrix in memory, and subsequently subset interactions overlapping with a genomic locus of interest.\n\n\n\n3.1.1 Subsetting before import: with focus\n\nSpecifying a focus when importing a dataset in R (i.e. \"Subset first, then parse\") is generally the recommended approach to import Hi-C data in R.\nThe focus argument can be set when importing a ContactFile in R, as follows:\n\nimport(cf, focus = \"...\")\n\nThis ensures that only the needed data is parsed in R, reducing memory load and accelerating the import. Thus, this should be the preferred way of parsing HiCExperiment data, as disk-stored contact matrices allow efficient random access to indexed data.\nfocus can be any of the following string types:\n\n# \"II\" --> import contacts over an entire chromosome\n# \"II:300001-800000\" --> import on-diagonal contacts within a chromosome\n# \"II:300001-400000|II:600001-700000\" --> import off-diagonal contacts within a chromosome\n# \"II|III\" --> import contacts between two chromosomes\n# \"II:300001-800000|V:1-500000\" --> import contacts between segments of two chromosomes\n\n\n\n\n\n\n\nMore examples for import with focus argument 👇\n\n\n\n\n\n\nSubsetting to a specific on-diagonal genomic location using standard UCSC coordinates query:\n\n\nimport(cf, focus = 'II:300001-800000', resolution = 2000)\n## `HiCExperiment` object with 301,018 contacts over 250 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300,001-800,000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 17974 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting to a specific off-diagonal genomic location using pairs of coordinates query:\n\n\nimport(cf, focus = 'II:300001-400000|II:600001-700000', resolution = 2000)\n## `HiCExperiment` object with 402 contacts over 100 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300001-400000|II:600001-700000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 357 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those constrained within a single chromosome:\n\n\nimport(cf, focus = 'II', resolution = 2000)\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between two chromosomes:\n\n\nimport(cf, focus = 'II|III', resolution = 2000)\n## `HiCExperiment` object with 9,092 contacts over 566 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II|III\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 7438 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between parts of two chromosomes:\n\n\nimport(cf, focus = 'II:300001-800000|V:1-500000', resolution = 2000)\n## `HiCExperiment` object with 7,147 contacts over 500 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300001-800000|V:1-500000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 6523 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n3.1.2 Subsetting after import\nIt may sometimes be desirable to import a full dataset from disk first, and only then perform in-memory subsetting of the HiCExperiment object (i.e. \"Parse first, then subset\"). This is for example necessary when the end user aims to investigate subsets of interactions across a large number of different areas of a contact matrix.\nSeveral strategies are possible to allow subsetting of imported data, either with subsetByOverlaps or [.\n\n3.1.2.1 subsetByOverlaps(<HiCExperiment>, <GRanges>)\n\nsubsetByOverlaps can take a HiCExperiment as a query and a GRanges as a query. In this case, the GRanges is used to extract a subset of a HiCExperiment constrained within a specific genomic location.\n\ntelomere <- GRanges(\"II:700001-813184\")\nsubsetByOverlaps(hic, telomere) |> interactions()\n## GInteractions object with 1540 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>\n## [1] II 700001-702000 --- II 700001-702000 | 466 466 30 0.0283618\n## [2] II 700001-702000 --- II 702001-704000 | 466 467 145 0.0709380\n## [3] II 700001-702000 --- II 704001-706000 | 466 468 124 0.0704979\n## [4] II 700001-702000 --- II 706001-708000 | 466 469 59 0.0510221\n## [5] II 700001-702000 --- II 708001-710000 | 466 470 59 0.0384004\n## ... ... ... ... ... ... . ... ... ... ...\n## [1536] II 804001-806000 --- II 810001-812000 | 518 521 1 NaN\n## [1537] II 806001-808000 --- II 806001-808000 | 519 519 15 0.0560633\n## [1538] II 806001-808000 --- II 808001-810000 | 519 520 25 NaN\n## [1539] II 806001-808000 --- II 810001-812000 | 519 521 1 NaN\n## [1540] II 808001-810000 --- II 808001-810000 | 520 520 10 NaN\n## -------\n## regions: 57 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\nBy default, subsetByOverlaps(hic, telomere) will only recover interactions constrained within telomere, i.e. interactions for which both ends are in telomere.\nAlternatively, type = \"any\" can be specified to get all interactions with at least one of their anchors within telomere.\n\nsubsetByOverlaps(hic, telomere, type = \"any\") |> interactions()\n## GInteractions object with 6041 interactions and 4 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric>\n## [1] II 300001-302000 --- II 702001-704000 | 266 467 1 0.000590999\n## [2] II 300001-302000 --- II 704001-706000 | 266 468 1 0.000686799\n## [3] II 300001-302000 --- II 768001-770000 | 266 500 1 0.000728215\n## [4] II 300001-302000 --- II 784001-786000 | 266 508 1 0.000923092\n## [5] II 302001-304000 --- II 740001-742000 | 267 486 1 0.000382222\n## ... ... ... ... ... ... . ... ... ... ...\n## [6037] II 804001-806000 --- II 810001-812000 | 518 521 1 NaN\n## [6038] II 806001-808000 --- II 806001-808000 | 519 519 15 0.0560633\n## [6039] II 806001-808000 --- II 808001-810000 | 519 520 25 NaN\n## [6040] II 806001-808000 --- II 810001-812000 | 519 521 1 NaN\n## [6041] II 808001-810000 --- II 808001-810000 | 520 520 10 NaN\n## -------\n## regions: 257 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome\n\n\n3.1.2.2 <HiCExperiment>[\"...\"]\n\nThe square bracket operator [ allows for more advanced textual queries, similarly to focus arguments that can be used when importing contact matrices in memory.\nThis ensures that only the needed data is parsed in R, reducing memory load and accelerating the import. Thus, this should be the preferred way of parsing HiCExperiment data, as disk-stored contact matrices allow efficient random access to indexed data.\nThe following string types can be used to subset a HiCExperiment object with the [ notation:\n\n# \"II\" --> import contacts over an entire chromosome\n# \"II:300001-800000\" --> import on-diagonal contacts within a chromosome\n# \"II:300001-400000|II:600001-700000\" --> import off-diagonal contacts within a chromosome\n# \"II|III\" --> import contacts between two chromosomes\n# \"II:300001-800000|V:1-500000\" --> import contacts between segments of two chromosomes\n# c(\"II\", \"III\", \"IV\") --> import contacts within and between several chromosomes\n\n\n\n\n\n\n\nMore examples for subsetting with [ 👇\n\n\n\n\n\n\nSubsetting to a specific on-diagonal genomic location using standard UCSC coordinates query:\n\n\nhic[\"II:800001-813184\"]\n## `HiCExperiment` object with 1,040 contacts over 6 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:800,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 19 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting to a specific off-diagonal genomic location using pairs of coordinates query:\n\n\nhic[\"II:300001-320000|II:800001-813184\"]\n## `HiCExperiment` object with 3 contacts over 6 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300001-320000|II:800001-813184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 3 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those constrained within a single chromosome:\n\n\nhic[\"II\"]\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between two chromosomes:\n\n\nhic[\"II|IV\"]\n## `HiCExperiment` object with 0 contacts over 0 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:1-813184|IV:1-1531933\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 0 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those between segments of two chromosomes:\n\n\nhic[\"II:300001-320000|IV:1-100000\"]\n## `HiCExperiment` object with 0 contacts over 0 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300001-320000|IV:1-100000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 0 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\nSubsetting interactions to retain those constrained within several chromosomes:\n\n\nhic[c('II', 'III', 'IV')]\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II, III, IV\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nSome notes:\n\nThis last example (subsetting for a vector of several chromosomes) is the only scenario for which [-based in-memory subsetting of pre-imported data is the only way to go, as such subsetting is not possible with focus from disk-stored data.\nAll the other [ subsetting scenarii illustrated above can be achieved more efficiently using the focus argument when importing data into a HiCExperiment object.\nHowever, keep in mind that subsetting preserves extra data, e.g. added scores, topologicalFeatures, metadata or pairsFile, whereas this information is lost using focus with import.\n\n\n\n\n\n3.1.3 Zooming on a HiCExperiment\n\n“Zooming” refers to dynamically changing the resolution of a HiCExperiment. By zooming a HiCExperiment, one can refine or coarsen the contact matrix. This operation takes aContactFile and focus from an existing HiCExperiment input and re-generates a new HiCExperiment with updated resolution, interactions and scores. Note that zoom will preserve existing metadata, topologicalFeatures and pairsFile information.\n\nhic\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nzoom(hic, 4000)\n## `HiCExperiment` object with 306,212 contacts over 129 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 4000 \n## interactions: 6800 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nzoom(hic, 1000)\n## `HiCExperiment` object with 306,212 contacts over 514 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 44363 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n\nNote\n\n\n\nThe sum of raw counts do not change after zooming, however the number of individual interactions and regions changes.\n\nlength(hic)\n## [1] 18513\nlength(zoom(hic, 1000))\n## [1] 44363\nlength(zoom(hic, 4000))\n## [1] 6800\nsum(scores(hic, \"count\"))\n## [1] 306212\nsum(scores(zoom(hic, 1000), \"count\"))\n## [1] 306212\nsum(scores(zoom(hic, 4000), \"count\"))\n## [1] 306212\n\n\n\n\n\n\n\n\n\nImportant\n\n\n\n\n\nzoom does not change the focus! It only affects the resolution (and consequently, the interactions).\n\nzoom will only work for multi-resolution contact matrices, e.g. .mcool or .hic."
},
{
"objectID": "parsing.html#updating-an-hicexperiment-object",
"href": "parsing.html#updating-an-hicexperiment-object",
"title": "\n3 Manipulating Hi-C data in R\n",
"section": "\n3.2 Updating an HiCExperiment object",
- "text": "3.2 Updating an HiCExperiment object\n\n\n\n\n\n\nTL;DR: Which HiCExperiment slots are mutable (✅) / immutable (⛔️)?\n\n\n\n\n\nfileName(hic): ⛔️ (obtained from disk-stored file)\n\nfocus(hic): 🤔 (see subsetting section)\n\nresolutions(hic): ⛔️ (obtained from disk-stored file)\n\nresolution(hic): 🤔 (see zooming section)\n\ninteractions(hic): ⛔️ (obtained from disk-stored file)\n\nscores(hic): ✅\n\ntopologicalFeatures(hic): ✅\n\npairsFile(hic): ✅\n\nmetadata(hic): ✅\n\n\n\n\n3.2.1 Immutable slots\nAn HiCExperiment object acts as an interface exposing disk-stored data. This implies that the fileName slot itself is immutable (i.e. cannot be changed). This should be obvious, as a HiCExperiment has to be associated with a disk-stored contact matrix to properly function (except in some advanced cases developed in next chapters).\nFor this reason, methods to manually modify interactions and resolutions slots are also not exposed in the HiCExperiment package.\nA corollary of this is that the associated regions and anchors of an HiCExperiment should not be modified by hand either, since they are directly linked to interactions.\n\n3.2.2 Mutable slots\nThat being said, HiCExperiment objects are flexible and can be partially modified in memory without having to change/overwrite the original, disk-stored contact matrix.\nSeveral slots can be modified in memory: slots, topologicalFeatures, pairsFile and metadata.\n\n3.2.2.1 scores\n\nWe have seen in the previous chapter that scores are stored in a list and are available using the scores function.\n\nscores(hic)\n## List of length 2\n## names(2): count balanced\n\nhead(scores(hic, \"count\"))\n## [1] 7 92 75 61 38 43\n\nhead(scores(hic, \"balanced\"))\n## [1] 0.009657438 0.076622340 0.054101992 0.042940512 0.040905212 0.029293930\n\nExtra scores can be added to this list, e.g. to describe the “expected” interaction frequency for each interaction stored in the HiCExperiment object). This can be achieved using the scores()<- function.\n\nscores(hic, \"random\") <- runif(length(hic))\n\nscores(hic)\n## List of length 3\n## names(3): count balanced random\n\nhead(scores(hic, \"random\"))\n## [1] 0.080750138 0.834333037 0.600760886 0.157208442 0.007399441 0.466393497\n\n\n3.2.2.2 topologicalFeatures\n\nThe end-user can create additional topologicalFeatures or modify the existing ones using the topologicalFeatures()<- function.\n\ntopologicalFeatures(hic, 'CTCF') <- GRanges(c(\n \"II:340-352\", \n \"II:3520-3532\", \n \"II:7980-7992\", \n \"II:9240-9252\" \n))\ntopologicalFeatures(hic, 'CTCF')\n## GRanges object with 4 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] II 340-352 *\n## [2] II 3520-3532 *\n## [3] II 7980-7992 *\n## [4] II 9240-9252 *\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\ntopologicalFeatures(hic, 'loops') <- GInteractions(\n topologicalFeatures(hic, 'CTCF')[rep(1:3, each = 3)],\n topologicalFeatures(hic, 'CTCF')[rep(1:3, 3)]\n)\ntopologicalFeatures(hic, 'loops')\n## GInteractions object with 9 interactions and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] II 340-352 --- II 340-352\n## [2] II 340-352 --- II 3520-3532\n## [3] II 340-352 --- II 7980-7992\n## [4] II 3520-3532 --- II 340-352\n## [5] II 3520-3532 --- II 3520-3532\n## [6] II 3520-3532 --- II 7980-7992\n## [7] II 7980-7992 --- II 340-352\n## [8] II 7980-7992 --- II 3520-3532\n## [9] II 7980-7992 --- II 7980-7992\n## -------\n## regions: 3 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nhic\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(3): count balanced random \n## topologicalFeatures: compartments(0) borders(0) loops(9) viewpoints(0) CTCF(4) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n\nNote\n\n\n\nAll these objects can be used in *Overlap methods, as they all extend the GRanges class of objects.\n\n# ---- This counts the number of times `CTCF` anchors are being used in the \n# `loops` `GInteractions` object\ncountOverlaps(\n query = topologicalFeatures(hic, 'CTCF'), \n subject = topologicalFeatures(hic, 'loops')\n)\n## [1] 5 5 5 0\n\n\n\n\n3.2.2.3 pairsFile\n\nIf pairsFile is not specified when importing the ContactFile into a HiCExperiment object, one can add it later.\n\npairsf <- HiContactsData('yeast_wt', 'pairs.gz')\n\n\npairsFile(hic) <- pairsf\nhic\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(3): count balanced random \n## topologicalFeatures: compartments(0) borders(0) loops(9) viewpoints(0) CTCF(4) \n## pairsFile: /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753 \n## metadata(0):\n\n\n3.2.2.4 metadata\n\nMetadata associated with a HiCExperiment can be updated at any point.\n\nmetadata(hic) <- list(\n info = \"HiCExperiment created from an example .mcool file from `HiContactsData`\", \n date = date()\n)\nmetadata(hic)\n## $info\n## [1] \"HiCExperiment created from an example .mcool file from `HiContactsData`\"\n## \n## $date\n## [1] \"Thu Oct 19 10:04:41 2023\""
+ "text": "3.2 Updating an HiCExperiment object\n\n\n\n\n\n\nTL;DR: Which HiCExperiment slots are mutable (✅) / immutable (⛔️)?\n\n\n\n\n\nfileName(hic): ⛔️ (obtained from disk-stored file)\n\nfocus(hic): 🤔 (see subsetting section)\n\nresolutions(hic): ⛔️ (obtained from disk-stored file)\n\nresolution(hic): 🤔 (see zooming section)\n\ninteractions(hic): ⛔️ (obtained from disk-stored file)\n\nscores(hic): ✅\n\ntopologicalFeatures(hic): ✅\n\npairsFile(hic): ✅\n\nmetadata(hic): ✅\n\n\n\n\n3.2.1 Immutable slots\nAn HiCExperiment object acts as an interface exposing disk-stored data. This implies that the fileName slot itself is immutable (i.e. cannot be changed). This should be obvious, as a HiCExperiment has to be associated with a disk-stored contact matrix to properly function (except in some advanced cases developed in next chapters).\nFor this reason, methods to manually modify interactions and resolutions slots are also not exposed in the HiCExperiment package.\nA corollary of this is that the associated regions and anchors of an HiCExperiment should not be modified by hand either, since they are directly linked to interactions.\n\n3.2.2 Mutable slots\nThat being said, HiCExperiment objects are flexible and can be partially modified in memory without having to change/overwrite the original, disk-stored contact matrix.\nSeveral slots can be modified in memory: slots, topologicalFeatures, pairsFile and metadata.\n\n3.2.2.1 scores\n\nWe have seen in the previous chapter that scores are stored in a list and are available using the scores function.\n\nscores(hic)\n## List of length 2\n## names(2): count balanced\n\nhead(scores(hic, \"count\"))\n## [1] 7 92 75 61 38 43\n\nhead(scores(hic, \"balanced\"))\n## [1] 0.009657438 0.076622340 0.054101992 0.042940512 0.040905212 0.029293930\n\nExtra scores can be added to this list, e.g. to describe the “expected” interaction frequency for each interaction stored in the HiCExperiment object). This can be achieved using the scores()<- function.\n\nscores(hic, \"random\") <- runif(length(hic))\n\nscores(hic)\n## List of length 3\n## names(3): count balanced random\n\nhead(scores(hic, \"random\"))\n## [1] 0.080750138 0.834333037 0.600760886 0.157208442 0.007399441 0.466393497\n\n\n3.2.2.2 topologicalFeatures\n\nThe end-user can create additional topologicalFeatures or modify the existing ones using the topologicalFeatures()<- function.\n\ntopologicalFeatures(hic, 'CTCF') <- GRanges(c(\n \"II:340-352\", \n \"II:3520-3532\", \n \"II:7980-7992\", \n \"II:9240-9252\" \n))\ntopologicalFeatures(hic, 'CTCF')\n## GRanges object with 4 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] II 340-352 *\n## [2] II 3520-3532 *\n## [3] II 7980-7992 *\n## [4] II 9240-9252 *\n## -------\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\ntopologicalFeatures(hic, 'loops') <- GInteractions(\n topologicalFeatures(hic, 'CTCF')[rep(1:3, each = 3)],\n topologicalFeatures(hic, 'CTCF')[rep(1:3, 3)]\n)\ntopologicalFeatures(hic, 'loops')\n## GInteractions object with 9 interactions and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] II 340-352 --- II 340-352\n## [2] II 340-352 --- II 3520-3532\n## [3] II 340-352 --- II 7980-7992\n## [4] II 3520-3532 --- II 340-352\n## [5] II 3520-3532 --- II 3520-3532\n## [6] II 3520-3532 --- II 7980-7992\n## [7] II 7980-7992 --- II 340-352\n## [8] II 7980-7992 --- II 3520-3532\n## [9] II 7980-7992 --- II 7980-7992\n## -------\n## regions: 3 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nhic\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(3): count balanced random \n## topologicalFeatures: compartments(0) borders(0) loops(9) viewpoints(0) CTCF(4) \n## pairsFile: N/A \n## metadata(0):\n\nAll these objects can be used in *Overlap methods, as they all extend the GRanges class of objects.\n\n# ---- This counts the number of times `CTCF` anchors are being used in the \n# `loops` `GInteractions` object\ncountOverlaps(\n query = topologicalFeatures(hic, 'CTCF'), \n subject = topologicalFeatures(hic, 'loops')\n)\n## [1] 5 5 5 0\n\n\n3.2.2.3 pairsFile\n\nIf pairsFile is not specified when importing the ContactFile into a HiCExperiment object, one can add it later.\n\npairsf <- HiContactsData('yeast_wt', 'pairs.gz')\n\n\npairsFile(hic) <- pairsf\nhic\n## `HiCExperiment` object with 306,212 contacts over 257 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:300,001-813,184\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 18513 \n## scores(3): count balanced random \n## topologicalFeatures: compartments(0) borders(0) loops(9) viewpoints(0) CTCF(4) \n## pairsFile: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 \n## metadata(0):\n\n\n3.2.2.4 metadata\n\nMetadata associated with a HiCExperiment can be updated at any point.\n\nmetadata(hic) <- list(\n info = \"HiCExperiment created from an example .mcool file from `HiContactsData`\", \n date = date()\n)\nmetadata(hic)\n## $info\n## [1] \"HiCExperiment created from an example .mcool file from `HiContactsData`\"\n## \n## $date\n## [1] \"Mon Oct 30 10:36:54 2023\""
},
{
"objectID": "parsing.html#coercing-hicexperiment-objects",
"href": "parsing.html#coercing-hicexperiment-objects",
"title": "\n3 Manipulating Hi-C data in R\n",
"section": "\n3.3 Coercing HiCExperiment objects",
- "text": "3.3 Coercing HiCExperiment objects\nConvenient coercing functions exist to transform data stored as a HiCExperiment into another class.\n\n\nas.matrix(): allows to coerce the HiCExperiment into a sparse or dense matrix (using the sparse logical argument, TRUE by default) and choosing specific scores of interest (using the use.scores argument, \"balanced\" by default).\n\n\n# ----- `as.matrix` coerces a `HiCExperiment` into a `sparseMatrix` by default \nas.matrix(hic) |> class()\n## [1] \"dgTMatrix\"\n## attr(,\"package\")\n## [1] \"Matrix\"\n\nas.matrix(hic) |> dim()\n## [1] 257 257\n\n# ----- One can specify which scores should be used when coercing into a matrix\nas.matrix(hic, use.scores = \"balanced\")[1:5, 1:5]\n## 5 x 5 sparse Matrix of class \"dgTMatrix\"\n## \n## [1,] 0.009657438 0.07662234 0.05410199 0.04294051 0.04090521\n## [2,] 0.076622340 0.05128277 0.09841564 0.06926737 0.05263611\n## [3,] 0.054101992 0.09841564 0.05657589 0.08723160 0.07316890\n## [4,] 0.042940512 0.06926737 0.08723160 0.03699543 0.08403496\n## [5,] 0.040905212 0.05263611 0.07316890 0.08403496 0.04787415\n\nas.matrix(hic, use.scores = \"count\")[1:5, 1:5]\n## 5 x 5 sparse Matrix of class \"dgTMatrix\"\n## \n## [1,] 7 92 75 61 38\n## [2,] 92 102 226 163 81\n## [3,] 75 226 150 237 130\n## [4,] 61 163 237 103 153\n## [5,] 38 81 130 153 57\n\n# ----- If **expressly required**, one can coerce a HiCExperiment into a dense matrix\nas.matrix(hic, use.scores = \"count\", sparse = FALSE)[1:5, 1:5]\n## [,1] [,2] [,3] [,4] [,5]\n## [1,] 7 92 75 61 38\n## [2,] 92 102 226 163 81\n## [3,] 75 226 150 237 130\n## [4,] 61 163 237 103 153\n## [5,] 38 81 130 153 57\n\n\n\nas.data.frame(): simply coercing interactions into a rectangular data frame\n\n\nas.data.frame(hic) |> head()\n## seqnames1 start1 end1 width1 strand1 bin_id1 weight1 center1\n## 1 II 300001 302000 2000 * 266 0.03714342 301000\n## 2 II 300001 302000 2000 * 266 0.03714342 301000\n## 3 II 300001 302000 2000 * 266 0.03714342 301000\n## 4 II 300001 302000 2000 * 266 0.03714342 301000\n## 5 II 300001 302000 2000 * 266 0.03714342 301000\n## 6 II 300001 302000 2000 * 266 0.03714342 301000\n## seqnames2 start2 end2 width2 strand2 bin_id2 weight2 center2 count\n## 1 II 300001 302000 2000 * 266 0.03714342 301000 7\n## 2 II 302001 304000 2000 * 267 0.02242258 303000 92\n## 3 II 304001 306000 2000 * 268 0.01942093 305000 75\n## 4 II 306001 308000 2000 * 269 0.01895202 307000 61\n## 5 II 308001 310000 2000 * 270 0.02898098 309000 38\n## 6 II 310001 312000 2000 * 271 0.01834118 311000 43\n## balanced random\n## 1 0.009657438 0.080750138\n## 2 0.076622340 0.834333037\n## 3 0.054101992 0.600760886\n## 4 0.042940512 0.157208442\n## 5 0.040905212 0.007399441\n## 6 0.029293930 0.466393497\n\n\n\n\n\n\n\nWarning\n\n\n\nThese coercing methods only operate on interactions and scores, and discard all other information, e.g. regarding genomic regions, available resolutions, associated metadata, pairsFile or topologicalFeatures."
+ "text": "3.3 Coercing HiCExperiment objects\nConvenient coercing functions exist to transform data stored as a HiCExperiment into another class.\n\n\nas.matrix(): allows to coerce the HiCExperiment into a sparse or dense matrix (using the sparse logical argument, TRUE by default) and choosing specific scores of interest (using the use.scores argument, \"balanced\" by default).\n\n\n# ----- `as.matrix` coerces a `HiCExperiment` into a `sparseMatrix` by default \nas.matrix(hic) |> class()\n## [1] \"dgTMatrix\"\n## attr(,\"package\")\n## [1] \"Matrix\"\n\nas.matrix(hic) |> dim()\n## [1] 257 257\n\n# ----- One can specify which scores should be used when coercing into a matrix\nas.matrix(hic, use.scores = \"balanced\")[1:5, 1:5]\n## 5 x 5 sparse Matrix of class \"dgTMatrix\"\n## \n## [1,] 0.009657438 0.07662234 0.05410199 0.04294051 0.04090521\n## [2,] 0.076622340 0.05128277 0.09841564 0.06926737 0.05263611\n## [3,] 0.054101992 0.09841564 0.05657589 0.08723160 0.07316890\n## [4,] 0.042940512 0.06926737 0.08723160 0.03699543 0.08403496\n## [5,] 0.040905212 0.05263611 0.07316890 0.08403496 0.04787415\n\nas.matrix(hic, use.scores = \"count\")[1:5, 1:5]\n## 5 x 5 sparse Matrix of class \"dgTMatrix\"\n## \n## [1,] 7 92 75 61 38\n## [2,] 92 102 226 163 81\n## [3,] 75 226 150 237 130\n## [4,] 61 163 237 103 153\n## [5,] 38 81 130 153 57\n\n# ----- If **expressly required**, one can coerce a HiCExperiment into a dense matrix\nas.matrix(hic, use.scores = \"count\", sparse = FALSE)[1:5, 1:5]\n## [,1] [,2] [,3] [,4] [,5]\n## [1,] 7 92 75 61 38\n## [2,] 92 102 226 163 81\n## [3,] 75 226 150 237 130\n## [4,] 61 163 237 103 153\n## [5,] 38 81 130 153 57\n\n\n\nas.data.frame(): simply coercing interactions into a rectangular data frame\n\n\nas.data.frame(hic) |> head()\n## seqnames1 start1 end1 width1 strand1 bin_id1 weight1 center1 seqnames2 start2 end2 width2 strand2 bin_id2 weight2 center2 count balanced random\n## 1 II 300001 302000 2000 * 266 0.03714342 301000 II 300001 302000 2000 * 266 0.03714342 301000 7 0.009657438 0.080750138\n## 2 II 300001 302000 2000 * 266 0.03714342 301000 II 302001 304000 2000 * 267 0.02242258 303000 92 0.076622340 0.834333037\n## 3 II 300001 302000 2000 * 266 0.03714342 301000 II 304001 306000 2000 * 268 0.01942093 305000 75 0.054101992 0.600760886\n## 4 II 300001 302000 2000 * 266 0.03714342 301000 II 306001 308000 2000 * 269 0.01895202 307000 61 0.042940512 0.157208442\n## 5 II 300001 302000 2000 * 266 0.03714342 301000 II 308001 310000 2000 * 270 0.02898098 309000 38 0.040905212 0.007399441\n## 6 II 300001 302000 2000 * 266 0.03714342 301000 II 310001 312000 2000 * 271 0.01834118 311000 43 0.029293930 0.466393497\n\n\n\n\n\n\n\nWarning\n\n\n\nThese coercing methods only operate on interactions and scores, and discard all other information, e.g. regarding genomic regions, available resolutions, associated metadata, pairsFile or topologicalFeatures."
},
{
"objectID": "visualization.html",
@@ -165,14 +165,14 @@
"href": "visualization.html#hi-c-maps-customization-options",
"title": "\n4 Hi-C data visualization\n",
"section": "\n4.2 Hi-C maps customization options",
- "text": "4.2 Hi-C maps customization options\nA number of customization options are available for the plotMatrix function. The next subsections focus on how to:\n\nPick the scores of interest to represent in a Hi-C heatmap;\nChange the numeric scale and boundaries;\nChange the color map;\nExtra customization options\n\n\n4.2.1 Choosing scores\nBy default, plotMatrix will attempt to plot balanced (coverage normalized) Hi-C matrices. However, extra scores may be associated with interactions in a HiCExperiment object (more on this in the next chapter)\nFor instance, we can plot the count scores, which are un-normalized raw contact counts directly obtained when binning a .pairs file:\n\nplotMatrix(hic, use.scores = 'count')\n\n\n\n\n\n\n\n\n4.2.2 Choosing scale\nThe color scale is automatically adjusted to range from the minimum to the maximum scores of the HiCExperiment being plotted. This can be adjusted using the limits argument.\n\nplotMatrix(hic, limits = c(-3.5, -1))\n\n\n\n\n\n\n\n\n4.2.3 Choosing color map\n?HiContacts::palettes returns a list of available color maps to use with plotMatrix. Any custom color map can also be used by manually specifying a vector of colors.\n\n# ----- `afmhotr` color map is shipped in the `HiContacts` package\nafmhotrColors() \n## [1] \"#ffffff\" \"#f8f5c3\" \"#f4ee8d\" \"#f6be35\" \"#ee7d32\" \"#c44228\" \"#821d19\"\n## [8] \"#381211\" \"#050606\"\nplotMatrix(\n hic, \n use.scores = 'balanced',\n limits = c(-4, -1),\n cmap = afmhotrColors()\n)"
+ "text": "4.2 Hi-C maps customization options\nA number of customization options are available for the plotMatrix function. The next subsections focus on how to:\n\nPick the scores of interest to represent in a Hi-C heatmap;\nChange the numeric scale and boundaries;\nChange the color map;\nExtra customization options\n\n\n4.2.1 Choosing scores\nBy default, plotMatrix will attempt to plot balanced (coverage normalized) Hi-C matrices. However, extra scores may be associated with interactions in a HiCExperiment object (more on this in the next chapter)\nFor instance, we can plot the count scores, which are un-normalized raw contact counts directly obtained when binning a .pairs file:\n\nplotMatrix(hic, use.scores = 'count')\n\n\n\n\n\n\n\n\n4.2.2 Choosing scale\nThe color scale is automatically adjusted to range from the minimum to the maximum scores of the HiCExperiment being plotted. This can be adjusted using the limits argument.\n\nplotMatrix(hic, limits = c(-3.5, -1))\n\n\n\n\n\n\n\n\n4.2.3 Choosing color map\n?HiContacts::palettes returns a list of available color maps to use with plotMatrix. Any custom color map can also be used by manually specifying a vector of colors.\n\n# ----- `afmhotr` color map is shipped in the `HiContacts` package\nafmhotrColors() \n## [1] \"#ffffff\" \"#f8f5c3\" \"#f4ee8d\" \"#f6be35\" \"#ee7d32\" \"#c44228\" \"#821d19\" \"#381211\" \"#050606\"\nplotMatrix(\n hic, \n use.scores = 'balanced',\n limits = c(-4, -1),\n cmap = afmhotrColors()\n)"
},
{
"objectID": "visualization.html#advanced-visualization",
"href": "visualization.html#advanced-visualization",
"title": "\n4 Hi-C data visualization\n",
"section": "\n4.3 Advanced visualization",
- "text": "4.3 Advanced visualization\n\n4.3.1 Overlaying topological features\nTopological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap.\nTo illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from.\n\nlibrary(rtracklayer)\nlibrary(InteractionSet)\nloops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> \n import() |> \n makeGInteractionsFromGRangesPairs()\nloops\n## GInteractions object with 162 interactions and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] I 3001-4000 --- I 29001-30000\n## [2] I 29001-30000 --- I 50001-51000\n## [3] I 95001-96000 --- I 128001-129000\n## [4] I 133001-134000 --- I 157001-158000\n## [5] II 8001-9000 --- II 46001-47000\n## ... ... ... ... ... ...\n## [158] XVI 773001-774000 --- XVI 803001-804000\n## [159] XVI 834001-835000 --- XVI 859001-860000\n## [160] XVI 860001-861000 --- XVI 884001-885000\n## [161] XVI 901001-902000 --- XVI 940001-941000\n## [162] XVI 917001-918000 --- XVI 939001-940000\n## -------\n## regions: 316 ranges and 0 metadata columns\n## seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nSimilarly, borders have also been mapped with chromosight. We can also import them in R.\n\nborders <- system.file('extdata', 'S288C-borders.bed', package = 'HiCExperiment') |> \n import()\nborders\n## GRanges object with 814 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] I 73001-74000 *\n## [2] I 108001-109000 *\n## [3] I 181001-182000 *\n## [4] II 90001-91000 *\n## [5] II 119001-120000 *\n## ... ... ... ...\n## [810] XVI 777001-778000 *\n## [811] XVI 796001-797000 *\n## [812] XVI 811001-812000 *\n## [813] XVI 890001-891000 *\n## [814] XVI 933001-934000 *\n## -------\n## seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nChromatin loops are stored in GInteractions while borders are GRanges. The former will be displayed as off-diagonal circles and the later as on-diagonal diamonds on the Hi-C heatmap.\n\nplotMatrix(hic, loops = loops, borders = borders)\n\n\n\n\n\n\n\n\n4.3.2 Aggregated Hi-C maps\nFinally, Hi-C map “snippets” (i.e. extracts) are often aggregated together to show an average signal. This analysis is sometimes referred to as APA (Aggregated Plot Analysis).\nAggregated Hi-C maps can be computed over a collection of targets using the aggregate function. These targets can be GRanges (to extract on-diagonal snippets) or GInteractions (to extract off-diagonal snippets). The flankingBins specifies how many matrix bins should be extracted on each side of the targets of interest.\nHere, we compute the aggregated Hi-C snippets of ± 15kb around each chromatin loop listed in loops.\n\nhic <- zoom(hic, 1000)\naggr_loops <- aggregate(hic, targets = loops, flankingBins = 15)\n## Going through preflight checklist...\n## Parsing the entire contact matrice as a sparse matrix...\n## Modeling distance decay...\n## Filtering for contacts within provided targets...\naggr_loops\n## `AggrHiCExperiment` object over 148 targets \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: 148 targets \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 961 \n## scores(4): count balanced expected detrended \n## slices(4): count balanced expected detrended \n## topologicalFeatures: targets(148) compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\naggregate generates a AggrHiCExperiment object, a flavor of HiCExperiment class of objects.\n\n\nAggrHiCExperiment objects have an extra slices slot. This stores a list of arrays, one per scores. Each array is of 3 dimensions, x and y representing the heatmap axes, and z representing the index of the target.\n\nAggrHiCExperiment objects also have a mandatory topologicalFeatures element named targets, storing the genomic loci provided in aggregate.\n\n\nslices(aggr_loops)\n## List of length 4\n## names(4): count balanced expected detrended\ndim(slices(aggr_loops, 'count'))\n## [1] 31 31 148\ntopologicalFeatures(aggr_loops, 'targets')\n## Pairs object with 148 pairs and 0 metadata columns:\n## first second\n## <GRanges> <GRanges>\n## [1] I:14501-44500 I:35501-65500\n## [2] I:80501-110500 I:113501-143500\n## [3] I:118501-148500 I:142501-172500\n## [4] II:33501-63500 II:63501-93500\n## [5] II:134501-164500 II:159501-189500\n## ... ... ...\n## [144] XVI:586501-616500 XVI:606501-636500\n## [145] XVI:733501-763500 XVI:754501-784500\n## [146] XVI:758501-788500 XVI:788501-818500\n## [147] XVI:819501-849500 XVI:844501-874500\n## [148] XVI:845501-875500 XVI:869501-899500\n\nThe resulting AggrHiCExperiment can be plotted using the same plotMatrix function with the arguments described above.\n\nplotMatrix(\n aggr_loops, \n use.scores = 'detrended', \n scale = 'linear', \n limits = c(-1, 1), \n cmap = bgrColors()\n)"
+ "text": "4.3 Advanced visualization\n\n4.3.1 Overlaying topological features\nTopological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap.\nTo illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from.\n\nlibrary(rtracklayer)\nlibrary(InteractionSet)\nloops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> \n import() |> \n makeGInteractionsFromGRangesPairs()\nloops\n## GInteractions object with 162 interactions and 0 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2\n## <Rle> <IRanges> <Rle> <IRanges>\n## [1] I 3001-4000 --- I 29001-30000\n## [2] I 29001-30000 --- I 50001-51000\n## [3] I 95001-96000 --- I 128001-129000\n## [4] I 133001-134000 --- I 157001-158000\n## [5] II 8001-9000 --- II 46001-47000\n## ... ... ... ... ... ...\n## [158] XVI 773001-774000 --- XVI 803001-804000\n## [159] XVI 834001-835000 --- XVI 859001-860000\n## [160] XVI 860001-861000 --- XVI 884001-885000\n## [161] XVI 901001-902000 --- XVI 940001-941000\n## [162] XVI 917001-918000 --- XVI 939001-940000\n## -------\n## regions: 316 ranges and 0 metadata columns\n## seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nSimilarly, borders have also been mapped with chromosight. We can also import them in R.\n\nborders <- system.file('extdata', 'S288C-borders.bed', package = 'HiCExperiment') |> \n import()\nborders\n## GRanges object with 814 ranges and 0 metadata columns:\n## seqnames ranges strand\n## <Rle> <IRanges> <Rle>\n## [1] I 73001-74000 *\n## [2] I 108001-109000 *\n## [3] I 181001-182000 *\n## [4] II 90001-91000 *\n## [5] II 119001-120000 *\n## ... ... ... ...\n## [810] XVI 777001-778000 *\n## [811] XVI 796001-797000 *\n## [812] XVI 811001-812000 *\n## [813] XVI 890001-891000 *\n## [814] XVI 933001-934000 *\n## -------\n## seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nChromatin loops are stored in GInteractions while borders are GRanges. The former will be displayed as off-diagonal circles and the later as on-diagonal diamonds on the Hi-C heatmap.\n\nplotMatrix(hic, loops = loops, borders = borders)\n\n\n\n\n\n\n\n\n4.3.2 Aggregated Hi-C maps\nFinally, Hi-C map “snippets” (i.e. extracts) are often aggregated together to show an average signal. This analysis is sometimes referred to as APA (Aggregated Plot Analysis).\nAggregated Hi-C maps can be computed over a collection of targets using the aggregate function. These targets can be GRanges (to extract on-diagonal snippets) or GInteractions (to extract off-diagonal snippets). The flankingBins specifies how many matrix bins should be extracted on each side of the targets of interest.\nHere, we compute the aggregated Hi-C snippets of ± 15kb around each chromatin loop listed in loops.\n\nhic <- zoom(hic, 1000)\naggr_loops <- aggregate(hic, targets = loops, flankingBins = 15)\n## Going through preflight checklist...\n## Parsing the entire contact matrice as a sparse matrix...\n## Modeling distance decay...\n## Filtering for contacts within provided targets...\naggr_loops\n## `AggrHiCExperiment` object over 148 targets \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: 148 targets \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 961 \n## scores(4): count balanced expected detrended \n## slices(4): count balanced expected detrended \n## topologicalFeatures: targets(148) compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\naggregate generates a AggrHiCExperiment object, a flavor of HiCExperiment class of objects.\n\n\nAggrHiCExperiment objects have an extra slices slot. This stores a list of arrays, one per scores. Each array is of 3 dimensions, x and y representing the heatmap axes, and z representing the index of the target.\n\nAggrHiCExperiment objects also have a mandatory topologicalFeatures element named targets, storing the genomic loci provided in aggregate.\n\n\nslices(aggr_loops)\n## List of length 4\n## names(4): count balanced expected detrended\ndim(slices(aggr_loops, 'count'))\n## [1] 31 31 148\ntopologicalFeatures(aggr_loops, 'targets')\n## Pairs object with 148 pairs and 0 metadata columns:\n## first second\n## <GRanges> <GRanges>\n## [1] I:14501-44500 I:35501-65500\n## [2] I:80501-110500 I:113501-143500\n## [3] I:118501-148500 I:142501-172500\n## [4] II:33501-63500 II:63501-93500\n## [5] II:134501-164500 II:159501-189500\n## ... ... ...\n## [144] XVI:586501-616500 XVI:606501-636500\n## [145] XVI:733501-763500 XVI:754501-784500\n## [146] XVI:758501-788500 XVI:788501-818500\n## [147] XVI:819501-849500 XVI:844501-874500\n## [148] XVI:845501-875500 XVI:869501-899500\n\nThe resulting AggrHiCExperiment can be plotted using the same plotMatrix function with the arguments described above.\n\nplotMatrix(\n aggr_loops, \n use.scores = 'detrended', \n scale = 'linear', \n limits = c(-1, 1), \n cmap = bgrColors()\n)"
},
{
"objectID": "matrix-centric.html",
@@ -186,7 +186,7 @@
"href": "matrix-centric.html#operations-in-an-individual-matrix",
"title": "\n5 Matrix-centric analysis\n",
"section": "\n5.1 Operations in an individual matrix",
- "text": "5.1 Operations in an individual matrix\n\n5.1.1 Balancing a raw interaction count map\nHi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices.\nTo correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function.\n\nnormalized_hic <- normalize(hic)\nnormalized_hic\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(3): count balanced ICE \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n\nNote\n\n\n\nThe only change done to the HiCExperiment object by the normalize function is the addition of a single extra ICE in scores list. The interactions themselves are unmodified.\n\n\nIt is possible to plot the different scores of the resulting object to visualize the newly computed scores. In this example, ICE scores should be nearly identical to balanced scores, which were originally imported from the disk-stored contact matrix.\n\n\npatchwork::wrap_plots(\n plotMatrix(normalized_hic, use.scores = 'count', caption = FALSE),\n plotMatrix(normalized_hic, use.scores = 'balanced', caption = FALSE),\n plotMatrix(normalized_hic, use.scores = 'ICE', caption = FALSE), \n nrow = 1\n)\n\n\n\n\n\n\n\n\n\n5.1.2 Computing observed/expected (O/E) map\nThe most prominent feature of a balanced Hi-C matrix is the strong main diagonal. This main diagonal is observed because interactions between immediate adjacent genomic loci are more prone to happen than interactions spanning longer genomic distances. This “expected” behavior is due to the polymer nature of the chromosomes being studied, and can be locally estimated using the distance-dependent interaction frequency (a.k.a. the “distance law”, or P(s)). It can be used to compute an expected matrix on interactions.\nWhen it is desirable to “mask” this polymer behavior to emphasize topological structures formed by chromosomes, one can divide a given balanced matrix by its expected matrix, i.e. calculate the observed/expected (O/E) map. This is sometimes called “detrending”, as it effectively removes the average polymer behavior from the balanced matrix.\nThe detrend function performs this operation on a given HiCExperiment object.\n\ndetrended_hic <- detrend(hic)\ndetrended_hic\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(4): count balanced expected detrended \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\n\n\n\n\n\n\nNote\n\n\n\nThe only change done to the HiCExperiment object by the detrend function is the addition of two extra scores:\n\nexpected\ndetrended\n\nThe interactions themselves are unmodified.\n\n\nTopological features will be visually more prominent in the O/E detrended Hi-C map.\n\n\npatchwork::wrap_plots(\n plotMatrix(detrended_hic, use.scores = 'balanced', scale = 'log10', limits = c(-3.5, -1.2), caption = FALSE),\n plotMatrix(detrended_hic, use.scores = 'expected', scale = 'log10', limits = c(-3.5, -1.2), caption = FALSE),\n plotMatrix(detrended_hic, use.scores = 'detrended', scale = 'linear', limits = c(-1, 1), cmap = bwrColors(), caption = FALSE), \n nrow = 1\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nScale for detrended scores\n\n\n\n\n\nexpected scores are in linear scale and ± in the same amplitude than balanced scores;\n\ndetrended scores are in log2 scale, in general approximately centered around 0. When plotting detrended scores, scale = linear should be set to prevent the default log10 scaling.\n\n\n\n\n5.1.3 Computing autocorrelated map\nCorrelation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)).\nThe autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome.\n\nautocorr_hic <- autocorrelate(hic)\n## \nautocorr_hic\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(5): count balanced expected detrended autocorrelated \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\nSince these metrics represent correlation scores, they range between -1 and 1. Two loci with an autocorrelated score close to -1 have anti-correlated interaction profiles, while two loci with a autocorrelated score close to 1 are likely to interact with shared targets.\n\nsummary(scores(autocorr_hic, 'autocorrelated'))\n## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's \n## -0.4156 0.0025 0.0504 0.0645 0.1036 1.0000 564\n\nCorrelated and anti-correlated loci will be visually represented in the autocorrelated Hi-C map in red and blue pixels, respectively.\n\n\n\n\n\n\nNote\n\n\n\nHere we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10.\n\n\n\nplotMatrix(\n autocorr_hic, \n use.scores = 'autocorrelated', \n scale = 'linear', \n limits = c(-0.4, 0.4), \n cmap = bgrColors()\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\nScale for autocorrelated scores\n\n\n\n\n\nautocorrelated scores are in linear scale, in general approximately centered around 0. When plotting autocorrelated scores, scale = linear should be set to prevent the default log10 scaling.\n\nlimits should be manually set to c(-x, x) (0 < x <= 1) to ensure that the color range is effectively centered on 0.\n\n\n\n\n5.1.4 Despeckling (smoothing out) a contact map\nShallow-sequenced Hi-C libraries or matrices binned with an overly small bin size sometimes produce “grainy” Hi-C maps with noisy backgrounds. A grainy map may also be obtained when dividing two matrices, e.g. when computing the O/E ratio with detrend. This is particularly true for sparser long-range interactions. To overcome such limitations, HiCExperiment objects can be “despeckled” to smooth out focal speckles.\n\nhic2 <- detrend(hic['II:400000-700000'])\nhic2 <- despeckle(hic2, use.scores = 'detrended', focal.size = 2)\nhic2\n## `HiCExperiment` object with 168,785 contacts over 150 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II:400,000-700,000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 11325 \n## scores(5): count balanced expected detrended detrended.despeckled \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\nThe added <use.scores>.despeckled scores correspond to scores averaged using a window, whose width is provided with the focal.size argument. This results in a smoother Hi-C heatmap, effectively removing the “speckles” observed at longer range.\n\n\nlibrary(InteractionSet)\nloops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> \n import() |> \n makeGInteractionsFromGRangesPairs()\nborders <- system.file('extdata', 'S288C-borders.bed', package = 'HiCExperiment') |> \n import()\npatchwork::wrap_plots(\n plotMatrix(hic2, caption = FALSE),\n plotMatrix(hic2, use.scores = 'detrended', scale = 'linear', limits = c(-1, 1), caption = FALSE),\n plotMatrix(\n hic2, \n use.scores = 'detrended.despeckled', \n scale = 'linear', \n limits = c(-1, 1), \n caption = FALSE, \n loops = loops, \n borders = borders\n ),\n nrow = 1\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nScale for despeckled scores\n\n\n\ndespeckled scores are in the same scale than the scores they were computed from."
+ "text": "5.1 Operations in an individual matrix\n\n5.1.1 Balancing a raw interaction count map\nHi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices.\nTo correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified).\n\nnormalized_hic <- normalize(hic)\nnormalized_hic\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(3): count balanced ICE \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\nIt is possible to plot the different scores of the resulting object to visualize the newly computed scores. In this example, ICE scores should be nearly identical to balanced scores, which were originally imported from the disk-stored contact matrix.\n\n\npatchwork::wrap_plots(\n plotMatrix(normalized_hic, use.scores = 'count', caption = FALSE),\n plotMatrix(normalized_hic, use.scores = 'balanced', caption = FALSE),\n plotMatrix(normalized_hic, use.scores = 'ICE', caption = FALSE), \n nrow = 1\n)\n\n\n\n\n\n\n\n\n\n5.1.2 Computing observed/expected (O/E) map\nThe most prominent feature of a balanced Hi-C matrix is the strong main diagonal. This main diagonal is observed because interactions between immediate adjacent genomic loci are more prone to happen than interactions spanning longer genomic distances. This “expected” behavior is due to the polymer nature of the chromosomes being studied, and can be locally estimated using the distance-dependent interaction frequency (a.k.a. the “distance law”, or P(s)). It can be used to compute an expected matrix on interactions.\nWhen it is desirable to “mask” this polymer behavior to emphasize topological structures formed by chromosomes, one can divide a given balanced matrix by its expected matrix, i.e. calculate the observed/expected (O/E) map. This is sometimes called “detrending”, as it effectively removes the average polymer behavior from the balanced matrix.\nThe detrend function performs this operation on a given HiCExperiment object. It adds two extra elements in scores list: expected and detrended metrics (while the interactions themselves are unmodified).\n\ndetrended_hic <- detrend(hic)\ndetrended_hic\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(4): count balanced expected detrended \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\nTopological features will be visually more prominent in the O/E detrended Hi-C map.\n\n\npatchwork::wrap_plots(\n plotMatrix(detrended_hic, use.scores = 'balanced', scale = 'log10', limits = c(-3.5, -1.2), caption = FALSE),\n plotMatrix(detrended_hic, use.scores = 'expected', scale = 'log10', limits = c(-3.5, -1.2), caption = FALSE),\n plotMatrix(detrended_hic, use.scores = 'detrended', scale = 'linear', limits = c(-1, 1), cmap = bwrColors(), caption = FALSE), \n nrow = 1\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nScale for detrended scores\n\n\n\n\n\nexpected scores are in linear scale and ± in the same amplitude than balanced scores;\n\ndetrended scores are in log2 scale, in general approximately centered around 0. When plotting detrended scores, scale = linear should be set to prevent the default log10 scaling.\n\n\n\n\n5.1.3 Computing autocorrelated map\nCorrelation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)).\nThe autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome.\n\nautocorr_hic <- autocorrelate(hic)\n## \nautocorr_hic\n## `HiCExperiment` object with 471,364 contacts over 407 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 34063 \n## scores(5): count balanced expected detrended autocorrelated \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\nSince these metrics represent correlation scores, they range between -1 and 1. Two loci with an autocorrelated score close to -1 have anti-correlated interaction profiles, while two loci with a autocorrelated score close to 1 are likely to interact with shared targets.\n\nsummary(scores(autocorr_hic, 'autocorrelated'))\n## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's \n## -0.4156 0.0025 0.0504 0.0645 0.1036 1.0000 564\n\nCorrelated and anti-correlated loci will be visually represented in the autocorrelated Hi-C map in red and blue pixels, respectively.\n\n\n\n\n\n\nNote\n\n\n\nHere we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10.\n\n\n\nplotMatrix(\n autocorr_hic, \n use.scores = 'autocorrelated', \n scale = 'linear', \n limits = c(-0.4, 0.4), \n cmap = bgrColors()\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\nScale for autocorrelated scores\n\n\n\n\n\nautocorrelated scores are in linear scale, in general approximately centered around 0. When plotting autocorrelated scores, scale = linear should be set to prevent the default log10 scaling.\n\nlimits should be manually set to c(-x, x) (0 < x <= 1) to ensure that the color range is effectively centered on 0.\n\n\n\n\n5.1.4 Despeckling (smoothing out) a contact map\nShallow-sequenced Hi-C libraries or matrices binned with an overly small bin size sometimes produce “grainy” Hi-C maps with noisy backgrounds. A grainy map may also be obtained when dividing two matrices, e.g. when computing the O/E ratio with detrend. This is particularly true for sparser long-range interactions. To overcome such limitations, HiCExperiment objects can be “despeckled” to smooth out focal speckles.\n\nhic2 <- detrend(hic['II:400000-700000'])\nhic2 <- despeckle(hic2, use.scores = 'detrended', focal.size = 2)\nhic2\n## `HiCExperiment` object with 168,785 contacts over 150 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II:400,000-700,000\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 2000 \n## interactions: 11325 \n## scores(5): count balanced expected detrended detrended.despeckled \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) \n## pairsFile: N/A \n## metadata(0):\n\nThe added <use.scores>.despeckled scores correspond to scores averaged using a window, whose width is provided with the focal.size argument. This results in a smoother Hi-C heatmap, effectively removing the “speckles” observed at longer range.\n\n\nlibrary(InteractionSet)\nloops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> \n import() |> \n makeGInteractionsFromGRangesPairs()\nborders <- system.file('extdata', 'S288C-borders.bed', package = 'HiCExperiment') |> \n import()\npatchwork::wrap_plots(\n plotMatrix(hic2, caption = FALSE),\n plotMatrix(hic2, use.scores = 'detrended', scale = 'linear', limits = c(-1, 1), caption = FALSE),\n plotMatrix(\n hic2, \n use.scores = 'detrended.despeckled', \n scale = 'linear', \n limits = c(-1, 1), \n caption = FALSE, \n loops = loops, \n borders = borders\n ),\n nrow = 1\n)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nScale for despeckled scores\n\n\n\ndespeckled scores are in the same scale than the scores they were computed from."
},
{
"objectID": "matrix-centric.html#operations-between-multiple-matrices",
@@ -207,28 +207,28 @@
"href": "interactions-centric.html#distance-laws",
"title": "\n6 Interactions-centric analysis\n",
"section": "\n6.1 Distance law(s)",
- "text": "6.1 Distance law(s)\n\n6.1.1 P(s) from a single .pairs file\nDistance laws are generally computed directly from .pairs files. This is because the .pairs files are at 1-bp resolution whereas the contact matrices (for example from .cool files) are binned at a minimum resolution.\nAn example .pairs file can be fetched from the ExperimentHub database using the HiContactsData package.\n\nlibrary(HiCExperiment)\nlibrary(HiContactsData)\npairsf <- HiContactsData('yeast_wt', 'pairs.gz')\npf <- PairsFile(pairsf)\n\n\npf\n## PairsFile object\n## resource: /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753\n\n\n\n\n\n\n\nReminder!\n\n\n\nPairsFile connections can be imported directly into a GInteractions object with import():\n\nimport(pf)\n## GInteractions object with 471364 interactions and 3 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric>\n## [1] II 105 --- II 48548 | 1358 1681\n## [2] II 113 --- II 45003 | 1358 1658\n## [3] II 119 --- II 687251 | 1358 5550\n## [4] II 160 --- II 26124 | 1358 1510\n## [5] II 169 --- II 39052 | 1358 1613\n## ... ... ... ... ... ... . ... ...\n## [471360] II 808605 --- II 809683 | 6316 6320\n## [471361] II 808609 --- II 809917 | 6316 6324\n## [471362] II 808617 --- II 809506 | 6316 6319\n## [471363] II 809447 --- II 809685 | 6319 6321\n## [471364] II 809472 --- II 809675 | 6319 6320\n## distance\n## <integer>\n## [1] 48443\n## [2] 44890\n## [3] 687132\n## [4] 25964\n## [5] 38883\n## ... ...\n## [471360] 1078\n## [471361] 1308\n## [471362] 889\n## [471363] 238\n## [471364] 203\n## -------\n## regions: 549331 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\n\n\nWe can compute a P(s) per chromosome from this .pairs file using the distanceLaw function.\n\nlibrary(HiContacts)\nps <- distanceLaw(pf, by_chr = TRUE) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753 in memory. This may take a while...\nps\n## # A tibble: 115 × 6\n## chr binned_distance p norm_p norm_p_unity slope\n## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n## 1 II 14 0.00000212 0.00000106 2.27 0 \n## 2 II 16 0.0000170 0.0000170 36.4 1.56\n## 3 II 17 0.0000361 0.0000180 38.6 1.55\n## 4 II 19 0.0000424 0.0000212 45.5 1.55\n## 5 II 21 0.0000467 0.0000233 50.0 1.54\n## 6 II 23 0.0000870 0.0000290 62.1 1.53\n## # ℹ 109 more rows\n\n\n\n\n\n\n\nNote\n\n\n\nBecause this is a toy dataset, contacts are only provided for the chromosome II.\n\ntable(ps$chr)\n## \n## II \n## 115\n\n\n\nThe plotPs() and plotPsSlope() functions are convenient ggplot2-based functions with pre-configured settings optimized for P(s) visualization.\n\nlibrary(ggplot2)\nplotPs(ps, aes(x = binned_distance, y = norm_p, color = chr))\n## Warning: Removed 67 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(ps, aes(x = binned_distance, y = slope, color = chr))\n## Warning: Removed 67 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\n\n\n6.1.2 P(s) for multiple .pairs files\nLet’s first import a second example dataset. We’ll import pairs identified in a eco1 yeast mutant.\n\neco1_pairsf <- HiContactsData('yeast_eco1', 'pairs.gz')\neco1_pf <- PairsFile(eco1_pairsf)\n\n\neco1_ps <- distanceLaw(eco1_pf, by_chr = TRUE) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/21fb251da216_7755 in memory. This may take a while...\neco1_ps\n## # A tibble: 115 × 6\n## chr binned_distance p norm_p norm_p_unity slope\n## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n## 1 II 14 0.00000201 0.00000100 0.660 0 \n## 2 II 16 0.0000221 0.0000221 14.5 1.46\n## 3 II 17 0.0000492 0.0000246 16.2 1.46\n## 4 II 19 0.0000412 0.0000206 13.5 1.45\n## 5 II 21 0.0000653 0.0000326 21.5 1.45\n## 6 II 23 0.0000803 0.0000268 17.6 1.44\n## # ℹ 109 more rows\n\nA little data wrangling can help plotting the distance laws for 2 different samples in the same plot.\n\nlibrary(dplyr)\nmerged_ps <- rbind(\n ps |> mutate(sample = 'WT'), \n eco1_ps |> mutate(sample = 'eco1')\n)\nplotPs(merged_ps, aes(x = binned_distance, y = norm_p, color = sample, linetype = chr)) + \n scale_color_manual(values = c('#c6c6c6', '#ca0000'))\n## Warning: Removed 134 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(merged_ps, aes(x = binned_distance, y = slope, color = sample, linetype = chr)) + \n scale_color_manual(values = c('#c6c6c6', '#ca0000'))\n## Warning: Removed 135 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\n\n\n6.1.3 P(s) from HiCExperiment objects\nAlternatively, distance laws can be computed from binned matrices directly by providing HiCExperiment objects. For deeply sequenced datasets, this can be significantly faster than when using original .pairs files, but the smoothness of the resulting curves will be greatly impacted, notably at short distances.\n\nps_from_hic <- distanceLaw(hic, by_chr = TRUE) \n## pairsFile not specified. The P(s) curve will be an approximation.\nplotPs(ps_from_hic, aes(x = binned_distance, y = norm_p))\n## Warning: Removed 9 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(ps_from_hic, aes(x = binned_distance, y = slope))\n## Warning: Removed 8 rows containing missing values (`geom_line()`)."
+ "text": "6.1 Distance law(s)\n\n6.1.1 P(s) from a single .pairs file\nDistance laws are generally computed directly from .pairs files. This is because the .pairs files are at 1-bp resolution whereas the contact matrices (for example from .cool files) are binned at a minimum resolution.\nAn example .pairs file can be fetched from the ExperimentHub database using the HiContactsData package.\n\nlibrary(HiCExperiment)\nlibrary(HiContactsData)\npairsf <- HiContactsData('yeast_wt', 'pairs.gz')\npf <- PairsFile(pairsf)\n\n\npf\n## PairsFile object\n## resource: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753\n\nIf needed, PairsFile connections can be imported directly into a GInteractions object with import().\n\nimport(pf)\n## GInteractions object with 471364 interactions and 3 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2 distance\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <integer>\n## [1] II 105 --- II 48548 | 1358 1681 48443\n## [2] II 113 --- II 45003 | 1358 1658 44890\n## [3] II 119 --- II 687251 | 1358 5550 687132\n## [4] II 160 --- II 26124 | 1358 1510 25964\n## [5] II 169 --- II 39052 | 1358 1613 38883\n## ... ... ... ... ... ... . ... ... ...\n## [471360] II 808605 --- II 809683 | 6316 6320 1078\n## [471361] II 808609 --- II 809917 | 6316 6324 1308\n## [471362] II 808617 --- II 809506 | 6316 6319 889\n## [471363] II 809447 --- II 809685 | 6319 6321 238\n## [471364] II 809472 --- II 809675 | 6319 6320 203\n## -------\n## regions: 549331 ranges and 0 metadata columns\n## seqinfo: 1 sequence from an unspecified genome; no seqlengths\n\nWe can compute a P(s) per chromosome from this .pairs file using the distanceLaw function.\n\nlibrary(HiContacts)\nps <- distanceLaw(pf, by_chr = TRUE) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while...\nps\n## # A tibble: 115 × 6\n## chr binned_distance p norm_p norm_p_unity slope\n## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n## 1 II 14 0.00000212 0.00000106 2.27 0 \n## 2 II 16 0.0000170 0.0000170 36.4 1.56\n## 3 II 17 0.0000361 0.0000180 38.6 1.55\n## 4 II 19 0.0000424 0.0000212 45.5 1.55\n## 5 II 21 0.0000467 0.0000233 50.0 1.54\n## 6 II 23 0.0000870 0.0000290 62.1 1.53\n## # ℹ 109 more rows\n\nThe plotPs() and plotPsSlope() functions are convenient ggplot2-based functions with pre-configured settings optimized for P(s) visualization.\n\nlibrary(ggplot2)\nplotPs(ps, aes(x = binned_distance, y = norm_p, color = chr))\n## Warning: Removed 67 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(ps, aes(x = binned_distance, y = slope, color = chr))\n## Warning: Removed 67 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\n\n\n6.1.2 P(s) for multiple .pairs files\nLet’s first import a second example dataset. We’ll import pairs identified in a eco1 yeast mutant.\n\neco1_pairsf <- HiContactsData('yeast_eco1', 'pairs.gz')\neco1_pf <- PairsFile(eco1_pairsf)\n\n\neco1_ps <- distanceLaw(eco1_pf, by_chr = TRUE) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while...\neco1_ps\n## # A tibble: 115 × 6\n## chr binned_distance p norm_p norm_p_unity slope\n## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n## 1 II 14 0.00000201 0.00000100 0.660 0 \n## 2 II 16 0.0000221 0.0000221 14.5 1.46\n## 3 II 17 0.0000492 0.0000246 16.2 1.46\n## 4 II 19 0.0000412 0.0000206 13.5 1.45\n## 5 II 21 0.0000653 0.0000326 21.5 1.45\n## 6 II 23 0.0000803 0.0000268 17.6 1.44\n## # ℹ 109 more rows\n\nA little data wrangling can help plotting the distance laws for 2 different samples in the same plot.\n\nlibrary(dplyr)\nmerged_ps <- rbind(\n ps |> mutate(sample = 'WT'), \n eco1_ps |> mutate(sample = 'eco1')\n)\nplotPs(merged_ps, aes(x = binned_distance, y = norm_p, color = sample, linetype = chr)) + \n scale_color_manual(values = c('#c6c6c6', '#ca0000'))\n## Warning: Removed 134 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(merged_ps, aes(x = binned_distance, y = slope, color = sample, linetype = chr)) + \n scale_color_manual(values = c('#c6c6c6', '#ca0000'))\n## Warning: Removed 135 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\n\n\n6.1.3 P(s) from HiCExperiment objects\nAlternatively, distance laws can be computed from binned matrices directly by providing HiCExperiment objects. For deeply sequenced datasets, this can be significantly faster than when using original .pairs files, but the smoothness of the resulting curves will be greatly impacted, notably at short distances.\n\nps_from_hic <- distanceLaw(hic, by_chr = TRUE) \n## pairsFile not specified. The P(s) curve will be an approximation.\nplotPs(ps_from_hic, aes(x = binned_distance, y = norm_p))\n## Warning: Removed 9 rows containing missing values (`geom_line()`).\n\n\n\n\n\n\nplotPsSlope(ps_from_hic, aes(x = binned_distance, y = slope))\n## Warning: Removed 8 rows containing missing values (`geom_line()`)."
},
{
"objectID": "interactions-centric.html#cistrans-ratios",
"href": "interactions-centric.html#cistrans-ratios",
"title": "\n6 Interactions-centric analysis\n",
"section": "\n6.2 Cis/trans ratios",
- "text": "6.2 Cis/trans ratios\nThe ratio between cis interactions and trans interactions is often used to assess the overall quality of a Hi-C dataset. It can be computed per chromosome using the cisTransRatio() function.\n\n\n\n\n\n\nTip!\n\n\n\nYou will need to provide a genome-wide HiCExperiment to estimate cis/trans ratios!\n\n\n\nfull_hic <- import(cf, resolution = 2000)\nct <- cisTransRatio(full_hic) \nct\n## # A tibble: 16 × 6\n## # Groups: chr [16]\n## chr cis trans n_total cis_pct trans_pct\n## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>\n## 1 I 186326 96738 283064 0.658 0.342\n## 2 II 942728 273966 1216694 0.775 0.225\n## 3 III 303980 127087 431067 0.705 0.295\n## 4 IV 1858062 418218 2276280 0.816 0.184\n## 5 V 607090 220873 827963 0.733 0.267\n## 6 VI 280282 127771 408053 0.687 0.313\n## # ℹ 10 more rows\n\nIt can be plotted using ggplot2-based visualization functions.\n\nggplot(ct, aes(x = chr, y = cis_pct)) + \n geom_col(position = position_stack()) + \n theme_bw() + \n guides(x=guide_axis(angle = 90)) + \n scale_y_continuous(labels = scales::percent) + \n labs(x = 'Chromosomes', y = '% of cis contacts')\n\n\n\n\n\n\n\n\n\n\n\n\n\nWatch out\n\n\n\nCis/trans contact ratios will greatly vary depending on the cell cycle phase the sample is in! For instance, chromosomes during the mitosis phase of the cell cycle have very little trans contacts, due to their structural organization and individualization."
+ "text": "6.2 Cis/trans ratios\nThe ratio between cis interactions and trans interactions is often used to assess the overall quality of a Hi-C dataset. It can be computed per chromosome using the cisTransRatio() function. You will need to provide a genome-wide HiCExperiment to estimate cis/trans ratios!\n\nfull_hic <- import(cf, resolution = 2000)\nct <- cisTransRatio(full_hic) \nct\n## # A tibble: 16 × 6\n## # Groups: chr [16]\n## chr cis trans n_total cis_pct trans_pct\n## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>\n## 1 I 186326 96738 283064 0.658 0.342\n## 2 II 942728 273966 1216694 0.775 0.225\n## 3 III 303980 127087 431067 0.705 0.295\n## 4 IV 1858062 418218 2276280 0.816 0.184\n## 5 V 607090 220873 827963 0.733 0.267\n## 6 VI 280282 127771 408053 0.687 0.313\n## # ℹ 10 more rows\n\nIt can be plotted using ggplot2-based visualization functions.\n\nggplot(ct, aes(x = chr, y = cis_pct)) + \n geom_col(position = position_stack()) + \n theme_bw() + \n guides(x=guide_axis(angle = 90)) + \n scale_y_continuous(labels = scales::percent) + \n labs(x = 'Chromosomes', y = '% of cis contacts')\n\n\n\n\n\n\n\nCis/trans contact ratios will greatly vary depending on the cell cycle phase the sample is in! For instance, chromosomes during the mitosis phase of the cell cycle have very little trans contacts, due to their structural organization and individualization."
},
{
"objectID": "interactions-centric.html#virtual-4c-profiles",
"href": "interactions-centric.html#virtual-4c-profiles",
"title": "\n6 Interactions-centric analysis\n",
"section": "\n6.3 Virtual 4C profiles",
- "text": "6.3 Virtual 4C profiles\nInteraction profile of a genomic locus of interest with its surrounding environment or the rest of the genome is frequently generated. In some cases, this can help in identifying and/or comparing regulatory or structural interactions.\nFor instance, we can compute the genome-wide virtual 4C profile of interactions anchored at the centromere in chromosome II (located at ~ 238kb).\n\nlibrary(GenomicRanges)\nv4C <- virtual4C(full_hic, viewpoint = GRanges(\"II:230001-240000\"))\nv4C\n## GRanges object with 6045 ranges and 4 metadata columns:\n## seqnames ranges strand | score viewpoint\n## <Rle> <IRanges> <Rle> | <numeric> <character>\n## [1] I 1-2000 * | 0.00000000 II:230001-240000\n## [2] I 2001-4000 * | 0.00000000 II:230001-240000\n## [3] I 4001-6000 * | 0.00129049 II:230001-240000\n## [4] I 6001-8000 * | 0.00000000 II:230001-240000\n## [5] I 8001-10000 * | 0.00000000 II:230001-240000\n## ... ... ... ... . ... ...\n## [6041] XVI 940001-942000 * | 0.000775721 II:230001-240000\n## [6042] XVI 942001-944000 * | 0.000000000 II:230001-240000\n## [6043] XVI 944001-946000 * | 0.000000000 II:230001-240000\n## [6044] XVI 946001-948000 * | 0.000000000 II:230001-240000\n## [6045] XVI 948001-948066 * | 0.000000000 II:230001-240000\n## center in_viewpoint\n## <numeric> <logical>\n## [1] 1000.5 FALSE\n## [2] 3000.5 FALSE\n## [3] 5000.5 FALSE\n## [4] 7000.5 FALSE\n## [5] 9000.5 FALSE\n## ... ... ...\n## [6041] 941000 FALSE\n## [6042] 943000 FALSE\n## [6043] 945000 FALSE\n## [6044] 947000 FALSE\n## [6045] 948034 FALSE\n## -------\n## seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nggplot2 can be used to visualize the 4C-like profile over multiple chromosomes.\n\n\ndf <- as_tibble(v4C)\nggplot(df, aes(x = center, y = score)) + \n geom_area(position = \"identity\", alpha = 0.5) + \n theme_bw() + \n labs(x = \"Position\", y = \"Contacts with viewpoint\") +\n scale_x_continuous(labels = scales::unit_format(unit = \"M\", scale = 1e-06)) + \n facet_wrap(~seqnames, scales = 'free_y')\n\n\n\n\n\n\n\n\nThis clearly highlights trans interactions of the chromosome II centromere with the centromeres from other chromosomes."
+ "text": "6.3 Virtual 4C profiles\nInteraction profile of a genomic locus of interest with its surrounding environment or the rest of the genome is frequently generated. In some cases, this can help in identifying and/or comparing regulatory or structural interactions.\nFor instance, we can compute the genome-wide virtual 4C profile of interactions anchored at the centromere in chromosome II (located at ~ 238kb).\n\nlibrary(GenomicRanges)\nv4C <- virtual4C(full_hic, viewpoint = GRanges(\"II:230001-240000\"))\nv4C\n## GRanges object with 6045 ranges and 4 metadata columns:\n## seqnames ranges strand | score viewpoint center in_viewpoint\n## <Rle> <IRanges> <Rle> | <numeric> <character> <numeric> <logical>\n## [1] I 1-2000 * | 0.00000000 II:230001-240000 1000.5 FALSE\n## [2] I 2001-4000 * | 0.00000000 II:230001-240000 3000.5 FALSE\n## [3] I 4001-6000 * | 0.00129049 II:230001-240000 5000.5 FALSE\n## [4] I 6001-8000 * | 0.00000000 II:230001-240000 7000.5 FALSE\n## [5] I 8001-10000 * | 0.00000000 II:230001-240000 9000.5 FALSE\n## ... ... ... ... . ... ... ... ...\n## [6041] XVI 940001-942000 * | 0.000775721 II:230001-240000 941000 FALSE\n## [6042] XVI 942001-944000 * | 0.000000000 II:230001-240000 943000 FALSE\n## [6043] XVI 944001-946000 * | 0.000000000 II:230001-240000 945000 FALSE\n## [6044] XVI 946001-948000 * | 0.000000000 II:230001-240000 947000 FALSE\n## [6045] XVI 948001-948066 * | 0.000000000 II:230001-240000 948034 FALSE\n## -------\n## seqinfo: 16 sequences from an unspecified genome; no seqlengths\n\nggplot2 can be used to visualize the 4C-like profile over multiple chromosomes.\n\n\ndf <- as_tibble(v4C)\nggplot(df, aes(x = center, y = score)) + \n geom_area(position = \"identity\", alpha = 0.5) + \n theme_bw() + \n labs(x = \"Position\", y = \"Contacts with viewpoint\") +\n scale_x_continuous(labels = scales::unit_format(unit = \"M\", scale = 1e-06)) + \n facet_wrap(~seqnames, scales = 'free_y')\n\n\n\n\n\n\n\n\nThis clearly highlights trans interactions of the chromosome II centromere with the centromeres from other chromosomes."
},
{
"objectID": "interactions-centric.html#scalograms",
"href": "interactions-centric.html#scalograms",
"title": "\n6 Interactions-centric analysis\n",
"section": "\n6.4 Scalograms",
- "text": "6.4 Scalograms\nScalograms were introduced in Lioy et al. (2018) to investigate distance-dependent contact frequencies for individual genomic bins along chromosomes.\nTo generate a scalogram, one needs to provide a HiCExperiment object with a valid associated pairsFile.\n\npairsFile(hic) <- pairsf\nscalo <- scalogram(hic) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9a1c034d7_7753 in memory. This may take a while...\nplotScalogram(scalo |> filter(chr == 'II'), ylim = c(1e3, 1e5))\n\n\n\n\n\n\n\nSeveral scalograms can be plotted together to compare distance-dependent contact frequencies along a given chromosome in different samples.\n\n\neco1_hic <- import(\n CoolFile(HiContactsData('yeast_eco1', 'mcool')), \n focus = 'II', \n resolution = 2000\n)\n## see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n## loading from cache\neco1_pairsf <- HiContactsData('yeast_eco1', 'pairs.gz')\n## see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n## loading from cache\npairsFile(eco1_hic) <- eco1_pairsf\neco1_scalo <- scalogram(eco1_hic) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/21fb251da216_7755 in memory. This may take a while...\nmerged_scalo <- rbind(\n scalo |> mutate(sample = 'WT'), \n eco1_scalo |> mutate(sample = 'eco1')\n)\nplotScalogram(merged_scalo |> filter(chr == 'II'), ylim = c(1e3, 1e5)) + \n facet_grid(~sample)\n\n\n\n\n\n\n\n\nThis example points out the overall longer interactions within the long arm of the chromosome II in an eco1 mutant."
+ "text": "6.4 Scalograms\nScalograms were introduced in Lioy et al. (2018) to investigate distance-dependent contact frequencies for individual genomic bins along chromosomes.\nTo generate a scalogram, one needs to provide a HiCExperiment object with a valid associated pairsFile.\n\npairsFile(hic) <- pairsf\nscalo <- scalogram(hic) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while...\nplotScalogram(scalo |> filter(chr == 'II'), ylim = c(1e3, 1e5))\n\n\n\n\n\n\n\nSeveral scalograms can be plotted together to compare distance-dependent contact frequencies along a given chromosome in different samples.\n\n\neco1_hic <- import(\n CoolFile(HiContactsData('yeast_eco1', 'mcool')), \n focus = 'II', \n resolution = 2000\n)\n## see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n## loading from cache\neco1_pairsf <- HiContactsData('yeast_eco1', 'pairs.gz')\n## see ?HiContactsData and browseVignettes('HiContactsData') for documentation\n## loading from cache\npairsFile(eco1_hic) <- eco1_pairsf\neco1_scalo <- scalogram(eco1_hic) \n## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while...\nmerged_scalo <- rbind(\n scalo |> mutate(sample = 'WT'), \n eco1_scalo |> mutate(sample = 'eco1')\n)\nplotScalogram(merged_scalo |> filter(chr == 'II'), ylim = c(1e3, 1e5)) + \n facet_grid(~sample)\n\n\n\n\n\n\n\n\nThis example points out the overall longer interactions within the long arm of the chromosome II in an eco1 mutant."
},
{
"objectID": "topological-features.html",
@@ -242,21 +242,21 @@
"href": "topological-features.html#chromosome-compartments",
"title": "\n7 Finding topological features in Hi-C\n",
"section": "\n7.1 Chromosome compartments",
- "text": "7.1 Chromosome compartments\nChromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment).\n\n7.1.1 Importing Hi-C data\nTo investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp.\n\nlibrary(HiCExperiment)\nlibrary(OHCA)\ncf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool')\nmicroC <- import(cf, resolution = 250000)\nmicroC\n## `HiCExperiment` object with 10,086,710 contacts over 334 regions \n## -------\n## fileName: \"/usr/local/lib/R/site-library/OHCA/extdata/chr17.mcool\" \n## focus: \"whole genome\" \n## resolutions(3): 5000 100000 250000\n## active resolution: 250000 \n## interactions: 52755 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nseqinfo(microC)\n## Seqinfo object with 1 sequence from an unspecified genome:\n## seqnames seqlengths isCircular genome\n## chr17 83257441 NA <NA>\n\n\n7.1.2 Annotating A/B compartments\nThe consensus approach to annotate A/B compartments is to compute the eigenvectors of a Hi-C contact matrix and identify the eigenvector representing the chromosome-wide bi-partite segmentation of the genome.\nThe getCompartments() function performs several internal operations to achieve this:\n\nObtains cis interactions per chromosome\nComputes O/E contact matrix scores\nComputes 3 first eigenvectors of this Hi-C contact matrix\nNormalizes eigenvectors\nPicks the eigenvector that has the greatest absolute correlation with a phasing track (e.g. a GC% track automatically computed from a genome reference sequence, or a gene density track)\nSigns this eigenvector so that positive values represent the A compartment\n\n\nphasing_track <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38\nmicroC_compts <- getCompartments(microC, genome = phasing_track)\n## Going through preflight checklist...\n## Parsing intra-chromosomal contacts for each chromosome...\n## Computing eigenvectors for each chromosome...\n\nmicroC_compts\n## `HiCExperiment` object with 10,086,710 contacts over 334 regions \n## -------\n## fileName: \"/usr/local/lib/R/site-library/OHCA/extdata/chr17.mcool\" \n## focus: \"whole genome\" \n## resolutions(3): 5000 100000 250000\n## active resolution: 250000 \n## interactions: 52755 \n## scores(2): count balanced \n## topologicalFeatures: compartments(41) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(1): eigens\n\n\n\n\n\n\n\nNote\n\n\n\ngetCompartments() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA compartments topologicalFeatures:\n\n\ntopologicalFeatures(microC_compts, \"compartments\")\n## GRanges object with 41 ranges and 1 metadata column:\n## seqnames ranges strand | compartment\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr17 250001-3000000 * | A\n## [2] chr17 3000001-3500000 * | B\n## [3] chr17 3500001-5500000 * | A\n## [4] chr17 5500001-6500000 * | B\n## [5] chr17 6500001-8500000 * | A\n## ... ... ... ... . ...\n## [37] chr17 72750001-73250000 * | A\n## [38] chr17 73250001-74750000 * | B\n## [39] chr17 74750001-79250000 * | A\n## [40] chr17 79250001-79750000 * | B\n## [41] chr17 79750001-83250000 * | A\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\nThe calculated eigenvectors stored in metadata:\n\n\nmetadata(microC_compts)$eigens\n## GRanges object with 334 ranges and 9 metadata columns:\n## seqnames ranges strand |\n## <Rle> <IRanges> <Rle> |\n## chr17.chr17_1_250000 chr17 1-250000 * |\n## chr17.chr17_250001_500000 chr17 250001-500000 * |\n## chr17.chr17_500001_750000 chr17 500001-750000 * |\n## chr17.chr17_750001_1000000 chr17 750001-1000000 * |\n## chr17.chr17_1000001_1250000 chr17 1000001-1250000 * |\n## ... ... ... ... .\n## chr17.chr17_82250001_82500000 chr17 82250001-82500000 * |\n## chr17.chr17_82500001_82750000 chr17 82500001-82750000 * |\n## chr17.chr17_82750001_83000000 chr17 82750001-83000000 * |\n## chr17.chr17_83000001_83250000 chr17 83000001-83250000 * |\n## chr17.chr17_83250001_83257441 chr17 83250001-83257441 * |\n## bin_id weight chr center\n## <numeric> <numeric> <Rle> <integer>\n## chr17.chr17_1_250000 0 NaN chr17 125000\n## chr17.chr17_250001_500000 1 0.00626903 chr17 375000\n## chr17.chr17_500001_750000 2 0.00567190 chr17 625000\n## chr17.chr17_750001_1000000 3 0.00528588 chr17 875000\n## chr17.chr17_1000001_1250000 4 0.00464628 chr17 1125000\n## ... ... ... ... ...\n## chr17.chr17_82250001_82500000 329 0.00463044 chr17 82375000\n## chr17.chr17_82500001_82750000 330 0.00486910 chr17 82625000\n## chr17.chr17_82750001_83000000 331 0.00561269 chr17 82875000\n## chr17.chr17_83000001_83250000 332 0.00546433 chr17 83125000\n## chr17.chr17_83250001_83257441 333 NaN chr17 83253721\n## E1 E2 E3 phasing\n## <numeric> <numeric> <numeric> <numeric>\n## chr17.chr17_1_250000 0.000000 0.000000 0.000000 0.383084\n## chr17.chr17_250001_500000 0.450991 0.653287 0.615300 0.433972\n## chr17.chr17_500001_750000 0.716784 0.707461 0.845033 0.465556\n## chr17.chr17_750001_1000000 0.904423 0.414952 0.864288 0.503592\n## chr17.chr17_1000001_1250000 0.913023 0.266287 0.759016 0.547712\n## ... ... ... ... ...\n## chr17.chr17_82250001_82500000 1.147060 0.239112 1.133498 0.550872\n## chr17.chr17_82500001_82750000 1.106937 0.419647 1.169464 0.513212\n## chr17.chr17_82750001_83000000 0.818990 0.591955 0.850340 0.522432\n## chr17.chr17_83000001_83250000 0.874038 0.503175 0.847926 0.528448\n## chr17.chr17_83250001_83257441 0.000000 0.000000 0.000000 0.000000\n## eigen\n## <numeric>\n## chr17.chr17_1_250000 0.000000\n## chr17.chr17_250001_500000 0.450991\n## chr17.chr17_500001_750000 0.716784\n## chr17.chr17_750001_1000000 0.904423\n## chr17.chr17_1000001_1250000 0.913023\n## ... ...\n## chr17.chr17_82250001_82500000 1.147060\n## chr17.chr17_82500001_82750000 1.106937\n## chr17.chr17_82750001_83000000 0.818990\n## chr17.chr17_83000001_83250000 0.874038\n## chr17.chr17_83250001_83257441 0.000000\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\n\n\n7.1.3 Exporting compartment tracks\nTo save the eigenvector (as a bigwig file) and the compartments(as a gff file), the export function can be used:\n\nlibrary(GenomicRanges)\nlibrary(rtracklayer)\ncoverage(metadata(microC_compts)$eigens, weight = 'eigen') |> export('microC_eigen.bw')\ntopologicalFeatures(microC_compts, \"compartments\") |> export('microC_compartments.gff3')\n\n\n7.1.4 Visualizing compartment tracks\nCompartment tracks should be visualized in a dedicated genome browser, with the phasing track loaded as well, to ensure they are phased accordingly.\nThat being said, it is possible to visualize a genome track in R besides the matching Hi-C contact matrix.\n\nlibrary(ggplot2)\nlibrary(patchwork)\nmicroC <- autocorrelate(microC)\n## \np1 <- plotMatrix(microC, use.scores = 'autocorrelated', scale = 'linear', limits = c(-1, 1), caption = FALSE)\neigen <- coverage(metadata(microC_compts)$eigens, weight = 'eigen')[[1]]\neigen_df <- tibble(pos = cumsum(runLength(eigen)), eigen = runValue(eigen))\np2 <- ggplot(eigen_df, aes(x = pos, y = eigen)) + \n geom_area() + \n theme_void() + \n coord_cartesian(expand = FALSE) + \n labs(x = \"Genomic position\", y = \"Eigenvector value\")\nwrap_plots(p1, p2, ncol = 1, heights = c(10, 1))\n\n\n\n\n\n\n\nHere, we clearly note the concordance between the Hi-C correlation matrix, highlighting correlated interactions between pairs of genomic segments, and the eigenvector representing chromosome segmentation into 2 compartments: A (for positive values) and B (for negative values).\n\n7.1.5 Saddle plots\nSaddle plots are typically used to measure the observed vs. expected interaction scores within or between genomic loci belonging to A and B compartments.\nNon-overlapping genomic windows are grouped in nbins quantiles (typically between 10 and 50 quantiles) according to their A/B compartment eigenvector value, from lowest eigenvector values (i.e. strongest B compartments) to highest eigenvector values (i.e. strongest A compartments). The average observed vs. expected interaction scores are then computed for pairwise eigenvector quantiles and plotted in a 2D heatmap.\n\nlibrary(BiocParallel)\nplotSaddle(microC_compts, nbins = 25, BPPARAM = SerialParam(progressbar = FALSE))\n\n\n\n\n\n\n\nHere, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments.\n\n\n\n\n\n\nNote\n\n\n\nOnly chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot."
+ "text": "7.1 Chromosome compartments\nChromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment).\n\n7.1.1 Importing Hi-C data\nTo investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp.\n\nlibrary(HiCExperiment)\nlibrary(OHCA)\ncf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool')\nmicroC <- import(cf, resolution = 250000)\nmicroC\n## `HiCExperiment` object with 10,086,710 contacts over 334 regions \n## -------\n## fileName: \"/usr/local/lib/R/site-library/OHCA/extdata/chr17.mcool\" \n## focus: \"whole genome\" \n## resolutions(3): 5000 100000 250000\n## active resolution: 250000 \n## interactions: 52755 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nseqinfo(microC)\n## Seqinfo object with 1 sequence from an unspecified genome:\n## seqnames seqlengths isCircular genome\n## chr17 83257441 NA <NA>\n\n\n7.1.2 Annotating A/B compartments\nThe consensus approach to annotate A/B compartments is to compute the eigenvectors of a Hi-C contact matrix and identify the eigenvector representing the chromosome-wide bi-partite segmentation of the genome.\nThe getCompartments() function performs several internal operations to achieve this:\n\nObtains cis interactions per chromosome\nComputes O/E contact matrix scores\nComputes 3 first eigenvectors of this Hi-C contact matrix\nNormalizes eigenvectors\nPicks the eigenvector that has the greatest absolute correlation with a phasing track (e.g. a GC% track automatically computed from a genome reference sequence, or a gene density track)\nSigns this eigenvector so that positive values represent the A compartment\n\n\nphasing_track <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38\nmicroC_compts <- getCompartments(microC, genome = phasing_track)\n## Going through preflight checklist...\n## Parsing intra-chromosomal contacts for each chromosome...\n## Computing eigenvectors for each chromosome...\n\nmicroC_compts\n## `HiCExperiment` object with 10,086,710 contacts over 334 regions \n## -------\n## fileName: \"/usr/local/lib/R/site-library/OHCA/extdata/chr17.mcool\" \n## focus: \"whole genome\" \n## resolutions(3): 5000 100000 250000\n## active resolution: 250000 \n## interactions: 52755 \n## scores(2): count balanced \n## topologicalFeatures: compartments(41) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(1): eigens\n\ngetCompartments() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA compartments topologicalFeatures:\n\n\ntopologicalFeatures(microC_compts, \"compartments\")\n## GRanges object with 41 ranges and 1 metadata column:\n## seqnames ranges strand | compartment\n## <Rle> <IRanges> <Rle> | <character>\n## [1] chr17 250001-3000000 * | A\n## [2] chr17 3000001-3500000 * | B\n## [3] chr17 3500001-5500000 * | A\n## [4] chr17 5500001-6500000 * | B\n## [5] chr17 6500001-8500000 * | A\n## ... ... ... ... . ...\n## [37] chr17 72750001-73250000 * | A\n## [38] chr17 73250001-74750000 * | B\n## [39] chr17 74750001-79250000 * | A\n## [40] chr17 79250001-79750000 * | B\n## [41] chr17 79750001-83250000 * | A\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\nThe calculated eigenvectors stored in metadata:\n\n\nmetadata(microC_compts)$eigens\n## GRanges object with 334 ranges and 9 metadata columns:\n## seqnames ranges strand | bin_id weight chr center E1 E2 E3 phasing eigen\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer> <numeric> <numeric> <numeric> <numeric> <numeric>\n## chr17.chr17_1_250000 chr17 1-250000 * | 0 NaN chr17 125000 0.000000 0.000000 0.000000 0.383084 0.000000\n## chr17.chr17_250001_500000 chr17 250001-500000 * | 1 0.00626903 chr17 375000 0.450991 0.653287 0.615300 0.433972 0.450991\n## chr17.chr17_500001_750000 chr17 500001-750000 * | 2 0.00567190 chr17 625000 0.716784 0.707461 0.845033 0.465556 0.716784\n## chr17.chr17_750001_1000000 chr17 750001-1000000 * | 3 0.00528588 chr17 875000 0.904423 0.414952 0.864288 0.503592 0.904423\n## chr17.chr17_1000001_1250000 chr17 1000001-1250000 * | 4 0.00464628 chr17 1125000 0.913023 0.266287 0.759016 0.547712 0.913023\n## ... ... ... ... . ... ... ... ... ... ... ... ... ...\n## chr17.chr17_82250001_82500000 chr17 82250001-82500000 * | 329 0.00463044 chr17 82375000 1.147060 0.239112 1.133498 0.550872 1.147060\n## chr17.chr17_82500001_82750000 chr17 82500001-82750000 * | 330 0.00486910 chr17 82625000 1.106937 0.419647 1.169464 0.513212 1.106937\n## chr17.chr17_82750001_83000000 chr17 82750001-83000000 * | 331 0.00561269 chr17 82875000 0.818990 0.591955 0.850340 0.522432 0.818990\n## chr17.chr17_83000001_83250000 chr17 83000001-83250000 * | 332 0.00546433 chr17 83125000 0.874038 0.503175 0.847926 0.528448 0.874038\n## chr17.chr17_83250001_83257441 chr17 83250001-83257441 * | 333 NaN chr17 83253721 0.000000 0.000000 0.000000 0.000000 0.000000\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\n7.1.3 Exporting compartment tracks\nTo save the eigenvector (as a bigwig file) and the compartments(as a gff file), the export function can be used:\n\nlibrary(GenomicRanges)\nlibrary(rtracklayer)\ncoverage(metadata(microC_compts)$eigens, weight = 'eigen') |> export('microC_eigen.bw')\ntopologicalFeatures(microC_compts, \"compartments\") |> export('microC_compartments.gff3')\n\n\n7.1.4 Visualizing compartment tracks\nCompartment tracks should be visualized in a dedicated genome browser, with the phasing track loaded as well, to ensure they are phased accordingly.\nThat being said, it is possible to visualize a genome track in R besides the matching Hi-C contact matrix.\n\nlibrary(ggplot2)\nlibrary(patchwork)\nmicroC <- autocorrelate(microC)\n## \np1 <- plotMatrix(microC, use.scores = 'autocorrelated', scale = 'linear', limits = c(-1, 1), caption = FALSE)\neigen <- coverage(metadata(microC_compts)$eigens, weight = 'eigen')[[1]]\neigen_df <- tibble(pos = cumsum(runLength(eigen)), eigen = runValue(eigen))\np2 <- ggplot(eigen_df, aes(x = pos, y = eigen)) + \n geom_area() + \n theme_void() + \n coord_cartesian(expand = FALSE) + \n labs(x = \"Genomic position\", y = \"Eigenvector value\")\nwrap_plots(p1, p2, ncol = 1, heights = c(10, 1))\n\n\n\n\n\n\n\nHere, we clearly note the concordance between the Hi-C correlation matrix, highlighting correlated interactions between pairs of genomic segments, and the eigenvector representing chromosome segmentation into 2 compartments: A (for positive values) and B (for negative values).\n\n7.1.5 Saddle plots\nSaddle plots are typically used to measure the observed vs. expected interaction scores within or between genomic loci belonging to A and B compartments.\nNon-overlapping genomic windows are grouped in nbins quantiles (typically between 10 and 50 quantiles) according to their A/B compartment eigenvector value, from lowest eigenvector values (i.e. strongest B compartments) to highest eigenvector values (i.e. strongest A compartments). The average observed vs. expected interaction scores are then computed for pairwise eigenvector quantiles and plotted in a 2D heatmap.\n\nlibrary(BiocParallel)\nplotSaddle(microC_compts, nbins = 25, BPPARAM = SerialParam(progressbar = FALSE))\n\n\n\n\n\n\n\nHere, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot."
},
{
"objectID": "topological-features.html#topological-domains",
"href": "topological-features.html#topological-domains",
"title": "\n7 Finding topological features in Hi-C\n",
"section": "\n7.2 Topological domains",
- "text": "7.2 Topological domains\nTopological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries.\n\n\n\n\nThey are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)).\n\n7.2.1 Computing diamond insulation score\nSeveral approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare.\nHiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution.\n\n# - Compute insulation score\nbpparam <- SerialParam(progressbar = FALSE)\nhic <- zoom(microC, 5000) |> \n refocus('chr17:60000001-83257441') |>\n getDiamondInsulation(window_size = 100000, BPPARAM = bpparam) |> \n getBorders()\n## Going through preflight checklist...\n## Scan each window and compute diamond insulation score...\n## Annotating diamond score prominence for each window...\n\nhic\n## `HiCExperiment` object with 2,156,222 contacts over 4,652 regions \n## -------\n## fileName: \"/usr/local/lib/R/site-library/OHCA/extdata/chr17.mcool\" \n## focus: \"chr17:60,000,001-83,257,441\" \n## resolutions(3): 5000 100000 250000\n## active resolution: 5000 \n## interactions: 2156044 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(21) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(1): insulation\n\n\n\n\n\n\n\nNote\n\n\n\nThe getDiamondInsulation function can be parallelized over multiple threads by specifying the Bioconductor generic BPPARAM argument.\n\n\n\n\n\n\n\n\nNote\n\n\n\ngetDiamondInsulation() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA borders topologicalFeatures:\n\n\ntopologicalFeatures(hic, \"borders\")\n## GRanges object with 21 ranges and 1 metadata column:\n## seqnames ranges strand | score\n## <Rle> <IRanges> <Rle> | <numeric>\n## strong chr17 60105001-60110000 * | 0.574760\n## weak chr17 60210001-60215000 * | 0.414425\n## weak chr17 61415001-61420000 * | 0.346668\n## strong chr17 61500001-61505000 * | 0.544336\n## weak chr17 62930001-62935000 * | 0.399794\n## ... ... ... ... . ...\n## weak chr17 78395001-78400000 * | 0.235613\n## weak chr17 79065001-79070000 * | 0.236535\n## weak chr17 80155001-80160000 * | 0.284855\n## weak chr17 81735001-81740000 * | 0.497478\n## strong chr17 81840001-81845000 * | 1.395949\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\nThe calculated insulation scores stored in metadata:\n\n\nmetadata(hic)$insulation\n## GRanges object with 4611 ranges and 8 metadata columns:\n## seqnames ranges strand | bin_id\n## <Rle> <IRanges> <Rle> | <numeric>\n## chr17_60100001_60105000 chr17 60100001-60105000 * | 12020\n## chr17_60105001_60110000 chr17 60105001-60110000 * | 12021\n## chr17_60110001_60115000 chr17 60110001-60115000 * | 12022\n## chr17_60115001_60120000 chr17 60115001-60120000 * | 12023\n## chr17_60120001_60125000 chr17 60120001-60125000 * | 12024\n## ... ... ... ... . ...\n## chr17_83130001_83135000 chr17 83130001-83135000 * | 16626\n## chr17_83135001_83140000 chr17 83135001-83140000 * | 16627\n## chr17_83140001_83145000 chr17 83140001-83145000 * | 16628\n## chr17_83145001_83150000 chr17 83145001-83150000 * | 16629\n## chr17_83150001_83155000 chr17 83150001-83155000 * | 16630\n## weight chr center score insulation\n## <numeric> <Rle> <integer> <numeric> <numeric>\n## chr17_60100001_60105000 0.0406489 chr17 60102500 0.188061 -0.750142\n## chr17_60105001_60110000 0.0255539 chr17 60107500 0.180860 -0.806466\n## chr17_60110001_60115000 NaN chr17 60112500 0.196579 -0.686232\n## chr17_60115001_60120000 NaN chr17 60117500 0.216039 -0.550046\n## chr17_60120001_60125000 NaN chr17 60122500 0.230035 -0.459489\n## ... ... ... ... ... ...\n## chr17_83130001_83135000 0.0314684 chr17 83132500 0.262191 -0.270723\n## chr17_83135001_83140000 0.0307197 chr17 83137500 0.240779 -0.393632\n## chr17_83140001_83145000 0.0322810 chr17 83142500 0.219113 -0.529664\n## chr17_83145001_83150000 0.0280840 chr17 83147500 0.199645 -0.663900\n## chr17_83150001_83155000 0.0272775 chr17 83152500 0.180434 -0.809873\n## min prominence\n## <logical> <numeric>\n## chr17_60100001_60105000 FALSE NA\n## chr17_60105001_60110000 TRUE 0.57476\n## chr17_60110001_60115000 FALSE NA\n## chr17_60115001_60120000 FALSE NA\n## chr17_60120001_60125000 FALSE NA\n## ... ... ...\n## chr17_83130001_83135000 FALSE NA\n## chr17_83135001_83140000 FALSE NA\n## chr17_83140001_83145000 FALSE NA\n## chr17_83145001_83150000 FALSE NA\n## chr17_83150001_83155000 FALSE NA\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\n\n\n7.2.2 Exporting insulation scores tracks\nTo save the diamond insulation scores (as a bigwig file) and the borders (as a bed file), the export function can be used:\n\ncoverage(metadata(hic)$insulation, weight = 'insulation') |> export('microC_insulation.bw')\ntopologicalFeatures(hic, \"borders\") |> export('microC_borders.bed')\n\n\n7.2.3 Visualizing chromatin domains\nInsulation tracks should be visualized in a dedicated genome browser.\nThat being said, it is possible to visualize a genome track in R besides the matching Hi-C contact matrix.\n\nhic <- zoom(hic, 100000)\np1 <- plotMatrix(\n hic, \n use.scores = 'balanced', \n limits = c(-3.5, -1),\n borders = topologicalFeatures(hic, \"borders\"),\n caption = FALSE\n)\ninsulation <- coverage(metadata(hic)$insulation, weight = 'insulation')[[1]]\ninsulation_df <- tibble(pos = cumsum(runLength(insulation)), insulation = runValue(insulation))\np2 <- ggplot(insulation_df, aes(x = pos, y = insulation)) + \n geom_area() + \n theme_void() + \n coord_cartesian(expand = FALSE) + \n labs(x = \"Genomic position\", y = \"Diamond insulation score\")\nwrap_plots(p1, p2, ncol = 1, heights = c(10, 1))\n\n\n\n\n\n\n\nLocal minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds."
+ "text": "7.2 Topological domains\nTopological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries.\n\n\n\n\nThey are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)).\n\n7.2.1 Computing diamond insulation score\nSeveral approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare.\nHiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution.\n\n# - Compute insulation score\nbpparam <- SerialParam(progressbar = FALSE)\nhic <- zoom(microC, 5000) |> \n refocus('chr17:60000001-83257441') |>\n getDiamondInsulation(window_size = 100000, BPPARAM = bpparam) |> \n getBorders()\n## Going through preflight checklist...\n## Scan each window and compute diamond insulation score...\n## Annotating diamond score prominence for each window...\n\nhic\n## `HiCExperiment` object with 2,156,222 contacts over 4,652 regions \n## -------\n## fileName: \"/usr/local/lib/R/site-library/OHCA/extdata/chr17.mcool\" \n## focus: \"chr17:60,000,001-83,257,441\" \n## resolutions(3): 5000 100000 250000\n## active resolution: 5000 \n## interactions: 2156044 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(21) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(1): insulation\n\ngetDiamondInsulation() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA borders topologicalFeatures:\n\n\ntopologicalFeatures(hic, \"borders\")\n## GRanges object with 21 ranges and 1 metadata column:\n## seqnames ranges strand | score\n## <Rle> <IRanges> <Rle> | <numeric>\n## strong chr17 60105001-60110000 * | 0.574760\n## weak chr17 60210001-60215000 * | 0.414425\n## weak chr17 61415001-61420000 * | 0.346668\n## strong chr17 61500001-61505000 * | 0.544336\n## weak chr17 62930001-62935000 * | 0.399794\n## ... ... ... ... . ...\n## weak chr17 78395001-78400000 * | 0.235613\n## weak chr17 79065001-79070000 * | 0.236535\n## weak chr17 80155001-80160000 * | 0.284855\n## weak chr17 81735001-81740000 * | 0.497478\n## strong chr17 81840001-81845000 * | 1.395949\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\nThe calculated insulation scores stored in metadata:\n\n\nmetadata(hic)$insulation\n## GRanges object with 4611 ranges and 8 metadata columns:\n## seqnames ranges strand | bin_id weight chr center score insulation min prominence\n## <Rle> <IRanges> <Rle> | <numeric> <numeric> <Rle> <integer> <numeric> <numeric> <logical> <numeric>\n## chr17_60100001_60105000 chr17 60100001-60105000 * | 12020 0.0406489 chr17 60102500 0.188061 -0.750142 FALSE NA\n## chr17_60105001_60110000 chr17 60105001-60110000 * | 12021 0.0255539 chr17 60107500 0.180860 -0.806466 TRUE 0.57476\n## chr17_60110001_60115000 chr17 60110001-60115000 * | 12022 NaN chr17 60112500 0.196579 -0.686232 FALSE NA\n## chr17_60115001_60120000 chr17 60115001-60120000 * | 12023 NaN chr17 60117500 0.216039 -0.550046 FALSE NA\n## chr17_60120001_60125000 chr17 60120001-60125000 * | 12024 NaN chr17 60122500 0.230035 -0.459489 FALSE NA\n## ... ... ... ... . ... ... ... ... ... ... ... ...\n## chr17_83130001_83135000 chr17 83130001-83135000 * | 16626 0.0314684 chr17 83132500 0.262191 -0.270723 FALSE NA\n## chr17_83135001_83140000 chr17 83135001-83140000 * | 16627 0.0307197 chr17 83137500 0.240779 -0.393632 FALSE NA\n## chr17_83140001_83145000 chr17 83140001-83145000 * | 16628 0.0322810 chr17 83142500 0.219113 -0.529664 FALSE NA\n## chr17_83145001_83150000 chr17 83145001-83150000 * | 16629 0.0280840 chr17 83147500 0.199645 -0.663900 FALSE NA\n## chr17_83150001_83155000 chr17 83150001-83155000 * | 16630 0.0272775 chr17 83152500 0.180434 -0.809873 FALSE NA\n## -------\n## seqinfo: 1 sequence from an unspecified genome\n\n\n\n\n\n\n\nNote\n\n\n\nThe getDiamondInsulation function can be parallelized over multiple threads by specifying the Bioconductor generic BPPARAM argument.\n\n\n\n7.2.2 Exporting insulation scores tracks\nTo save the diamond insulation scores (as a bigwig file) and the borders (as a bed file), the export function can be used:\n\ncoverage(metadata(hic)$insulation, weight = 'insulation') |> export('microC_insulation.bw')\ntopologicalFeatures(hic, \"borders\") |> export('microC_borders.bed')\n\n\n7.2.3 Visualizing chromatin domains\nInsulation tracks should be visualized in a dedicated genome browser.\nThat being said, it is possible to visualize a genome track in R besides the matching Hi-C contact matrix.\n\nhic <- zoom(hic, 100000)\np1 <- plotMatrix(\n hic, \n use.scores = 'balanced', \n limits = c(-3.5, -1),\n borders = topologicalFeatures(hic, \"borders\"),\n caption = FALSE\n)\ninsulation <- coverage(metadata(hic)$insulation, weight = 'insulation')[[1]]\ninsulation_df <- tibble(pos = cumsum(runLength(insulation)), insulation = runValue(insulation))\np2 <- ggplot(insulation_df, aes(x = pos, y = insulation)) + \n geom_area() + \n theme_void() + \n coord_cartesian(expand = FALSE) + \n labs(x = \"Genomic position\", y = \"Diamond insulation score\")\nwrap_plots(p1, p2, ncol = 1, heights = c(10, 1))\n\n\n\n\n\n\n\nLocal minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds."
},
{
"objectID": "topological-features.html#chromatin-loops",
"href": "topological-features.html#chromatin-loops",
"title": "\n7 Finding topological features in Hi-C\n",
"section": "\n7.3 Chromatin loops",
- "text": "7.3 Chromatin loops\n\n7.3.1 chromosight\n\nChromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function.\n\n7.3.1.1 Identifying loops\n\nhic <- HiCool::getLoops(microC, resolution = 5000)\n\nhic\n## `HiCExperiment` object with 917,156 contacts over 100 regions\n## -------\n## fileName: \"/home/rsg/.cache/R/fourDNData/4d434d8538a0_4DNFI9FVHJZQ.mcool\"\n## focus: \"chr17:63,000,001-63,500,000\"\n## resolutions(13): 1000 2000 ... 5000000 10000000\n## active resolution: 5000\n## interactions: 5047\n## scores(2): count balanced\n## topologicalFeatures: compartments(0) borders(0) loops(66411) viewpoints(0)\n## pairsFile: N/A\n## metadata(1): chromosight_args\n\n\n\n\n\n\n\nNote\n\n\n\ngetLoops() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA loops topologicalFeatures:\n\n\ntopologicalFeatures(hic, \"loops\")\n## GInteractions object with 66411 interactions and 5 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 score ## pvalue qvalue\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> ## <numeric> <numeric>\n## [1] chr1 775001-780000 --- chr1 850001-855000 | 155 170 0.334586 2.## 15995e-05 2.162e-05\n## [2] chr1 775001-780000 --- chr1 865001-870000 | 155 173 0.403336 1.## 62900e-07 1.669e-07\n## [3] chr1 865001-870000 --- chr1 890001-895000 | 173 178 0.337344 1.## 91400e-07 1.957e-07\n## [4] chr1 910001-915000 --- chr1 955001-960000 | 182 191 0.639725 0.## 00000e+00 0.000e+00\n## [5] chr1 910001-915000 --- chr1 1055001-1060000 | 182 211 0.521699 0.## 00000e+00 0.000e+00\n## ... ... ... ... ... ... . ... ... ... ## ... ...\n## [66407] chrY 19570001-19575000 --- chrY 19720001-19725000 | 610133 610163 0.315529 3.## 30e-08 3.55e-08\n## [66408] chrY 19705001-19710000 --- chrY 19730001-19735000 | 610160 610165 0.708753 0.## 00e+00 0.00e+00\n## [66409] chrY 19765001-19770000 --- chrY 19800001-19805000 | 610172 610179 0.373635 1.## 10e-09 1.40e-09\n## [66410] chrY 20555001-20560000 --- chrY 20645001-20650000 | 610330 610348 0.603308 0.## 00e+00 0.00e+00\n## [66411] chrY 21015001-21020000 --- chrY 21055001-21060000 | 610422 610430 0.394614 9.## 12e-08 9.45e-08\n## -------\n## regions: 84171 ranges and 0 metadata columns\n## seqinfo: 24 sequences from an unspecified genome; no seqlengths\n\n\nThe arguments used by chromosight, stored in metadata:\n\n\nmetadata(hic)$chromosight_args\n## $`--pattern`\n## [1] \"loops\"\n## \n## $`--dump`\n## [1] \"/data/.cache/R//RtmpSaRwiZ\"\n## \n## $`--inter`\n## [1] FALSE\n## \n## $`--iterations`\n## [1] \"auto\"\n## \n## $`--kernel-config`\n## NULL\n## \n## $`--perc-zero`\n## [1] \"auto\"\n## \n## $`--perc-undetected`\n## [1] \"auto\"\n## \n## $`--tsvd`\n## [1] FALSE\n## \n## $`--win-fmt`\n## [1] \"json\"\n## \n## $`--win-size`\n## [1] \"auto\"\n## \n## $`--no-plotting`\n## [1] TRUE\n## \n## $`--smooth-trend`\n## [1] FALSE\n## \n## $`--norm`\n## [1] \"auto\"\n## \n## $`<contact_map>`\n## [1] \"/home/rsg/.cache/R/fourDNData/4d434d8538a0_4DNFI9FVHJZQ.mcool::/resolutions/5000\"\n## \n## $`--max-dist`\n## [1] \"auto\"\n## \n## $`--min-dist`\n## [1] \"auto\"\n## \n## $`--min-separation`\n## [1] \"auto\"\n## \n## $`--n-mads`\n## [1] 5\n## \n## $`<prefix>`\n## [1] \"chromosight/chromo\"\n## \n## $`--pearson`\n## [1] \"auto\"\n## \n## $`--subsample`\n## [1] \"no\"\n## \n## $`--threads`\n## [1] 1\n\n\n\n\n7.3.1.2 Exporting chromatin loops\n\nloops <- topologicalFeatures(hic, \"loops\")\nloops <- loops[loops$score >= 0.4 & loops$qvalue <= 1e-6]\nGenomicInteractions::export.bedpe(loops, 'loops.bedpe')\n\n\n7.3.1.3 Visualizing chromatin loops\n\n\n\n\n\n\nChromosight users\n\n\n\nIf you are using chromosight directly from the terminal (i.e. outside R), you can import the annotated loops in R as follows:\n\ndf <- readr::read_tsv(\"...\")\nloops <- InteractionSet::GInteractions(\n anchor1 = GenomicRanges::GRanges(\n df$chrom1, IRanges::IRanges(df$start1+1, df$end1)\n ),\n anchor2 = GenomicRanges::GRanges(\n df$chrom2, IRanges::IRanges(df$start2+1, df$end2)\n ),\n bin_id1 = df$bin1, \n bin_id2 = df$bin2, \n score = df$score, \n pvalue = df$pvalue, \n qvalue = df$qvalue\n)\n\n\n\n\nplotMatrix(\n refocus(hic, 'chr17:62500001-63500000') |> zoom(5000), \n loops = loops,\n limits = c(-4, -1.2),\n caption = FALSE\n)\n\n\n\n7.3.2 Other R packages\nA number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications."
+ "text": "7.3 Chromatin loops\n\n7.3.1 chromosight\n\nChromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function.\n\n7.3.1.1 Identifying loops\n\nhic <- HiCool::getLoops(microC, resolution = 5000)\n\nhic\n## `HiCExperiment` object with 917,156 contacts over 100 regions\n## -------\n## fileName: \"/home/rsg/.cache/R/fourDNData/4d434d8538a0_4DNFI9FVHJZQ.mcool\"\n## focus: \"chr17:63,000,001-63,500,000\"\n## resolutions(13): 1000 2000 ... 5000000 10000000\n## active resolution: 5000\n## interactions: 5047\n## scores(2): count balanced\n## topologicalFeatures: compartments(0) borders(0) loops(66411) viewpoints(0)\n## pairsFile: N/A\n## metadata(1): chromosight_args\n\ngetLoops() is an endomorphism: it returns the original object, enriched with two new pieces of information:\n\nA loops topologicalFeatures:\n\n\ntopologicalFeatures(hic, \"loops\")\n## GInteractions object with 66411 interactions and 5 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 score ## pvalue qvalue\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> ## <numeric> <numeric>\n## [1] chr1 775001-780000 --- chr1 850001-855000 | 155 170 0.334586 2.## 15995e-05 2.162e-05\n## [2] chr1 775001-780000 --- chr1 865001-870000 | 155 173 0.403336 1.## 62900e-07 1.669e-07\n## [3] chr1 865001-870000 --- chr1 890001-895000 | 173 178 0.337344 1.## 91400e-07 1.957e-07\n## [4] chr1 910001-915000 --- chr1 955001-960000 | 182 191 0.639725 0.## 00000e+00 0.000e+00\n## [5] chr1 910001-915000 --- chr1 1055001-1060000 | 182 211 0.521699 0.## 00000e+00 0.000e+00\n## ... ... ... ... ... ... . ... ... ... ## ... ...\n## [66407] chrY 19570001-19575000 --- chrY 19720001-19725000 | 610133 610163 0.315529 3.## 30e-08 3.55e-08\n## [66408] chrY 19705001-19710000 --- chrY 19730001-19735000 | 610160 610165 0.708753 0.## 00e+00 0.00e+00\n## [66409] chrY 19765001-19770000 --- chrY 19800001-19805000 | 610172 610179 0.373635 1.## 10e-09 1.40e-09\n## [66410] chrY 20555001-20560000 --- chrY 20645001-20650000 | 610330 610348 0.603308 0.## 00e+00 0.00e+00\n## [66411] chrY 21015001-21020000 --- chrY 21055001-21060000 | 610422 610430 0.394614 9.## 12e-08 9.45e-08\n## -------\n## regions: 84171 ranges and 0 metadata columns\n## seqinfo: 24 sequences from an unspecified genome; no seqlengths\n\n\nThe arguments used by chromosight, stored in metadata:\n\n\nmetadata(hic)$chromosight_args\n## $`--pattern`\n## [1] \"loops\"\n## \n## $`--dump`\n## [1] \"/data/.cache/R//RtmpSaRwiZ\"\n## \n## $`--inter`\n## [1] FALSE\n## \n## $`--iterations`\n## [1] \"auto\"\n## \n## $`--kernel-config`\n## NULL\n## \n## $`--perc-zero`\n## [1] \"auto\"\n## \n## $`--perc-undetected`\n## [1] \"auto\"\n## \n## $`--tsvd`\n## [1] FALSE\n## \n## $`--win-fmt`\n## [1] \"json\"\n## \n## $`--win-size`\n## [1] \"auto\"\n## \n## $`--no-plotting`\n## [1] TRUE\n## \n## $`--smooth-trend`\n## [1] FALSE\n## \n## $`--norm`\n## [1] \"auto\"\n## \n## $`<contact_map>`\n## [1] \"/home/rsg/.cache/R/fourDNData/4d434d8538a0_4DNFI9FVHJZQ.mcool::/resolutions/5000\"\n## \n## $`--max-dist`\n## [1] \"auto\"\n## \n## $`--min-dist`\n## [1] \"auto\"\n## \n## $`--min-separation`\n## [1] \"auto\"\n## \n## $`--n-mads`\n## [1] 5\n## \n## $`<prefix>`\n## [1] \"chromosight/chromo\"\n## \n## $`--pearson`\n## [1] \"auto\"\n## \n## $`--subsample`\n## [1] \"no\"\n## \n## $`--threads`\n## [1] 1\n\n\n7.3.1.2 Importing loops from files\nIf you are using chromosight directly from the terminal (i.e. outside R), you can import the annotated loops in R as follows:\n\ndf <- readr::read_tsv(\"...\") ## Here put your loops file\nloops <- InteractionSet::GInteractions(\n anchor1 = GenomicRanges::GRanges(\n df$chrom1, IRanges::IRanges(df$start1+1, df$end1)\n ),\n anchor2 = GenomicRanges::GRanges(\n df$chrom2, IRanges::IRanges(df$start2+1, df$end2)\n ),\n bin_id1 = df$bin1, \n bin_id2 = df$bin2, \n score = df$score, \n pvalue = df$pvalue, \n qvalue = df$qvalue\n)\n\n\n7.3.1.3 Exporting chromatin loops\n\nloops <- topologicalFeatures(hic, \"loops\")\nloops <- loops[loops$score >= 0.4 & loops$qvalue <= 1e-6]\nGenomicInteractions::export.bedpe(loops, 'loops.bedpe')\n\n\n7.3.1.4 Visualizing chromatin loops\n\nplotMatrix(\n refocus(hic, 'chr17:62500001-63500000') |> zoom(5000), \n loops = loops,\n limits = c(-4, -1.2),\n caption = FALSE\n)\n\n\n\n7.3.2 Other R packages\nA number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications."
},
{
"objectID": "disseminating.html",
@@ -291,7 +291,7 @@
"href": "interoperability.html#hicrep",
"title": "\n9 Interoperability: using HiCExperiment with other R packages\n",
"section": "\n9.1 HiCrep",
- "text": "9.1 HiCrep\nhicrep is a popular package to compute stratum-adjusted correlations between Hi-C datasets (Yang et al. (2017)). “Stratum” refers to the distance from the main diagonal: with increase distance from the main diagonal, interactions of the DNA polymer are bound to decrease. hicrep computes a “per-stratum” correlation score and computes a weighted average correlation for entire chromosomes.\n\n\n\n\n\n\nInstalling hicrep\n\n\n\nhicrep package has been available from Bioconductor for many years but has been withdrawn from their repositories at some point. You can always install hicrep directly from its GitHub repository as follows:\n\nremotes::install_github('TaoYang-dev/hicrep')\n\n\n\nIn order to use hicrep, we first need to create two HiCExperiment objects.\n\nlibrary(InteractionSet)\nlibrary(HiCExperiment)\nlibrary(HiContactsData)\n\n# ---- This downloads example `.mcool` and `.pairs` files and caches them locally \ncoolf_wt <- HiContactsData('yeast_wt', 'mcool')\ncoolf_eco1 <- HiContactsData('yeast_eco1', 'mcool')\n\n\nhic_wt <- import(coolf_wt, format = 'cool')\nhic_eco1 <- import(coolf_eco1, format = 'cool')\n\nWe can now run the main get.scc function from hicrep. The documentation for this function is available from the console by typing ?hicrep::get.scc. More information is also available from the GitHub page. It informs the end user that the input for this function should be two intra-chromosomal Hi-C raw count matrices in square (optionally sparse) format.\n\nhic_wt\n## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 2945692 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nas.matrix(hic_wt[\"IV\"], use.scores = 'count')[1:10, 1:10]\n## 10 x 10 sparse Matrix of class \"dgTMatrix\"\n## \n## [1,] . 1 . . 1 . . . . .\n## [2,] 1 . . . . . . . . .\n## [3,] . . . . . . . . . .\n## [4,] . . . . . . . . . .\n## [5,] 1 . . . . . . . 1 .\n## [6,] . . . . . . . . . .\n## [7,] . . . . . . . . . .\n## [8,] . . . . . . . . 1 .\n## [9,] . . . . 1 . . 1 . .\n## [10,] . . . . . . . . . .\n\nlibrary(hicrep)\nscc <- get.scc(\n as.matrix(hic_wt[\"IV\"], use.scores = 'count'), \n as.matrix(hic_eco1[\"IV\"], use.scores = 'count'), \n resol = 1000, h = 25, lbr = 5000, ubr = 50000\n)\nscc\n## $corr\n## [1] 0.9412784 0.9410680 0.9408082 0.9404796 0.9404544 0.9402584 0.9400710\n## [8] 0.9398965 0.9397935 0.9397027 0.9396112 0.9393001 0.9393180 0.9390608\n## [15] 0.9391645 0.9394670 0.9395147 0.9396798 0.9397547 0.9398291 0.9401371\n## [22] 0.9402369 0.9402251 0.9404188 0.9404327 0.9403101 0.9402634 0.9401683\n## [29] 0.9401746 0.9394978 0.9391277 0.9381969 0.9371561 0.9357012 0.9342620\n## [36] 0.9324366 0.9302835 0.9277556 0.9247008 0.9208466 0.9166648 0.9120206\n## [43] 0.9060828 0.9002430 0.8931754 0.8847777\n## \n## $wei\n## [1] 123.2500 123.1667 123.0833 123.0000 122.9167 122.8333 122.7500 122.6667\n## [9] 122.5833 122.5000 122.4167 122.3333 122.2500 122.1667 122.0833 122.0000\n## [17] 121.9167 121.8333 121.7500 121.6667 121.5833 121.5000 121.4167 121.3333\n## [25] 121.2500 121.1667 121.0833 121.0000 120.9167 120.8333 120.7500 120.6667\n## [33] 120.5833 120.5000 120.4167 120.3333 120.2500 120.1667 120.0833 120.0000\n## [41] 119.9167 119.8333 119.7500 119.6667 119.5833 119.5000\n## \n## $scc\n## [,1]\n## [1,] 0.9334303\n## \n## $std\n## [1] 0.001994845\n\nscc$scc\n## [,1]\n## [1,] 0.9334303"
+ "text": "9.1 HiCrep\nhicrep is a popular package to compute stratum-adjusted correlations between Hi-C datasets (Yang et al. (2017)). “Stratum” refers to the distance from the main diagonal: with increase distance from the main diagonal, interactions of the DNA polymer are bound to decrease. hicrep computes a “per-stratum” correlation score and computes a weighted average correlation for entire chromosomes.\n\n\n\n\n\n\nInstalling hicrep\n\n\n\nhicrep package has been available from Bioconductor for many years but has been withdrawn from their repositories at some point. You can always install hicrep directly from its GitHub repository as follows:\n\nremotes::install_github('TaoYang-dev/hicrep')\n\n\n\nIn order to use hicrep, we first need to create two HiCExperiment objects.\n\nlibrary(InteractionSet)\nlibrary(HiCExperiment)\nlibrary(HiContactsData)\n\n# ---- This downloads example `.mcool` and `.pairs` files and caches them locally \ncoolf_wt <- HiContactsData('yeast_wt', 'mcool')\ncoolf_eco1 <- HiContactsData('yeast_eco1', 'mcool')\n\n\nhic_wt <- import(coolf_wt, format = 'cool')\nhic_eco1 <- import(coolf_eco1, format = 'cool')\n\nWe can now run the main get.scc function from hicrep. The documentation for this function is available from the console by typing ?hicrep::get.scc. More information is also available from the GitHub page. It informs the end user that the input for this function should be two intra-chromosomal Hi-C raw count matrices in square (optionally sparse) format.\n\nhic_wt\n## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"whole genome\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 2945692 \n## scores(2): count balanced \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) \n## pairsFile: N/A \n## metadata(0):\n\nas.matrix(hic_wt[\"IV\"], use.scores = 'count')[1:10, 1:10]\n## 10 x 10 sparse Matrix of class \"dgTMatrix\"\n## \n## [1,] . 1 . . 1 . . . . .\n## [2,] 1 . . . . . . . . .\n## [3,] . . . . . . . . . .\n## [4,] . . . . . . . . . .\n## [5,] 1 . . . . . . . 1 .\n## [6,] . . . . . . . . . .\n## [7,] . . . . . . . . . .\n## [8,] . . . . . . . . 1 .\n## [9,] . . . . 1 . . 1 . .\n## [10,] . . . . . . . . . .\n\nlibrary(hicrep)\nscc <- get.scc(\n as.matrix(hic_wt[\"IV\"], use.scores = 'count'), \n as.matrix(hic_eco1[\"IV\"], use.scores = 'count'), \n resol = 1000, h = 25, lbr = 5000, ubr = 50000\n)\nscc\n## $corr\n## [1] 0.9412784 0.9410680 0.9408082 0.9404796 0.9404544 0.9402584 0.9400710 0.9398965 0.9397935 0.9397027 0.9396112 0.9393001 0.9393180 0.9390608 0.9391645 0.9394670 0.9395147 0.9396798 0.9397547 0.9398291 0.9401371 0.9402369 0.9402251 0.9404188 0.9404327 0.9403101 0.9402634 0.9401683 0.9401746 0.9394978 0.9391277 0.9381969 0.9371561 0.9357012 0.9342620 0.9324366 0.9302835 0.9277556 0.9247008 0.9208466 0.9166648 0.9120206 0.9060828 0.9002430 0.8931754 0.8847777\n## \n## $wei\n## [1] 123.2500 123.1667 123.0833 123.0000 122.9167 122.8333 122.7500 122.6667 122.5833 122.5000 122.4167 122.3333 122.2500 122.1667 122.0833 122.0000 121.9167 121.8333 121.7500 121.6667 121.5833 121.5000 121.4167 121.3333 121.2500 121.1667 121.0833 121.0000 120.9167 120.8333 120.7500 120.6667 120.5833 120.5000 120.4167 120.3333 120.2500 120.1667 120.0833 120.0000 119.9167 119.8333 119.7500 119.6667 119.5833 119.5000\n## \n## $scc\n## [,1]\n## [1,] 0.9334303\n## \n## $std\n## [1] 0.001994845\n\nscc$scc\n## [,1]\n## [1,] 0.9334303"
},
{
"objectID": "interoperability.html#multihiccompare",
@@ -312,7 +312,7 @@
"href": "interoperability.html#gothic",
"title": "\n9 Interoperability: using HiCExperiment with other R packages\n",
"section": "\n9.4 GOTHiC",
- "text": "9.4 GOTHiC\nGOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)).\n\n\n\n\n\n\nUsing the GOTHiC function\n\n\n\nUnfortunately, the main GOTHiC function require two .bam files as input. These files are often deleted due to their larger size, while the filtered pairs file itself is retained.\nMoreover, the internal nuts and bolts of the main GOTHiC function perform several operations that are not required in modern workflows:\n\n\nFiltering pairs from same restriction fragment; this step is now usually taken care of automatically, e.g. with HiCool Hi-C processing package.\n\nFiltering short-range pairs; the GOTHiC package hard-codes a 10kb lower threshold for minimum pair distance. More advanced optimized filtering approaches have been implemented since then, to circumvent the need for such hard-coded threshold.\n\nBinning pairs; this step is also already taken care of, when working with Hi-C matrices in modern formats, e.g. with .(m)cool files.\n\n\n\nBased on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R.\n\nShow the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) {\n\n if (length(trans(x)) != 0) stop(\"Only `cis` interactions can be used here.\")\n ints <- interactions(x) |>\n as.data.frame() |> \n select(seqnames1, start1, seqnames2, start2, count) |>\n dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |>\n mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |>\n mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2))\n \n numberOfReadPairs <- sum(ints$frequencies)\n all_bins <- unique(c(unique(ints$int1), unique(ints$int2)))\n all_bins <- sort(all_bins)\n upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2\n\n cov <- ints |> \n group_by(int1) |> \n tally(frequencies) |> \n full_join(ints |> \n group_by(int2) |> \n tally(frequencies), \n by = c('int1' = 'int2')\n ) |> \n rowwise() |> \n mutate(coverage = sum(n.x, n.y, na.rm = TRUE)) |> \n ungroup() |>\n mutate(relative_coverage = coverage/sum(coverage))\n \n results <- mutate(ints,\n cov1 = left_join(ints, select(cov, int1, relative_coverage), by = c('int1' = 'int1'))$relative_coverage, \n cov2 = left_join(ints, select(cov, int1, relative_coverage), by = c('int2' = 'int1'))$relative_coverage,\n probability = cov1 * cov2 * 2 * 1/(1 - sum(cov$relative_coverage^2)),\n predicted = probability * numberOfReadPairs\n ) |> \n rowwise() |>\n mutate(\n pvalue = binom.test(\n frequencies, \n numberOfReadPairs, \n probability,\n alternative = \"greater\"\n )$p.value\n ) |> \n ungroup() |> \n mutate(\n logFoldChange = log2(frequencies / predicted), \n qvalue = stats::p.adjust(pvalue, method = \"BH\", n = upperhalfBinNumber)\n )\n\n scores(x, \"probability\") <- results$probability\n scores(x, \"predicted\") <- results$predicted\n scores(x, \"pvalue\") <- results$pvalue\n scores(x, \"qvalue\") <- results$qvalue\n scores(x, \"logFoldChange\") <- results$logFoldChange\n\n return(x)\n\n} \n\n\n\nres <- GOTHiC_binomial(hic[\"II\"])\nres\n## `HiCExperiment` object with 471,364 contacts over 802 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a9a4dc30249_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 74360 \n## scores(7): count balanced probability predicted pvalue qvalue logFoldChange \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) domain(52) \n## pairsFile: N/A \n## metadata(0):\n\ninteractions(res)\n## GInteractions object with 74360 interactions and 9 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric>\n## [1] II 1-1000 --- II 1001-2000 | 231\n## [2] II 1-1000 --- II 5001-6000 | 231\n## [3] II 1-1000 --- II 6001-7000 | 231\n## [4] II 1-1000 --- II 8001-9000 | 231\n## [5] II 1-1000 --- II 9001-10000 | 231\n## ... ... ... ... ... ... . ...\n## [74356] II 807001-808000 --- II 809001-810000 | 1038\n## [74357] II 807001-808000 --- II 810001-811000 | 1038\n## [74358] II 808001-809000 --- II 808001-809000 | 1039\n## [74359] II 808001-809000 --- II 809001-810000 | 1039\n## [74360] II 809001-810000 --- II 809001-810000 | 1040\n## bin_id2 count balanced probability predicted pvalue\n## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>\n## [1] 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03\n## [2] 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05\n## [3] 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03\n## [4] 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04\n## [5] 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06\n## ... ... ... ... ... ... ...\n## [74356] 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11\n## [74357] 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02\n## [74358] 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02\n## [74359] 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13\n## [74360] 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03\n## qvalue logFoldChange\n## <numeric> <numeric>\n## [1] 0.063385760 8.08079\n## [2] 0.001926954 7.23674\n## [3] 0.150288341 6.70775\n## [4] 0.009806734 5.97810\n## [5] 0.000173165 6.43158\n## ... ... ...\n## [74356] 1.07966e-09 5.45977\n## [74357] 3.38098e-01 5.39837\n## [74358] 5.49519e-01 4.60031\n## [74359] 5.77259e-12 7.18423\n## [74360] 2.79707e-02 5.15344\n## -------\n## regions: 802 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome"
+ "text": "9.4 GOTHiC\nGOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)).\n\n\n\n\n\n\nUsing the GOTHiC function\n\n\n\nUnfortunately, the main GOTHiC function require two .bam files as input. These files are often deleted due to their larger size, while the filtered pairs file itself is retained.\nMoreover, the internal nuts and bolts of the main GOTHiC function perform several operations that are not required in modern workflows:\n\n\nFiltering pairs from same restriction fragment; this step is now usually taken care of automatically, e.g. with HiCool Hi-C processing package.\n\nFiltering short-range pairs; the GOTHiC package hard-codes a 10kb lower threshold for minimum pair distance. More advanced optimized filtering approaches have been implemented since then, to circumvent the need for such hard-coded threshold.\n\nBinning pairs; this step is also already taken care of, when working with Hi-C matrices in modern formats, e.g. with .(m)cool files.\n\n\n\nBased on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R.\n\nShow the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) {\n\n if (length(trans(x)) != 0) stop(\"Only `cis` interactions can be used here.\")\n ints <- interactions(x) |>\n as.data.frame() |> \n select(seqnames1, start1, seqnames2, start2, count) |>\n dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |>\n mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |>\n mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2))\n \n numberOfReadPairs <- sum(ints$frequencies)\n all_bins <- unique(c(unique(ints$int1), unique(ints$int2)))\n all_bins <- sort(all_bins)\n upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2\n\n cov <- ints |> \n group_by(int1) |> \n tally(frequencies) |> \n full_join(ints |> \n group_by(int2) |> \n tally(frequencies), \n by = c('int1' = 'int2')\n ) |> \n rowwise() |> \n mutate(coverage = sum(n.x, n.y, na.rm = TRUE)) |> \n ungroup() |>\n mutate(relative_coverage = coverage/sum(coverage))\n \n results <- mutate(ints,\n cov1 = left_join(ints, select(cov, int1, relative_coverage), by = c('int1' = 'int1'))$relative_coverage, \n cov2 = left_join(ints, select(cov, int1, relative_coverage), by = c('int2' = 'int1'))$relative_coverage,\n probability = cov1 * cov2 * 2 * 1/(1 - sum(cov$relative_coverage^2)),\n predicted = probability * numberOfReadPairs\n ) |> \n rowwise() |>\n mutate(\n pvalue = binom.test(\n frequencies, \n numberOfReadPairs, \n probability,\n alternative = \"greater\"\n )$p.value\n ) |> \n ungroup() |> \n mutate(\n logFoldChange = log2(frequencies / predicted), \n qvalue = stats::p.adjust(pvalue, method = \"BH\", n = upperhalfBinNumber)\n )\n\n scores(x, \"probability\") <- results$probability\n scores(x, \"predicted\") <- results$predicted\n scores(x, \"pvalue\") <- results$pvalue\n scores(x, \"qvalue\") <- results$qvalue\n scores(x, \"logFoldChange\") <- results$logFoldChange\n\n return(x)\n\n} \n\n\n\nres <- GOTHiC_binomial(hic[\"II\"])\nres\n## `HiCExperiment` object with 471,364 contacts over 802 regions \n## -------\n## fileName: \"/github/home/.cache/R/ExperimentHub/1a92248c093f_7752\" \n## focus: \"II\" \n## resolutions(5): 1000 2000 4000 8000 16000\n## active resolution: 1000 \n## interactions: 74360 \n## scores(7): count balanced probability predicted pvalue qvalue logFoldChange \n## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) domain(52) \n## pairsFile: N/A \n## metadata(0):\n\ninteractions(res)\n## GInteractions object with 74360 interactions and 9 metadata columns:\n## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange\n## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>\n## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079\n## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674\n## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775\n## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810\n## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158\n## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ...\n## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977\n## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837\n## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031\n## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423\n## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344\n## -------\n## regions: 802 ranges and 4 metadata columns\n## seqinfo: 16 sequences from an unspecified genome"
},
{
"objectID": "interoperability.html#references",
@@ -326,7 +326,7 @@
"href": "interoperability.html#session-info",
"title": "\n9 Interoperability: using HiCExperiment with other R packages\n",
"section": "Session info",
- "text": "Session info\n\n## ─ Session info ────────────────────────────────────────────────────────────\n## setting value\n## version R version 4.3.1 (2023-06-16)\n## os Ubuntu 22.04.3 LTS\n## system x86_64, linux-gnu\n## ui X11\n## language (EN)\n## collate en_US.UTF-8\n## ctype en_US.UTF-8\n## tz Etc/UTC\n## date 2023-10-19\n## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)\n## \n## ─ Packages ────────────────────────────────────────────────────────────────\n## package * version date (UTC) lib source\n## abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.1)\n## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1)\n## AnnotationDbi 1.63.2 2023-07-02 [1] Bioconductor\n## AnnotationHub * 3.9.2 2023-08-24 [1] Bioconductor\n## Biobase * 2.61.0 2023-04-25 [1] Bioconductor\n## BiocFileCache * 2.9.1 2023-07-12 [1] Bioconductor\n## BiocGenerics * 0.47.0 2023-04-25 [1] Bioconductor\n## BiocIO 1.11.0 2023-04-25 [1] Bioconductor\n## BiocManager 1.30.22 2023-08-08 [1] CRAN (R 4.3.1)\n## BiocParallel 1.35.4 2023-08-17 [1] Bioconductor\n## BiocVersion 3.18.0 2023-04-25 [1] Bioconductor\n## Biostrings 2.69.2 2023-07-02 [1] Bioconductor\n## bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.1)\n## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1)\n## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1)\n## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1)\n## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)\n## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1)\n## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)\n## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1)\n## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)\n## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)\n## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1)\n## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1)\n## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1)\n## dbplyr * 2.3.4 2023-09-26 [1] CRAN (R 4.3.1)\n## DelayedArray 0.27.10 2023-07-28 [1] Bioconductor\n## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)\n## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1)\n## edgeR 3.99.3 2023-10-16 [1] Bioconductor\n## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)\n## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1)\n## ExperimentHub * 2.9.1 2023-07-12 [1] Bioconductor\n## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)\n## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)\n## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1)\n## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)\n## GenomeInfoDb * 1.37.6 2023-10-02 [1] Bioconductor\n## GenomeInfoDbData 1.2.11 2023-10-19 [1] Bioconductor\n## GenomicAlignments 1.37.0 2023-04-25 [1] Bioconductor\n## GenomicRanges * 1.53.2 2023-10-08 [1] Bioconductor\n## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)\n## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)\n## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1)\n## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1)\n## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1)\n## HiCcompare 1.23.1 2023-06-04 [1] Bioconductor\n## HiCExperiment * 1.1.2 2023-09-04 [1] Bioconductor\n## HiContactsData * 1.3.0 2023-04-27 [1] Bioconductor\n## hicrep * 1.12.2 2023-10-19 [1] Github (TaoYang-dev/hicrep@e485dfa)\n## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)\n## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1)\n## httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.1)\n## httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.1)\n## InteractionSet * 1.29.1 2023-06-14 [1] Bioconductor\n## interactiveDisplayBase 1.39.0 2023-04-25 [1] Bioconductor\n## IRanges * 2.35.3 2023-10-12 [1] Bioconductor\n## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)\n## KEGGREST 1.41.4 2023-09-25 [1] Bioconductor\n## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1)\n## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1)\n## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1)\n## lattice 0.21-9 2023-10-01 [1] CRAN (R 4.3.1)\n## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)\n## limma 3.57.10 2023-10-11 [1] Bioconductor\n## locfit 1.5-9.8 2023-06-11 [1] CRAN (R 4.3.1)\n## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)\n## MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.1)\n## Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1)\n## MatrixGenerics * 1.13.1 2023-07-25 [1] Bioconductor\n## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1)\n## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)\n## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1)\n## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1)\n## multiHiCcompare * 1.19.1 2023-07-02 [1] Bioconductor\n## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1)\n## nlme 3.1-163 2023-08-09 [1] CRAN (R 4.3.1)\n## pbapply 1.7-2 2023-06-27 [1] CRAN (R 4.3.1)\n## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1)\n## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)\n## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)\n## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1)\n## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1)\n## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1)\n## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)\n## qqman 0.1.9 2023-08-23 [1] CRAN (R 4.3.1)\n## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)\n## rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.1)\n## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1)\n## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)\n## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1)\n## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1)\n## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1)\n## rhdf5 2.45.1 2023-07-10 [1] Bioconductor\n## rhdf5filters 1.13.5 2023-07-19 [1] Bioconductor\n## Rhdf5lib 1.23.2 2023-09-10 [1] Bioconductor\n## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1)\n## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)\n## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)\n## Rsamtools 2.17.0 2023-04-25 [1] Bioconductor\n## RSQLite 2.3.1 2023-04-03 [1] CRAN (R 4.3.1)\n## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)\n## rtracklayer 1.61.1 2023-08-15 [1] Bioconductor\n## S4Arrays 1.1.6 2023-08-30 [1] Bioconductor\n## S4Vectors * 0.39.3 2023-10-11 [1] Bioconductor\n## scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.1)\n## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)\n## shiny 1.7.5.1 2023-10-14 [1] CRAN (R 4.3.1)\n## SparseArray 1.1.12 2023-08-31 [1] Bioconductor\n## statmod 1.5.0 2023-01-06 [1] CRAN (R 4.3.1)\n## strawr 0.0.91 2023-03-29 [1] CRAN (R 4.3.1)\n## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.1)\n## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)\n## SummarizedExperiment * 1.31.1 2023-05-01 [1] Bioconductor\n## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)\n## tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.1)\n## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1)\n## TopDom * 0.10.1 2021-05-06 [1] CRAN (R 4.3.1)\n## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1)\n## utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.1)\n## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)\n## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1)\n## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1)\n## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1)\n## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1)\n## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)\n## XVector 0.41.1 2023-05-03 [1] Bioconductor\n## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.1)\n## zlibbioc 1.47.0 2023-04-25 [1] Bioconductor\n## \n## [1] /usr/local/lib/R/site-library\n## [2] /usr/local/lib/R/library\n## \n## ───────────────────────────────────────────────────────────────────────────"
+ "text": "Session info\n\n## ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n## setting value\n## version R version 4.3.1 (2023-06-16)\n## os Ubuntu 22.04.3 LTS\n## system x86_64, linux-gnu\n## ui X11\n## language (EN)\n## collate en_US.UTF-8\n## ctype en_US.UTF-8\n## tz Etc/UTC\n## date 2023-10-30\n## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)\n## \n## ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n## package * version date (UTC) lib source\n## abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.1)\n## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1)\n## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor\n## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor\n## Biobase * 2.62.0 2023-10-24 [1] Bioconductor\n## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor\n## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor\n## BiocIO 1.12.0 2023-10-24 [1] Bioconductor\n## BiocManager 1.30.22 2023-08-08 [1] CRAN (R 4.3.1)\n## BiocParallel 1.36.0 2023-10-24 [1] Bioconductor\n## BiocVersion 3.18.0 2023-04-25 [1] Bioconductor\n## Biostrings 2.70.1 2023-10-25 [1] Bioconductor\n## bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.1)\n## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1)\n## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1)\n## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1)\n## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)\n## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1)\n## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)\n## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1)\n## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)\n## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)\n## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1)\n## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1)\n## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1)\n## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1)\n## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor\n## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)\n## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1)\n## edgeR 4.0.0 2023-10-24 [1] Bioconductor\n## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)\n## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1)\n## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor\n## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)\n## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)\n## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1)\n## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)\n## GenomeInfoDb * 1.38.0 2023-10-24 [1] Bioconductor\n## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor\n## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor\n## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor\n## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)\n## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)\n## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1)\n## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1)\n## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1)\n## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor\n## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor\n## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor\n## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa)\n## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)\n## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1)\n## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1)\n## httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.1)\n## InteractionSet * 1.30.0 2023-10-24 [1] Bioconductor\n## interactiveDisplayBase 1.40.0 2023-10-24 [1] Bioconductor\n## IRanges * 2.36.0 2023-10-24 [1] Bioconductor\n## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)\n## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor\n## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1)\n## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1)\n## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1)\n## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1)\n## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)\n## limma 3.58.0 2023-10-24 [1] Bioconductor\n## locfit 1.5-9.8 2023-06-11 [1] CRAN (R 4.3.1)\n## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)\n## MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.1)\n## Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1)\n## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor\n## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1)\n## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)\n## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1)\n## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1)\n## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor\n## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1)\n## nlme 3.1-163 2023-08-09 [1] CRAN (R 4.3.1)\n## pbapply 1.7-2 2023-06-27 [1] CRAN (R 4.3.1)\n## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1)\n## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)\n## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)\n## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1)\n## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1)\n## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1)\n## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)\n## qqman 0.1.9 2023-08-23 [1] CRAN (R 4.3.1)\n## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)\n## rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.1)\n## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1)\n## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)\n## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1)\n## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1)\n## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1)\n## rhdf5 2.46.0 2023-10-24 [1] Bioconductor\n## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor\n## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor\n## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1)\n## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)\n## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)\n## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor\n## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1)\n## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)\n## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor\n## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor\n## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor\n## scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.1)\n## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)\n## shiny 1.7.5.1 2023-10-14 [1] CRAN (R 4.3.1)\n## SparseArray 1.2.0 2023-10-24 [1] Bioconductor\n## statmod 1.5.0 2023-01-06 [1] CRAN (R 4.3.1)\n## strawr 0.0.91 2023-03-29 [1] CRAN (R 4.3.1)\n## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.1)\n## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)\n## SummarizedExperiment * 1.32.0 2023-10-24 [1] Bioconductor\n## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)\n## tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.1)\n## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1)\n## TopDom * 0.10.1 2021-05-06 [1] CRAN (R 4.3.1)\n## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1)\n## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)\n## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)\n## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1)\n## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1)\n## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1)\n## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1)\n## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)\n## XVector 0.42.0 2023-10-24 [1] Bioconductor\n## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.1)\n## zlibbioc 1.48.0 2023-10-24 [1] Bioconductor\n## \n## [1] /usr/local/lib/R/site-library\n## [2] /usr/local/lib/R/library\n## \n## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────"
},
{
"objectID": "workflow-yeast.html#recovering-data-from-sra",
diff --git a/sitemap.xml b/sitemap.xml
index 36e09c8..6cb5d2e 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,58 +2,58 @@
https://js2264.github.io/OHCA/index.html
- 2023-10-19T10:18:04.176Z
+ 2023-10-30T10:47:54.287Zhttps://js2264.github.io/OHCA/preamble.html
- 2023-10-19T10:18:04.184Z
+ 2023-10-30T10:47:54.295Zhttps://js2264.github.io/OHCA/principles.html
- 2023-10-19T10:18:04.224Z
+ 2023-10-30T10:47:54.323Zhttps://js2264.github.io/OHCA/data-representation.html
- 2023-10-19T10:18:04.396Z
+ 2023-10-30T10:47:54.447Zhttps://js2264.github.io/OHCA/parsing.html
- 2023-10-19T10:18:04.460Z
+ 2023-10-30T10:47:54.491Zhttps://js2264.github.io/OHCA/visualization.html
- 2023-10-19T10:18:04.488Z
+ 2023-10-30T10:47:54.511Zhttps://js2264.github.io/OHCA/matrix-centric.html
- 2023-10-19T10:18:04.564Z
+ 2023-10-30T10:47:54.543Zhttps://js2264.github.io/OHCA/interactions-centric.html
- 2023-10-19T10:18:04.624Z
+ 2023-10-30T10:47:54.579Zhttps://js2264.github.io/OHCA/topological-features.html
- 2023-10-19T10:18:04.676Z
+ 2023-10-30T10:47:54.619Zhttps://js2264.github.io/OHCA/disseminating.html
- 2023-10-19T10:18:04.700Z
+ 2023-10-30T10:47:54.635Zhttps://js2264.github.io/OHCA/interoperability.html
- 2023-10-19T10:18:04.760Z
+ 2023-10-30T10:47:54.699Zhttps://js2264.github.io/OHCA/workflow-yeast.html
- 2023-10-19T10:18:04.848Z
+ 2023-10-30T10:47:54.791Zhttps://js2264.github.io/OHCA/workflow-chicken.html
- 2023-10-19T10:18:04.884Z
+ 2023-10-30T10:47:54.815Zhttps://js2264.github.io/OHCA/workflow-centros.html
- 2023-10-19T10:18:04.924Z
+ 2023-10-30T10:47:54.839Z
diff --git a/topological-features.html b/topological-features.html
index 5132e5d..4399a31 100644
--- a/topological-features.html
+++ b/topological-features.html
@@ -392,16 +392,6 @@
## pairsFile: N/A ## metadata(1): eigens
-
-
-
-
-
-
-Note
-
-
-
getCompartments() is an endomorphism: it returns the original object, enriched with two new pieces of information:
To save the eigenvector (as a bigwig file) and the compartments(as a gff file), the export function can be used:
@@ -537,20 +486,7 @@
-
Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments.
-
-
-
-
-
-
-Note
-
-
-
-
Only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot.
-
-
+
Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot.
7.2 Topological domains
Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries.
@@ -587,29 +523,6 @@
## pairsFile: N/A ## metadata(1): insulation
-
-
-
-
-
-
-Note
-
-
-
-
The getDiamondInsulation function can be parallelized over multiple threads by specifying the Bioconductor generic BPPARAM argument.
-
-
-
-
-
-
-
-
-Note
-
-
-
getDiamondInsulation() is an endomorphism: it returns the original object, enriched with two new pieces of information:
df<-readr::read_tsv("...")## Here put your loops fileloops<-InteractionSet::GInteractions( anchor1 =GenomicRanges::GRanges(df$chrom1, IRanges::IRanges(df$start1+1, df$end1)
@@ -886,8 +755,15 @@