-
Notifications
You must be signed in to change notification settings - Fork 8
/
README.Rmd
94 lines (67 loc) · 3.37 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
output:
github_document: default
html_document:
self_contained: no
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# adjclust: Adjacency-constrained clustering
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/adjclust)](https://cran.r-project.org/package=adjclust)
[![R build status](https://github.com/pneuvial/adjclust/workflows/R-CMD-check/badge.svg)](https://github.com/pneuvial/adjclust/actions)
[![Coverage Status](https://img.shields.io/codecov/c/github/pneuvial/adjclust/develop.svg)](https://app.codecov.io/github/pneuvial/adjclust/branch/develop)
`adjclust` is a package that provides methods to perform adjacency-constrained hierarchical agglomerative clustering. Adjacency-constrained hierarchical agglomerative clustering is hierarchical agglomerative clustering (HAC) in which each observation is associated to a position, and the clustering is constrained so as only adjacent clusters are merged. It is useful in bioinformatics (e.g. Genome Wide Association Studies or Hi-C data analysis).
`adjclust` provides three user level functions: `adjClust`, `snpClust` and `hicClust`, which are briefly explained below.
## Installation
You can install adjclust from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("pneuvial/adjclust")
```
## `adjClust`
`adjClust` performs adjacency-constrained HAC for standard and sparse, similarity and dissimilarity matrices and `dist` objects. `Matrix::dgCMatrix` and `Matrix::dsCMatrix` are the supported sparse matrix classes. Let's look at a basic example
```{r adjClust}
library("adjclust")
sim <- matrix(c(1.0, 0.5, 0.2, 0.1,
0.5, 1.0, 0.1, 0.2,
0.2, 0.1, 1.0, 0.6,
0.1, 0.2 ,0.6 ,1.0), nrow=4)
h <- 3
fit <- adjClust(sim, "similarity", h)
plot(fit)
```
The result is of class `chac`. It can be plotted as a dendrogram (as shown above). Successive merge and heights of clustering can be obtained by `fit$merge` and `fit$height` respectively.
## `snpClust`
`snpClust` performs adjacency-constrained HAC for specific application of Genome Wide Association Studies (GWAS). A minimal example is given below. See [GWAS Vignette](vignettes/snpClust.Rmd) for details.
```{r snpClust}
library("snpStats")
data("ld.example", package = "snpStats")
geno <- ceph.1mb[, -316] ## drop one SNP leading to one missing LD value
h <- 100
ld.ceph <- ld(geno, stats = "R.squared", depth = h)
image(ld.ceph, lwd = 0)
fit <- snpClust(geno, stats = "R.squared", h = h)
plot(fit)
sel_clust <- select(fit, "bs")
plotSim(as.matrix(ld.ceph), clustering = sel_clust, dendro = fit)
```
## `hicClust`
`hicClust` performs adjacency-constrained HAC for specific application of Hi-C data analysis. A minimal example is given below. See [Hi-C Vignette](vignettes/hicClust.Rmd) for details.
```{r HiTC, message=FALSE}
library("HiTC")
```
```{r hicClust}
load(system.file("extdata", "hic_imr90_40_XX.rda", package = "adjclust"))
binned <- binningC(hic_imr90_40_XX, binsize = 5e5)
mapC(binned)
fitB <- hicClust(binned)
plot(fitB)
plotSim(intdata(binned), dendro = fitB) # default: log scale for colors
```
## Credits
Version 0.4.0 of this package was completed by Shubham Chaturvedi as a part of the [Google Summer of Code 2017](https://summerofcode.withgoogle.com/projects/#4961904920363008) program.