Skip to content

Commit

Permalink
Merge pull request #32 from davidavdav/master
Browse files Browse the repository at this point in the history
Inclusion of Hclust into Clustering.jl
  • Loading branch information
johnmyleswhite committed Jun 29, 2015
2 parents 3c6b280 + 1595732 commit 7e6ae49
Show file tree
Hide file tree
Showing 7 changed files with 504 additions and 3 deletions.
56 changes: 56 additions & 0 deletions doc/source/hclust.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# HClust

Hierarchical Clustering for Julia, similar to R's `hclust()`

Status
======

The package is currently work-in-progress. Clustering involves doing a lot of admin, and it is easy to make an error. I've tested the results for medium sized clusters (up to 250---5000) elements, for the following methods:

| method | validated at matrix size | time | validated |
|-------------|------------------------|------|-----------|
| `:single` | 5000 | 1.3 | OK
| `:complete` | 2500 | 4.5 | OK
| `:average` | 2500 | 4.5 | OK

Usage
=====

```julia
d = rand(1000,1000)
d += d' ## make sure distance matrix d is symmetric (this is optional)
h = hclust(d, :single)
```

hclust()
------

```julia
hclust(distance::Matrix, method::Symbol)
```

Performs hierarchical clustering for distance matrix `d` (which is forced to be symmetric), using one of three methods:
- `:single`: cluster distance is equal to the minimum distance between any of the members
- `:average`: cluster distance is equal to the mean distance between any of the cluster's members
- `:complete`: cluster distance is equal to the maximum distance between any of the members.

The output of `hclust()` is an object of type `Hclust` with the fields

- `merge` the clusters merged in order. Leafs are indicated by negative numbers
- `height` the distance at which the merges take place
- `order` a preferred grouping for drawing a dendogram. Not implemented, always `[1:n]`.
- `labels` labels of the clusters. Not implemented, now always `[1:n]`
- `method` the name of the clustering method.

cutree()
--------

```julia
cutree(cl:Hclust; h, k)
```

Cuts the cluster tree at height `h` or amounting to `k` clusters.

The output is a vector of indices. The `n`th element in this vector indicates the cluster that this data point belongs to.


7 changes: 6 additions & 1 deletion src/Clustering.jl
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,10 @@ module Clustering
silhouettes,

# varinfo
varinfo
varinfo,

# hclust
Hclust, hclust, cutree


## source files
Expand All @@ -58,5 +61,7 @@ module Clustering
include("silhouette.jl")
include("varinfo.jl")

include("hclust.jl")

include("deprecate.jl")
end
Loading

0 comments on commit 7e6ae49

Please sign in to comment.