GitHub - yu-iskw/bisecting-kmeans: An implementation of Bisecting KMeans Clustering which is a kind of Hierarchical Clustering algorithm on Spark

Bisecting K-Meams Clustering

This is a prototype implementation of Bisecting K-Means Clustering on Spark. Bisecting K-Means is like a combination of K-Means and hierarchical clustering.

Scala API

Those are the Scala APIs of Bisecting K-Means Clustering. BisectingKMeans is the class to train a BisectingKMeansModel. You could train a model with BisectingKMeans.train method. And the class has a few parameters.

setK: the number of clusters you want
setMaxIterations: the number of iterations at each step
setSeed: random seed

import org.apache.spark.mllib.bisectingkmeans.{BisectingKMeans, BisectingKMeansModel}
import org.apache.spark.mllib.linalg.{Vector, Vectors}

# Prepare for the input data
val localData = (1 to 100).toSeq.map { i =>
  val label = i % 5
  val vector = Vectors.dense(label, label, label)
  (label, vector)
}
val data = sc.parallelize(localData.map(_._2))

# Create an object for this algorithm
val algo = new BisectingKMeans()
  .setK(5)
  .setMaxIterations(20)
  .setSeed(1)

# Train a model
val model = algo.run(data)

# Get trained centers
val centers: Array[Vector] = model.getCenters

# Computes Within Set Sum of Squared Error(WSSSE)
val cost: Double = model.WSSSE(data)

# Convert a cluster tree into an adjacency list
val list: Array[(Int, Int, Double)] = model.toAdjacencyList

# Convert a cluster tree into a linkage matrix
val matrix: Array[(Int, Int, Double, Int)] = model.toLinkageMatrix

Reference

"A comparison of document clustering techniques", M. Steinbach, G. Karypis and V. Kumar. Workshop on Text Mining, KDD, 2000. pdf

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
build		build
project		project
src		src
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bisecting K-Meams Clustering

Scala API

Reference

About

Releases

Packages

Languages

License

yu-iskw/bisecting-kmeans

Folders and files

Latest commit

History

Repository files navigation

Bisecting K-Meams Clustering

Scala API

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages