This is a prototype implementation of Bisecting K-Means Clustering on Spark. Bisecting K-Means is like a combination of K-Means and hierarchical clustering.
Those are the Scala APIs of Bisecting K-Means Clustering.
is the class to train a BisectingKMeansModel
You could train a model with BisectingKMeans.train
And the class has a few parameters.
: the number of clusters you wantsetMaxIterations
: the number of iterations at each stepsetSeed
: random seed
import org.apache.spark.mllib.bisectingkmeans.{BisectingKMeans, BisectingKMeansModel}
import org.apache.spark.mllib.linalg.{Vector, Vectors}
# Prepare for the input data
val localData = (1 to 100) { i =>
val label = i % 5
val vector = Vectors.dense(label, label, label)
(label, vector)
val data = sc.parallelize(
# Create an object for this algorithm
val algo = new BisectingKMeans()
# Train a model
val model =
# Get trained centers
val centers: Array[Vector] = model.getCenters
# Computes Within Set Sum of Squared Error(WSSSE)
val cost: Double = model.WSSSE(data)
# Convert a cluster tree into an adjacency list
val list: Array[(Int, Int, Double)] = model.toAdjacencyList
# Convert a cluster tree into a linkage matrix
val matrix: Array[(Int, Int, Double, Int)] = model.toLinkageMatrix
