The Dynamic Means algorithm is a k-means-like algorithm for clustering large or temporally evolving datasets. It operates batch-sequentially, i.e. it processes "windows" of data rather than processing an entire dataset at once. This allows it to capture clusters that change over time (via (a) motion, (b) creation, and (c) deletion), or to cluster large datasets by considering small chunks of data at a time. Spectral Dynamic Means is an extension to Dynamic Means that captures cluster motion, creation and deletion in general similarity graphs -- this allows it to capture more general data types and cluster shapes, at the expense of increased computational cost. Kernelized Dynamic Means is another extension which also operates on similarity graphs, but does not require the computationally expensive step of computing eigenvectors that the spectral code performs. Efficient C++ implementations of all algorithms are provided in this repository.
-
Clone this repository:
git clone https://github.com/trevorcampbell/dynamic-means.git
-
Navigate to the directory and run the install script:
sudo ./install
-
In your code, include one or more of the following headers,
#include <dynmeans/dynmeans.hpp> #include <dynmeans/specdynmeans.hpp> #include <dynmeans/kerndynmeans.hpp>
depending on whether you want to use regular Dynamic Means, Spectral Dynamic Means, or Kernelized Dynamic Means. Make sure Gurobi is installed (free for academic use) if you want to use Spectral/Kernel Dynamic Means. If you can't get access to Gurobi, feel free to modify the
SpecDynMeans::getOldNewMatching
function insrc/specdynmeans_impl.hpp
and theKernDynMeans::getMinWtMatching
function insrc/kerndynmeans_impl.hpp
to use a different LP solver. -
Create a DynMeans, KernDynMeans, and/or SpecDynMeans object:
double lambda = .05; double T_Q = 6.8; double K_tau = 1.01; double Q = lambda/T_Q; double tau = (T_Q*(K_tau - 1.0)+1.0)/(T_Q-1.0); DynMeans<Eigen::Vector2d> dynm(lambda, Q, tau); SpecDynMeans<YourAffType> sdynm(lambda, Q, tau); KernDynMeans<YourAffType> kdynm(lambda, Q, tau);
where Eigen::Vector2d is the vector data type that Dynamic Means is going to cluster. Note that other types can be used in place of Eigen::Vector2d, but they must implement vector addition, and scalar multiplcation/division. For Spectral/Kernel Dynamic Means, YourAffType is a wrapper you must write to abstract the computation of node->node and node->cluster affinities. To find out which functions YourAffType must implement, see the example in examples/mainkdm.cpp. See the Dynamic Means paper for a description of the values
lambda, T_Q, K_tau, Q, tau
. -
To cluster the first window of data with Dynamic Means, just call the
DynMeans::cluster
functionvector<Eigen::Vector2d> dataWindow1; ... int nRestarts = 10; vector<Eigen::Vector2d> learnedParams1; vector<int> learnedLabels1; double obj1, tTaken1; dynm.cluster(dataWindow1, nRestarts, learnedLabels1, learnedParams1, obj1, tTaken1);
where
obj1
is the clustering cost output,tTaken1
is the clustering time output,learnedLabels1
is the data labels output, andlearnedParams1
is the cluster parameters output.nRestarts
is the number of random label assignment orders Dynamic Means will try.To cluster the first window of data with Spectral/Kernel Dynamic Means, just call the
SpecDynMeans::cluster
/KernDynMeans::cluster
functionvector<Eigen::Vector2d> dataWindow1; int nRestarts = 10; int nCoarsest = 20; int nClusMax = 5; ... YourAffinityWrapper aff; aff.yourDataUpdateFunction(dataWindow1); vector<int> learnedLabels1, prmlbls1; vector<double> gammas; double obj1, tTaken1; //for spectral dynamic means, call sdynm.cluster sdynm.cluster(affinities, nRestarts, nClusMax, SpecDynMeans::EigenSolverType::REDSVD learnedLabels1, obj1, gammas1, prmlbls1, tTaken1); //for kernel dynamic means, call kdynm.cluster kdynm.cluster(affinities, nRestarts, nCoarsest, learnedLabels1, obj1, gammas1, prmlbls1, tTaken1); //when done, make sure to update the affinities wrapper with the new clustering affinities.yourUpdateFunction(learnedLabels, gammas, prmlbls);
where
obj1
is the clustering cost output,tTaken1
is the clustering time output,learnedLabels1
is the data labels output,gammas1
is the old cluster weights output, andprmlbls1
is the old parameter labels output.nRestarts
is the number of random orthogonal matrix initializations Spectral Dynamic Means will try, andnClusMax
is (intuitively) the maximum number of new clusters expected in each timestep (mathematically, it is the rank approximation to use when doing eigendecompositions).EigenSolverType::REDSVD
tells the algorithm to use an approximate eigendecomposition (adapted from redsvd). To reduce sensitivity to initialization, Kernel Dynamic Means iteratively coarsifies the graph until it hasnCoarsest
nodes, applies a spectral clustering to that coarse graph, and then iteratively refines the graph and applies k-means-like labelling updates. PicknCoarsest
to be a number that is small enough such that computing the eigenvectors of annCoarsest
bynCoarsest
matrix is fast enough for your application. -
To cluster another window of data, just call
DynMeans::cluster
/SpecDynMeans::cluster
/KernDynMeans::Cluster
againvector<Eigen::Vector2d> dataWindow2; ... vector<Eigen::Vector2d> learnedParams2; vector<int> learnedLabels2; double obj2, tTaken2; dynm.cluster(dataWindow2, nRestarts, learnedLabels2, learnedParams2, obj2, tTaken2);
and make sure to use
YourAffinityWrapper::yourUpdateFunction
afterwards to compute the new set of "old parameter nodes" to prepare Spectral/Kernel Dynamic Means for the next clustering step -
Repeat step 6 as many times as required (e.g., split a dataset of 1,000,000 datapoints into chunks of 1,000 and cluster each sequentially)
To run the example, first make sure liblpsolve is installed (required for label accuracy computations):
sudo apt-get install liblpsolve55-dev
make sure liblpsolve55.so (not liblpsolve55.a) is located in your /usr/lib/ folder when the installation is complete, and if it is not, move it to that location. If you compile against the static library (.a), you will get linker errors about undefined references to dlclose, dlopen, colamd_*, etc.
Navigate to the examples
folder to compile and run the example for Dynamic Means:
make config=release DynMeansExample
./DynMeansExample
For Spectral Dynamic Means, run
make config=release SpecDynMeansExample
./SpecDynMeansExample
For Kernel Dynamic Means, run
make config=release KernDynMeansExample
./KernDynMeansExample
If you want to change how the example compiles, a premake Makefile generation script is included.
If you use Dynamic Means/Spectral Dynamic Means/Kernel Dynamic Means for a paper or project, please use the following BibTeX entry for citation:
@inproceedings{Campbell13_NIPS, Author = {Trevor Campbell and Miao Liu and Brian Kulis and Jonathan P.~How and Lawrence Carin}, Title = {Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture}, Year = {2013}, Booktitle = {Advances in Neural Information Processing Systems 26}}