This is code for the differentially private clustering algorithm in the paper "Differentially Private Clustering in High-Dimensional Euclidean Spaces".
The code is written in Matlab. It is tested and Matlab 2017 but should also run on some earlier versions like 2016.
To get started, run demo.m. It will run the code on a simple synthetic dataset and on the MNIST data (the MNIST data will need to be downloaded and preprocessed first).
The main function is in clustering.m.
[ z_centers,clusters,u_centers,c_candidates,L_loss ] = clustering( x_data,n,d,k,epsilon,delta )
Input parameters:
- x_data: d*n data matrix
- n: integer, number of data points
- d: integer, dimension of each data point
- k: integer, number of desired clusters
- epsilon: positive real number, privacy parameter
- delta: real number in (0,1), failure probability. Not used in this implementation.
Output:
- z_centers: d*k matrix, set of k centers in R^d.
- clusters: k*1 cell, each containing a few integers. Set of points in each clusters.
- u_centers: p*k matrix, set of k centers in projected space R^p.
- c_candidates: matrix with p rows, set of candidate centers in R^p generated by the algorithm.
- L_loss: clustering loss by this set of centers.
Global Variables that need to be set before calling the main function:
- range: l_2 radius of the data space.
- side_length: more than 2*l_infinity norm of data space.
Note:
- Outputting z_centers, u_centers, c_candidates will all preserve epsilon differential privacy. However, outputting clusters, L_loss does NOT preserve privacy. They are included only for testing purposes.
- To use our private clustering algorithm, we need to know and set the global variables a priori.
For technical details and full experimental results, see the paper.
@inproceedings{pmlr-v70-balcan17a,
title = {Differentially Private Clustering in High-Dimensional {E}uclidean Spaces},
author = {Maria-Florina Balcan and Travis Dick and Yingyu Liang and Wenlong Mou and Hongyang Zhang},
booktitle = {Proceedings of the 34th International Conference on Machine Learning},
year = {2017}
}