Skip to content

Differentially Private Clustering in High-Dimensional Euclidean Spaces

License

Notifications You must be signed in to change notification settings

mouwenlong/dp-clustering-icml17

Repository files navigation

Differentially Private Clustering in High-Dimensional Euclidean Spaces

This is code for the differentially private clustering algorithm in the paper "Differentially Private Clustering in High-Dimensional Euclidean Spaces".

The code is written in Matlab. It is tested and Matlab 2017 but should also run on some earlier versions like 2016.

Get started

To get started, run demo.m. It will run the code on a simple synthetic dataset and on the MNIST data (the MNIST data will need to be downloaded and preprocessed first).

Usage

The main function is in clustering.m.

[ z_centers,clusters,u_centers,c_candidates,L_loss ] = clustering( x_data,n,d,k,epsilon,delta )

Input parameters:

  • x_data: d*n data matrix
  • n: integer, number of data points
  • d: integer, dimension of each data point
  • k: integer, number of desired clusters
  • epsilon: positive real number, privacy parameter
  • delta: real number in (0,1), failure probability. Not used in this implementation.

Output:

  • z_centers: d*k matrix, set of k centers in R^d.
  • clusters: k*1 cell, each containing a few integers. Set of points in each clusters.
  • u_centers: p*k matrix, set of k centers in projected space R^p.
  • c_candidates: matrix with p rows, set of candidate centers in R^p generated by the algorithm.
  • L_loss: clustering loss by this set of centers.

Global Variables that need to be set before calling the main function:

  • range: l_2 radius of the data space.
  • side_length: more than 2*l_infinity norm of data space.

Note:

  • Outputting z_centers, u_centers, c_candidates will all preserve epsilon differential privacy. However, outputting clusters, L_loss does NOT preserve privacy. They are included only for testing purposes.
  • To use our private clustering algorithm, we need to know and set the global variables a priori.

References

For technical details and full experimental results, see the paper.

@inproceedings{pmlr-v70-balcan17a,
  title = 	 {Differentially Private Clustering in High-Dimensional {E}uclidean Spaces},
  author = 	 {Maria-Florina Balcan and Travis Dick and Yingyu Liang and Wenlong Mou and Hongyang Zhang},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  year = 	 {2017}
}

About

Differentially Private Clustering in High-Dimensional Euclidean Spaces

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages