An efficient method for federated (K-means) clustering and its corresponding unlearning procedure, which is introduced in our paper:
- [ICLR 2023] Machine Unlearning of Federated Clusters
Celltype
, Gaussian
, Postures
, Covtype
can be downloaded from Google Drive provided by the authors of DC-Kmeans. FEMNIST
can be downloaded from the Leaf Project. TCGA
and TMI
may contain potentially sensitive biological data and can be downloaded after logging into the databases (TCGA, TMI). We can provide the data processing pipelines upon reasonable requests via emails.
We also provide a utility function generate_data
in utils.py
to generate the data for clients in federated setting, where data_input
is the raw global feature matrix. Please refer to the function for more details. One example of the Celltype
dataset after data generation is included in this repository.
Two other methods, DC-Kmeans and K-FED, are also implemented in this repository for comparison.
To run the methods on the example dataset, you can use the following command
python mufc_main.py --num_clusters=4 --num_clients=100 --data_path=celltype_processed.pkl --num_removes=10 \
--k_prime=4 --split=non-iid --compare_kfed --compare_dc --client_kpp_only --verbose --update_centralized_loss
or simply run the shell script
chmod +x run.sh
./run.sh
Please contact Chao Pan ([email protected]), Jin Sima ([email protected]), Saurav Prakash ([email protected]) if you have any question.
If you find our code or work useful, please consider citing our paper:
@inproceedings{
pan2023machine,
title={Machine Unlearning of Federated Clusters},
author={Chao Pan and Jin Sima and Saurav Prakash and Vishal Rana and Olgica Milenkovic},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=VzwfoFyYDga}
}