Provably Accurate Federated Clustering with Unlearning Mechanism

An efficient method for federated (K-means) clustering and its corresponding unlearning procedure, which is introduced in our paper:

[ICLR 2023] Machine Unlearning of Federated Clusters

Datasets

Celltype, Gaussian, Postures, Covtype can be downloaded from Google Drive provided by the authors of DC-Kmeans. FEMNIST can be downloaded from the Leaf Project. TCGA and TMI may contain potentially sensitive biological data and can be downloaded after logging into the databases (TCGA, TMI). We can provide the data processing pipelines upon reasonable requests via emails.

We also provide a utility function generate_data in utils.py to generate the data for clients in federated setting, where data_input is the raw global feature matrix. Please refer to the function for more details. One example of the Celltype dataset after data generation is included in this repository.

Usage

Two other methods, DC-Kmeans and K-FED, are also implemented in this repository for comparison.

To run the methods on the example dataset, you can use the following command

python mufc_main.py --num_clusters=4 --num_clients=100 --data_path=celltype_processed.pkl --num_removes=10 \
                    --k_prime=4  --split=non-iid  --compare_kfed --compare_dc --client_kpp_only --verbose --update_centralized_loss

or simply run the shell script

chmod +x run.sh
./run.sh

Contact

Please contact Chao Pan ([email protected]), Jin Sima ([email protected]), Saurav Prakash ([email protected]) if you have any question.

Citation

If you find our code or work useful, please consider citing our paper:

@inproceedings{
pan2023machine,
title={Machine Unlearning of Federated Clusters},
author={Chao Pan and Jin Sima and Saurav Prakash and Vishal Rana and Olgica Milenkovic},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=VzwfoFyYDga}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
celltype_processed.pkl		celltype_processed.pkl
dc_kmeans.py		dc_kmeans.py
kfed.py		kfed.py
model.py		model.py
mufc_main.py		mufc_main.py
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Provably Accurate Federated Clustering with Unlearning Mechanism

Datasets

Usage

Contact

Citation

About

Releases

Languages

License

thupchnsky/mufc

Folders and files

Latest commit

History

Repository files navigation

Provably Accurate Federated Clustering with Unlearning Mechanism

Datasets

Usage

Contact

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages