This repository contains the code to run the experiments present in this paper: A Fusion-Denoising Attack on InstaHide with Data Augmentation [1].
Running the .py
files one by one should be enough to recover the private images from the encrypted images generated by InstaHide [3] with data augmentation.
Files 1-4 are the preprocessing steps, in which files 1-2 are for generating training datasets, and files 3-4 are for training attack models (i.e., the Comparative Network and Fusion-Denoising Network). Files 5-6 contain the attacking algorithms, whose inputs comprise two files: encryption.npy
including all encrypted images, and label.npy
including the corresponding labels of encrypted images. The file formats are consistent with the InstaHide Challenge dataset.
We provide some toy datasets in ToyData/
to help run files 1-6 smoothly, and the images recovered from the toy datasets are expected to be meaningless. The four datasets used in the paper (CIFAR-10, CIFAR-100, STL10 and CelebFaces ) are all public and not provided in this repository because the sizes of these data files are relatively large. The interested readers may consider downloading these public datasets and training the attack models from scratch.
Install the python library requirements in a virtual environment using:
pip install -r requirements.txt
If we forgot something, please email the first author.
If you use our results or this codebase in your research, then please cite this paper:
@inproceedings{luo2022fusion,
title={A Fusion-Denoising Attack on InstaHide with Data Augmentation},
author={Luo, Xinjian and Xiao, Xiaokui and Wu, Yuncheng and Liu, Juncheng and Ooi, Beng Chin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={2},
pages={1899--1907},
publisher = {{AAAI} Press},
year={2022}
}
The clustering step on the encrypted images is inspired by [2]. The reader can also refer to the repository for a different implementation of the clustering algorithm.
[1] A Fusion-Denoising Attack on InstaHide with Data Augmentation, Xinjian Luo, Xiaokui Xiao, Yuncheng Wu, Juncheng Liu, Beng Chin Ooi, AAAI 2022.
[2] Is Private Learning Possible with Instance Encoding?, Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, Florian Tramer, SP 2021.
[3] InstaHide: Instance-hiding Schemes for Private Distributed Learning, Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora, ICML 2020.