The unprecedented success of deep neural networks in many applications has made these networks a prime target for adversarial exploitation. In this paper, we introduce a benchmark technique for detecting backdoor attacks (aka Trojan attacks) on deep convolutional neural networks (CNNs). We introduce the concept of Universal Litmus Patterns (ULPs), which enable one to reveal backdoor attacks by feeding these universal patterns to the network and analyzing the output (i.e., classifying the network as ‘clean’ or ‘corrupted’). This detection is fast because it requires only a few forward passes through a CNN. We demonstrate the effectiveness of ULPs for detecting backdoor attacks on thousands of networks with different architectures trained on four benchmark datasets, namely the German Traffic Sign Recognition Benchmark (GTSRB), MNIST, CIFAR10, and Tiny-ImageNet.
https://arxiv.org/abs/1906.10842
The code was tested using pytorch 1.4.0, python 3.7.
To generate the poisoned data to be used in the experiments run
python generate_poison.py
This script adds the triggers from ./Data/Masks to the images to generate poisoned data. Please generate one set of images for each poisoned model you want to train. We use poisoned models for 10 triggers for training ULPs and poisoned models for the other 10 to test. This ensures that the train and test poisoned models use a different set of triggers.
We have made the poisoned data generated for our paper available along with the models.
We use a modified VGG architecture for our experiments. To train clean models use
python train_clean_model.py <partition-num> <logfile>
To train poisoned models use
python train_poisoned_model.py <partition-num> <logfile>
For training ULPs: Train 500 clean models and 500 poisoned models. For evaluating ULPs: Train 100 clean models and 100 poisoned models.
Currently each partition trains 100 models. Modify this according to your needs if you have multiple GPUs to train in parallel.
To save time, you can also use our trained models available here:
- extract clean_models_trainval.zip and save in ./clean_models/trainval
- extract poisoned_models_trainval.zip and save in ./poisoned_models/trainval
- extract clean_models_test.zip and save in ./clean_models/test
- extract poisoned_models_test.zip and save in ./poisoned_models/test
Once the models are generated, run
python train_ULP.py <num_ULPs> <logfile>
Provide appropriate number of ULPs. We run experiments for 1, 5 and 10 patterns. This will save the results, i.e ULPs and our classifier in ./results
To evaluate ULPs run
python evaluate_ULP.py
To evaluate Noise patterns run
python evaluate_noise.py
python plot_ROC_curves.py
Download data from the Tiny ImageNet Visual Recognition Challenge Please replace all occurrences of with the appropriate path.
The organization of Tiny ImageNet differs from standard ImageNet. This scripts cleans the data.
python data_cleaning.py
To generate the poisoned data to be used in the experiments run
python convert_data.py
python generate_poison.py
The first script converts the images into a numpy array and stores them in ./data for faster generation of poisons. The second script adds the triggers from ./triggers to the images to generate poisoned data. Please generate one set of images for each poisoned model you want to train. We use poisoned models for Triggers 01-10 for training ULPs and poisoned models for Triggers 11-20 to test. This ensures that the train and test poisoned models use a different set of triggers.
We have made the poisoned data generated for our paper available along with the models.
We use a modified Resnet architecture for our experiments. To train clean models use
python train_clean_model.py <partition-num> <logfile>
To train poisoned models use
python train_poisoned_model.py <partition-num> <logfile>
For training ULPs: Train 1000 clean models and 1000 poisoned models on triggers 01-10. For testing ULPs: Train 100 clean models and 100 poisoned models on triggers 11-20.
Currently each partition trains 50 models. Modify this according to your needs if you have multiple GPUs to train in parallel.
To save time, you can also use our trained models available here:
- extract Clean models train and save in ./clean_models/train
- extract Poisoned models train and save in ./poisoned_models/Triggers_01_10
- extract Clean models val and save in ./clean_models/val
- extract Poisoned models val and save in ./poisoned_models/Triggers_11_20
Once the models are generated, run
python train_ULP.py <num_ULPs> <logfile>
Provide appropriate number of ULPs. We run experiments for 1, 5 and 10 patterns. This will save the results, i.e ULPs and our classifier in ./results
To evaluate ULPs run
python evaluate_ULP.py
To evaluate Noise patterns run
python evaluate_noise.py
python plot_ROC_curves.py
If you find our paper, code or models useful, please cite us using
@inproceedings{kolouri2020universal,
title={Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs},
author={Kolouri, Soheil and Saha, Aniruddha and Pirsiavash, Hamed and Hoffmann, Heiko},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={301--310},
year={2020}
}
This work was performed under the following financial assistance award: 60NANB18D279 from U.S. Department of Commerce, National Institute of Standards and Technology, funding from SAP SE, and also NSF grant 1845216.
Please create an issue on the Github Repo directly or contact [email protected] for any questions about the code.