This repository is the source code for "UNICORN: A Unified Backdoor Trigger Inversion Framework" (ICLR 2023).
The above figure is the visualization of the inverted triggers as well as the ground-truth triggers. This work formally defines and analyzes the triggers injected in different spaces and the inversion problem. Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis.
see requirements.txt
- For CIFAR-10, it will be downloaded automatedly.
- For ImageNet subset, please download the dataset via the link provided in https://github.com/yuezunli/ISSBA. Also, modify the line 74-75 of dataloader.py to match the corresponding directory.
Backdoored models can be generated by using the code in the following links:
- [BadNets] https://github.com/verazuo/badnets-pytorch
- [Blend] https://github.com/THUYimingLi/BackdoorBox/blob/main/core/attacks/Blended.py
- [WaNet] https://github.com/VinAIResearch/Warping-based_Backdoor_Attack-release
- [Filter] https://github.com/trojai
- [SIG] https://github.com/bboylyg/NAD
- [BppAttack] https://github.com/RU-System-Software-and-Security/BppAttack
[TODO] Providing the pretrained backdoored models.
python unicorn.py \
--dataset cifar10 --epoch 1000 --arch resnet18 \
--model_path </path_to_pth_file> \
--data_fraction 0.01 \
--bs 256 \
--all2one_target <target_label> \
--ssim_loss_bound 0.15