Code for "NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models"
See requirements.txt
Use gradient-descent to identify target anchors.
cd autoprompt
python -m autoprompt.label_search --train data/yelp/create_anchor.tsv --template '[CLS] {sentence} Is it a positive semantic of this sentence? [T] [T] [T] [P]. [SEP]' --label-map '{"0": 0, "1": 1}' --iters 50 --model-name 'bert-base-uncased'
Generate a backdoor PLM on Yelp with bert-base-uncased.
CUDA_VISIBLE_DEVICES=0 python main.py --method prompt_learn \
--usage output/pre-train --trigger cf --pattern_id 4 --data_dir data/yelp \
--model_type bert --task_name yelp-polarity --model_name_or_path bert-base-uncased \
--do_train --train_examples 45000 --with_poison --poison_train_examples 5000 \
--pet_per_gpu_train_batch_size 8 --pet_num_train_epochs 4 --trigger_positions middle --learning_rate 5e-5
Retrain the backdoored PLM on SST-2 for downtream evaluation.
CUDA_VISIBLE_DEVICES=0 python eval.py
The code is modified based on PET(https://github.com/timoschick/pet) and Autoprompt(https://github.com/ucinlp/autoprompt).