This repository contains source code for the research work described in our EMNLP 2021 paper:
A Strong Baseline for Query Efficient Attacks in a Black Box Setting
The attack jointly leverages attention mechanism and Locality-sensitive hashing (LSH) for word ranking. It has been implemented in the Textattack framework so as to ensure consistent comparison with other attack methods.
-
Clone the repository using the
recursive
flag so as to set up the Textattack submodule.git clone --recursive https://github.com/RishabhMaheshwary/query-attack.git
-
Make sure
git lfs
is installed in your system. If not installed refer this. -
Run the below commands to download pre-trained attention models.
git install lfs
git lfs pull
-
It is recommended to create a new conda environment to install all dependencies.
cd Textattack pip install -e . pip install allennlp==2.1.0 allennlp-models==2.1.0 pip install tensorflow pip install numpy==1.18.5
-
To attack BERT model trained on IMDB using the WordNet search space use the following command:
textattack attack \ --recipe lsh-with-attention-wordnet \ --model bert-base-uncased-imdb \ --num-examples 500 \ --log-to-csv outputs/ \ --attention-model attention_models/yelp/han_model_yelp
Note: The attention model specified should be trained on a different dataset than that of the target model. This is because in the black box setting we do not have access to the training data of the target model.
-
To attack LSTM model trained on Yelp using the WordNet search space use the following command:
textattack attack \ --recipe lsh-with-attention-wordnet \ --model lstm-yelp \ --num-examples 500 \ --log-to-csv outputs/ \ --attention-model attention_models/imdb/han_model_imdb
-
To evaluate BERT model trained on MNLI using the HowNet search space use the following command:
textattack attack \ --recipe lsh-with-attention-hownet \ --model bert-base-uncased-mnli \ --num-examples 500 \ --log-to-csv outputs/ \ --attention-model mnli
The tables below shows what arguments to pass to --model
flag and --recipe
flag in the textattack command to attack BERT and LSTM models on IMDB, Yelp and MNLI datasets across various search spaces.
|
|
To run the baselines in the paper refer to the main Textattack repository.
-
pip install gensim==3.8.3 torch==1.7.1+cu101
-
The datasets used to train the attention model can be found here.
-
Unzip the dataets and specify the path of the dataset in the
create_input_files.py
file. -
The model then can be trained using the command below:
python create_input_files.py
python train.py
python eval.py
The implementation of the training attention models is borrowed from here.
- For NLI task the attention weights are computed using the pre-trained decomposable attention model from AllenNLP api.