Code for the paper: Unmasking the Mask -- Evaluating Social Biases in Masked Language Models. If you use any part of this work, please cite the following citation:
@InProceedings{Kaneko:AUL:2022,
author={Masahiro Kaneko and Danushka Bollegala},
title={Unmasking the Mask -- Evaluating Social Biases in Masked Language Models},
booktitle = {Proceedings of the 36th AAAI Conference on Artificial Intelligence},
year = {2022},
month = {February},
address = {Vancouver, BC, Canada}
}
You can install all required packages with following command.
pip install -r requirements.txt
You can downlaod CrowS-Pairs (CP) and StereoSet (SS) datasets and preprocess them with following commands.
mkdir -p data
wget -O data/cp.csv https://raw.githubusercontent.com/nyu-mll/crows-pairs/master/data/crows_pairs_anonymized.csv
wget -O data/ss.json https://raw.githubusercontent.com/moinnadeem/StereoSet/master/data/dev.json
python -u preprocess.py --input crows_pairs --output data/paralled_cp.json
python -u preprocess.py --input stereoset --output data/paralled_ss.json
You can evaluate MLMs (BERT, RoBERTa and ALBERT) on AULA, AUL, CP score(CPS) and SS score(SSS)-intrasentence on CP and SS datasets with following command. You also can specify pre-trained MLM path using --model
.
python evaluate.py --data [cp, ss] --output /Your/output/path --model [bert, roberta, albert] --method [aula, aul, cps, sss]
See the LICENSE file