This repository contains the code of our paper Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords (EMNLP 2021).
The dataset can be found here. It is devided to 3 portions (as we describe in our paper).
To get the pseudoword vectors, run the code --> get_pseudowords.py using the data (queries) we provide here, or data of the same format.
Please cite our paper if you found the resources in this repository useful.
inproceedings{karidi2021putting,
title = "Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords,
author = "Taelin Karidi and Yichu Zhou and Nathan Schneider and Omri Abend and Vivek Srikumar",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = oct,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2109.11491",
}