Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
This repository is for the paper
Morris Alper, Michael Fiman, & Hadar Averbuch-Elor (2023). Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (arXiv link)
For more information, see our project page at https://isbertblind.github.io.
This repository allows can be used for the following:
- Apply Stroop Probing on a given sentence with different options
- Compare MLM and Stroop Probing for different types of models on various tasks
Build the stroop-probing package locally by running the following commands:
# clone into this repo
git clone https://github.com/TAU-VAILab/isbertblind.git
# install the Stroop Probing package locally
pip install -e ./isbertblind
An example for usage of Stroop Probing:
COLORS = ['red', 'orange', 'yellow', 'green', 'blue', 'black', 'white', 'grey', 'brown']
SENTENCE = 'A MASK colored banana'
from probing import CLIPStroopProbe
clip_sp = CLIPStroopProbe('openai/clip-vit-base-patch32')
scores = clip_sp.score_from_options(SENTENCE, COLORS, as_dict=True)
# scores = 'red': 0.87503874, 'orange': 0.8977335, 'yellow': 0.94582725, 'green': 0.8791876, 'blue': 0.8688055, 'black': 0.8739696, 'white': 0.8991788, 'grey': 0.880877, 'brown': 0.89924145}
# 'blue': 0.79435384, 'black': 0.79852706, 'white': 0.83922243, 'grey': 0.81859416, 'brown': 0.8265251}
print(f"{SENTENCE.replace('MASK', max(scores, key=scores.get))}")
# A yellow colored banana
The following models are supported:
- CLIP:
CLIPStroopProbe
- uses huggingface checkpoints supportingCLIPModel.from_pretrained()
- FLAVA:
FLAVAStroopProbe
- uses huggingface checkpoints supportingFlavaModel.from_pretrained()
- TEXT:
TextStroopProbe
- uses huggingface checkpoints supportingAutoModel.from_pretrained()
which use a pooler output layer
# clone into this repo
git clone https://github.com/TAU-VAILab/isbertblind.git
# install required packages for using this repo
pip install -r requirements.txt
This repository currently supports two types of tasks:
This type of task is used to test association between a given set of objects and a given set of words. For example, this type of task can be used for color or shape association prediction.
Use the config file define experiment setup parameters, set of prompts to test and list of models to use. An example for a task config can be found in the ./configs/shapes.json
file.
This sort of task requires a csv file with the following columns: ["word","gt","options"]
. An example of a dataset for using this type of task, please see the datasets/shape_association.csv
file.
This type of task is used to solve cloze tasks. For example, this kind of task can be found in the Children’s Book Test (CBT) cloze dataset.
Use the config file define experiment setup parameters, set of words to use as PAD options, and list of models to use. An example for a task config can be found in the ./configs/cbt_v_sample.json
file.
This sort of task requires a csv file with the following columns: ["sentence","gt","options"]
. An example of a dataset for using this type of task, please see the datasets/cbt_v_sample.csv
file.
Note that since the Children’s Book Test (CBT) Dataset is not ours, we only show a sample of a few examples in this repository.
To run the script defined by the config file, run the following command:
python run_on_dataset.py path_to_config.json
Output per model and prompt and a summary of the experiment will be written to the output folder defined in the config file setup.
The ShapeIt dataset of shape associations introduced by our paper is available at Kaggle. The other datasets used in our paper are publicly available and can be accessed at their respective project pages.
Other datasets used for the different VLU and NLU tasks which were used in our paper can be found in the following links:
If you find this code or our data helpful in your research or work, please cite the following paper.
@InProceedings{alper2023:is-bert-blind,
author = {Morris Alper and Michael Fiman and Hadar Averbuch-Elor},
title = {Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023}
}