This is the repository for the ACL 2022 paper "Probing as Quantifying Inductive Bias".
The following instructions are enough to setup the code and download and process all the data.
- (Optional) Create environment
conda env create -f environment.yml
and activate it withconda activate bayesian-probing
. - Run
pip install requirements.txt
- Run
make install
- Install the appropriate pytorch_scatter for your CUDA version.
- Run
python -m spacy download en_core_web_sm
to setup Spacy - Download and process all the required data with
make data
. If you are running OS X, you might be asked to install some additional tools.
If you encounter any problems in final step, verify that you activated the environment and installed all dependencies.
The data for this paper comes from a few different sources, but the Make should pull everything together for you if you setup everything correctly. If not, open an issue.
Below are some details on how the data for each sub-experiment is generated.
For token-level tasks, we use the procedure from Intrinsic Probing, except that we run it on the UD 2.5 treebanks (like in Pareto Probing).
You can obtain it with make data_token
.
For the arc tasks, we use the Pareto probing data as-is. You can download everything and process it with make data_arc
.
For the sentence-level tasks, we use the MultiNLI dataset and some SuperGLUE tasks.
In general, we obtain representations for every token and then average them to obtain a sentence-level representation.
That said, there are some task-specific variations.
You can download and prepare this data with make data_sentence
.
You should be able to replicate our experiments with the following command:
python -u run.py --task-type ${TASK} --language ${LANGUAGE} --attribute ${PROP} --contextualizer ${REP} --seed 2 --gpu \
--trainer-num-epochs 500 --step_size 1e-2 --trainer-batch-size 512 marglik --depths 0 1 2 --widths 100 --posterior-structure ${STRUCT}
where:
TASK
: This is one oftoken
(for all the morphosyntactic token-level tasks),arc
(for the arc-level dependency task),sentence
(this is the NLI task),boolq
,cb
,copa
,rte
PROP
: This is dependent on the setting ofTASK
. In short:- If
TASK
istoken
, then this can be eitherCase
,Number
,Tense
orPOS
- If
TASK
isarc
, then this must be set todep
- If
TASK
issentence
, this this must be set tonli
- If
TASK
is anything else, then this must be set to the same value asTASK
- If
LANGUAGE
: This is the language to probe. Again, this is task-specific, since not all tasks are available for all languages:- If
TASK
is set totoken
orarc
, then this can be set to either:eng
,tur
,ara
,mar
,zho
,deu
- If
TASK
is set to anything else, then this must be set toeng
- If
REP
: This is the representation whose inductive bias is being measured. This is task-specific.- If
TASK
is set totoken
orarc
, then this can be set to either:random_context
(fully random),random_word
(per-word random),fasttext
(language-specific fastText),bert
(multilingual BERT) - If
TASK
is set to anything else, then this must be set torandom_context
(fully random),fasttext
,bert
(English BERT),roberta
,xlnet
,albert
,t5
- If
STRUCT
controls how the Laplace approximation is built, and must be set to eitherkron
ordiag
.
Some other handy flags are --quiet
(to suppress most logging), and --output-file
(to specify where experiment data should be saved).
@inproceedings{immer-etal-2022-probing,
title = "Probing as Quantifying Inductive Bias",
author = "Immer, Alexander and
Torroba Hennigen, Lucas and
Fortuin, Vincent and
Cotterell, Ryan",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.129",
pages = "1839--1851",
}