This repository contains scripts to download and preprocess the General Language Analysis Datasets (GLAD) benchmark and codes for the task-agnostic SpanRel model, a multi-faceted NLP toolkit that can cover many different tasks.
Generalizing Natural Language Analysis through Span-relation Representations (ACL2020)
# install jsonnet from https://github.com/google/jsonnet
conda create -n spanrel python=3.6
conda activate spanrel
./setup.sh
8 datasets consisting of annotations of 10 tasks are included in this repository.
Dataset | Task | Task code | Dir |
---|---|---|---|
Wet Lab Protocols | NER | wlp | data/wlp |
RE | wlp | data/wlp | |
CoNLL-2003 | NER | ner | data/semeval_2014/ |
SemEval-2010 Task 8 | RE | rc | data/semeval_2010_task8/ |
OntoNotes 5.0 | Coref. | coref | data/conll_coref_2012/ |
SRL | srl | data/conll_srl_2012/ | |
POS | pos_conll | data/conll_pos_2012/ | |
Dep. | dp_conll | data/conll_dep_2012/ | |
Consti. | consti_conll | data/conll_consti_2012/ | |
Penn Treebank | POS | pos | data/ptb_pos/ |
Dep. | dp | data/ptb/ | |
Consti. | consti | data/ptb_consti/ | |
OIE2016 | OpenIE | oie | data/openie/ |
MPQA 3.0 | ORL | orl | data/mpqa/ |
SemEval-2014 Task 4 | ABSA | semeval14_st2 | data/semeval_2014/ |
Follow the instructions in run.sh
in each dataset directory to download and preprocess the datasets into BRAT format.
Run BERT-based models, where $emb
can be bert-base-uncased, bert-large-uncased, and $task
is one of the "task code" shown in the table.
./run_by_config_bert.sh $task $emb $output
Run GloVe/ELMo-based models, where $emb
can be glove
or elmo
, and $task
is one of the "task code" shown in the table.
./run_by_config.sh $task $emb $output
- Put the data in
data/kairos
withtrain
,dev
, andtest
sub-directory, each containing multiple.ann
and.txt
files. - Build vocabulary:
./run_by_config_bert.sh kairos bert-base-uncased output/kairos_vocab
. - Modify the
vocab
variable in thekairos
section ofrun_by_config_bert.sh
tooutput/kairos_vocab/vocabulary
. - Train and evaluate:
./run_by_config_bert.sh kairos bert-base-uncased output/kairos_log
.