Skip to content

Latest commit

 

History

History
38 lines (21 loc) · 1.91 KB

README.md

File metadata and controls

38 lines (21 loc) · 1.91 KB

e2e-entity-typing

End-to-end entity typing using Pytorch and BERT.

Setting up

First place the BERT model folder into bert/. To run the end-to-end model, navigate to the bert folder in a terminal start Bert as a Service via the command:

bert-serving-start -model_dir cased_L-12_H-768_A-12 -num_worker=1 -max_seq_len=100 -pooling_strategy=NONE

To run the mention-level model, start Bert as a Service via the command:

bert-serving-start -model_dir cased_L-12_H-768_A-12 -num_worker=1 -max_seq_len=10

Then, open config.json and modify it to suit your experiments. The main values to change are:

"dataset": either "bbn_modified", "figer_50k", or "ontonotes_modified"
"model": <the name of your model (for your reference)>
"task": either "end_to_end" or "mention_level"

For the end-to-end model, the embedding model can be changed:

"embedding_model": either "bert", "glove", or "word2vec"

Please note that if you want to use word2vec or glove you'll need to download the pretrained model files and place them under /word2vec and glove respectively. The filenames are glove.6B.300d.txt (from here) and enwiki_20180420_300d.txt (from here) respectively.

(The datasets are named "bbn_modified", "ontonotes_modified" and "figer_50k" because I took small samples from the test set of the BBN and Ontonotes datasets, and transformed it into a new dataset. The FIGER dataset is the first 50k of the "Wiki" dataset available in the AFET paper.)

Running the model

First, build the data loaders using:

python build_data.py

The code will transform the relevant dataset in the /data/datasets directory into a numerical format. Then, to train the model:

python train.py

The model will be evaluated during training on the dev set, and then evaluated on the test set once training is complete.