GitHub - INK-USC/AFET: AFET: Automatic Fine-Grained Entity Typing (EMNLP'16)

AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding

Source code and data for EMNLP'16 paper AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding.

Given a text corpus with entity mentions detected and heuristically labeled by distant supervision, this code performs training of a rank-based loss over distant supervision and predict the fine-grained entity types for each test entity mention. For example, check out AFET's output on WSJ news articles.

An end-to-end tool (corpus to typed entities) is under development. Please keep track of our updates.

Performance

Performance of fine-grained entity type classification over Wiki (Ling & Weld, 2012) dataset.

Method	Accuray	Macro-F1	Micro-F1
HYENA (Yosef et al., 2012)	0.288	0.528	0.506
FIGER (Ling & Weld, 2012)	0.474	0.692	0.655
FIGER + All Filter (Gillick et al., 2014)	0.453	0.648	0.582
HNM (Dong et al., 2015)	0.237	0.409	0.417
WSABIE (Yogatama et al,., 2015)	0.480	0.679	0.657
AFET (Ren et al., 2016)	0.533	0.693	0.664

System Output

The output on BBN dataset can be found here. Each line is a sentence in the test data of BBN, with entity mentions and their fine-grained entity typed identified.

Dependency

python 2.7, g++
Python library dependencies

$ pip install pexpect unidecode six requests protobuf

Setup stanford coreNLP and its python wrapper.

$ cd DataProcessor/
$ git clone [email protected]:stanfordnlp/stanza.git
$ cd stanza
$ pip install -e .
$ wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
$ unzip stanford-corenlp-full-2016-10-31.zip
$ rm stanford-corenlp-full-2016-10-31.zip

Data

We pre-processed three public datasets (train/test sets) to our JSON format. We ran Stanford NER on training set to detect entity mentions, and performed distant supervision using DBpediaSpotlight to assign type labels:

Wiki (Ling & Weld, 2012): 1.5M sentences sampled from 780k Wikipedia articles. 434 news sentences are manually annotated for evaluation. 113 entity types are organized into a 2-level hierarchy (download JSON)
OntoNotes (Weischedel et al., 2011): 13k news articles with 77 of them are manually labeled for evaluation. 89 entity types are organized into a 3-level hierarchy. (download JSON)
BBN (Weischedel et al., 2005): 2,311 WSJ articles that are manually annotated using 93 types in a 2-level hierarchy. (download JSON)

Type hierarches for each dataset are included.
Please put the data files in the corresponding subdirectories under AFET/Data/.

Makefile

$ cd AFET/Model; make

Default Run

Run AFET for fine-grained entity typing on BBN dataset

$ java -mx4g -cp "DataProcessor/stanford-corenlp-full-2016-10-31/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
$ ./run.sh

Parameters - run.sh

Dataset to run on.

Data="BBN"

concrete parameters for running each dataset can be found in the README in corresponding data folder under AFET/Data/

Evaluation

Evaluate prediction results (by classifier trained on de-noised data) over test data

python Evaluation/emb_prediction.py $Data pl_warp bipartite maximum cosine 0.25
python Evaluation/evaluation.py $Data pl_warp bipartite

python Evaluation/evaluation.py -DATA(BBN/ontonotes/FIGER) -METHOD(hple/...) -EMB_MODE(hete_feature)

Publication

Please cite the following paper if you find the codes and datasets are helpful:

@inproceedings{Ren2016AFETAF,
  title={AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding},
  author={Xiang Ren and Wenqi He and Meng Qu and Lifu Huang and Heng Ji and Jiawei Han},
  booktitle={EMNLP},
  year={2016}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Data		Data
DataProcessor		DataProcessor
Evaluation		Evaluation
Model		Model
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding

Performance

System Output

Dependency

Data

Makefile

Default Run

Parameters - run.sh

Evaluation

Publication

About

Releases

Packages

Contributors 3

Languages

License

INK-USC/AFET

Folders and files

Latest commit

History

Repository files navigation

AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding

Performance

System Output

Dependency

Data

Makefile

Default Run

Parameters - run.sh

Evaluation

Publication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages