chineseEL-datasets

This is the revised NLPCC dataset for Chinese entity linking (re-annotated to Chinese wikipedia dump on 12-01-2017).

The original datasets can be retrieved from http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html (NLPCC 2013); http://tcci.ccf.org.cn/conference/2014/pages/page04_tdata.html (NLPCC 2014). After downloading the files, there ought to be the following files:

(NLPCC 2013)

query.txt.withNIL: evaluation answer with annotation to the local knowledge base

query_0525_ALL.txt: evaluation samples without answer

query_0501.txt: training data

KB_BD_nomerge_nolink.xml: local knowledge base

(NLPCC 2014)

EL_annotation.txt: evaluation answer with annotation to the local knowledge base

weiboAutoTag_6.txt: evaluation samples without answer

sample_query_2014.txt: training data

PKBase_zhwiki_1_small.xml: local knowledge base

As is pointed out in our paper, the local knowledge bases used in the datasets are partial and out-dated, thereby might restrain the development of EL techniques. We re-annotated the in-KB parts to the Chinese wikipedia. Note that not all the mentions need to be re-annotated since we identify each entity by its name instead of id or other things. Chances are that the true entity name for some mentions share the same name with the corresponding entities in Chinese wikipedia. Thereby, we merely offers the modification files here, which includes mentions whose true entity name has changed during the re-annotating process.

The modification files may conform to the following form:

We have found some errors in the revised datasets. They will be uploaded as soon as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
2013kb_modify.txt		2013kb_modify.txt
2014kb_modify.txt		2014kb_modify.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chineseEL-datasets

About

Releases

Packages

DexterZeng/chineseEL-datasets

Folders and files

Latest commit

History

Repository files navigation

chineseEL-datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages