Skip to content
André Pires edited this page Apr 27, 2017 · 36 revisions

This wiki documents the development process for my master's thesis, named Entity and relation extraction from web content.

First, the HAREM dataset was used to perform NER using available tools, namely Stanford NER, NLTK, OpenNLP and spaCy.

Main repository folders

All programs were intended to be ran across HAREM with four different categories:

  • Categories: use only categories
  • Types: use only types
  • Subtypes: use only subtypes
  • Filtered: use filtered categories

Results

Taking into account only the categories, the results, ordered by F-measure, were:

  • Stanford CoreNLP: 56.10%
  • OpenNLP: 53.63%
  • SpaCy: 46.81%
  • NLTK: 30.97%

Results for categories:

Tool Precision Recall F-measure
Stanford CoreNLP 58.84% 53.60% 56.10%
OpenNLP 55.43% 51.94% 53.63%
SpaCy 51.21% 43.10% 46.81%
NLTK 30.58% 31.38% 30.97%

F-measure for all levels:

Tool Categories Types Subtypes Filtered
Stanford CoreNLP 56.10% - - 61.10%
OpenNLP 53.63% 48.53% 50.74% 57.44%
SpaCy 46.81% 44.04% 37.86% 49.22%
NLTK 30.97% 28.82% 21.91% 32.12%

Performance

Average training time:

Tool Categories Types Subtypes Filtered All
Stanford CoreNLP 11m40s - - 5m09s 11h13m
OpenNLP 22s 52s 44s 16s 1h30
SpaCy 3m17s 5m19s 5m20s 2m55s 11h14m
NLTK 2s + 1m56s + 5m55s 2s + 5m23s + 5m54s 2s + 4m25s + 5m52s 2s + 1m12s + 5m58s 24h30m

Notes: The All column represents the amount of training time for every fold + repeats combined for all levels. It is important to note that Stanford CoreNLP only ran for categories and filtered level. And NLTK ran 3 different algorithms for each level, hence the high value for the All column.

Clone this wiki locally