Skip to content

Results overview

André Pires edited this page Jun 18, 2017 · 6 revisions

Results for HAREM

Taking into account only the categories, the results, ordered by F-measure, were:

  • Stanford CoreNLP: 56.10%
  • OpenNLP: 53.63%
  • SpaCy: 46.81%
  • NLTK: 30.97%

Results for categories:

Tool Precision Recall F-measure
Stanford CoreNLP 58.84% 53.60% 56.10%
OpenNLP 55.43% 51.94% 53.63%
SpaCy 51.21% 43.10% 46.81%
NLTK 30.58% 31.38% 30.97%

F-measure for all levels:

Tool Categories Types Subtypes Filtered
Stanford CoreNLP 56.10% - - 61.10%
OpenNLP 53.63% 48.53% 50.74% 57.44%
SpaCy 46.81% 44.04% 37.86% 49.22%
NLTK 30.97% 28.82% 21.91% 32.12%

Performance

Average training time:

Tool Categories Types Subtypes Filtered All
Stanford CoreNLP 11m40s - - 5m09s 11h13m
OpenNLP 22s 52s 44s 16s 1h30
SpaCy 3m17s 5m19s 5m20s 2m55s 11h14m
NLTK 2s + 1m56s + 5m55s 2s + 5m23s + 5m54s 2s + 4m25s + 5m52s 2s + 1m12s + 5m58s 24h30m

Notes: The All column represents the amount of training time for every fold + repeats combined for all levels. It is important to note that Stanford CoreNLP only ran for categories and filtered level. And NLTK ran 3 different algorithms for each level, hence the high value for the All column.

Hyperparameter study results

Tool Default F-measure Best configurations Best F-measure
Stanford CoreNLP 54.14% tolerance=1e-3 54.31%
OpenNLP 50.90% cutoff=4 52.38%
OpenNLP 50.90% iterations=170 51.52%
SpaCy 54.70 iterations=110 46.60%
NLTK DT 26.14% entropy_cutoff=0.08 26.63%
NLTK DT 26.14% support_cutoff=16 26.18%
NLTK ME 1.11% min_lldelta=0, iterations=100 35.24%

Results for SIGARRA News Corpus

Repeated holdout

Tool Precision Recall F-measure
Stanford CoreNLP 90.26% 83.31% 86.64%
OpenNLP 87.87% 78.98% 83.19%
SpaCy 83.29% 77.52% 80.30%
NLTK 60.50% 69.53% 64.70%

Repeated 10-fold cross validation

Tool Precision Recall F-measure
Stanford CoreNLP 89.80% 84.10% 86.86%
OpenNLP 88.03% 79.85% 83.74%
SpaCy 83.95% 78.76% 81.27%
NLTK 56.03% 70.32% 62.37%