-
Notifications
You must be signed in to change notification settings - Fork 20
Results overview
André Pires edited this page Jun 18, 2017
·
6 revisions
Taking into account only the categories, the results, ordered by F-measure, were:
- Stanford CoreNLP: 56.10%
- OpenNLP: 53.63%
- SpaCy: 46.81%
- NLTK: 30.97%
Results for categories:
Tool | Precision | Recall | F-measure |
---|---|---|---|
Stanford CoreNLP | 58.84% | 53.60% | 56.10% |
OpenNLP | 55.43% | 51.94% | 53.63% |
SpaCy | 51.21% | 43.10% | 46.81% |
NLTK | 30.58% | 31.38% | 30.97% |
F-measure for all levels:
Tool | Categories | Types | Subtypes | Filtered |
---|---|---|---|---|
Stanford CoreNLP | 56.10% | - | - | 61.10% |
OpenNLP | 53.63% | 48.53% | 50.74% | 57.44% |
SpaCy | 46.81% | 44.04% | 37.86% | 49.22% |
NLTK | 30.97% | 28.82% | 21.91% | 32.12% |
Average training time:
Tool | Categories | Types | Subtypes | Filtered | All |
---|---|---|---|---|---|
Stanford CoreNLP | 11m40s | - | - | 5m09s | 11h13m |
OpenNLP | 22s | 52s | 44s | 16s | 1h30 |
SpaCy | 3m17s | 5m19s | 5m20s | 2m55s | 11h14m |
NLTK | 2s + 1m56s + 5m55s | 2s + 5m23s + 5m54s | 2s + 4m25s + 5m52s | 2s + 1m12s + 5m58s | 24h30m |
Notes: The All column represents the amount of training time for every fold + repeats combined for all levels. It is important to note that Stanford CoreNLP only ran for categories and filtered level. And NLTK ran 3 different algorithms for each level, hence the high value for the All column.
Tool | Default F-measure | Best configurations | Best F-measure |
---|---|---|---|
Stanford CoreNLP | 54.14% | tolerance=1e-3 | 54.31% |
OpenNLP | 50.90% | cutoff=4 | 52.38% |
OpenNLP | 50.90% | iterations=170 | 51.52% |
SpaCy | 54.70 | iterations=110 | 46.60% |
NLTK DT | 26.14% | entropy_cutoff=0.08 | 26.63% |
NLTK DT | 26.14% | support_cutoff=16 | 26.18% |
NLTK ME | 1.11% | min_lldelta=0, iterations=100 | 35.24% |
Repeated holdout
Tool | Precision | Recall | F-measure |
---|---|---|---|
Stanford CoreNLP | 90.26% | 83.31% | 86.64% |
OpenNLP | 87.87% | 78.98% | 83.19% |
SpaCy | 83.29% | 77.52% | 80.30% |
NLTK | 60.50% | 69.53% | 64.70% |
Repeated 10-fold cross validation
Tool | Precision | Recall | F-measure |
---|---|---|---|
Stanford CoreNLP | 89.80% | 84.10% | 86.86% |
OpenNLP | 88.03% | 79.85% | 83.74% |
SpaCy | 83.95% | 78.76% | 81.27% |
NLTK | 56.03% | 70.32% | 62.37% |