-
Notifications
You must be signed in to change notification settings - Fork 0
/
K=2
176 lines (137 loc) · 6.14 KB
/
K=2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
K=2
normal
CRFClassifier tagged 24878 words in 1380 documents at 7847,95 words per second.
Entity P R F1 TP FP FN
LOC 0,8729 0,2210 0,3527 103 15 363
MISC 0,4615 0,0496 0,0896 6 7 115
ORG 0,7527 0,2208 0,3415 70 23 247
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7056 0,5743 0,6332 429 179 318
Totals 0,7308 0,3680 0,4895 608 224 1044
---------------------------------------------------
useWordPairs = true
CRFClassifier tagged 24878 words in 1380 documents at 4439,33 words per second.
Entity P R F1 TP FP FN
LOC 0,8860 0,2167 0,3483 101 13 365
MISC 0,5000 0,0496 0,0902 6 6 115
ORG 0,7647 0,2050 0,3234 65 20 252
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,6995 0,5609 0,6226 419 180 328
Totals 0,7296 0,3577 0,4801 591 219 1061
---------------------------------------------------
normalizeTerms=true
normalizeTimex=true
useNB=true
CRFClassifier tagged 24878 words in 1380 documents at 5244,10 words per second.
Entity P R F1 TP FP FN
LOC 0,8793 0,2189 0,3505 102 14 364
MISC 0,4615 0,0496 0,0896 6 7 115
ORG 0,7640 0,2145 0,3350 68 21 249
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7044 0,5743 0,6327 429 180 318
Totals 0,7316 0,3662 0,4881 605 222 1047
---------------------------------------------------
useOccurrencePatterns=true
CRFClassifier tagged 24878 words in 1380 documents at 2340,14 words per second.
Entity P R F1 TP FP FN
LOC 0,8824 0,2253 0,3590 105 14 361
MISC 0,4615 0,0496 0,0896 6 7 115
ORG 0,7742 0,2271 0,3512 72 21 245
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,6970 0,5850 0,6361 437 190 310
Totals 0,7277 0,3753 0,4952 620 232 1032
---------------------------------------------------
useOccurrencePatterns=true
normalizeTerms=true
normalizeTimex=true
useNB=true
useWordPairs = true
CRFClassifier tagged 24878 words in 1380 documents at 862,83 words per second.
Entity P R F1 TP FP FN
LOC 0,8803 0,2210 0,3533 103 14 363
MISC 0,5000 0,0496 0,0902 6 6 115
ORG 0,7647 0,2050 0,3234 65 20 252
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,6895 0,5797 0,6298 433 195 314
Totals 0,7209 0,3674 0,4868 607 235 1045
---------------------------------------------------
useTitle=true
CRFClassifier tagged 24878 words in 1380 documents at 5686,40 words per second.
Entity P R F1 TP FP FN
LOC 0,8739 0,2232 0,3556 104 15 362
MISC 0,5000 0,0496 0,0902 6 6 115
ORG 0,7527 0,2208 0,3415 70 23 247
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7059 0,5783 0,6358 432 180 315
Totals 0,7321 0,3705 0,4920 612 224 1040
---------------------------------------------------
With mergeTags=true
CRFClassifier tagged 24878 words in 1380 documents at 6118,54 words per second.
Entity P R F1 TP FP FN
LOC 0,9161 0,3047 0,4573 142 13 324
MISC 0,3810 0,0661 0,1127 8 13 113
ORG 0,7128 0,2114 0,3260 67 27 250
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7298 0,6037 0,6608 451 167 296
Totals 0,7523 0,4044 0,5260 668 220 984 (incorrect)
Totals 0,7157 0,3565 0,4760 589 234 1063 (correct)
---------------------------------------------------
useGazettes=true
gazette=./src/firstname.txt
sloppyGazette=true
CRFClassifier tagged 24878 words in 1380 documents at 4858,04 words per second.
Entity P R F1 TP FP FN
LOC 0,8512 0,2210 0,3509 103 18 363
MISC 0,5000 0,0496 0,0902 6 6 115
ORG 0,7528 0,2114 0,3300 67 22 250
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7072 0,5756 0,6347 430 178 317
Totals 0,7301 0,3668 0,4883 606 224 1046
---------------------------------------------------
useGazettes=true
gazette=./src/firstname.txt
sloppyGazette=true
ProperCase for family name
CRFClassifier tagged 24878 words in 1380 documents at 1152,45 words per second.
Entity P R F1 TP FP FN
MISC 0,5000 0,0661 0,1168 8 8 113
ORG 0,7303 0,2050 0,3202 65 24 252
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7282 0,6024 0,6593 450 168 297
Totals 0,7452 0,3771 0,5008 623 213 1029
With states of US
CRFClassifier tagged 24878 words in 1380 documents at 4819,45 words per second.
Entity P R F1 TP FP FN
LOC 0,9008 0,2339 0,3714 109 12 357
MISC 0,5000 0,0661 0,1168 8 8 113
ORG 0,7556 0,2145 0,3342 68 22 249
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7338 0,6051 0,6632 452 164 295
Totals 0,7556 0,3856 0,5106 637 206 1015
With countries
CRFClassifier tagged 24878 words in 1380 documents at 8323,19 words per second.
Entity P R F1 TP FP FN
LOC 0,9097 0,3026 0,4541 141 14 325
MISC 0,5000 0,0661 0,1168 8 8 113
ORG 0,7113 0,2177 0,3333 69 28 248
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7426 0,6064 0,6676 453 157 294
Totals 0,7642 0,4062 0,5304 671 207 981
Without Miss
CRFClassifier tagged 24878 words in 1380 documents at 4990,57 words per second.
Entity P R F1 TP FP FN
LOC 0,9091 0,3004 0,4516 140 14 326
MISC 0,5333 0,0661 0,1176 8 7 113
ORG 0,7188 0,2177 0,3341 69 27 248
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7463 0,6064 0,6691 453 154 294
Totals 0,7683 0,4056 0,5309 670 202 982
With cleanGazette + PER + ORG + LOC
CRFClassifier tagged 24878 words in 1380 documents at 4816,65 words per second.
Entity P R F1 TP FP FN
LOC 0,9217 0,3283 0,4842 153 13 313
MISC 0,5000 0,0661 0,1168 8 8 113
ORG 0,7451 0,2397 0,3628 76 26 241
Org 0,0000 0,0000 0,0000 0 0 1
PER 0,7451 0,6064 0,6686 453 155 294
Totals 0,7735 0,4177 0,5425 690 202 962