-
Notifications
You must be signed in to change notification settings - Fork 0
/
FINAL STATS
307 lines (270 loc) · 10.1 KB
/
FINAL STATS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
#column 1
map = word=0,answer=1,lemma=2,tag=3
#these are the features we'd like to train with
#some are discussed below, the rest can be
#understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
useNGrams=true
#no ngrams will be included that do not contain either the
#beginning or end of the word
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
#the next 4 deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
#memory parameters
cacheNGrams=true
maxLeft=1
#qnSize=10
saveFeatureIndexToDisk=true
#useObservedSequencesOnly=true
#improving quality
useLemmas=true
usePrevNextLemmas=true
useTags=true
useGazettes=true
gazette=./src/firstname.txt ./src/companies.txt ./src/location.txt ./src/misc.txt
cleanGazette=true
for K = 2:
CRFClassifier tagged 24880 words in 1381 documents at 7846.11 words per second.
Entity P R F1 TP FP FN
LOC 0.8828 0.4528 0.5986 211 28 255
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.6992 0.2925 0.4124 93 40 225
PER 0.7125 0.7001 0.7063 523 211 224
Totals 0.7444 0.5061 0.6025 836 287 816
===============================
***
with B-PER in the firstnames gazette instead of PER
(gazette=./src/B-PER.txt ./src/companies.txt ./src/location.txt ./src/misc.txt):
Entity P R F1 TP FP FN
LOC 0.8828 0.4528 0.5986 211 28 255
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.7194 0.3145 0.4376 100 39 218
PER 0.7155 0.7001 0.7077 523 208 224
Totals 0.7487 0.5103 0.6069 843 283 809
+++
(P: 0.7444 -> 0.7487 R: 0.5061 -> 0.5103 F1: 0.6025 -> 0.6069)
===============================
with B/I tagged gazettes for LOC,MISC,ORG,PER:
cleanGazette=true
CRFClassifier tagged 24880 words in 1381 documents at 8173.46 words per second.
Entity P R F1 TP FP FN
LOC 0.8734 0.4442 0.5889 207 30 259
MISC 0.5000 0.0826 0.1418 10 10 111
ORG 0.7120 0.2799 0.4018 89 36 229
PER 0.7196 0.6975 0.7084 521 203 226
Totals 0.7477 0.5006 0.5997 827 279 825
+--
(P: 0.7444 -> 0.7477 R: 0.5061 -> 0.5006 F1: 0.6025 -> 0.5997)
===============================
with B/I tagged gazettes for LOC,MISC,ORG,PER:
sloppyGazette=true
CRFClassifier tagged 24880 words in 1381 documents at 9621.04 words per second.
Entity P R F1 TP FP FN
LOC 0.8771 0.4442 0.5897 207 29 259
MISC 0.5263 0.0826 0.1429 10 9 111
ORG 0.6947 0.2862 0.4053 91 40 227
PER 0.7190 0.6988 0.7088 522 204 225
Totals 0.7464 0.5024 0.6006 830 282 822
+--
(P: 0.7444 -> 0.7464 R: 0.5061 -> 0.5024 F1: 0.6025 -> 0.6006)
===============================
with:
useTitle=true
CRFClassifier tagged 24880 words in 1381 documents at 7910.97 words per second.
Entity P R F1 TP FP FN
LOC 0.8828 0.4528 0.5986 211 28 255
MISC 0.4737 0.0744 0.1286 9 10 112
ORG 0.7071 0.3113 0.4323 99 41 219
PER 0.7121 0.6988 0.7054 522 211 225
Totals 0.7436 0.5091 0.6044 841 290 811
-++
(P: 0.7444 -> 0.7436 R: 0.5061 -> 0.5091 F1: 0.6025 -> 0.6044)
===============================
***
with:
useTitle2=true
CRFClassifier tagged 24880 words in 1381 documents at 9635.94 words per second.
Entity P R F1 TP FP FN
LOC 0.8828 0.4528 0.5986 211 28 255
MISC 0.5000 0.0744 0.1295 9 9 112
ORG 0.7143 0.3145 0.4367 100 40 218
PER 0.7141 0.6988 0.7064 522 209 225
Totals 0.7465 0.5097 0.6058 842 286 810
+++
(P: 0.7444 -> 0.7465 R: 0.5061 -> 0.5097 F1: 0.6025 -> 0.6058)
===============================
???
with:
normalizeTerms=true
normalizeTimex=true
useNB=true
CRFClassifier tagged 24880 words in 1381 documents at 9123.58 words per second.
Entity P R F1 TP FP FN
LOC 0.8838 0.4571 0.6025 213 28 253
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.6917 0.2893 0.4080 92 41 226
PER 0.7147 0.6975 0.7060 521 208 226
Totals 0.7455 0.5054 0.6025 835 285 817
+-/
(P: 0.7444 -> 0.7455 R: 0.5061 -> 0.5054 F1: 0.6025 -> 0.6025)
===============================
???
with:
normalizeTerms=true
normalizeTimex=true
CRFClassifier tagged 24880 words in 1381 documents at 9273.20 words per second.
Entity P R F1 TP FP FN
LOC 0.8838 0.4571 0.6025 213 28 253
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.6917 0.2893 0.4080 92 41 226
PER 0.7147 0.6975 0.7060 521 208 226
Totals 0.7455 0.5054 0.6025 835 285 817
+-/
(P, R, F1 are same as the previous one.)
===============================
with:
useWordPairs=true
CRFClassifier tagged 24880 words in 1381 documents at 9001.45 words per second.
Entity P R F1 TP FP FN
LOC 0.8819 0.4485 0.5946 209 28 257
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.7016 0.2736 0.3937 87 37 231
PER 0.7127 0.6975 0.7050 521 210 226
Totals 0.7448 0.5000 0.5983 826 283 826
+--
(P: 0.7444 -> 0.7448 R: 0.5061 -> 0.5000 F1: 0.6025 -> 0.5983)
===============================
with:
useTypeSeqs3=true
CRFClassifier tagged 24880 words in 1381 documents at 9076.98 words per second.
Entity P R F1 TP FP FN
LOC 0.8809 0.4442 0.5906 207 28 259
MISC 0.5000 0.0744 0.1295 9 9 112
ORG 0.7007 0.3019 0.4220 96 41 222
PER 0.7160 0.6988 0.7073 522 207 225
Totals 0.7453 0.5048 0.6019 834 285 818
+--
(P: 0.7444 -> 0.7453 R: 0.5061 -> 0.5048 F1: 0.6025 -> 0.6019)
===============================
***
with useOccurrencePatterns=true:
CRFClassifier tagged 24880 words in 1381 documents at 9034.13 words per second.
Entity P R F1 TP FP FN
LOC 0.8765 0.4571 0.6008 213 30 253
MISC 0.5000 0.0744 0.1295 9 9 112
ORG 0.7054 0.2862 0.4072 91 38 227
PER 0.7193 0.7068 0.7130 528 206 219
Totals 0.7482 0.5091 0.6059 841 283 811
+++
(P: 0.7444 -> 0.7482 R: 0.5061 -> 0.5091 F1: 0.6025 -> 0.6059)
===============================
with:
gazette=./src/firstname.txt ./src/companies.txt ./src/location.txt ./src/misc.txt
useOccurrencePatterns=true
useTitle2=true
CRFClassifier tagged 24880 words in 1381 documents at 9431.39 words per second.
Entity P R F1 TP FP FN
LOC 0.8802 0.4571 0.6017 213 29 253
MISC 0.5000 0.0744 0.1295 9 9 112
ORG 0.6923 0.2830 0.4018 90 40 228
PER 0.7211 0.7095 0.7152 530 205 217
Totals 0.7484 0.5097 0.6064 842 283 810
+++
(P: 0.7444 -> 0.7484 R: 0.5061 -> 0.5097 F1: 0.6025 -> 0.6064)
===============================
*** ----->>>> [[ BEST F1 RESULT ]] <<<<----
with:
gazette=./src/B-PER.txt ./src/companies.txt ./src/location.txt ./src/misc.txt
useOccurrencePatterns=true
useTitle2=true
CRFClassifier tagged 24880 words in 1381 documents at 9749.22 words per second.
Entity P R F1 TP FP FN
LOC 0.8811 0.4614 0.6056 215 29 251
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.7077 0.2893 0.4107 92 38 226
PER 0.7158 0.7082 0.7120 529 210 218
Totals 0.7478 0.5115 0.6075 845 285 807
+++
(P: 0.7444 -> 0.7478 R: 0.5061 -> 0.5115 F1: 0.6025 -> 0.6075)
===============================
with:
gazette=./src/B-PER.txt ./src/companies.txt ./src/location.txt ./src/misc.txt
useOccurrencePatterns=true
useTitle2=true
normalizeTerms=true
normalizeTimex=true
CRFClassifier tagged 24880 words in 1381 documents at 8985.19 words per second.
Entity P R F1 TP FP FN
LOC 0.8807 0.4592 0.6037 214 29 252
MISC 0.5625 0.0744 0.1314 9 7 112
ORG 0.7031 0.2830 0.4036 90 38 228
PER 0.7152 0.7095 0.7124 530 211 217
Totals 0.7473 0.5103 0.6065 843 285 809
+++
(P: 0.7444 -> 0.7473 R: 0.5061 -> 0.5103 F1: 0.6025 -> 0.6065)
===============================
***
with increased memory (-Xmx2048m -Xms512m):
CRFClassifier tagged 24880 words in 1381 documents at 8895.24 words per second.
Entity P R F1 TP FP FN
LOC 0.8828 0.4528 0.5986 211 28 255
MISC 0.5000 0.0744 0.1295 9 9 112
ORG 0.7153 0.3082 0.4308 98 39 220
PER 0.7121 0.6988 0.7054 522 211 225
Totals 0.7453 0.5085 0.6045 840 287 812
+++
(P: 0.7444 -> 0.7453 R: 0.5061 -> 0.5085 F1: 0.6025 -> 0.6045)
===============================
***
with increased memory (-Xmx2048m -Xms512m) and:
useOccurrencePatterns=true
CRFClassifier tagged 24880 words in 1381 documents at 7900.92 words per second.
Entity P R F1 TP FP FN
LOC 0.8776 0.4614 0.6048 215 30 251
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.7132 0.2893 0.4116 92 37 226
PER 0.7154 0.7068 0.7111 528 210 219
Totals 0.7476 0.5109 0.6070 844 285 808
+++
(P: 0.7444 -> 0.7476 R: 0.5061 -> 0.5109 F1: 0.6025 -> 0.6070)
===============================
with increased memory (-Xmx4096m -Xms512m) and:
gazette=./src/B-PER.txt ./src/companies.txt ./src/location.txt ./src/misc.txt
useOccurrencePatterns=true
useTitle2=true
CRFClassifier tagged 24880 words in 1381 documents at 9470.88 words per second.
Entity P R F1 TP FP FN
LOC 0.8802 0.4571 0.6017 213 29 253
MISC 0.5294 0.0744 0.1304 9 8 112
ORG 0.7031 0.2830 0.4036 90 38 228
PER 0.7141 0.7055 0.7098 527 211 220
Totals 0.7458 0.5079 0.6042 839 286 813
+++
(P: 0.7444 -> 0.7476 R: 0.5061 -> 0.5109 F1: 0.6025 -> 0.6070)
===============================
with increased memory (-Xmx4096m -Xms512m) and:
gazette=./src/B-PER.txt ./src/companies.txt ./src/location.txt ./src/misc.txt
useOccurrencePatterns=true
useTitle2=true
normalizeTerms=true
normalizeTimex=true
CRFClassifier tagged 24880 words in 1381 documents at 9157.16 words per second.
Entity P R F1 TP FP FN
LOC 0.8802 0.4571 0.6017 213 29 253
MISC 0.5625 0.0744 0.1314 9 7 112
ORG 0.6977 0.2830 0.4027 90 39 228
PER 0.7139 0.7082 0.7110 529 212 218
Totals 0.7456 0.5091 0.6050 841 287 811
+++
(P: 0.7444 -> 0.7456 R: 0.5061 -> 0.5091 F1: 0.6025 -> 0.6050)
===============================