Question about active features #91

kite1988 · 2017-08-30T20:43:47Z

I used crfsuite to train a model for a named entity recognition task. I set the feature.minfreq to be 0 (no feature cut off), but I observed the number of active features (17450) is much smaller than the number of features (79075). Below is the snippet of the log:

Number of active features: 17450 (79075)
Number of active attributes: 5310 (64323)
Number of active labels: 21 (21)

Is any one know how the active features are selected? Another question, what are the differences between active features and active attributes? Thanks very much!

chokkan · 2017-08-31T03:35:58Z

CRFsuite removes features with zero weight assigned after finishing a training process. In your case, the number of features used in the training process was 79075, but only 17450 features have non-zero weights assigned by the training algorithm. For this reason, (79075-17450) features are removed from the model.

Roughly speaking, state features are pairs of attributes and labels. When a feature is removed from a model, there is also a possibility that the attribute associated with the feature is not referred to by any other feature and can be pruned. In your case, 5310 attributes are associated with features with non-zero weights, but the rest are with zero weights. For this reason, CRFsuite removed (64323-5310) attributes from the model.

I guess you used L1-regularization for training the model. It has a similar effect to setting a frequency cutoff.

arvinarvi · 2018-01-13T07:24:21Z

Is the tagging done using just the active features? How are the potentials computed for the tokens in the evaluation set which do not appear in the model file?
Thanks for reply.

usptact · 2018-01-13T19:53:04Z

@arvinarvi The features which appear only in tagging mode but are not in the model, will get a weight of zero.

arvinarvi · 2018-01-17T09:28:08Z

@usptact Thank you for your reply.

arvinarvi · 2018-01-19T09:41:30Z

I am implementing a sequence labeling problem which extracts the learned potentials from the model file of CRFsuite and apply different inference algorithm. I am finding it difficult to generalize the extraction of potentials of state features from the saved model file for a particular token (in the evaluation set, if it is present in the model file) since only the active features are logged. Can anyone help me figure out the problem? (If more info is required, I can be very specific to my problem). Thanks.

marctorsoc · 2018-10-29T12:06:49Z

I don't understand @chokkan answer. If every feature is a pair attribute+label. Then there should be at least as many features as attributes. And in theory many more as an attribute might appear with different labels... can someone explain please?

jwijffels mentioned this issue Oct 29, 2018

meaning of min_freq parameter bnosac/crfsuite#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about active features #91

Question about active features #91

kite1988 commented Aug 30, 2017

chokkan commented Aug 31, 2017

arvinarvi commented Jan 13, 2018

usptact commented Jan 13, 2018

arvinarvi commented Jan 17, 2018

arvinarvi commented Jan 19, 2018 •

edited

Loading

marctorsoc commented Oct 29, 2018

Question about active features #91

Question about active features #91

Comments

kite1988 commented Aug 30, 2017

chokkan commented Aug 31, 2017

arvinarvi commented Jan 13, 2018

usptact commented Jan 13, 2018

arvinarvi commented Jan 17, 2018

arvinarvi commented Jan 19, 2018 • edited Loading

marctorsoc commented Oct 29, 2018

arvinarvi commented Jan 19, 2018 •

edited

Loading