You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a confidence score to eds.ner.
This is not trivial since an entity is the result of multiple word predictions, following a label scheme.
Ex: in the sentence "the [big cat] likes the [dog]", labelled as O B-CAT L-CAT O O U-DOG
Each word has a probability distribution over the different tags [O, I-CAT, B-CAT, L-CAT, U-CAT, I-DOG, B-DOG, L-DOG, U-DOG].
So we must actually decide what meaning this score would have:
probability that an entity covers the word of the predicted entity
probability that an entity covers the words of the predicted entity and starts/end there
probability that an entity covers the words of the predicted entity and starts/end there, with the same label
Since word tag sequence is produced via computed via a CRF, we can either use the logits before the CRF, or use CRF marginalization to obtain the post-CRF logits distribution.
At the end, the score should be available as ent._.prob[ner/ent/label]
The text was updated successfully, but these errors were encountered:
Feature type
Following https://huggingface.co/AP-HP/eds-pseudo-public/discussions/1
Add a confidence score to eds.ner.
This is not trivial since an entity is the result of multiple word predictions, following a label scheme.
Ex: in the sentence "the [big cat] likes the [dog]", labelled as
O B-CAT L-CAT O O U-DOG
Each word has a probability distribution over the different tags [O, I-CAT, B-CAT, L-CAT, U-CAT, I-DOG, B-DOG, L-DOG, U-DOG].
So we must actually decide what meaning this score would have:
For reference, here is how it was done in nlstruct https://github.com/percevalw/nlstruct/blob/23bff612369d96d54c352031c6819ab235250fea/nlstruct/models/bitag.py#L178-L201
Since word tag sequence is produced via computed via a CRF, we can either use the logits before the CRF, or use CRF marginalization to obtain the post-CRF logits distribution.
At the end, the score should be available as
ent._.prob[ner/ent/label]
The text was updated successfully, but these errors were encountered: