Feature request: NER prediction confidence score #377

percevalw · 2025-02-13T10:04:58Z

Feature type

Following https://huggingface.co/AP-HP/eds-pseudo-public/discussions/1

Add a confidence score to eds.ner.
This is not trivial since an entity is the result of multiple word predictions, following a label scheme.

Ex: in the sentence "the [big cat] likes the [dog]", labelled as O B-CAT L-CAT O O U-DOG
Each word has a probability distribution over the different tags [O, I-CAT, B-CAT, L-CAT, U-CAT, I-DOG, B-DOG, L-DOG, U-DOG].

So we must actually decide what meaning this score would have:

probability that an entity covers the word of the predicted entity
probability that an entity covers the words of the predicted entity and starts/end there
probability that an entity covers the words of the predicted entity and starts/end there, with the same label
...

For reference, here is how it was done in nlstruct https://github.com/percevalw/nlstruct/blob/23bff612369d96d54c352031c6819ab235250fea/nlstruct/models/bitag.py#L178-L201

Since word tag sequence is produced via computed via a CRF, we can either use the logits before the CRF, or use CRF marginalization to obtain the post-CRF logits distribution.

At the end, the score should be available as ent._.prob[ner/ent/label]

The text was updated successfully, but these errors were encountered:

LucasDedieu self-assigned this Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: NER prediction confidence score #377

Feature request: NER prediction confidence score #377

percevalw commented Feb 13, 2025 •

edited

Loading

Feature request: NER prediction confidence score #377

Feature request: NER prediction confidence score #377

Comments

percevalw commented Feb 13, 2025 • edited Loading

Feature type

percevalw commented Feb 13, 2025 •

edited

Loading