Improve evaluation #62

marmg · 2023-08-25T11:39:27Z

Status	Type	⚠️ Core Change	Issue
Ready	Feature	No

Summary

Added create_dataset:

Function for creating a DatasetWithEntities from sentences and ground truth.

Add token-based evaluation

Use span-based evaluation or token-based evaluation

span-based
Span-based evaluation (by default) consider each BIO tag for each token, and it's only correct if all the tokens in the entity span are recognized with their corresponding BIO tag.

token-based
Token-based evaluation consider only the B- tag for each token, and it's measured at token level.

In the next example, the SMXM Linker will extract York as LOC, but not New New York. Thus, the span-based f1 is 0.0 as the span was not fully recognized. On the other hand, the token-based f1 is 0.5, as the precision is 1.0 and the recall is 0.3333 (1 token correctly extracted out of three).

Example

import spacy
from zshot import PipelineConfig
from zshot.linker import LinkerSMXM

from zshot.evaluation.metrics.seqeval.seqeval import Seqeval
from zshot.evaluation.dataset.dataset import create_dataset
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report
from zshot.utils.data_models import Entity

ENTITIES = [
    Entity(name="FAC", description="A facility"),
    Entity(name="LOC", description="A location"),
]
sentences = ["New New York is beautiful"]
gt = [["B-LOC", "I-LOC", "I-LOC", "O", "O"]]

dataset = create_dataset(gt, sentences, ENTITIES)


nlp = spacy.blank("en")
nlp_config = PipelineConfig(
    linker=LinkerSMXM(),
    entities=ENTITIES
)

nlp.add_pipe("zshot", config=nlp_config, last=True)

evaluation = evaluate(nlp, dataset, metric=Seqeval())
tables = prettify_evaluate_report(evaluation, show_full_report=False)
print("F1-Macro span-based:", evaluation['linker']['overall_f1_macro'])

evaluation = evaluate(nlp, dataset, metric=Seqeval(), mode='token')
tables = prettify_evaluate_report(evaluation, show_full_report=False)
print("F1-Macro token-based:", evaluation['linker']['overall_f1_macro'])

> Map:   0%|          | 0/1 [00:00<?, ? examples/s]
> F1-Macro span-based: 0.0
> Map:   0%|          | 0/1 [00:00<?, ? examples/s]
> F1-Macro token-based: 0.5

Signed-off-by: Marcos Martinez <[email protected]>

…n-based evaluation Signed-off-by: Marcos Martinez <[email protected]>

marmg self-assigned this Aug 25, 2023

marmg added 4 commits August 25, 2023 12:39

📝 Added title to evaluation tables

ff949e3

Signed-off-by: Marcos Martinez <[email protected]>

✨🚸 Added token-based evaluation. Improved evaluation report

a59eaa0

Signed-off-by: Marcos Martinez <[email protected]>

♻️ Added create_dataset function

004bef4

Signed-off-by: Marcos Martinez <[email protected]>

🐛✅ Fix bug in token-based evaluation of linkers. Added tests for toke…

3d993cb

…n-based evaluation Signed-off-by: Marcos Martinez <[email protected]>

marmg force-pushed the improvement/evaluation branch from 5e43400 to 3d993cb Compare August 25, 2023 11:40

marmg merged commit 5a6f5a2 into main Aug 25, 2023

marmg deleted the improvement/evaluation branch August 25, 2023 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve evaluation #62

Improve evaluation #62

marmg commented Aug 25, 2023

Improve evaluation #62

Improve evaluation #62

Conversation

marmg commented Aug 25, 2023

Summary