Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve evaluation #62

Merged
merged 4 commits into from
Aug 25, 2023
Merged

Improve evaluation #62

merged 4 commits into from
Aug 25, 2023

Conversation

marmg
Copy link
Collaborator

@marmg marmg commented Aug 25, 2023

Status Type ⚠️ Core Change Issue
Ready Feature No

Summary

  • Added create_dataset:

Function for creating a DatasetWithEntities from sentences and ground truth.

  • Add token-based evaluation

Use span-based evaluation or token-based evaluation

span-based
    Span-based evaluation (by default) consider each BIO tag for each token, and it's only correct if all the tokens in the entity span are recognized with their corresponding BIO tag.

token-based
    Token-based evaluation consider only the B- tag for each token, and it's measured at token level.

In the next example, the SMXM Linker will extract York as LOC, but not New New York. Thus, the span-based f1 is 0.0 as the span was not fully recognized. On the other hand, the token-based f1 is 0.5, as the precision is 1.0 and the recall is 0.3333 (1 token correctly extracted out of three).

Example

import spacy
from zshot import PipelineConfig
from zshot.linker import LinkerSMXM

from zshot.evaluation.metrics.seqeval.seqeval import Seqeval
from zshot.evaluation.dataset.dataset import create_dataset
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report
from zshot.utils.data_models import Entity

ENTITIES = [
    Entity(name="FAC", description="A facility"),
    Entity(name="LOC", description="A location"),
]
sentences = ["New New York is beautiful"]
gt = [["B-LOC", "I-LOC", "I-LOC", "O", "O"]]

dataset = create_dataset(gt, sentences, ENTITIES)


nlp = spacy.blank("en")
nlp_config = PipelineConfig(
    linker=LinkerSMXM(),
    entities=ENTITIES
)

nlp.add_pipe("zshot", config=nlp_config, last=True)

evaluation = evaluate(nlp, dataset, metric=Seqeval())
tables = prettify_evaluate_report(evaluation, show_full_report=False)
print("F1-Macro span-based:", evaluation['linker']['overall_f1_macro'])

evaluation = evaluate(nlp, dataset, metric=Seqeval(), mode='token')
tables = prettify_evaluate_report(evaluation, show_full_report=False)
print("F1-Macro token-based:", evaluation['linker']['overall_f1_macro'])

> Map:   0%|          | 0/1 [00:00<?, ? examples/s]
> F1-Macro span-based: 0.0
> Map:   0%|          | 0/1 [00:00<?, ? examples/s]
> F1-Macro token-based: 0.5

@marmg marmg self-assigned this Aug 25, 2023
@marmg marmg force-pushed the improvement/evaluation branch from 5e43400 to 3d993cb Compare August 25, 2023 11:40
@marmg marmg merged commit 5a6f5a2 into main Aug 25, 2023
@marmg marmg deleted the improvement/evaluation branch August 25, 2023 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant