Skip to content
This repository has been archived by the owner on Jan 29, 2024. It is now read-only.

Perform Error Analysis of NER model predictions #607

Closed
3 tasks done
FrancescoCasalegno opened this issue Jul 12, 2022 · 8 comments
Closed
3 tasks done

Perform Error Analysis of NER model predictions #607

FrancescoCasalegno opened this issue Jul 12, 2022 · 8 comments

Comments

@FrancescoCasalegno
Copy link
Contributor

FrancescoCasalegno commented Jul 12, 2022

Context

Actions

  • Analyse the errors produced by our NER.
  • Are the discrepancies between y_true and y_pred actual errors or are there mainly subjective differencies (e.g. whether "cell" is a CELL_TYPE)?
  • What causes CELL_TYPE and CELL_COMPARTMENT in particular to have such bad f1-score?
@EmilieDel
Copy link
Contributor

EmilieDel commented Jul 12, 2022

Results of bluesearch.mining.eval.ner_errors():

(Note that the false negative and false positive results are set, so they don't appear several times if the mistake is detected more than once)

Entity mode

BRAIN_REGION
{'false_neg': {'Cortico - thalamic', 'cortical', 'cortico - cortico', 'cortico - striatal', 'cortico - thalamic', 'retinas'},
 'false_pos': {'cortico', 'dentate gyrus', 'dorsal', 'dorsal telencephalic', 'hippocampal', 'pons', 'pontine', 'striatal', 'thalamic'}}
CELL_COMPARTMENT
{'false_neg': {'axo - somato - dendritic', 'axonal'},
 'false_pos': {'axo', 'axonal guidance', 'dendr', 'dendritic', 'mitochondrial', 'somato'}}
CELL_TYPE
{'false_neg': {'DGGCs', 'GSC', 'PC', 'TWIK-1^−/−', 'cell', 'dentate gyrus granule cells', 'glioma stem cell', 'oligodendrocyte precursor cells'},
 'false_pos': {'- cells', 'AMs', 'Aδ', 'Aδ fibers', 'RGC', 'RGCs', 'astrocytoma', 'brush cells', 'caveolated cells', 'cell', 'cells', 
'fibrillovesicular', 'granule cells', 'multivesicular', 'neural progenitor', 'oligodendrocyte', 'tuft'}}
GENE
{'false_neg': {'ClC-2', 'Wnt'},
 'false_pos': {'- gated sodium channel', 'BDNF', 'EZH2', 'GLI3', 'Kv1.1^mceph', 'NOTCH-1', 'TRAIL', 
'TWIK-1^−/−', 'Tph1', 'Wnt',  'mGluR6', 'voltage - gated ClC-2'}}
ORGANISM
{'false_neg': {'glioma cells', 'Wistar rats', 'mice', 'mouse'},
 'false_pos': {'mouse', 'mice', 'rats', 'rodent', 'human'}}

Token mode

BRAIN_REGION
{'false_neg': {'cortical', 'cortico', '-', 'retinas', 'Cortico'},
 'false_pos': {'dentate', 'dorsal', 'gyrus', 'hippocampal', 'pons', 'pontine', 'telencephalic'}}
CELL_COMPARTMENT
{'false_neg': {'-'}, 
'false_pos': {'guidance', 'dendr', 'mitochondrial'}}
CELL_TYPE
{'false_neg': {'DGGCs', 'GSC', 'PC', 'TWIK-1^−/−', 'cell', 'dentate', 'glioma', 'gyrus', 'precursor', 'stem'},
 'false_pos': {'-', 'AMs', 'Aδ', 'RGC', 'RGCs', 'astrocytoma', 'brush', 'caveolated', 'cell', 'cells', 'fibers', 'fibrillovesicular',
 'multivesicular', 'neural', 'progenitor', 'tuft'}}
GENE
{'false_neg': {'Wnt'},
 'false_pos': {'-', 'BDNF', 'EZH2', 'GLI3', 'Kv1.1^mceph', 'NOTCH-1', 'TRAIL', 'TWIK-1^−/−', 'Tph1', 'Wnt', 'channel',
'gated', 'mGluR6', 'sodium', 'voltage'}}
ORGANISM
{'false_neg': {'mouse', 'mice', 'cells', 'Wistar', 'glioma'},
 'false_pos': {'rodent', 'human', 'mice', 'mouse'}}

@jankrepl
Copy link
Contributor

Maybe we should write some check to make sure that once an expert annotates a given word to be a specific entity type then each time this word occurs it will be exactly of the same entity type.

Of course, some words might have multiple meanings (lab mouse, Micky Mouse) but we don't have to worry about it IMO since the context is always very narrow.

@FrancescoCasalegno
Copy link
Contributor Author

FrancescoCasalegno commented Jul 26, 2022

2022-08-26 Planning

  • Track with DVC current NER annotations (= original annotations from expert).
  • Use regex to check that occurrences are consistent, i.e. that if "mice" is annotated once as ORGANISM it is always annotated as such. If inconsistencies are found, double check by hand and fix the original annotations whenever needed. This will help reducing the False Neg in our evaluation.
  • Use k-Fold out-of-sample predictions followed by manual verification (with the help of Google/Wikipedia) to see where our model predictions differ from the human annotations, and accordingly fix the original annotations. This will help reducing False Pos and False Neg in our evaluation.

@EmilieDel
Copy link
Contributor

EmilieDel commented Jul 29, 2022

Experiment

The idea was to take all the entities annotated by GK and create an entity ruler with all those entities (using a lemmatizer to detect mouse and mice). Once this entity ruler is constructed, the model created is used to predict again on all the annotations. A comparison is done between the GK annotations and the entity ruler predictions.

What was done ?

  • Several models (en_core_web_sm and en_core_sci_lg) - meaning different tokenizer and lemmatizer - were used.
  • Conflicts of entity types for the same entity (=lemma) are manually solved (to avoid any randomness in the results). However, this is not scalable.

Results

The results exposed here are created with en_core_web_sm model instantiated the following way

nlp = spacy.load("en_core_web_sm", disable=["ner"])
nlp.remove_pipe("lemmatizer")
nlp.add_pipe("lemmatizer", config={"mode": "lookup"}).initialize()
  • 914 different patterns are detected (and there was 2014 duplicates (=several times the same lemma)
GENE                529
CELL_TYPE           156
BRAIN_REGION        138
CELL_COMPARTMENT     46
ORGANISM             45
  • The comparison between GK annotations and the entity ruler predictions (on all the paragraphs annotated)
precision recall f1-score support
BRAIN_REGION 0.74 0.97 0.84 345
CELL_COMPARTMENT 0.58 0.94 0.72 177
CELL_TYPE 0.48 0.86 0.62 677
GENE 0.87 0.99 0.93 1469
ORGANISM 0.61 0.98 0.75 279
  • Here are some results:
    image
    image

  • Also some false positives are appearing using this method.
    image

@EmilieDel
Copy link
Contributor

EmilieDel commented Aug 2, 2022

Here are the results of:

  • Spacy model trained on the annotations from GK
  • Spacy model trained on the entity ruler annotations

The models are evaluated respectively against the annotations from GK and the entity ruler annotations. The train and test splits are kept the same.

image
image
image

@FrancescoCasalegno
Copy link
Contributor Author

@EmilieDel Awesome, so test score seems to improve signficantly!
Just one thing – after #602 didn't we decide to switch to 🤗 transformers rather than spaCy? Is it possible to see those results?

@FrancescoCasalegno
Copy link
Contributor Author

FrancescoCasalegno commented Aug 3, 2022

Planning 2022-08-02

  • Find out how to export corrected annotations in a format compatible with the one of prodigy.
  • Save with dvc (put remote on gpfs) original annotations + after entity ruler + after entity ruler and manual correction.
  • Re-train and evaluate model (k-fold cross-validation) before/after correction. Also inspect which errors are now made (like Perform Error Analysis of NER model predictions #607 (comment))

@FrancescoCasalegno
Copy link
Contributor Author

See plot in #608 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants