Issue understanding the outputs of coverage and purity metrics #56

ckobus · 2021-11-23T13:59:11Z

ckobus
Nov 23, 2021

First, thank you for this open-source project!

I start looking at speaker change detection algorithms and discovered this open-source project.
As I am a newbie in this field, I am still struggling understanding which measure to use to evaluate a speaker change detection module.
The coverage and purity measure are well explained in this page https://pyannote.github.io/pyannote-metrics/reference.html

I had a look at a previous issue from someone mentioning he always gets a purity of 100% even though its system is not perfect.
Someone replied he should rather use DiarizationPurity and DiarizationCoverage for a speaker change detection task, which is the task I want to perform.

I tried them on a toy example :

from pyannote.core import Annotation, Segment
from pyannote.metrics.diarization import DiarizationPurity, DiarizationCoverage
purity = DiarizationPurity()
coverage = DiarizationCoverage()
reference = Annotation()
reference[Segment(1, 2)] = "a"
reference[Segment(3, 5)] = "b"
hypothesis = Annotation()
hypothesis[Segment(1, 5)] = "A"

I get a purity of 66,66% where as I would expect a purity of 50% (for the hypothesis segment A, the most covering segment is the segment b with an overlap of 2 => purity = 2/4=0.5

Could you explain where I am wrong in my understanding? And tell me how I should use those metrics?

Thank you in advance for the advises/explanations!

hbredin · 2021-11-24T20:48:55Z

hbredin
Nov 24, 2021
Maintainer

Purity is computed on the temporal support common to reference and hypothesis.
In your case, this means that interval [2, 3] is excluded from the computation (since it is not covered by reference).
To obtain the behavior that you expected, you'd have to fill the [2, 3] gap in reference by a fake non_speech segment:

reference[Segment(2, 3)] = "non_speech"

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue understanding the outputs of coverage and purity metrics #56

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Issue understanding the outputs of coverage and purity metrics #56

ckobus Nov 23, 2021

Replies: 1 comment

hbredin Nov 24, 2021 Maintainer

ckobus
Nov 23, 2021

hbredin
Nov 24, 2021
Maintainer