Speaker reidentification #1383

Vermeille · 2023-05-24T12:17:24Z

Hello,

The project I am currently working on needs to identify speakers. Having the audio segments labeled "Speaker 0" to Speaker N" is cool, but labels "Yann LeCun", "Yoshua Bengio" etc are much better.

I am willing to contribute the feature and was thinking of implementing it the following way:

Have the user have a directory with speaker samples (speakers/Yann_LeCun.wav, speakers/Yoshua_Bengio.wav, etc)
Get a speaker embedding for each file (Time avg pool? Max pool? Attention is out of the equation since it needs)
Optional: store the embeddings on disk (safetensors)
When doing the speaker clustering, assign each speaker embedding to a centroid (hungarian algorithm?) and use the speaker labels.

What do you think?

Related to: #1310 . Could be an implementation starting point and have two birds with one stone.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-05-24T12:17:44Z

We found the following entry in the FAQ which you may find helpful:

Does pyannote support streaming speaker diarization?

Feel free to close this issue if you found an answer in the FAQ. Otherwise, please give us a little time to review.

This is an automated reply, generated by FAQtory

hbredin · 2023-05-25T15:00:05Z

Thanks. This is definitely a recurrent request and would make a nice addition to pyannote.

Related ongoing PR by @flyingleafe would make a good starting point.
It makes the speaker diarization pipeline output one speaker embedding per speaker.

diarization, embedding = pipeline("audio.wav", return_embedding=True)

You could maybe start by having a look at this PR and the three of us (@flyingleafe, @Vermeille, and I) can continue the discussion there.

hbredin · 2023-05-25T15:02:17Z

To backup my "recurrent request" claim, here is a poll that I recently posted on Twitter (where speaker tracking ~= speaker reidentification)

https://twitter.com/hbredin/status/1647146368782270464

flyingleafe · 2023-05-29T09:00:51Z

@hbredin By the way, I think that I have applied all your comments to the mentioned PR a long time ago. If you do not have further comments, maybe we can merge it, so that people can start using this apparently desired feature?

flyingleafe · 2023-05-29T09:04:29Z

@Vermeille in the meanwhile, yes, you can use the PR branch and simply compare the embeddings returned by the pipeline with the known embeddings by cosine distance, then relabel the annotation accordingly.

stale · 2023-11-25T10:02:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the wontfix label Nov 25, 2023

stale bot closed this as completed Dec 26, 2023

razi-tm mentioned this issue Nov 5, 2024

Allow the user to extract speaker embeddings along with the diarization #1346

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker reidentification #1383

Speaker reidentification #1383

Vermeille commented May 24, 2023 •

edited

Loading

github-actions bot commented May 24, 2023

hbredin commented May 25, 2023

hbredin commented May 25, 2023

flyingleafe commented May 29, 2023

flyingleafe commented May 29, 2023

stale bot commented Nov 25, 2023

Speaker reidentification #1383

Speaker reidentification #1383

Comments

Vermeille commented May 24, 2023 • edited Loading

github-actions bot commented May 24, 2023

hbredin commented May 25, 2023

hbredin commented May 25, 2023

flyingleafe commented May 29, 2023

flyingleafe commented May 29, 2023

stale bot commented Nov 25, 2023

Vermeille commented May 24, 2023 •

edited

Loading