-
-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feasability of the implementation of a Speaker Enrollment pipeline. #391
Comments
Can you please define "speaker unrolling"? I am not familiar with this wording. Did you mean "speaker enrollment"? |
Hello, Yes, it would be speaker enrollment. |
I didn't go through the paper but I think the SpeechTurnClosestAssignment pipeline might get you started. Enrollment Basic idea: gather all speaker embedding for each target and take the average. pyannote-audio/pyannote/audio/pipeline/speech_turn_assignment.py Lines 94 to 113 in 06f76a2
Recognition Basic idea: for each test speech turn (or, here, speaker cluster), find closest target speaker (by comparing their average embedding. You might also want to consider the reject option if even the closest target speaker is too far. pyannote-audio/pyannote/audio/pipeline/speech_turn_assignment.py Lines 115 to 144 in 06f76a2
|
Amazing! Thank you! We will let you know how this goes and if it works on our 'special' data. |
Closing this issue as I believe the original question has been answered. |
Hey! This is still ongoing. We have the pipeline up and running but we are having a hard time to finetune correctly a spk emb model on our smallish dataset 😿 |
Is your feature request related to a problem? Please describe.
The title is pretty self-explanatory. I'd just like to know how much work would be needed to implement a pipeline for a speaker unrolling task: are all the required building blocks already here in your opinion? If there isn't too much digging involved, i'd probably be willing to do it myself :)
The text was updated successfully, but these errors were encountered: