-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Implementing Persistent Speaker Embeddings Across Conversations #227
Comments
My own solution is the following: patch OnlineSpeakerClustering with:
SpeakerDiarization:
I run this the following way:
the above is writing output to redis queue with global speaker embeddings note that it's probably suboptimal to save centroids with every iteration, as they quickly converge to equal I would appreciate your feedback! |
Hey @DmitriyG228! Thanks for this feature request, your implementation looks very ncie! I would only change some minor things. For example, I would prefer not to have a speaker id mapping mechanism in the clustering block. The speaker ids are already numbered according to their centroid if I'm not mistaken (e.g. speaker_0 == centroid 0). However, if we decide to include a mapping structure (I'm willing to be persuaded on this cause I see some advantages), I'd prefer to put it in Apart from that, I also really like the idea of a I would prefer not to add unnecessary dependencies to diart. I would implement the Thank you! |
Hey @juanmc2005, thanks for your feedback, please find the PR |
Feature Description
I propose the addition of a feature to the DIART project that allows for the persistence and reuse of speaker embeddings across multiple conversations. I am willing to contribute into this feature.
Expected Benefit
It would be particularly useful in scenarios where the identification of speakers is necessary over time accross multiple conversations
Implementation Feasibility
Given the complexity of the speaker embeddings obtained during a conversation, I seek guidance on the technical feasibility of this feature. Specifically, I'm interested in understanding whether the current architecture and design of DIART can support the persistence of speaker embeddings across conversations.
Suggested Integration Points
Could you provide insights on which parts of the DIART codebase would be most relevant for integrating this mechanism? Any pointers or suggestions on how to approach this enhancement would be greatly appreciated.
Additional Context
I have reviewed the paper implemented by DIART and believe that, although challenging, this feature could be a feasible and valuable addition.
I am eager to contribute to this aspect of the project and align it with DIART's overall goals and design.
Thank you for considering this feature request and for any guidance you can provide.
The text was updated successfully, but these errors were encountered: