Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature #2

Open
Abdulrahman392011 opened this issue Oct 30, 2024 · 0 comments
Open

feature #2

Abdulrahman392011 opened this issue Oct 30, 2024 · 0 comments

Comments

@Abdulrahman392011
Copy link

hey guys,

I have been trying to use local language model by putting the laptop motherboard in a custom 3d necklace.
the idea is that i will use whisper and have it with me at all times. however in practice I found that the language model gets confused as to who is said what. whisper transcribe all what it hears from all people without actually saying that those are different people, so the model process one big lump of text and gets confused.

this brings me here, whisper does provide time-frame for when each sentence was said. now I want to use a complementary voiceprint detector that can actually detect the voices of different people and trim the audio into different parts that it can actually label by name of speaker for the language model to be able to understand the context of the conversation better.

I just need you guys to fellow along with me as I build off of what you guys built in case i needed help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant