-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something feels off with queueing of text results, still #16
Comments
Hi, I think the whisper is responsible for this. |
Not too sure about Whisper being fully responsible. Well, it is for the slowness ofc, but e.g. the double responses are suspicious. As for other languages, that doesn't help me, as I want to use it for English. |
the "leftovers" can be explained by the small "overlap" between segments of audio i'm sending to Whisper. it could be dealt with in post=processing, like detecting the "." and not pushing that through |
In that case, the first thing I'd do would be allowing modifying that 200ms window. And the duration of scanning for voice segments, too. So that you could e.g. decrease number of Whisper requests, as I don't see any need for what's appearing after the initial scanned sentenced personally when using the plugin. |
@Destroy666x you can try the method in #18 where i'm experimenting with "streaming" of the processing with low step sizes i can add parameters for the overall buffer and the overlap |
Ok, I'll check tomorrow |
It seems more consistent in terms of content that's appearing, but also uses more CPU, ~15% more on average for me. Looking forwards to play with parameters in both modes, maybe that'll result in some "golden middle". |
Any news about these? Would love to play around with them now that I see performance improved a bunch, getting consistent < 10% CPU on small model |
@Destroy666x please check out #66 |
Thanks, seems fine, I'll check how playing with it goes. I'll close this issue and will make an more accurate one if I find something. BTW, why this choice min/max values? |
I see these issues, with my mediocre CPU/GPU:
hello
, it outputshello
correctly after 1 second, but then it sometimes adds something similar likeyellow
as if it tried to look it up twice. At other times it adds a different phrase you said before that either showed or didn't show earlier. A single dot.
can appear as well. At one point I even got some random "toxic waste" emotes displayed, no clue where that came from, perhaps just inaccurate result of something I said earlier.How does this work exactly, does it scan the audio for X time, then wait for Whisper to answer within certain timeframe, then it outputs the matched text, if any was reported within the timeframe?
The text was updated successfully, but these errors were encountered: