Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something feels off with queueing of text results, still #16

Closed
Destroy666x opened this issue Sep 15, 2023 · 10 comments
Closed

Something feels off with queueing of text results, still #16

Destroy666x opened this issue Sep 15, 2023 · 10 comments
Labels

Comments

@Destroy666x
Copy link
Contributor

Destroy666x commented Sep 15, 2023

I see these issues, with my mediocre CPU/GPU:

  1. With tiny/base model you often get random phrases you haven't said at that moment. E.g. you say hello, it outputs hello correctly after 1 second, but then it sometimes adds something similar like yellow as if it tried to look it up twice. At other times it adds a different phrase you said before that either showed or didn't show earlier. A single dot . can appear as well. At one point I even got some random "toxic waste" emotes displayed, no clue where that came from, perhaps just inaccurate result of something I said earlier.
  2. With bigger models nothing just gets output most of the time. I don't think it's worth using them regardless, with at least few seconds of scanning that requires resources, but not sure why only like 1/10th is output, even after a while.

How does this work exactly, does it scan the audio for X time, then wait for Whisper to answer within certain timeframe, then it outputs the matched text, if any was reported within the timeframe?

@klasadoroty
Copy link

Hi, I think the whisper is responsible for this.
You are probably thinking of the English language as you write this.
Here it still works ...
You would have to check in other languages. And this is where the fun comes in.

@Destroy666x
Copy link
Contributor Author

Not too sure about Whisper being fully responsible. Well, it is for the slowness ofc, but e.g. the double responses are suspicious.

As for other languages, that doesn't help me, as I want to use it for English.

@royshil
Copy link
Collaborator

royshil commented Sep 16, 2023

the "leftovers" can be explained by the small "overlap" between segments of audio i'm sending to Whisper.
effectively copying a bit of the end of the last segment to the beginning of the new one.
this is to avoid "word boundaries", the length is 200ms.
sometimes 200ms can contain words or parts of words that are interpreted incorrectly.

it could be dealt with in post=processing, like detecting the "." and not pushing that through

@Destroy666x
Copy link
Contributor Author

In that case, the first thing I'd do would be allowing modifying that 200ms window. And the duration of scanning for voice segments, too. So that you could e.g. decrease number of Whisper requests, as I don't see any need for what's appearing after the initial scanned sentenced personally when using the plugin.

@royshil
Copy link
Collaborator

royshil commented Sep 16, 2023

@Destroy666x you can try the method in #18 where i'm experimenting with "streaming" of the processing with low step sizes
the overlap there is also reduced to 100ms

i can add parameters for the overall buffer and the overlap
but i experienced some crashes with low size buffers, since there's theoretically a low limit for Whisper. I believe it's 1 second

@Destroy666x
Copy link
Contributor Author

Ok, I'll check tomorrow

@Destroy666x
Copy link
Contributor Author

It seems more consistent in terms of content that's appearing, but also uses more CPU, ~15% more on average for me.

Looking forwards to play with parameters in both modes, maybe that'll result in some "golden middle".

@royshil royshil added the stale label Nov 5, 2023
@Destroy666x
Copy link
Contributor Author

Destroy666x commented Jan 26, 2024

i can add parameters for the overall buffer and the overlap

Any news about these? Would love to play around with them now that I see performance improved a bunch, getting consistent < 10% CPU on small model

@royshil
Copy link
Collaborator

royshil commented Jan 26, 2024

@Destroy666x please check out #66

@Destroy666x
Copy link
Contributor Author

Thanks, seems fine, I'll check how playing with it goes. I'll close this issue and will make an more accurate one if I find something.

BTW, why this choice min/max values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants