Something feels off with queueing of text results, still #16

Destroy666x · 2023-09-15T18:33:12Z

I see these issues, with my mediocre CPU/GPU:

With tiny/base model you often get random phrases you haven't said at that moment. E.g. you say hello, it outputs hello correctly after 1 second, but then it sometimes adds something similar like yellow as if it tried to look it up twice. At other times it adds a different phrase you said before that either showed or didn't show earlier. A single dot . can appear as well. At one point I even got some random "toxic waste" emotes displayed, no clue where that came from, perhaps just inaccurate result of something I said earlier.
With bigger models nothing just gets output most of the time. I don't think it's worth using them regardless, with at least few seconds of scanning that requires resources, but not sure why only like 1/10th is output, even after a while.

How does this work exactly, does it scan the audio for X time, then wait for Whisper to answer within certain timeframe, then it outputs the matched text, if any was reported within the timeframe?

The text was updated successfully, but these errors were encountered:

klasadoroty · 2023-09-16T07:28:38Z

Hi, I think the whisper is responsible for this.
You are probably thinking of the English language as you write this.
Here it still works ...
You would have to check in other languages. And this is where the fun comes in.

Destroy666x · 2023-09-16T13:19:27Z

Not too sure about Whisper being fully responsible. Well, it is for the slowness ofc, but e.g. the double responses are suspicious.

As for other languages, that doesn't help me, as I want to use it for English.

royshil · 2023-09-16T21:59:03Z

the "leftovers" can be explained by the small "overlap" between segments of audio i'm sending to Whisper.
effectively copying a bit of the end of the last segment to the beginning of the new one.
this is to avoid "word boundaries", the length is 200ms.
sometimes 200ms can contain words or parts of words that are interpreted incorrectly.

it could be dealt with in post=processing, like detecting the "." and not pushing that through

Destroy666x · 2023-09-16T22:13:48Z

In that case, the first thing I'd do would be allowing modifying that 200ms window. And the duration of scanning for voice segments, too. So that you could e.g. decrease number of Whisper requests, as I don't see any need for what's appearing after the initial scanned sentenced personally when using the plugin.

royshil · 2023-09-16T22:21:32Z

@Destroy666x you can try the method in #18 where i'm experimenting with "streaming" of the processing with low step sizes
the overlap there is also reduced to 100ms

i can add parameters for the overall buffer and the overlap
but i experienced some crashes with low size buffers, since there's theoretically a low limit for Whisper. I believe it's 1 second

Destroy666x · 2023-09-16T22:29:06Z

Ok, I'll check tomorrow

Destroy666x · 2023-09-18T00:36:08Z

It seems more consistent in terms of content that's appearing, but also uses more CPU, ~15% more on average for me.

Looking forwards to play with parameters in both modes, maybe that'll result in some "golden middle".

Destroy666x · 2024-01-26T03:24:50Z

i can add parameters for the overall buffer and the overlap

Any news about these? Would love to play around with them now that I see performance improved a bunch, getting consistent < 10% CPU on small model

royshil · 2024-01-26T20:39:59Z

@Destroy666x please check out #66

Destroy666x · 2024-01-26T21:26:54Z

Thanks, seems fine, I'll check how playing with it goes. I'll close this issue and will make an more accurate one if I find something.

BTW, why this choice min/max values?

royshil added the stale label Nov 5, 2023

Destroy666x closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something feels off with queueing of text results, still #16

Something feels off with queueing of text results, still #16

Destroy666x commented Sep 15, 2023 •

edited

Loading

klasadoroty commented Sep 16, 2023

Destroy666x commented Sep 16, 2023

royshil commented Sep 16, 2023

Destroy666x commented Sep 16, 2023

royshil commented Sep 16, 2023

Destroy666x commented Sep 16, 2023

Destroy666x commented Sep 18, 2023

Destroy666x commented Jan 26, 2024 •

edited

Loading

royshil commented Jan 26, 2024

Destroy666x commented Jan 26, 2024

Something feels off with queueing of text results, still #16

Something feels off with queueing of text results, still #16

Comments

Destroy666x commented Sep 15, 2023 • edited Loading

klasadoroty commented Sep 16, 2023

Destroy666x commented Sep 16, 2023

royshil commented Sep 16, 2023

Destroy666x commented Sep 16, 2023

royshil commented Sep 16, 2023

Destroy666x commented Sep 16, 2023

Destroy666x commented Sep 18, 2023

Destroy666x commented Jan 26, 2024 • edited Loading

royshil commented Jan 26, 2024

Destroy666x commented Jan 26, 2024

Destroy666x commented Sep 15, 2023 •

edited

Loading

Destroy666x commented Jan 26, 2024 •

edited

Loading