Performance improvement #9

ogmkp · 2023-08-25T09:59:10Z

Hi, I'm testing on Debian 12 with OBS 29.1.3 with the pre-set parameters, my 4-thread CPU grinds and I get a randomly generated sentence with a huge delay.
I've looked at Whisper.cpp but I can't correlate the parameters.
Do you have any recommended settings for fast, resource-efficient transcription?

Thanks a lot!

Destroy666x · 2023-09-08T23:32:04Z

Confirming here, even on clean launch (after laptop restart) the CPU jumps from like 8% to 40-50% whenever I speak. Then sentences gets generated very late and often not too accurately. Using default settings. With bigger models it's way worse in terms of latency and CPU usage, but ofc better with accuracy.

Intel i7 7820 CPU, Windows 10, OBS 29.1.3

Destroy666x · 2023-09-08T23:43:07Z

Would it be possible to allow GPU usage instead? In general, my GPU is more free, as I'm mainly streaming games relying more on CPU. I see Whisper can run on GPU. This also shows better performance with GPU: https://github.com/MiscellaneousStuff/openai-whisper-cpu#results if I understand correctly.

royshil · 2023-09-09T21:54:36Z

Yes I'm working on acceleration for Whisper.cpp build and I'll release a pull request as soon as I got it working on my PC

There are several options.. but the general goal of GGML is to enable running on CPUs and their inherent acceleration e.g. SIMD

I'm still unpacking this, but it's important to get it

royshil · 2023-09-11T05:33:00Z

@Destroy666x can you try the build in https://github.com/royshil/obs-localvocal/actions/runs/6142210185#artifacts ?

it should be much faster and more performant

Destroy666x · 2023-09-11T12:36:27Z

For me CPU usage seems still rather high with that, maybe a few % lower on average.

royshil · 2023-09-11T12:43:02Z

@Destroy666x so there is improvement! this is a good thing
for me it improves 100% e.g. x2 faster

were you able to benchmark whisper.cpp separately?

i think i will merge this in anyway, since it's an improvement

Destroy666x · 2023-09-11T13:30:03Z

Well, I think it is, but I don't quite know how to check it consistently, as it ran under different conditions. Similar, but different, as Windows definitel had different random processes like indexers and what not launched. But ye, improvement was like 35-45% compared to previous 40-50% reported by OBS.

As for separately, do you mean me checking Whisper's different options outside of this plugin? I can check that when I'll have time.

Destroy666x · 2023-09-11T18:47:15Z

I see there's bench.exe.

for a single run on non-BLAS with tiny model it tells me around 1-1.5 second
for a single run on BLAS with tiny model it tells me around 1-1.5 second
for a single run on non-BLAS with small model it tells me around 11-13 seconds
for a single run on BLAS with with small model it tells me around 10-12 seconds

Haven't found how to do more runs for consistent test.

According to this it should work better in OBS with tiny model at least as I also had bigger delay with that.

And, interestingly, after increasing threads from 4 to 8, small went up to ~15 seconds 🤔

royshil · 2023-09-12T01:13:39Z

thanks for this research @Destroy666x
i'm looking into CLBlast acceleration next. it should be supported on many platforms and will be able to use the GPU

royshil · 2023-09-12T01:49:50Z

here are some timings i get consistently

No BLAS

whisper_print_timings:     load time =   137.18 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =  1551.64 ms /     1 runs ( 1551.64 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1724.93 ms

OpenBLAS

whisper_print_timings:     load time =   145.05 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =  1107.12 ms /     1 runs ( 1107.12 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1287.19 ms

I conclude OpenBLAS brings the most performance on my PC

CLBlast

whisper_print_timings:     load time =  1163.69 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =  2474.72 ms /     1 runs ( 2474.72 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  3670.20 ms

Destroy666x · 2023-09-12T01:53:55Z

With what model and CPU/GPU, out of curiosity?

royshil · 2023-09-12T01:59:41Z

This is with an Intel i7-8700T
It has an NVidia GPU but it's not being used.
The Intel GPU is UHD Graphics 630, which is being used by CLBlast, but as you can see it doesn't bring any performance boost.

Destroy666x · 2023-09-12T02:07:00Z

And tiny model I assume? Weird that it doesn't use the "real" GPU.

royshil · 2023-09-12T02:09:13Z

yes this is the tiny model
Nvidia/CUDA GPU is not being used since Whisper wasn't built to use them.
I'm trying Whisper w CUDA now to see if it makes a difference...

Destroy666x · 2023-09-12T02:30:36Z

Oh, so there's yet another cuBLAS just for CUDA, I see that now: ggerganov/whisper.cpp#834 I'll test it on my machine too, assuming compilation is as easy as shown there.

royshil · 2023-09-12T02:58:34Z

this is the timing for whisper with CUDA

whisper_print_timings:     load time =  1227.32 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =   728.45 ms /     1 runs (  728.45 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1991.31 ms

it is faster than the rest, but not a huge gain over OpenBLAS

the downside with CUDA is that it's so big there's no hope to ship it with the plugin. and the compatibility is horrendous, e.g. if i compile vs CUDA v12.2 and the client has v11.1 then it doesn't work.

Destroy666x · 2023-09-12T03:09:15Z

For me it was 1.5+x faster on NVidia GeForce 1080.

Perhaps, could the executable be optionally provided through path setting, since there are so many different options? They're compatible with your code, right? Then additional options like downloading CUDA and compiling that version could be described through documentation.

royshil · 2023-09-12T05:16:58Z

@Destroy666x ok i've added CUDA building instructions.

as soon as this clears i'm going to merge since i'd like to release a new version

royshil · 2023-09-12T05:28:28Z

#12 has landed and introduced performance improvements
closing for now until we open again for discussion and requests

ogmkp · 2023-09-12T07:04:48Z

Hey I opened this issue because the plugin is extremely slow and cpu consuming on Linux, please keep it open !

Destroy666x mentioned this issue Sep 8, 2023

Optionally save to a text file instead of text source #6

Closed

royshil closed this as completed Sep 12, 2023

ogmkp mentioned this issue Sep 21, 2023

Linux - crash when adding localvocal 0.0.3 #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement #9

Performance improvement #9

ogmkp commented Aug 25, 2023

Destroy666x commented Sep 8, 2023 •

edited

Loading

Destroy666x commented Sep 8, 2023 •

edited

Loading

royshil commented Sep 9, 2023

royshil commented Sep 11, 2023

Destroy666x commented Sep 11, 2023

royshil commented Sep 11, 2023

Destroy666x commented Sep 11, 2023 •

edited

Loading

Destroy666x commented Sep 11, 2023 •

edited

Loading

royshil commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023 •

edited

Loading

royshil commented Sep 12, 2023

royshil commented Sep 12, 2023

ogmkp commented Sep 12, 2023 •

edited

Loading

Performance improvement #9

Performance improvement #9

Comments

ogmkp commented Aug 25, 2023

Destroy666x commented Sep 8, 2023 • edited Loading

Destroy666x commented Sep 8, 2023 • edited Loading

royshil commented Sep 9, 2023

royshil commented Sep 11, 2023

Destroy666x commented Sep 11, 2023

royshil commented Sep 11, 2023

Destroy666x commented Sep 11, 2023 • edited Loading

Destroy666x commented Sep 11, 2023 • edited Loading

royshil commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023

royshil commented Sep 12, 2023

Destroy666x commented Sep 12, 2023 • edited Loading

royshil commented Sep 12, 2023

royshil commented Sep 12, 2023

ogmkp commented Sep 12, 2023 • edited Loading

Destroy666x commented Sep 8, 2023 •

edited

Loading

Destroy666x commented Sep 8, 2023 •

edited

Loading

Destroy666x commented Sep 11, 2023 •

edited

Loading

Destroy666x commented Sep 11, 2023 •

edited

Loading

Destroy666x commented Sep 12, 2023 •

edited

Loading

ogmkp commented Sep 12, 2023 •

edited

Loading