-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement #9
Comments
Confirming here, even on clean launch (after laptop restart) the CPU jumps from like 8% to 40-50% whenever I speak. Then sentences gets generated very late and often not too accurately. Using default settings. With bigger models it's way worse in terms of latency and CPU usage, but ofc better with accuracy. Intel i7 7820 CPU, Windows 10, OBS 29.1.3 |
Would it be possible to allow GPU usage instead? In general, my GPU is more free, as I'm mainly streaming games relying more on CPU. I see Whisper can run on GPU. This also shows better performance with GPU: https://github.com/MiscellaneousStuff/openai-whisper-cpu#results if I understand correctly. |
Yes I'm working on acceleration for Whisper.cpp build and I'll release a pull request as soon as I got it working on my PC There are several options.. but the general goal of GGML is to enable running on CPUs and their inherent acceleration e.g. SIMD I'm still unpacking this, but it's important to get it |
@Destroy666x can you try the build in https://github.com/royshil/obs-localvocal/actions/runs/6142210185#artifacts ? it should be much faster and more performant |
For me CPU usage seems still rather high with that, maybe a few % lower on average. |
@Destroy666x so there is improvement! this is a good thing were you able to benchmark whisper.cpp separately? i think i will merge this in anyway, since it's an improvement |
Well, I think it is, but I don't quite know how to check it consistently, as it ran under different conditions. Similar, but different, as Windows definitel had different random processes like indexers and what not launched. But ye, improvement was like 35-45% compared to previous 40-50% reported by OBS. As for separately, do you mean me checking Whisper's different options outside of this plugin? I can check that when I'll have time. |
I see there's bench.exe.
Haven't found how to do more runs for consistent test. According to this it should work better in OBS with tiny model at least as I also had bigger delay with that. And, interestingly, after increasing threads from 4 to 8, small went up to ~15 seconds 🤔 |
thanks for this research @Destroy666x |
here are some timings i get consistently No BLAS
OpenBLAS
I conclude OpenBLAS brings the most performance on my PC CLBlast
|
With what model and CPU/GPU, out of curiosity? |
This is with an Intel i7-8700T |
And tiny model I assume? Weird that it doesn't use the "real" GPU. |
yes this is the tiny model |
Oh, so there's yet another cuBLAS just for CUDA, I see that now: ggerganov/whisper.cpp#834 I'll test it on my machine too, assuming compilation is as easy as shown there. |
this is the timing for whisper with CUDA
it is faster than the rest, but not a huge gain over OpenBLAS the downside with CUDA is that it's so big there's no hope to ship it with the plugin. and the compatibility is horrendous, e.g. if i compile vs CUDA v12.2 and the client has v11.1 then it doesn't work. |
For me it was 1.5+x faster on NVidia GeForce 1080. Perhaps, could the executable be optionally provided through path setting, since there are so many different options? They're compatible with your code, right? Then additional options like downloading CUDA and compiling that version could be described through documentation. |
@Destroy666x ok i've added CUDA building instructions. as soon as this clears i'm going to merge since i'd like to release a new version |
#12 has landed and introduced performance improvements |
Hey I opened this issue because the plugin is extremely slow and cpu consuming on Linux, please keep it open ! |
Hi, I'm testing on Debian 12 with OBS 29.1.3 with the pre-set parameters, my 4-thread CPU grinds and I get a randomly generated sentence with a huge delay.
I've looked at Whisper.cpp but I can't correlate the parameters.
Do you have any recommended settings for fast, resource-efficient transcription?
Thanks a lot!
The text was updated successfully, but these errors were encountered: