Whisper on webGPU? #100

sandorkonya · 2023-04-25T09:34:10Z

Somewhat related to this thread.

Is it within scope to implement a webGPU accelerated version of Whisper?

Not sure if this helps, but there is a C port for Whisper wirh CPU implementation, and as mentioned in this discussion, the main thing that needs to be offloaded to the GPU is the GGML_OP_MUL_MAT operator.

DK013 · 2023-04-25T11:55:36Z

Is it within scope to implement a webGPU accelerated version of Whisper?

As I understand, it's simply a matter of changing the Execution provider now to JSEP. The C++ port uses GGML format for the model and this repo uses onnx models alongside onnxruntime to run infrence. Both implementations are different. And with the WebGPU support for onnxruntime (check this PR: [js/web] WebGPU backend via JSEP #14579) which was merged today and official release build will come soon enough, I believe we don't have to worry about CUDA or DirectML endpoints, JSEP does the work for us. It's only a matter of updating the onnxruntime dependency and using JSEP for execution provider.

@xenova correct me if I'm wrong.

xenova · 2023-04-25T12:02:47Z

Is it within scope to implement a webGPU accelerated version of Whisper?

As I understand, it's simply a matter of changing the Execution provider now to JSEP. The C++ port uses GGML format for the model and this repo uses onnx models alongside onnxruntime to run infrence. Both implementations are different. And with the WebGPU support for onnxruntime (check this PR: [js/web] WebGPU backend via JSEP #14579) which was merged today and official release build will come soon enough, I believe we don't have to worry about CUDA or DirectML endpoints, JSEP does the work for us. It's only a matter of updating the onnxruntime dependency and using JSEP for execution provider.

@xenova correct me if I'm wrong.

Yep, that's correct! It should be as simple as changing the execution provided to webgpu (vs. wasm)

Hopefully they will make the release soon, but in the meantime, I'll do some testing by building the main branch locally.

sandorkonya · 2023-04-25T13:46:08Z

@DK013 & @xenova thank you for the clarification!

I would like to find a way to utilize the GPUs on edge devices (Android mobile) for inference.

As far i understand (as for now) webGPU works on Windows & iOS (my assumption based on this blog post), so we have to wait until webGPU targets the Android devices too?

Or am I simply wrong and onnxruntime won't be the way for edge devices?

best regards

xenova · 2023-04-25T13:53:08Z

Yes, you are correct. WebGPU would need to be available in your browser, as onnxruntime just uses the api provided by the browser.

That said, you might not have to wait for very long. As stated in the blog post you linked: "This initial release of WebGPU is available on ChromeOS, macOS, and Windows. Support for other platforms is coming later this year." If you'd like to test while you develop (so you can be ready when it releases fully), you can test using Chrome canary. As demoed here, some users have already got webgpu running on their android devices with this browser (which is just an experimental version of chrome)

drcodecamp · 2023-09-28T07:21:15Z

@xenova how we can use gpu power when we use nodejs ?

i try to build a local server with node, all works but very slow on an AMD 5950X , i would like to use my RTX 4070TI
to transcribe but i couldnt find any document that talks about it

Dolidodzik · 2023-11-22T15:49:42Z

@xenova, are there any news? Will we be allowed to use webgpu with transformers.js any time soon?

gabrielgrant · 2023-11-28T18:07:51Z

AFAIU onnx runtime's support for WebGPU is still pretty minimal/experimental, so likely isn't able to run Whisper today

Overview issue is here: microsoft/onnxruntime#15796

There doesn't seem to be much up-to-date detailed documentation about the current status publicly available, but as of May many operators were still yet to be ported: microsoft/onnxruntime#15952

guschmue · 2023-12-07T22:49:03Z

ort-web on webgpu has now good ops coverage and we can run most models that transformers.js supports. whisper is fine, it is part of our test suite.
The reason why we have not been more public about it is that we still have a performance issue with generative decoders that go 1 token at a time (ie whisper decoder, t5-decoder).
We are debugging that, don't know what the cause is but we are sure it is not the shaders.
All encoders and vision models should have good perf gains.
Supported ops can be found here: https://github.com/microsoft/onnxruntime/blob/main/js/web/docs/webgpu-operators.md

gabrielgrant · 2023-12-08T00:04:45Z

thanks for the update @guschmue !

Is there a GH issue for the problem you're describing? Is it this? microsoft/onnxruntime#17373

guschmue · 2023-12-08T01:05:15Z

That issue contains a couple of problems, like missing ops resulted in cross device copies and missing io-bindings resulted in a lot of cross device copies. I think we fix most of those. But this decoder issue has been in this too. Ie the io-bindings should have gained much more than they did.
Nasty issue, lots of gpu cycle available, kernel times look good, little cross device copy yet 2x slower than we want. Top of our list.
I can file a separate issue.

guschmue · 2023-12-08T01:17:18Z

microsoft/onnxruntime#18754

gokaybiz · 2023-12-08T14:43:17Z

What about node.js? Will webGPU/GPU acceleration be available on server/desktop side w/o browser?

tarekziade · 2023-12-17T14:36:05Z

@xenova I am curious to try. Do you have builds with WebGPU ?

I've built onnxruntime with the jsep option but I am not entirely sure what are the spots to change in transformers.js - is it as simple as executionProviders to ort.InferenceSession.create ?

DavidGOrtega · 2024-01-16T10:30:33Z

Additionally another optimization should be done: STFT

nmstoker · 2024-06-09T20:49:20Z

For anyone coming here who didn't see it yet, there is webGPU support now thanks to Xenova's efforts described here

Code in this branch: https://github.com/xenova/whisper-web/tree/experimental-webgpu

guschmue · 2024-06-10T16:21:33Z

What about node.js? Will webGPU/GPU acceleration be available on server/desktop side w/o browser?

There is some experimental code path in dawn that one could use to make onnxruntime work with webgpu on node.js.
But we are not sure if people would use that path since onnxruntime-node already supports cuda and directml which is faster than webgpu.

sandorkonya added the question Further information is requested label Apr 25, 2023

sandorkonya mentioned this issue Apr 25, 2023

Whisper in web-llm with WebGPU? mlc-ai/web-llm#68

Open

sandorkonya mentioned this issue Jun 12, 2023

[Question] A WebGPU-accelerated ONNX inference run-time #119

Closed

jozefchutka mentioned this issue Jul 7, 2023

WebGPU Support #20

Closed

xenova mentioned this issue Jan 27, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

xenova closed this as completed in #545 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper on webGPU? #100

Whisper on webGPU? #100

sandorkonya commented Apr 25, 2023

DK013 commented Apr 25, 2023 •

edited

Loading

xenova commented Apr 25, 2023 •

edited

Loading

sandorkonya commented Apr 25, 2023

xenova commented Apr 25, 2023

drcodecamp commented Sep 28, 2023

Dolidodzik commented Nov 22, 2023

gabrielgrant commented Nov 28, 2023 •

edited

Loading

guschmue commented Dec 7, 2023 •

edited

Loading

gabrielgrant commented Dec 8, 2023

guschmue commented Dec 8, 2023

guschmue commented Dec 8, 2023

gokaybiz commented Dec 8, 2023

tarekziade commented Dec 17, 2023

DavidGOrtega commented Jan 16, 2024

nmstoker commented Jun 9, 2024

guschmue commented Jun 10, 2024

Whisper on webGPU? #100

Whisper on webGPU? #100

Comments

sandorkonya commented Apr 25, 2023

DK013 commented Apr 25, 2023 • edited Loading

xenova commented Apr 25, 2023 • edited Loading

sandorkonya commented Apr 25, 2023

xenova commented Apr 25, 2023

drcodecamp commented Sep 28, 2023

Dolidodzik commented Nov 22, 2023

gabrielgrant commented Nov 28, 2023 • edited Loading

guschmue commented Dec 7, 2023 • edited Loading

gabrielgrant commented Dec 8, 2023

guschmue commented Dec 8, 2023

guschmue commented Dec 8, 2023

gokaybiz commented Dec 8, 2023

tarekziade commented Dec 17, 2023

DavidGOrtega commented Jan 16, 2024

nmstoker commented Jun 9, 2024

guschmue commented Jun 10, 2024

DK013 commented Apr 25, 2023 •

edited

Loading

xenova commented Apr 25, 2023 •

edited

Loading

gabrielgrant commented Nov 28, 2023 •

edited

Loading

guschmue commented Dec 7, 2023 •

edited

Loading