[Web] generative decoders are slower than they should be #18754

guschmue · 2023-12-08T01:16:56Z

Describe the issue

running generative decoders via webgpu (ie t5-small, whisper) are slower than wasm while there is plenty of gpu cycle available (gpu is 15% busy).
We know kernel times look good, cross device copy looks good.
Even with io-bindings it is still slower than wasm.

To reproduce

https://github.com/guschmue/ort-web-perf/blob/master/ort-t5.html

Urgency

No response

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

main

Execution Provider

'webgpu' (WebGPU)

lxfater · 2023-12-14T20:15:08Z

What causes this problem?

qjia7 · 2023-12-18T07:59:04Z

I think it is probable that gpu buffers are not efficiently reused. For each decoder running, lots of buffers are dynamically allocated instead of reusing existed buffers. I see https://github.com/microsoft/onnxruntime/blob/main/js/web/lib/wasm/jsep/webgpu/gpu-data-manager.ts#L267 is called many times for each inference. The current gpu buffer reuse strategy is not friendly for dynamic models. For each run, the input shapes will change, which results the needed buffer size changes and can't reuse last inference's buffers since the reuse strategy require exact matching buffer size to reuse. We may need to change the reuse strategy to reduce dynamically allocating buffers to see whether the perf can be improved.
And another issue is I still see some data download from gpu to cpu several times during each inference even I choose webgpu + io binding. We need to make sure no unnecessary data read back during inference.

guschmue added platform:web issues related to ONNX Runtime web; typically submitted using template ep:WebGPU ort-web webgpu provider labels Dec 8, 2023

github-actions bot added the model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. label Dec 8, 2023

guschmue mentioned this issue Dec 8, 2023

Whisper on webGPU? huggingface/transformers.js#100

Closed

guschmue self-assigned this Dec 8, 2023

gabrielgrant mentioned this issue Dec 23, 2023

[Feature request] WebGPU support huggingface/transformers.js#73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Web] generative decoders are slower than they should be #18754

[Web] generative decoders are slower than they should be #18754

guschmue commented Dec 8, 2023

lxfater commented Dec 14, 2023

qjia7 commented Dec 18, 2023

[Web] generative decoders are slower than they should be #18754

[Web] generative decoders are slower than they should be #18754

Comments

guschmue commented Dec 8, 2023

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

lxfater commented Dec 14, 2023

qjia7 commented Dec 18, 2023