[Web] generative decoders are slower than they should be #18754
Labels
ep:WebGPU
ort-web webgpu provider
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
platform:web
issues related to ONNX Runtime web; typically submitted using template
Describe the issue
running generative decoders via webgpu (ie t5-small, whisper) are slower than wasm while there is plenty of gpu cycle available (gpu is 15% busy).
We know kernel times look good, cross device copy looks good.
Even with io-bindings it is still slower than wasm.
To reproduce
https://github.com/guschmue/ort-web-perf/blob/master/ort-t5.html
Urgency
No response
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
main
Execution Provider
'webgpu' (WebGPU)
The text was updated successfully, but these errors were encountered: