Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly / Web runtime (both for wasm-simd and WebGPU) #3497

Open
vadimkantorov opened this issue May 2, 2024 · 9 comments
Open

WebAssembly / Web runtime (both for wasm-simd and WebGPU) #3497

vadimkantorov opened this issue May 2, 2024 · 9 comments
Assignees
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix

Comments

@vadimkantorov
Copy link

vadimkantorov commented May 2, 2024

I'm wondering if ExecuTorch can be compiled for WebAssembly target? As far as I understand, XNNPACK exists for wasm-simd, so theoretically at least for CPU it can be done? (e.g. to be compared with tflite+tfjs, ort-web and tvm-wasm at least for some popular models like MobileNets)

(This is especially interesting if strong fusion/codegen can be done to produce fused wasm-simd code/fused WebGPU programs - although maybe this is an ask for Inductor)

@SS-JIA
Copy link
Contributor

SS-JIA commented May 3, 2024

cc: @mcr229 or @digantdesai regarding running XNNPACK via wasm

@SS-JIA
Copy link
Contributor

SS-JIA commented May 3, 2024

Also cc: @mergennachin

@JacobSzwejbka
Copy link
Contributor

I've talked with @digantdesai about this before. I think for xnnpack he mentioned it should just be plug and play. Ive been wanting to try out wasm for sometime now just havent had the bandwidth.

@JacobSzwejbka JacobSzwejbka added the enhancement Not as big of a feature, but technically not a bug. Should be easy to fix label May 9, 2024
@vadimkantorov
Copy link
Author

I also wonder about the fusion capabilities of executorch :) Does it allow Inductor codegen'd fused kernels (e.g. think quant/dequant fused into the flash attn kernel directly, with positional embedding computation also fused into this kernel)?

Another interesting backend is webgpu/wgpu: https://github.com/huggingface/ratchet or even directly wgpu/wgsl shaders could in theory be a compilation target for fused kernels

But even if executorch does not support wild codegen/fusions - it's still be good to have it as a baseline with comparisons against ort-web and tflate-tfjs and tvm-wasm and ggml compiled to wasm. This should show roughly where all these frameworks stand (especially if compiling is relatively doable)

@vadimkantorov
Copy link
Author

And given that currently PyTorch does not have its own inference wasm/WebGPU story, having executorch compiled to wasm-simd might be a nice baseline to have (especially if it's minimalistic and relatively simple to compile)

@kimishpatel
Copy link
Contributor

I suspect much of the core should compilable with emscripten cpp compiler. Probably not optimized operators though and not too sure about backends/xnnpack

@vadimkantorov
Copy link
Author

Maybe best would be adding some sort of GitHub Actions CI test compiling it with emscripten... (even if no tests using it exist so far)

@digantdesai
Copy link
Contributor

not too sure about backends/xnnpack

It should be, given a bunch of WASM[SIMD] kernels. I haven't tried it myself though. IIRC there aren't any CI for that on github/xnnpack either.

@vadimkantorov
Copy link
Author

vadimkantorov commented Nov 21, 2024

xnnpack is also known to compile (and maybe even tested) for wasm/simd, so somehow this should be achievable... don't know if any compact backend library/project exists for webgpu kernels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix
Projects
None yet
Development

No branches or pull requests

6 participants