-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Web] WebGPU issues tracking #15796
Comments
WebGPU is now supported in the latest version of the official Chrome build (no longer only Canary, and not locked behind the flag). That said, I do not know about support for other browsers. |
Here are the model files. Here is some example input (for the encoder): let input = {
attention_mask: new Tensor(
'int64',
new BigInt64Array([1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n, 1n]),
[1, 12]
),
input_ids: new Tensor(
'int64',
new BigInt64Array([13959n, 1566n, 12n, 2379n, 10n, 8774n, 6n, 149n, 33n, 25n, 58n, 1n]),
[1, 12]
)
} Note: These are the same as I mentioned in the original issue: #15719 (comment) |
I got the similar error.
The model(Object.padding) is very simple reflection padding. PyTorch module |
The However, a model with a |
@fs-eire with the latest build (following your gist: here) with the following code with the same model and inputs as @xenova :
results in the following error at line
If I provide only
which is the same as before. If I use |
@DK013 The error message looks like the corresponding .wasm file is not served. please check from the devtool/network tab, if any 404 error occurs on *.wasm |
|
I think I know the reason. so it is because of this line I don't provide a non-simd version of webgpu wasm file because I assume every environment that supports webgpu should always also support wasm fixed-size SIMD so remove the line should make it work. I think I can add a warning message if simd is off and webgpu is request. |
Is there a list of supported WebGPU ops, as well as those planned to be implemented? |
@fs-eire Maybe along with the warning, you could ignore the |
Let me update to the summary. |
I will fail the initialization. see this PR: #15924 |
@fs-eire Are the versions released under the dev tag (e.g., https://www.npmjs.com/package/onnxruntime-web/v/1.16.0-dev.20230508-045c623415) built automatically from the main branch? This will mean I don't have to build the files myself for testing. |
They are, but it seems that the release pipeline is not doing perfectly. We are reworking the release pipeline for nightly builds recently. Before our work is done, you can use this link to download latest artifacts from our public CI. hope this helps save your time. |
|
So, I got the imports working (for this build), but I'm getting a lot of errors when running a simple text-classification model: input: {
attention_mask: Tensor {
type: 'int64',
data: [1n, 1n, 1n],
dims: [1,3],
},
input_ids: Tensor {
type: 'int64',
data: [101n, 3231n, 102n],
dims: [1,3],
},
} |
### Description because of #15618 , the default allocator changed to device allocator, which will be GPU instead of CPU. in transpose optimizer we expect to read data from initializers so a CPU allocator is required here. this change fixes transpose optimizer on GPU EP Fixes the issue referred to in #15869, #15796
### Description fix buffer size when download. buffer size should always be padded to multiple of 4. resolved issue described in #15796 > ![Image](https://user-images.githubusercontent.com/26504141/239093785-9417dffc-6f00-47b2-956d-402b43bdb0a9.png)
@mrdomino, to make WebGPU work, you may just need to import ort.webgpu.min.js. |
@gyagp thanks for the pointers! Your demo does run, but seems to throw some errors, so it's a little unclear what's being run on GPU vs CPU:
I think the suggestion to "rerun with verbose output on a non-minimal build" to show node assignments requires you to re-build, right? Is that something you're able to do? Thanks again! |
Thanks for the link! I'm now exploring further, and the behavior I'm seeing in Firefox (which I just installed) is different from the behavior I was seeing with Chrome on my Android phone. I'm going to focus on Firefox for now, as it's harder to debug on the phone. First of all, it turns out that the import is not the issue after all — if I just pass If I just pass If I pass either I have thus far been working around this by manually testing |
Okay, wow, this is getting complicated. I just checked again with Chrome on Android, and even with my So to recap:
|
@gabrielgrant, these errors could be ignored now. If we use webgpu as ep, if some ops are not supported by WebGPU, they would automatically fall back to wasm. My script has a ortProfiling task (Whole url is https://webatintel.github.io/ort-toolkit/?tasks=ortProfiling&ep=webgpu&modelName=mobilenetv2-12&modelUrl=hf&enableReadback=true), and it will show you where each op is running on (JsExecutionProvider means WebGPU, while CPUExecutionProvider means wasm). |
@mrdomino I'm not sure about the exact status of Firefox and Safari, but for Chrome, WebGPU was only officially supported on Windows, macOS and ChromeOS (since M113). Android support is still behind the flag "--enable-unsafe-webgpu". Fortunately the its status is very good now, as Google just sent out "intent to ship" in Chrome (https://chromestatus.com/feature/5119617865613312), and planned to ship it in M121 (Jan 23 is the release date). So before then, you still need to pass switch "--enable-unsafe-webgpu" to enable WebGPU on Android with latest Chrome (better to experiment with Chrome Canary). |
@mrdomino BTW, if you have interest to follow up the WebGPU status on Safari, here has some clue: gpuweb/gpuweb#4238 |
I'm actually not that worried about webgpu on Safari — using pure wasm as a fallback is acceptable for my use case, and actually works well enough on Firefox. The things I'm concerned about are basically just:
|
You only need to import ort.webgpu.min.js. Then if WebGPU is supported, use webgpu as ep; otherwise, change ep to wasm. Pseudo code:
If WebGPU is supported, the fallback to wasm is either a limitation (some ops, including dataType variants, are not implemented by WebGPU) or an optimization (ORT has heuristic to use wasm over WebGPU for better performance). We will continue to improve the framework, including the profiling mechanism, so that it's easier to differentiate these two. You're always welcome to report perf issue when in doubt. |
No, that is not the case in Chrome on Android. That works on Chrome on Desktop, and on Firefox and Safari on desktop, but not on Android. On Android, no matter what ep is, importing ort.webgpu.min.js causes a crash. And by point 2, I was referring to the strange error message thrown on Firefox/Safari ( |
The specific code I am using to decide which backend to use is: const backend = await (async () => {
if (!navigator.gpu) return 'wasm'
const adapter = await navigator.gpu.requestAdapter()
if (!adapter) return 'wasm'
return 'webgpu'
})() That backend is then passed in to InferenceSession.create (as I will try to get a code sandbox up with a minimal example. |
Here: https://ort-test.vercel.app/ The only difference between the WASM and WebGPU pages is which file is imported. Both are using "wasm" as the ep. On Chrome Android, WASM says "Everything worked" and WebGPU says "Failed during InferenceSession.create" with the WebGpuBackend error message. https://bitbucket.org/mrdomino/ort-test/src/main/app/ort/page.tsx |
Curiously, Chrome on Android exposes a Is it possible that somewhere in the code there is a simple check for the presence of navigator.gpu to decide to use WebGPU? |
It looks like onnxruntime/js/web/lib/index.ts Lines 18 to 23 in 24f9c1a
So there is a difference between the good cases of Firefox/Safari and the bad case of Android Chrome: the former do not have the backend registered, while the latter does. Still, the logic in resolveBackend really looks like it should be handling the error, and from the observed behavior, it is not. So a different code path must be getting taken that is trying to initialize a WebGpuBackend. Ah, and indeed, here we go: onnxruntime/js/web/lib/wasm/jsep/init.ts Lines 133 to 140 in 24f9c1a
I think I can probably submit a PR to fix that. |
Just testing for the presence of navigator.gpu is not sufficient to establish WebGPU support: in particular, at time of writing, Chrome on Android exposes a navigator.gpu but does not return anything from requestAdapter. Context: microsoft#15796 (comment)
Thanks for the PR, @fs-eire and @guschmue, any comments on this? |
Hahaha, I like it. One thought I had is it probably makes sense to also check the adapter at the registerBackend call site, and in that case, maybe it makes sense for hasGpu to be in the env? |
FYI, using onnxruntime-web 1.17.0, |
Hi there, I'm getting this error when I set executionProviders=['webgpu'] (I am running on Chrome via https): When I remove When I run my code with executionProviders=['wasm'] everything executes perfectly. I wasn't sure if I should create a new issue or just put a comment here |
This looks like you may import from |
Thank you for your prompt response. I get |
### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are **real backend**s, which means it's different implementation in code; and there are **backend hint**s, which are used as string names for backend hint; and there are **EP**s of the ONNX Runtime concepts. currently there are only 2 **backend**s in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: | Execution Provider Name (public) / Backend Hint (internal) | Backend | EP in ORT | -------- | ------- | ------- | | "wasm"/"cpu" | WasmBackend | CPU EP | "webgl" | OnnxjsBackend | \* technically not an EP | "webgpu" | WasmBackend | JSEP | "webnn" | WasmBackend | WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: #15796 (comment): EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console
Yes, as I explained, if you want to use WebGPU, you need to import |
### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are **real backend**s, which means it's different implementation in code; and there are **backend hint**s, which are used as string names for backend hint; and there are **EP**s of the ONNX Runtime concepts. currently there are only 2 **backend**s in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: | Execution Provider Name (public) / Backend Hint (internal) | Backend | EP in ORT | -------- | ------- | ------- | | "wasm"/"cpu" | WasmBackend | CPU EP | "webgl" | OnnxjsBackend | \* technically not an EP | "webgpu" | WasmBackend | JSEP | "webnn" | WasmBackend | WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: #15796 (comment): EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console
### Description fix buffer size when download. buffer size should always be padded to multiple of 4. resolved issue described in microsoft#15796 > ![Image](https://user-images.githubusercontent.com/26504141/239093785-9417dffc-6f00-47b2-956d-402b43bdb0a9.png)
closing this one, gotten a bit stale. |
This issue is for tracking WebGPU related problems. WebGPU EP is available since ONNX Runtime Web v1.15.0 as experimental feature. We are working on improving stability, operator coverage and performance.
For a list of supported/WIP operators, comments or any operator specific issues: #15952
Can not consume
Q: How to build?
A: Building ort-web with webgpu support from source: please refer to this gist
Q: [Web] An error occurred during model execution: "TypeError: Cannot read properties of undefined (reading 'apply')".
A: #15780 <--- this PR fixed it
Q:
no available backend found. ERR: ...
A: Need to make sure webgpu is available in the current context. Upgrade to latest Chrome or Edge (v113), and served in a secured location ( https or localhost )
Runtime failures
Q:
Non-zero status code returned while running Transpose node. ....
A: #15819 <--- This PR should fix it
Q: crash in the transpose optimizer for various models (#15869: cannot load model https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/onnx/vae_encoder)
A: issue being investigated - see the PR for detailed info
Kernel coverage or running slow
Q: General investigation tips?
A: a few tools that can be used to taking deeper look at it: ( don't do them together, it will generate too many logs )
env.logLevel = 'verbose'; env.debug = true;
- This will let onnxruntime-web to output some logs helpful for analysing the execution. including telling which operators are running on webgpu and which are on CPU (fallback). to improve performance caused by fallback we need to improve the operator coverage. I can help to implement the missing ops.env.webgpu.profilingMode = 'default';
- This will output quite a lot of logs into console for each webgpu shaders - by aggregating and analyzing those we can know which shader is slow. Need to launch chrome/edge with flag--disable-dawn-features=disallow_unsafe_apis
.sessionOptions.enableProfiling = true
when creating inference session. This shows which operator running on GPU, which fallback to CPU.Q: running slow on image classification model. (logs)
A:
jsepCopyGpuToCpu
occurred 114 times, which indicating frequent CPU <--> GPU data transfer. Adding implementation of the missing operators may improve performance.The text was updated successfully, but these errors were encountered: