Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Inference is defined as an async method, but, it blocks. After a couple days of trying all avenues and looking at sample apps, it looks like it is synchronous in that it will consume the attention of the thread the `await session.run` is called on. - Using Squadron to handle multi-threading didn't work. Now that the JS function in index.html is loading the model and passing it to a worker, it's possible it might. - In any case, this shows exactly how to set up a worker that A) does inference without blocking UI rendering B) allows Dart code to `await` the result without blocking UI - This process was frustrating and fraught, there's a surprising lack of info and examples around ONNX web. Most seem to consume it via diffusers.js/transformers.js. ONNX web was a separate library from the rest of the ONNX runtime until sometime around late 2022. The examples still use that library, and the examples use simple enough models that it's hard to catch whether they are blocking without falling back to dev tools. - Its absolutely crucial when debugging speed locally to make sure you're loading the ONNX version you expect (i.e. wasm AND threaded AND simd). The easiest way to check is network loads in Dev Tools, sort by size, and look for the .wasm file to A) be loaded B) include wasm, simd, and threaded in the filename. - Two things can prevent that: -- CORS nonsense with Flutter serving itself in debug mode: --- see here, nagadomi/nunif#34 --- note that the extension became adware, you should have Chrome set up its permissions such that it isn't run until you click it. Also, note that you have to do that each time the Flutter web app in debug mode's port changes. -- MIME type issues --- Even after that, I would see errors in console logs about the MIME type of the .wasm being incorrect and starting with the wrong bytes. That, again, seems due to local Flutter serving of the web app. To work around that, you can download the WASM files from the same CDN folder that hosts ort.min.js (see worker.js) and also in worker.js, remove the // in front of ort.env.wasm.wasmPaths = "". That indicates you've placed the WASM files next to index.html, which you should. Note you just need the 4 .wasm files, no more, from the CDN. Some performance review notes: - `webgpu` as execution provider completely errors out, says "JS executor not supported in the ONNX version" (1.16.3) - `webgl` throws "Cannot read properties of null (reading 'irVersion')" - Tested perf by varying wasm / simd / thread and thread count on M2 MacBook Air 16 GB ram, Chrome 120 - Landed on simd & thread count = 1/2 of cores as best performing -- first # is minilm l6v2, second is minilm l6v3, average inference time for 200 / 400 words -- 4 threads: 526 ms / 2196 ms -- simd 4 threads: 86 ms / 214 ms -- simd 8 threads: 106 ms / 260 ms -- simd 128 threads: 2879 ms / skipped -- simd navigator.hardwareConcurrency threads (8): 107 ms / 222 ms
- Loading branch information