https://radu-matei.com/blog/tensorflow-inferencing-wasi/
This is a demonstration of running the MobileNet V2 TensorFlow
model in WebAssembly System Interface (WASI) runtimes outside the
browser. The project uses the Sonos Tract crate to build an
inference program in Rust, which is then compiled to Rust's wasm32-wasi
WebAssembly target.
This is an experimental project, which has a few goals:
- build a more complex WebAssembly module that does not use any code
generation for bindings (such as
wasm-bindgen
). - run the module in Wasmtime, exemplifying writing arbitrary data (such as
images) in the guest's linear memory using Rust and
Wasmtime::Memory
. - build a simple runtime on top of a web server that accepts incoming connections, reads an image URL from their request body, execute the inference on the fetched image, and return the model's prediction
- execute the same WebAssembly module in Node's WASI runtime and exemplify writing into the module's linear memory using JavaScript.
- the project uses a pre-trained convolutional neural network model - MobileNet V2 with around 6M parameters, used for computer vision tasks such as classification or object detection.
- this project starts from the Sonos Tract's example for execution the model locally, and makes the necessary changes to compile it to WebAssembly.
- this project does not treat training neural network models.
- while the approach used by this project can be used to execute inferences using different neural network models, the implementation is specialized for performing inferences using MobileNet model. Changing the model architecture, as well as its inputs and outputs, would require changes in both the WebAssembly module, as well as in how it is instantiated in Wasmtime.
- because a
Wasmtime::Instance
cannot be safely sent between threads, a new instance of the module is created for each request, which adds to the overall latency.
When executing cargo build
, the following are executed:
- build and optimize a WebAssembly module based on the
wasi-mobilenet-inferencing
crate (seebuild.rs
) - build a server that listens for HTTP requests and get the model prediction for the image URL in the request body using Wasmtime.
$ cargo run --release
Listening on http://127.0.0.1:3000
module instantiation time: 774.715145ms
inference time: 723.531083ms
In another terminal instance (or from an HTTP request builder, such as Postman):
$ curl --request GET 'localhost:3000' \
--header 'Content-Type: text/plain' \
--data-raw 'https://upload.wikimedia.org/wikipedia/commons/3/33/GoldenRetrieverSnow.jpg'
golden retriever
Prerequisites (required in the path):
cargo
wasm-opt
from Binaryen
The repository contains an already built and optimized module, which can be
found in model/optimized-wasi.wasm
. It can be tested without any compilation
(and without any Node dependencies) using a recent NodeJS installation:
$ node -v
v14.5.0
$ node --experimental-wasi-unstable-preview1 --experimental-wasm-bigint test.js
predicting on file golden-retriever.jpeg
inference time: 953 ms
prediction: golden retriever
predicting on file husky.jpeg
inference time: 625 ms
prediction: Eskimo dog, husky