feature: add "named models" #36

abrown · 2022-12-17T00:20:37Z

Currently the wasi-nn API only allows loading ML models from their byte-serialized format (i.e., using load). This can be problematic for several reasons:

models may be large — "gigabytes" large
not all backends provide a way to load models from bytes (e.g., TF expects a certain filesystem layout)
retrieving and loading a model can be the most expensive part of an inference request — hosts may want to load a model once and reuse it across multiple Wasm module instantiations

I would like to propose "named models" — a way of solving these issues. Other WASI proposals, such as wasi-filesystem and wasi-sockets, provide a way of creating pre-instantiation resources that are then available to the Wasm module once instantiated (see, e.g., the --dir and --listenfd flags on the Wasmtime CLI). If a similar idea were available to wasi-nn, users could specify models before instantiation and these could be shared across instances. This sharing could only happen, however, if the models are "named."

Spec changes

To support this in the specification, one would need the ability to load a model using only a name and (possibly) the ability to a load a model from bytes and name it. This way there could be some symmetry between the host and guest functionality. I think this could be supported by adding the following functions:

// Like the `load` function, but the host would retain a mapping from `name` to the `graph`.
load_named: func(builder: graph-builder-array, encoding: graph-encoding, target: execution-target, name: string) -> expected<graph, error>

// Retrieve the loaded `graph` for the given `name`; this could be pre-loaded prior to instantiation or loaded by `load_named`.
get_named: func(name: string) -> expected<graph, error>

Obviously the ability to load a "named model" for all instances running in a host is up for debate: perhaps the available scope of that name should only be the Wasm instance itself or some host-specified neighborhood. I included the most controversial version, global scope, to see what people think. I also think the host may want to implement some way to limit the resources consumed by wasi-nn; this is a host implementation concern, discussed below.

Host engine changes

Though this repository is the spec repository and is primarily concerned with the Wasm-visible API, I think it would be valuable to discuss what changes this might imply for an engine implementing wasi-nn. Here are some suggestions:

The engine might want to limit the resources available to a wasi-nn-using module: this could take the form of limiting the number of models loaded via load or load_named, limiting the size of the models loaded (somehow), etc. One could imagine a flag like --nn-max-models to do something like this. (I would also think it would be great to have a generic way to limit any WASI API if anyone has thoughts on that).
The engine would likely want a way to preload some models to avoid load-ing them repeatedly in new Wasm instances. One could imagine a flag like --nn-preload <name>:<encoding>:<path> to tell the engine both the name of the model and how to load it. All modules instantiated by that engine would have the models available for retrieval with get_named.

The text was updated successfully, but these errors were encountered:

abrown · 2022-12-17T00:22:11Z

cc: @geekbeast, @roee88

This also updates wasmtime to use the latest version of the wasi-nn spec instead of older commit.

sunfishcode · 2023-04-12T17:00:28Z

A global scope would not be virtualizable.

The way --dir and --listenfd work is that they pass in (what are conceptually) handles into the running program, and associating names with them only as a compatibility layer with existing code and command-line argument passing schemes.

One option would be to have the registry be dynamic, and referenced by handle. Another option would be to make the registry be a "link-time authority", meaning you can have a function like get_named, but the scope of the registry isn't global, but the instance that get_named is conceptually imported from. That way, two instances of wasi-nn could be instantiated separately without them implicitly sharing a scope.

abrown · 2023-04-13T03:30:40Z

I've come to a similar conclusion as @geekbeast and I have discussed this. The idea that load_named must create a named model that could be retrieved from any other instance with get_named is not a good one. For one, it would be problematic on multi-tenant systems, allowing tenant A to access tenant B's proprietary model, e.g. Instead, the wasi-nn specification should not define the scope of the names used, leaving it up to the host to define a policy for what names are available to what instances. This would correspond to how --dir paths work, e.g.: a host could map a path used by all instances to the same physical location so that the instances access the same file but the WASI file system spec certainly does not require this or even encourage this. The same should be true here — the host decides how to map names to "handles."

And, given that names would not be globally scoped, do we even need load_named? I brought this up in #38 as well: what is the point of load_named if there is no guarantee that other instances can use that model? If a module does the heavy lifting to load a model, why name it just for itself — it might as well just hang on to the model resource instead? I think #38 is written under the assumption of tenant-scoped names (a model loaded by a tenant is available to all the tenant's instances) but I think this still breaks the virtualizability goal @sunfishcode is talking about.

@sunfishcode, I'm not sure I understand exactly what you mean by the two options you bring up at the end. My feeling now after some thought is that the best way to provide the name-model mapping is "out-of-band:" i.e., forget about load_named and have host APIs that allow tenants to provide the pre-loaded, named models. I'm open to discussing it more, though!

geekbeast · 2023-04-14T00:34:32Z

@sunfishcode My understanding is that the registry being proposed here is definitely not be global in scope. While there are some common inference models that hosts may choose to expose, they can do so by modifying their initialization of the dynamic registry at the host level.

Do you have any objections to a fully dynamic registry who contents are completely controlled by host policy? That's actually what I implemented in bytecodealliance/wastime#6134 (caveat there is a bug with the lifetime that I'm fixing up at the moment).

geekbeast · 2023-04-14T00:40:16Z

@sunfishcode Also quick clarifying question that I suspect I know the answer to here. If two identical WASM guest programs are run, any visible shared state across those two programs would be considered to break virtualization, correct?

geekbeast · 2023-04-14T01:12:39Z

The driving reason behind named models is that model compilation is expensive. I have seen 24x times inference cost for model compilation in some testing. A simpler solution here maybe to expose an import_compiled_model(...) that takes a model that a model that is ready to be executed. This maybe not be compatible with all frameworks. In particular, this is a current open issue in TensorFlow (tensorflow/tensorflow#55520)

mingqiusun · 2023-04-14T18:59:13Z

I would think even a local scoped load_named method is useful, as it solves the problem of maintaining state information across calls in the FaaS use case.

juntao · 2023-05-16T01:53:52Z

This sounds great! We look forward to implementing this in WasmEdge!

hydai · 2023-05-16T09:31:48Z

Hello, this is hydai from WasmEdge.

I'd like to clarify something about the section you mentioned:

The engine might require a mechanism to preload certain models to circumvent repeatedly loading them in new Wasm instances. One could suggest a flag such as --nn-preload :: to instruct the engine about the model name and the loading method. All modules initiated by this engine would have access to these preloaded models via get_named.

It seems like this would load directly from a host path, rather than utilizing a guest path mapped via pre-open. For instance, let's consider a model named NM located at the host path /host_path/NM.model. We could preload this model using wasm_runtime --nn-preload NM:<E>:/host_path/NM.model.

I'm curious if it's possible to use a guest path instead. Here's an example:

# Host
/host_path/NM.model
# Map it with pre-open
wasm_runtime --dir /guest_path/NM.model:/host_path/NM.model wasi-nn.wasm
# Within the guest environment
call register_named: func(name: string, path: string) -> expected<error>
prior to invoking load_named();

The reason behind this question is a need for container integration:

When docker+wasm is used, all files within the container images are mapped into the pre-open system.
Similarly, curn+wasm_handler maps --dir .:. to allow the wasm runtime to handle files within the container images.

abrown · 2023-05-18T03:47:01Z

Hey @hydai, thanks for the feedback. The idea I was going for with this change is that the model bytes should not even need to be visible to the Wasm guest at all. The issue with the current load function is not just that the model bytes need to be retrieved but that, for some backends, the host will have to compile the model bytes to some other form. I am trying to make it possible to avoid that overhead by naming the models without dictating how the host assigns those names to models. So I guess I see two ways to answer the question:

I guess for your host you could make up a way to configure names from guest paths and have the host keep track of those; CLI flags would be one way to do that and the wasi-nn spec doesn't say what those need to look like.
The other way is to register names within the guest: in @geekbeast's PR for this proposal I think he calls this "register_model" but I used "load_named" up above. The user then has to pay the overhead of compiling the model, though.

hydai · 2023-05-18T08:50:22Z

Hi @abrown

Gocha!

I would like to delve deeper into 'compiling model bytes into other forms.'

Two aspects require clarification:

Regarding the responsibility of this task, it would be beneficial to understand if the users are expected to undertake the precompilation and subsequently supply the location of the compiled form to WASI-NN. Alternatively, is it envisioned that WASI-NN, upon identifying the requirement by backends, would initiate the compilation of model bytes?
The described behavior indeed resembles 'pre-processing' work, such as 'image2tensor'. Given this similarity, could it be inferred that this function would be incorporated into the WASI-NN toolkit?

Another thing is that do we need to add deregister_model or unload_model functions? So users can add/remove named models according to the different use scenario.

abrown · 2023-05-18T16:22:02Z

Regarding the responsibility of this task, it would be beneficial to understand if the users are expected to undertake the precompilation and subsequently supply the location of the compiled form to WASI-NN. Alternatively, is it envisioned that WASI-NN, upon identifying the requirement by backends, would initiate the compilation of model bytes?

I don't think we want this "precompilation" step to be guest-visible. If it were, it would expose differences between backends that compile and those that don't. And if we hide the compilation like we do, each host implementation is free to optimize when and how they do this step.

The described behavior indeed resembles 'pre-processing' work, such as 'image2tensor'. Given this similarity, could it be inferred that this function would be incorporated into the WASI-NN toolkit?

I don't think it would fit in the wasi-nn bindings because this compilation I'm talking about is a host-side thing.

Another thing is that do we need to add deregister_model or unload_model functions? So users can add/remove named models according to the different use scenario.

@geekbeast and I discussed this over (I think via a meeting) as we were thinking through #38. (You might want to look at that PR as well). I am pretty reticent to add more surface area to the wasi-nn API unless we absolutely need it. More API surface means more to implement, more to maintain, more chances to make a mistake. So we agreed to hold off on deregistering models until a user demands it; I think we're expecting most users to clean up the wasi-nn state by just existing the Wasm instance. If this does become an issue for someone, though, it seems like a reasonable addition.

abrown · 2023-08-09T21:27:26Z

Ok, this feature has now been added by #38.

This change adds a way to retrieve preloaded ML models (i.e., "graphs" in wasi-nn terms) from a registry. The wasi-nn specification includes a new function, `load_by_name`, that can be used to access these models more efficiently than before; previously, a user's only option was to read/download/etc. all of the bytes of an ML model and pass them to the `load` function. [named models]: WebAssembly/wasi-nn#36 In Wasmtime's implementation of wasi-nn, we call the registry that holds the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry` for use in the Wasmtime CLI (more on this later) but the idea is that production use will involve some more complex caching and thus a new implementation of a registry--a `Box<dyn GraphRegistry>`--passed into the wasi-nn context. Note that, because we now must be able to `clone` a graph out of the registry and into the "used graphs" table, the OpenVINO `BackendGraph` is updated to be easier to copy around. To allow experimentation with this "preload a named model" functionality, this change also adds a new Wasmtime CLI flag: `--graph <encoding>:<host dir>`. Wasmtime CLI users can now preload a model from a directory; the directory `basename` is used as the model name. Loading models from a directory is probably not desired in Wasmtime embeddings so it is cordoned off into a separate `BackendFromDir` extension trait.

* wasi-nn: add [named models] This change adds a way to retrieve preloaded ML models (i.e., "graphs" in wasi-nn terms) from a registry. The wasi-nn specification includes a new function, `load_by_name`, that can be used to access these models more efficiently than before; previously, a user's only option was to read/download/etc. all of the bytes of an ML model and pass them to the `load` function. [named models]: WebAssembly/wasi-nn#36 In Wasmtime's implementation of wasi-nn, we call the registry that holds the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry` for use in the Wasmtime CLI (more on this later) but the idea is that production use will involve some more complex caching and thus a new implementation of a registry--a `Box<dyn GraphRegistry>`--passed into the wasi-nn context. Note that, because we now must be able to `clone` a graph out of the registry and into the "used graphs" table, the OpenVINO `BackendGraph` is updated to be easier to copy around. To allow experimentation with this "preload a named model" functionality, this change also adds a new Wasmtime CLI flag: `--graph <encoding>:<host dir>`. Wasmtime CLI users can now preload a model from a directory; the directory `basename` is used as the model name. Loading models from a directory is probably not desired in Wasmtime embeddings so it is cordoned off into a separate `BackendFromDir` extension trait. * wasi-nn: add "named model" test Add a new example crate which loads a model by name and performs image classification. It uses the same MobileNet model as the existing test but a new version of the Rust bindings. The new crate is built and run with the new CLI flag in the `ci/run-wasi-nn-example.sh` script. prtest:full * review: rename `--graph` to `--wasi-nn-graph`

abrown mentioned this issue Jan 3, 2023

Questions about WIT transition #24

Open

abrown changed the title ~~Discuss "named models"~~ feature: add "named models" Jan 3, 2023

geekbeast mentioned this issue Apr 3, 2023

Support loading models by name #38

Merged

geekbeast added a commit to geekbeast/wasmtime that referenced this issue Apr 3, 2023

Implementation for spec changes associatied with WebAssembly/wasi-nn#36

689e586

This also updates wasmtime to use the latest version of the wasi-nn spec instead of older commit.

This was referenced Apr 3, 2023

Named models for wasi-nn bytecodealliance/wasmtime#6134

Open

Update generated and support byte arrays in image2tensor bytecodealliance/wasi-nn#80

Closed

juntao mentioned this issue May 16, 2023

Support "Named Models" in WASI NN WasmEdge/WasmEdge#2504

Closed

abrown mentioned this issue May 22, 2023

[WASI-NN] Problems of implementing the Tensorflow backend WasmEdge/WasmEdge#2339

Open

abrown closed this as completed Aug 9, 2023

hydai mentioned this issue Aug 27, 2023

[Examples] Add the named model feature into pytorch demo. second-state/WasmEdge-WASINN-examples#27

Merged

Angelmmiguel mentioned this issue Sep 13, 2023

Add named models support on AI workers vmware-labs/wasm-workers-server#215

Closed

Angelmmiguel mentioned this issue Sep 27, 2023

feat: allow to preload ML models when running inference vmware-labs/wasm-workers-server#224

Merged

squillace mentioned this issue Sep 28, 2023

Specifying execution targets #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: add "named models" #36

feature: add "named models" #36

abrown commented Dec 17, 2022

abrown commented Dec 17, 2022

sunfishcode commented Apr 12, 2023

abrown commented Apr 13, 2023 •

edited

Loading

geekbeast commented Apr 14, 2023

geekbeast commented Apr 14, 2023

geekbeast commented Apr 14, 2023 •

edited

Loading

mingqiusun commented Apr 14, 2023

juntao commented May 16, 2023

hydai commented May 16, 2023

abrown commented May 18, 2023

hydai commented May 18, 2023

abrown commented May 18, 2023

abrown commented Aug 9, 2023

feature: add "named models" #36

feature: add "named models" #36

Comments

abrown commented Dec 17, 2022

Spec changes

Host engine changes

abrown commented Dec 17, 2022

sunfishcode commented Apr 12, 2023

abrown commented Apr 13, 2023 • edited Loading

geekbeast commented Apr 14, 2023

geekbeast commented Apr 14, 2023

geekbeast commented Apr 14, 2023 • edited Loading

mingqiusun commented Apr 14, 2023

juntao commented May 16, 2023

hydai commented May 16, 2023

abrown commented May 18, 2023

hydai commented May 18, 2023

abrown commented May 18, 2023

abrown commented Aug 9, 2023

abrown commented Apr 13, 2023 •

edited

Loading

geekbeast commented Apr 14, 2023 •

edited

Loading