Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add "named models" #36

Closed
abrown opened this issue Dec 17, 2022 · 13 comments
Closed

feature: add "named models" #36

abrown opened this issue Dec 17, 2022 · 13 comments

Comments

@abrown
Copy link
Collaborator

abrown commented Dec 17, 2022

Currently the wasi-nn API only allows loading ML models from their byte-serialized format (i.e., using load). This can be problematic for several reasons:

  • models may be large — "gigabytes" large
  • not all backends provide a way to load models from bytes (e.g., TF expects a certain filesystem layout)
  • retrieving and loading a model can be the most expensive part of an inference request — hosts may want to load a model once and reuse it across multiple Wasm module instantiations

I would like to propose "named models" — a way of solving these issues. Other WASI proposals, such as wasi-filesystem and wasi-sockets, provide a way of creating pre-instantiation resources that are then available to the Wasm module once instantiated (see, e.g., the --dir and --listenfd flags on the Wasmtime CLI). If a similar idea were available to wasi-nn, users could specify models before instantiation and these could be shared across instances. This sharing could only happen, however, if the models are "named."

Spec changes

To support this in the specification, one would need the ability to load a model using only a name and (possibly) the ability to a load a model from bytes and name it. This way there could be some symmetry between the host and guest functionality. I think this could be supported by adding the following functions:

// Like the `load` function, but the host would retain a mapping from `name` to the `graph`.
load_named: func(builder: graph-builder-array, encoding: graph-encoding, target: execution-target, name: string) -> expected<graph, error>

// Retrieve the loaded `graph` for the given `name`; this could be pre-loaded prior to instantiation or loaded by `load_named`.
get_named: func(name: string) -> expected<graph, error>

Obviously the ability to load a "named model" for all instances running in a host is up for debate: perhaps the available scope of that name should only be the Wasm instance itself or some host-specified neighborhood. I included the most controversial version, global scope, to see what people think. I also think the host may want to implement some way to limit the resources consumed by wasi-nn; this is a host implementation concern, discussed below.

Host engine changes

Though this repository is the spec repository and is primarily concerned with the Wasm-visible API, I think it would be valuable to discuss what changes this might imply for an engine implementing wasi-nn. Here are some suggestions:

  1. The engine might want to limit the resources available to a wasi-nn-using module: this could take the form of limiting the number of models loaded via load or load_named, limiting the size of the models loaded (somehow), etc. One could imagine a flag like --nn-max-models to do something like this. (I would also think it would be great to have a generic way to limit any WASI API if anyone has thoughts on that).

  2. The engine would likely want a way to preload some models to avoid load-ing them repeatedly in new Wasm instances. One could imagine a flag like --nn-preload <name>:<encoding>:<path> to tell the engine both the name of the model and how to load it. All modules instantiated by that engine would have the models available for retrieval with get_named.

@abrown
Copy link
Collaborator Author

abrown commented Dec 17, 2022

cc: @geekbeast, @roee88

@abrown abrown changed the title Discuss "named models" feature: add "named models" Jan 3, 2023
geekbeast added a commit to geekbeast/wasmtime that referenced this issue Apr 3, 2023
This also updates wasmtime to use the latest version of the wasi-nn spec
instead of older commit.
@sunfishcode
Copy link
Member

A global scope would not be virtualizable.

The way --dir and --listenfd work is that they pass in (what are conceptually) handles into the running program, and associating names with them only as a compatibility layer with existing code and command-line argument passing schemes.

One option would be to have the registry be dynamic, and referenced by handle. Another option would be to make the registry be a "link-time authority", meaning you can have a function like get_named, but the scope of the registry isn't global, but the instance that get_named is conceptually imported from. That way, two instances of wasi-nn could be instantiated separately without them implicitly sharing a scope.

@abrown
Copy link
Collaborator Author

abrown commented Apr 13, 2023

I've come to a similar conclusion as @geekbeast and I have discussed this. The idea that load_named must create a named model that could be retrieved from any other instance with get_named is not a good one. For one, it would be problematic on multi-tenant systems, allowing tenant A to access tenant B's proprietary model, e.g. Instead, the wasi-nn specification should not define the scope of the names used, leaving it up to the host to define a policy for what names are available to what instances. This would correspond to how --dir paths work, e.g.: a host could map a path used by all instances to the same physical location so that the instances access the same file but the WASI file system spec certainly does not require this or even encourage this. The same should be true here — the host decides how to map names to "handles."

And, given that names would not be globally scoped, do we even need load_named? I brought this up in #38 as well: what is the point of load_named if there is no guarantee that other instances can use that model? If a module does the heavy lifting to load a model, why name it just for itself — it might as well just hang on to the model resource instead? I think #38 is written under the assumption of tenant-scoped names (a model loaded by a tenant is available to all the tenant's instances) but I think this still breaks the virtualizability goal @sunfishcode is talking about.

@sunfishcode, I'm not sure I understand exactly what you mean by the two options you bring up at the end. My feeling now after some thought is that the best way to provide the name-model mapping is "out-of-band:" i.e., forget about load_named and have host APIs that allow tenants to provide the pre-loaded, named models. I'm open to discussing it more, though!

@geekbeast
Copy link
Contributor

@sunfishcode My understanding is that the registry being proposed here is definitely not be global in scope. While there are some common inference models that hosts may choose to expose, they can do so by modifying their initialization of the dynamic registry at the host level.

Do you have any objections to a fully dynamic registry who contents are completely controlled by host policy? That's actually what I implemented in bytecodealliance/wastime#6134 (caveat there is a bug with the lifetime that I'm fixing up at the moment).

@geekbeast
Copy link
Contributor

@sunfishcode Also quick clarifying question that I suspect I know the answer to here. If two identical WASM guest programs are run, any visible shared state across those two programs would be considered to break virtualization, correct?

@geekbeast
Copy link
Contributor

geekbeast commented Apr 14, 2023

The driving reason behind named models is that model compilation is expensive. I have seen 24x times inference cost for model compilation in some testing. A simpler solution here maybe to expose an import_compiled_model(...) that takes a model that a model that is ready to be executed. This maybe not be compatible with all frameworks. In particular, this is a current open issue in TensorFlow (tensorflow/tensorflow#55520)

@mingqiusun
Copy link
Collaborator

I would think even a local scoped load_named method is useful, as it solves the problem of maintaining state information across calls in the FaaS use case.

@juntao
Copy link

juntao commented May 16, 2023

This sounds great! We look forward to implementing this in WasmEdge!

@hydai
Copy link

hydai commented May 16, 2023

Hello, this is hydai from WasmEdge.

I'd like to clarify something about the section you mentioned:

The engine might require a mechanism to preload certain models to circumvent repeatedly loading them in new Wasm instances. One could suggest a flag such as --nn-preload :: to instruct the engine about the model name and the loading method. All modules initiated by this engine would have access to these preloaded models via get_named.

It seems like this would load directly from a host path, rather than utilizing a guest path mapped via pre-open. For instance, let's consider a model named NM located at the host path /host_path/NM.model. We could preload this model using wasm_runtime --nn-preload NM:<E>:/host_path/NM.model.

I'm curious if it's possible to use a guest path instead. Here's an example:

# Host
/host_path/NM.model
# Map it with pre-open
wasm_runtime --dir /guest_path/NM.model:/host_path/NM.model wasi-nn.wasm
# Within the guest environment
call register_named: func(name: string, path: string) -> expected<error>
prior to invoking load_named();

The reason behind this question is a need for container integration:

  1. When docker+wasm is used, all files within the container images are mapped into the pre-open system.
  2. Similarly, curn+wasm_handler maps --dir .:. to allow the wasm runtime to handle files within the container images.

@abrown
Copy link
Collaborator Author

abrown commented May 18, 2023

Hey @hydai, thanks for the feedback. The idea I was going for with this change is that the model bytes should not even need to be visible to the Wasm guest at all. The issue with the current load function is not just that the model bytes need to be retrieved but that, for some backends, the host will have to compile the model bytes to some other form. I am trying to make it possible to avoid that overhead by naming the models without dictating how the host assigns those names to models. So I guess I see two ways to answer the question:

  1. I guess for your host you could make up a way to configure names from guest paths and have the host keep track of those; CLI flags would be one way to do that and the wasi-nn spec doesn't say what those need to look like.
  2. The other way is to register names within the guest: in @geekbeast's PR for this proposal I think he calls this "register_model" but I used "load_named" up above. The user then has to pay the overhead of compiling the model, though.

@hydai
Copy link

hydai commented May 18, 2023

Hi @abrown

Gocha!

I would like to delve deeper into 'compiling model bytes into other forms.'

Two aspects require clarification:

  1. Regarding the responsibility of this task, it would be beneficial to understand if the users are expected to undertake the precompilation and subsequently supply the location of the compiled form to WASI-NN. Alternatively, is it envisioned that WASI-NN, upon identifying the requirement by backends, would initiate the compilation of model bytes?
  2. The described behavior indeed resembles 'pre-processing' work, such as 'image2tensor'. Given this similarity, could it be inferred that this function would be incorporated into the WASI-NN toolkit?

Another thing is that do we need to add deregister_model or unload_model functions? So users can add/remove named models according to the different use scenario.

@abrown
Copy link
Collaborator Author

abrown commented May 18, 2023

Regarding the responsibility of this task, it would be beneficial to understand if the users are expected to undertake the precompilation and subsequently supply the location of the compiled form to WASI-NN. Alternatively, is it envisioned that WASI-NN, upon identifying the requirement by backends, would initiate the compilation of model bytes?

I don't think we want this "precompilation" step to be guest-visible. If it were, it would expose differences between backends that compile and those that don't. And if we hide the compilation like we do, each host implementation is free to optimize when and how they do this step.

The described behavior indeed resembles 'pre-processing' work, such as 'image2tensor'. Given this similarity, could it be inferred that this function would be incorporated into the WASI-NN toolkit?

I don't think it would fit in the wasi-nn bindings because this compilation I'm talking about is a host-side thing.

Another thing is that do we need to add deregister_model or unload_model functions? So users can add/remove named models according to the different use scenario.

@geekbeast and I discussed this over (I think via a meeting) as we were thinking through #38. (You might want to look at that PR as well). I am pretty reticent to add more surface area to the wasi-nn API unless we absolutely need it. More API surface means more to implement, more to maintain, more chances to make a mistake. So we agreed to hold off on deregistering models until a user demands it; I think we're expecting most users to clean up the wasi-nn state by just existing the Wasm instance. If this does become an issue for someone, though, it seems like a reasonable addition.

@abrown
Copy link
Collaborator Author

abrown commented Aug 9, 2023

Ok, this feature has now been added by #38.

@abrown abrown closed this as completed Aug 9, 2023
abrown added a commit to abrown/wasmtime that referenced this issue Aug 17, 2023
This change adds a way to retrieve preloaded ML models (i.e., "graphs"
in wasi-nn terms) from a registry. The wasi-nn specification includes a
new function, `load_by_name`, that can be used to access these models
more efficiently than before; previously, a user's only option was to
read/download/etc. all of the bytes of an ML model and pass them to the
`load` function.

[named models]: WebAssembly/wasi-nn#36

In Wasmtime's implementation of wasi-nn, we call the registry that holds
the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry`
for use in the Wasmtime CLI (more on this later) but the idea is that
production use will involve some more complex caching and thus a new
implementation of a registry--a `Box<dyn GraphRegistry>`--passed into
the wasi-nn context. Note that, because we now must be able to `clone` a
graph out of the registry and into the "used graphs" table, the OpenVINO
`BackendGraph` is updated to be easier to copy around.

To allow experimentation with this "preload a named model"
functionality, this change also adds a new Wasmtime CLI flag: `--graph
<encoding>:<host dir>`. Wasmtime CLI users can now preload a model from
a directory; the directory `basename` is used as the model name. Loading
models from a directory is probably not desired in Wasmtime embeddings
so it is cordoned off into a separate `BackendFromDir` extension trait.
abrown added a commit to abrown/wasmtime that referenced this issue Aug 17, 2023
This change adds a way to retrieve preloaded ML models (i.e., "graphs"
in wasi-nn terms) from a registry. The wasi-nn specification includes a
new function, `load_by_name`, that can be used to access these models
more efficiently than before; previously, a user's only option was to
read/download/etc. all of the bytes of an ML model and pass them to the
`load` function.

[named models]: WebAssembly/wasi-nn#36

In Wasmtime's implementation of wasi-nn, we call the registry that holds
the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry`
for use in the Wasmtime CLI (more on this later) but the idea is that
production use will involve some more complex caching and thus a new
implementation of a registry--a `Box<dyn GraphRegistry>`--passed into
the wasi-nn context. Note that, because we now must be able to `clone` a
graph out of the registry and into the "used graphs" table, the OpenVINO
`BackendGraph` is updated to be easier to copy around.

To allow experimentation with this "preload a named model"
functionality, this change also adds a new Wasmtime CLI flag: `--graph
<encoding>:<host dir>`. Wasmtime CLI users can now preload a model from
a directory; the directory `basename` is used as the model name. Loading
models from a directory is probably not desired in Wasmtime embeddings
so it is cordoned off into a separate `BackendFromDir` extension trait.
abrown added a commit to abrown/wasmtime that referenced this issue Aug 17, 2023
This change adds a way to retrieve preloaded ML models (i.e., "graphs"
in wasi-nn terms) from a registry. The wasi-nn specification includes a
new function, `load_by_name`, that can be used to access these models
more efficiently than before; previously, a user's only option was to
read/download/etc. all of the bytes of an ML model and pass them to the
`load` function.

[named models]: WebAssembly/wasi-nn#36

In Wasmtime's implementation of wasi-nn, we call the registry that holds
the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry`
for use in the Wasmtime CLI (more on this later) but the idea is that
production use will involve some more complex caching and thus a new
implementation of a registry--a `Box<dyn GraphRegistry>`--passed into
the wasi-nn context. Note that, because we now must be able to `clone` a
graph out of the registry and into the "used graphs" table, the OpenVINO
`BackendGraph` is updated to be easier to copy around.

To allow experimentation with this "preload a named model"
functionality, this change also adds a new Wasmtime CLI flag: `--graph
<encoding>:<host dir>`. Wasmtime CLI users can now preload a model from
a directory; the directory `basename` is used as the model name. Loading
models from a directory is probably not desired in Wasmtime embeddings
so it is cordoned off into a separate `BackendFromDir` extension trait.
abrown added a commit to abrown/wasmtime that referenced this issue Aug 22, 2023
This change adds a way to retrieve preloaded ML models (i.e., "graphs"
in wasi-nn terms) from a registry. The wasi-nn specification includes a
new function, `load_by_name`, that can be used to access these models
more efficiently than before; previously, a user's only option was to
read/download/etc. all of the bytes of an ML model and pass them to the
`load` function.

[named models]: WebAssembly/wasi-nn#36

In Wasmtime's implementation of wasi-nn, we call the registry that holds
the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry`
for use in the Wasmtime CLI (more on this later) but the idea is that
production use will involve some more complex caching and thus a new
implementation of a registry--a `Box<dyn GraphRegistry>`--passed into
the wasi-nn context. Note that, because we now must be able to `clone` a
graph out of the registry and into the "used graphs" table, the OpenVINO
`BackendGraph` is updated to be easier to copy around.

To allow experimentation with this "preload a named model"
functionality, this change also adds a new Wasmtime CLI flag: `--graph
<encoding>:<host dir>`. Wasmtime CLI users can now preload a model from
a directory; the directory `basename` is used as the model name. Loading
models from a directory is probably not desired in Wasmtime embeddings
so it is cordoned off into a separate `BackendFromDir` extension trait.
github-merge-queue bot pushed a commit to bytecodealliance/wasmtime that referenced this issue Aug 23, 2023
* wasi-nn: add [named models]

This change adds a way to retrieve preloaded ML models (i.e., "graphs"
in wasi-nn terms) from a registry. The wasi-nn specification includes a
new function, `load_by_name`, that can be used to access these models
more efficiently than before; previously, a user's only option was to
read/download/etc. all of the bytes of an ML model and pass them to the
`load` function.

[named models]: WebAssembly/wasi-nn#36

In Wasmtime's implementation of wasi-nn, we call the registry that holds
the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry`
for use in the Wasmtime CLI (more on this later) but the idea is that
production use will involve some more complex caching and thus a new
implementation of a registry--a `Box<dyn GraphRegistry>`--passed into
the wasi-nn context. Note that, because we now must be able to `clone` a
graph out of the registry and into the "used graphs" table, the OpenVINO
`BackendGraph` is updated to be easier to copy around.

To allow experimentation with this "preload a named model"
functionality, this change also adds a new Wasmtime CLI flag: `--graph
<encoding>:<host dir>`. Wasmtime CLI users can now preload a model from
a directory; the directory `basename` is used as the model name. Loading
models from a directory is probably not desired in Wasmtime embeddings
so it is cordoned off into a separate `BackendFromDir` extension trait.

* wasi-nn: add "named model" test

Add a new example crate which loads a model by name and performs image
classification. It uses the same MobileNet model as the existing test
but a new version of the Rust bindings. The new crate is built and run
with the new CLI flag in the `ci/run-wasi-nn-example.sh` script.

prtest:full

* review: rename `--graph` to `--wasi-nn-graph`
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this issue Sep 6, 2023
* wasi-nn: add [named models]

This change adds a way to retrieve preloaded ML models (i.e., "graphs"
in wasi-nn terms) from a registry. The wasi-nn specification includes a
new function, `load_by_name`, that can be used to access these models
more efficiently than before; previously, a user's only option was to
read/download/etc. all of the bytes of an ML model and pass them to the
`load` function.

[named models]: WebAssembly/wasi-nn#36

In Wasmtime's implementation of wasi-nn, we call the registry that holds
the models a `GraphRegistry`. We include a simplistic `InMemoryRegistry`
for use in the Wasmtime CLI (more on this later) but the idea is that
production use will involve some more complex caching and thus a new
implementation of a registry--a `Box<dyn GraphRegistry>`--passed into
the wasi-nn context. Note that, because we now must be able to `clone` a
graph out of the registry and into the "used graphs" table, the OpenVINO
`BackendGraph` is updated to be easier to copy around.

To allow experimentation with this "preload a named model"
functionality, this change also adds a new Wasmtime CLI flag: `--graph
<encoding>:<host dir>`. Wasmtime CLI users can now preload a model from
a directory; the directory `basename` is used as the model name. Loading
models from a directory is probably not desired in Wasmtime embeddings
so it is cordoned off into a separate `BackendFromDir` extension trait.

* wasi-nn: add "named model" test

Add a new example crate which loads a model by name and performs image
classification. It uses the same MobileNet model as the existing test
but a new version of the Rust bindings. The new crate is built and run
with the new CLI flag in the `ci/run-wasi-nn-example.sh` script.

prtest:full

* review: rename `--graph` to `--wasi-nn-graph`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants