feat: allow to preload ML models when running inference #224
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce the ML model preloading by leveraging the "Named models" feature from WASI-NN. Now, workers don't need to load the model files and send it to the host, but loading it in the host directly. I added a new example that uses this feature and produces the same result as the rust-wasi-nn example.
I also introduced some new configuration properties to the
wasi_nn
feature:The
preload_models
array allows to configure multiple models. Theprovider
configures the waywws
retrieves the models. Currently, only the local provider is available but we plan to add more.Limitations
Even though we're preloading the models on the host, this process still happens at request level. Based on some comments, the Wasmtime WASI-NN library is not protected against concurrent access. To avoid multiple workers accessing the same context, I'll keep this at request level for now.
In the future, my goal is to load models once at worker initialization.
It closes #215