Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Wasm runner implementation #173

Merged
merged 13 commits into from
Oct 12, 2023
Merged

Conversation

leizaf
Copy link
Contributor

@leizaf leizaf commented Oct 5, 2023

Still untested, but I might have a very basic proof of concept ready. Currently it requires the user to define and export buffers to transfer tensors to/from wasm. The alternative is to pass pointers and lengths manually, which I think is more complicated, and doesn't allow for multiple returns. I used color_eyre temporarily for quick error handling, let me know what error handling system you are using and I can refactor to that. Also I'm not super sure how to test the runner, I have a half completed test written.

@leizaf leizaf changed the title Wasm runner [WIP] Wasm runner Oct 5, 2023
Copy link
Owner

@VivekPanyam VivekPanyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. Here are a few high level comments:

Currently it requires the user to define and export buffers to transfer tensors to/from wasm. The alternative is to pass pointers and lengths manually, which I think is more complicated, and doesn't allow for multiple returns.

Unfortunately, the current approach is not particularly flexible as many things cannot be dynamic (e.g. number of return tensors, number of inputs, shapes of tensors, etc.).

I'd recommend reading about the WebAssembly Component Model and then take a look at wit-bindgen.

The component model allows you to define more sophisticated interface types and defines a canonical ABI so that Wasm modules implemented in several languages can communicate in a consistent way. This is somewhat similar to the interface definitions we were talking about in #164.

We generally want to support the same infer interface as the rest of Carton (the input is an arbitrary number of named Tensors and the output is an arbitrary number of named Tensors). This should be possible using WIT.

Can you try to get a prototype working using the component model/WIT? wasmtime has support for WIT so you shouldn't have to change runtimes.

@leizaf
Copy link
Contributor Author

leizaf commented Oct 5, 2023

@VivekPanyam What do you think of the .wit for tensors I drafted up. And infer could just be:

infer func(in: list<tuple<string, tensor>>) -> list<tuple<string, tensor>>

or just a list of tensors.

I did consider using the component model initially, but I wasn't sure how developed tooling around it is yet.

@VivekPanyam
Copy link
Owner

That infer signature looks good to me and the .wit file looks good too!

One thing to note in the interface is that since you're using list<u8> or list<string>, we actually don't need strides.

In the future, ideally we'd return an address/pointer into Wasm memory for the buffer field (along with strides). That'll help avoid an extra copy in cases where the model's output isn't contiguous/doesn't have "standard" strides.

We could just make buffer a u64 or something and treat it as an offset into Wasm linear memory, but then we'd have to explicitly handle lifetimes. We can explore that as an optimization later and just stick with list for now.

@leizaf
Copy link
Contributor Author

leizaf commented Oct 6, 2023

@VivekPanyam What's the best way to copylessly create a Tensor from Vec<T>? Also I added 2 methods to TensorStorage, are those alright?

Copy link
Owner

@VivekPanyam VivekPanyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VivekPanyam What's the best way to copylessly create a Tensor from Vec<T>?

Does the WIT interface require you to get a Vec<T> as output or is there a way to get a slice? That would let you copy out of Wasm memory directly into a new Tensor.

As far as I'm aware, we need to do at least one copy on the output path (to copy out of Wasm linear memory into something else). If we can make it exactly one, that would be ideal (i.e. without an intermediate Vec).

If you don't see a way to do this, don't worry about it and we can optimize later.

(I added one other comment to answer your other question, but I didn't review the whole PR since it looks like it's still in progress)

@leizaf
Copy link
Contributor Author

leizaf commented Oct 7, 2023

Finally got the runner working! Here is a recap of everything:

Summary

This PR adds 2 sub-modules, carton-runner-wasm and carton-wasm-interface. The former implements the runner and specifies the components a model is required to implement. The latter is basically empty, but I'd like it to contain guest side implementations and conversions between Candle and Burn tensors. The motivation for this is that working with the raw component types is quite a rough experience.

Test Coverage

Host side conversions between carton and component tensors are covered for f32, u32, i32, string, and passing so I imagine they are working for all types. There is a a somewhat messy test for WASMModelInstance which is working for a basic model. The actual runner main.rs but I assume it's working since it's pretty simple.

Todo?

  • Guest side type conversions in carton-wasm-interface
  • Reduce number of copies
  • Return pointer directly from wasm

Comments

Does the WIT interface require you to get a Vec as output or is there a way to get a slice? That would let you copy out of Wasm memory directly into a new Tensor.

Wasmtime will automatically copy the return into host memory via the Lift trait, vice versa with the Lower trait. Since list translates to vec you would get Vec<u8> back. So currently each infer call does 2 copies per variable: carton -> component -> wasm. As you mentioned, and I found out the hard way, there are some caveats to copyless construction of vecs, so this might be difficult.

In the future, ideally we'd return an address/pointer into Wasm memory for the buffer field (along with strides). That'll help avoid an extra copy in cases where the model's output isn't contiguous/doesn't have "standard" strides.

For handling the lifetime I think Rcing the previous output until the next infer call would probably suffice. That or introduce a callback to free it. The user has to implement the infer function though, so I'm not sure how that would work.

What are your thoughts? Is this mergeable (after some clean up) yet?

Copy link
Owner

@VivekPanyam VivekPanyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got the runner working!

This is great. Thanks for spending time on it!

Wasmtime will automatically copy the return into host memory via the Lift trait, vice versa with the Lower trait. Since list translates to vec you would get Vec<u8> back. So currently each infer call does 2 copies per variable: carton -> component -> wasm.

Makes sense. General thoughts that require no action:

Is it possible/straightforward for us to implement Lift for Tensor (and use it easily)? It looks like wasmtime implements Lift for several types that can be built from a wit list (i.e. it's not a 1:1 mapping from list to Vec). I haven't looked at this in depth so maybe it doesn't actually give us what we want. It seems like implementing Lift might require messing with wasmtime implementation details so maybe it's not worth it (definitely not in this PR at least). We can explore this more if we find that this actually matters for performance in use cases we see.

The TODOs sound reasonable overall.

Is this mergeable yet?

Almost! We need a couple other things to get to a runner we can release and deploy. Here are a few things to figure out:

  • Two options with backwards compatibility:
    • Have a policy of not maintaining runner compatibility until the first time it shows up on the docs website (and we can mark it as experimental in the runner's readme until that happens). If that makes sense to you, add a README.md to source/carton-runner-wasm that says in bold at the top that the runner is currently experimental and while it's experimental, models created with it may not work in the future.
    • The other option is to confirm that the interfaces are something we're reasonably happy with. We can easily change the implementation by publishing new versions of the runner in the nightly builds, but I'd like to make sure we don't foresee immediate breaking changes to the interface (e.g. the .wit file). Not a huge deal if we need to make a breaking change after releasing (because of how Carton does versioning of runners and the models they create), but I'd rather not make a breaking change immediately (because in theory that means we still need to keep the old runner binary available for all platforms into the future). Some of the TODOs above make it seem like we might change the .wit file relatively soon.

I'd recommend marking it as experimental.

Finally, we need to add a binary that builds a release (example), a complete test (example), and add it to CI

The latter is basically empty, but I'd like it to contain guest side implementations and conversions between Candle and Burn tensors. The motivation for this is that working with the raw component types is quite a rough experience.

That makes sense. I'd recommend removing it for now since it's an empty crate and then you can add it back when you start implementing it.

I added a few comments inline; most of them are pretty simple fixes. The big changes that need to happen are the release building binary and end-to-end test I mentioned above. Nice work!

source/carton-runner-wasm/src/main.rs Outdated Show resolved Hide resolved
.unwrap();
}
RequestData::Seal { tensors } => {
todo!()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're marking the runner as experimental, this is fine. Otherwise we want to pass this through to the Wasm code.

.unwrap();
}
RequestData::InferWithHandle { handle, .. } => {
todo!()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. If you're marking the runner as experimental, this is fine. Otherwise we want to pass this through to the Wasm code.


impl Into<CartonTensor> for TensorNumeric {
fn into(self) -> CartonTensor {
match self.dtype {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be helpful to use the for_each_numeric_carton_type! macro here from the carton-macros crate

type Error = Report;

fn try_from(value: CartonTensor) -> Result<Self> {
Ok(match value {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be helpful to use the for_each_carton_type! macro

source/carton-runner-wasm/tests/test_model/model.wasm Outdated Show resolved Hide resolved

world model {
use types.{tensor};
export infer: func(in: list<tuple<string, tensor>>) -> list<tuple<string, tensor>>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add seal and infer_with_handle. Not necessary in this PR if you're marking the runner as experimental.

source/carton-runner-wasm/Cargo.toml Outdated Show resolved Hide resolved
source/carton-runner-wasm/src/lib.rs Outdated Show resolved Hide resolved
use carton_runner_wasm::WASMModelInstance;

#[test]
fn test_model_instance() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment on what this is testing

@leizaf leizaf mentioned this pull request Oct 7, 2023
@leizaf
Copy link
Contributor Author

leizaf commented Oct 11, 2023

Finally, we need to add a binary that builds a release (example), a complete test (example), and add it to CI

@VivekPanyam Done, and implemented most of the suggestions you made. Whats the best way to make it so the wasm runner is ignored when targeting wasm/wasi.

Copy link
Owner

@VivekPanyam VivekPanyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Whats the best way to make it so the wasm runner is ignored when targeting wasm/wasi.

None of the runners are built for wasm/wasi in CI at the moment so nothing to do here.

I added comments on spots you could use the for_each_carton_type macros, but no need to change them in this PR (Just for future reference).

I assume the reason you didn't use them in those spots, but used them in other places is because you were trying to return a value and it didn't work. If you want to return from within one of those macros you currently need to explicitly return (as in the example in one of my comments). This is a little counterintuitive and I should probably include it in the docstrings for the macros (or we should modify the macro implementations to make this easier)

Comment below if you want to change something, otherwise I'll let CI run and then merge!

Thanks again for working on this!

Comment on lines +38 to +65
match self.dtype {
Dtype::Float => {
copy_to_storage(CartonStorage::<f32>::new(self.shape), &self.buffer).into()
}
Dtype::Double => {
copy_to_storage(CartonStorage::<f64>::new(self.shape), &self.buffer).into()
}
Dtype::I8 => copy_to_storage(CartonStorage::<i8>::new(self.shape), &self.buffer).into(),
Dtype::I16 => {
copy_to_storage(CartonStorage::<i16>::new(self.shape), &self.buffer).into()
}
Dtype::I32 => {
copy_to_storage(CartonStorage::<i32>::new(self.shape), &self.buffer).into()
}
Dtype::I64 => {
copy_to_storage(CartonStorage::<i64>::new(self.shape), &self.buffer).into()
}
Dtype::U8 => copy_to_storage(CartonStorage::<u8>::new(self.shape), &self.buffer).into(),
Dtype::U16 => {
copy_to_storage(CartonStorage::<u16>::new(self.shape), &self.buffer).into()
}
Dtype::U32 => {
copy_to_storage(CartonStorage::<u32>::new(self.shape), &self.buffer).into()
}
Dtype::U64 => {
copy_to_storage(CartonStorage::<u64>::new(self.shape), &self.buffer).into()
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more concise option using a macro might be

for_each_numeric_carton_type! {
    match self.dtype {
        $(Dtype::$CartonType => {
            return copy_to_storage(CartonStorage::<$RustType>::new(self.shape), &self.buffer).into()
        })*
    }
}

Comment on lines +84 to +97
Ok(match value {
CartonTensor::Float(t) => WasmTensor::Numeric(t.into()),
CartonTensor::Double(t) => WasmTensor::Numeric(t.into()),
CartonTensor::I8(t) => WasmTensor::Numeric(t.into()),
CartonTensor::I16(t) => WasmTensor::Numeric(t.into()),
CartonTensor::I32(t) => WasmTensor::Numeric(t.into()),
CartonTensor::I64(t) => WasmTensor::Numeric(t.into()),
CartonTensor::U8(t) => WasmTensor::Numeric(t.into()),
CartonTensor::U16(t) => WasmTensor::Numeric(t.into()),
CartonTensor::U32(t) => WasmTensor::Numeric(t.into()),
CartonTensor::U64(t) => WasmTensor::Numeric(t.into()),
CartonTensor::String(t) => WasmTensor::String(t.into()),
CartonTensor::NestedTensor(_) => return Err(eyre!("Nested tensors are not supported")),
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my note on for_each_numeric_carton_type! above. You should be able to use for_each_carton_type! here

Comment on lines +147 to +198
for_each_numeric_carton_type! {
$(
paste::item! {
#[test]
fn [< $TypeStr "_tensor_carton_to_wasm" >]() {
let storage = CartonStorage::<$RustType>::new(vec![3]);
let carton_tensor = CartonTensor::$CartonType(
copy_to_storage(
storage,
slice_to_bytes(
&[1.0 as $RustType, 2.0 as $RustType, 3.0 as $RustType]
)
)
);
let wasm_tensor = WasmTensor::try_from(carton_tensor).unwrap();
match wasm_tensor {
WasmTensor::Numeric(tensor_numeric) => {
assert_eq!(
tensor_numeric.buffer,
slice_to_bytes(&[1.0 as $RustType, 2.0 as $RustType, 3.0 as $RustType])
);
}
_ => {
panic!(concat!("Expected WasmTensor::Numeric variant"));
}
}
}

#[test]
fn [< $TypeStr "_tensor_wasm_to_carton" >]() {
let buffer = slice_to_bytes(&[1.0 as $RustType, 2.0 as $RustType, 3.0 as $RustType]);
let tensor = WasmTensor::Numeric(TensorNumeric {
buffer: buffer.to_vec(),
dtype: Dtype::$CartonType,
shape: vec![3],
});
let carton_tensor: CartonTensor = tensor.into();
match carton_tensor {
CartonTensor::$CartonType(storage) => {
assert_eq!(
storage.view().as_slice().unwrap(),
&[1.0 as $RustType, 2.0 as $RustType, 3.0 as $RustType]
);
}
_ => {
panic!(concat!("Expected CartonTensor::", stringify!($CartonType), " variant"));
}
}
}
}
)*
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I just expected a single test for each direction that tested all the types. It looks like you got separate tests working! Hopefully figuring out how to do it didn't take too much time.

@VivekPanyam
Copy link
Owner

(Looks like the formatting check failed. You need to run cargo fmt)

@VivekPanyam
Copy link
Owner

VivekPanyam commented Oct 11, 2023

Once you run cargo fmt and update the PR, can you

  1. Change the PR title to remove the [WIP] and add a few words (e.g. "Initial implementation of Wasm runner")
  2. Add a comment with what you want the commit message to be for the merged PR (normally, this is automatically set to original PR description, but that probably doesn't make sense in this case)
  3. Mark the PR as "ready for review" (edit: I did this)

Thanks!

@VivekPanyam VivekPanyam marked this pull request as ready for review October 11, 2023 23:27
@leizaf leizaf changed the title [WIP] Wasm runner Initial Wasm runner implementation Oct 11, 2023
@VivekPanyam VivekPanyam marked this pull request as draft October 11, 2023 23:35
@VivekPanyam VivekPanyam marked this pull request as ready for review October 11, 2023 23:35
@leizaf
Copy link
Contributor Author

leizaf commented Oct 11, 2023

Appreciate the comments on using the macro. I couldn't figure it out initially, and I also didn't know you could match with macros like that. I don't want to restart CI so I can throw those changes in the next PR, or you could refactor them as well. I'll add a more detailed description in a bit.

@leizaf
Copy link
Contributor Author

leizaf commented Oct 12, 2023

Description

This PR adds a WASM runner, which can run WASM models compiled using the interface (subject to change #175) defined in ../carton-runner-wasm/wit/lib.wit. The existing implementation is still unoptimized, requiring 2 copies per Tensor moved to/from WASM. An example of compiling a compatible model can be found in carton-runner-wasm/tests/test_model.

Limitations

  • Only the wasm32-unknown-unknown target has been tested to be working.
  • Only infer is supported for now.
  • Packing only supports a single .wasm file and no other artifacts.
  • No WebGPU, and probably not for a while.

Test Coverage

All type conversions from Carton to WASM and vice versa and fully covered. Pack, Load, Infer are covered in pack.rs.

TODOs

Track in #164

@VivekPanyam VivekPanyam merged commit 0aef525 into VivekPanyam:main Oct 12, 2023
@VivekPanyam
Copy link
Owner

Merged! Nice work :) 🎉

@leizaf leizaf mentioned this pull request Oct 12, 2023
@leizaf leizaf deleted the wasm-runner branch October 12, 2023 01:25
VivekPanyam added a commit that referenced this pull request Oct 14, 2023
Although #173 included several dependency changes, it did not include an
updated `Cargo.lock` file. This PR updates the lock file and adds a
check to CI to ensure that lock files match manifest changes.

### Test plan

CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants