diff --git a/docs/examples-minimal.md b/docs/examples-minimal.md index b6f813552e5f..f4bc58a567ba 100644 --- a/docs/examples-minimal.md +++ b/docs/examples-minimal.md @@ -6,22 +6,22 @@ of Wasmtime and how to best produce a minimal build of Wasmtime. ## Building a minimal CLI -> *Note*: the exact numbers in this section were last updated on 2023-10-18 on a -> macOS aarch64 host. For up-to-date numbers consult the artifacts in the [`dev` -> release of Wasmtime][dev] where the `wasmtime-min` executable represents the -> culmination of these steps. +> *Note*: the exact numbers in this section were last updated on 2024-12-12 on a +> Linux x86\_64 host. For up-to-date numbers consult the artifacts in the [`dev` +> release of Wasmtime][dev] where the `min/lib/libwasmtime.so` binary +> represents the culmination of these steps. [dev]: https://github.com/bytecodealliance/wasmtime/releases/tag/dev Many Wasmtime embeddings go through the `wasmtime` crate as opposed to the -`wasmtime` CLI executable, but to start out let's take a look at minimizing the -command line executable. By default the wasmtime command line executable is +Wasmtime C API `libwasmtime.so`, but to start out let's take a look at +minimizing the dynamic library as a case study. By default the C API is relatively large: ```shell -$ cargo build -$ ls -l ./target/debug/wasmtime --rwxr-xr-x@ 1 root root 140M Oct 18 08:33 target/debug/wasmtime +$ cargo build -p wasmtime-c-api +$ ls -lh ./target/debug/libwasmtime.so +-rwxrwxr-x 2 alex alex 260M Dec 12 07:46 target/debug/libwasmtime.so ``` The easiest size optimization is to compile with optimizations. This will strip @@ -29,29 +29,27 @@ lots of dead code and additionally generate much less debug information by default ```shell -$ cargo build --release -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 33M Oct 18 08:34 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 19M Dec 12 07:46 target/release/libwasmtime.so ``` Much better, but still relatively large! The next thing that can be done is to -disable the default features of the `wasmtime-cli` crate. This will remove all +disable the default features of the C API. This will remove all optional functionality from the crate and strip it down to the bare bones -functionality. Note though that `run` is included to keep the ability to run -precompiled WebAssembly files as otherwise the CLI doesn't have any -functionality which isn't too useful. +functionality. ```shell -$ cargo build --release --no-default-features --features run -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 6.7M Oct 18 08:37 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release --no-default-features +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 2.1M Dec 12 07:47 target/release/libwasmtime.so ``` -Note that this executable is stripped to the bare minimum of functionality which +Note that this library is stripped to the bare minimum of functionality which notably means it does not have a compiler for WebAssembly files. This means that -`wasmtime compile` is no longer supported meaning that `*.cwasm` files must be -fed to `wasmtime run` to execute files. Additionally error messages will be -worse in this mode as less contextual information is provided. +compilation is no longer supported meaning that `*.cwasm` files must used to +create a module. Additionally error messages will be worse in this mode as less +contextual information is provided. The final Wasmtime-specific optimization you can apply is to disable logging statements. Wasmtime and its dependencies make use of the [`log` @@ -63,9 +61,9 @@ feature which sets the `max_level_off` feature for the `log` and `tracing` crate. ```shell -$ cargo build --release --no-default-features --features run,disable-logging -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 6.7M Oct 18 08:37 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release --no-default-features --features disable-logging +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 2.1M Dec 12 07:49 target/release/libwasmtime.so ``` At this point the next line of tricks to apply to minimize binary size are @@ -81,9 +79,9 @@ this. ```shell $ export CARGO_PROFILE_RELEASE_OPT_LEVEL=s -$ cargo build --release --no-default-features --features run,disable-logging -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 6.8M Oct 18 08:40 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release --no-default-features --features disable-logging +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 2.4M Dec 12 07:49 target/release/libwasmtime.so ``` Note that the size has increased here slightly instead of going down. Optimizing @@ -101,9 +99,9 @@ executable. ```shell $ export CARGO_PROFILE_RELEASE_OPT_LEVEL=s $ export CARGO_PROFILE_RELEASE_PANIC=abort -$ cargo build --release --no-default-features --features run,disable-logging -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 5.0M Oct 18 08:40 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release --no-default-features --features disable-logging +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 2.0M Dec 12 07:49 target/release/libwasmtime.so ``` Next, if the compile time hit is acceptable, LTO can be enabled to provide @@ -116,9 +114,9 @@ to compile than previously. Here LTO is configured with $ export CARGO_PROFILE_RELEASE_OPT_LEVEL=s $ export CARGO_PROFILE_RELEASE_PANIC=abort $ export CARGO_PROFILE_RELEASE_LTO=true -$ cargo build --release --no-default-features --features run,disable-logging -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 3.3M Oct 18 08:42 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release --no-default-features --features disable-logging +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 1.2M Dec 12 07:50 target/release/libwasmtime.so ``` Similar to LTO above rustc can be further instructed to place all crates into @@ -131,9 +129,9 @@ $ export CARGO_PROFILE_RELEASE_OPT_LEVEL=s $ export CARGO_PROFILE_RELEASE_PANIC=abort $ export CARGO_PROFILE_RELEASE_LTO=true $ export CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 -$ cargo build --release --no-default-features --features run,disable-logging -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 3.3M Oct 18 08:43 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release --no-default-features --features disable-logging +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 1.2M Dec 12 07:50 target/release/libwasmtime.so ``` Note that with LTO using a single codegen unit may only have marginal benefit. @@ -152,9 +150,9 @@ $ export CARGO_PROFILE_RELEASE_PANIC=abort $ export CARGO_PROFILE_RELEASE_LTO=true $ export CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 $ export CARGO_PROFILE_RELEASE_STRIP=debuginfo -$ cargo build --release --no-default-features --features run,disable-logging -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 2.4M Oct 18 08:44 target/release/wasmtime +$ cargo build -p wasmtime-c-api --release --no-default-features --features disable-logging +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 1.2M Dec 12 07:50 target/release/libwasmtime.so ``` Next, if your use case allows it, the Nightly Rust toolchain provides a number @@ -174,9 +172,9 @@ $ export CARGO_PROFILE_RELEASE_LTO=true $ export CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 $ export CARGO_PROFILE_RELEASE_STRIP=debuginfo $ export RUSTFLAGS="-Zlocation-detail=none" -$ cargo +nightly build --release --no-default-features --features run,disable-logging -$ ls -l ./target/release/wasmtime --rwxr-xr-x@ 1 root root 2.4M Oct 18 08:43 target/release/wasmtime +$ cargo +nightly build -p wasmtime-c-api --release --no-default-features --features disable-logging +$ ls -lh ./target/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 1.2M Dec 12 07:51 target/release/libwasmtime.so ``` Further along the line of nightly features the next optimization will recompile @@ -192,10 +190,10 @@ $ export CARGO_PROFILE_RELEASE_LTO=true $ export CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 $ export CARGO_PROFILE_RELEASE_STRIP=debuginfo $ export RUSTFLAGS="-Zlocation-detail=none" -$ cargo +nightly build --release --no-default-features --features run,disable-logging \ +$ cargo +nightly build -p wasmtime-c-api --release --no-default-features --features disable-logging \ -Z build-std=std,panic_abort --target aarch64-apple-darwin -$ ls -l ./target/aarch64-apple-darwin/release/wasmtime --rwxr-xr-x@ 1 root root 2.3M Oct 18 09:39 target/aarch64-apple-darwin/release/wasmtime +$ ls -lh target/x86_64-unknown-linux-gnu/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 941K Dec 12 07:52 target/x86_64-unknown-linux-gnu/release/libwasmtime.so ``` Next the Rust standard library has some optional features in addition to @@ -211,51 +209,52 @@ $ export CARGO_PROFILE_RELEASE_LTO=true $ export CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 $ export CARGO_PROFILE_RELEASE_STRIP=debuginfo $ export RUSTFLAGS="-Zlocation-detail=none" -$ cargo +nightly build --release --no-default-features --features run,disable-logging \ +$ cargo +nightly build -p wasmtime-c-api --release --no-default-features --features disable-logging \ -Z build-std=std,panic_abort --target aarch64-apple-darwin \ -Z build-std-features= -$ ls -l ./target/aarch64-apple-darwin/release/wasmtime --rwxr-xr-x@ 1 root root 2.1M Oct 18 09:39 target/aarch64-apple-darwin/release/wasmtime +$ ls -lh target/x86_64-unknown-linux-gnu/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 784K Dec 12 07:53 target/x86_64-unknown-linux-gnu/release/libwasmtime.so +``` + +And finally, if you can enable the `panic_immediate_abort` feature of the Rust +standard library to shrink panics even further. Note that this comes at a cost +of making bugs/panics very difficult to debug. + +```shell +$ export CARGO_PROFILE_RELEASE_OPT_LEVEL=s +$ export CARGO_PROFILE_RELEASE_PANIC=abort +$ export CARGO_PROFILE_RELEASE_LTO=true +$ export CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 +$ export CARGO_PROFILE_RELEASE_STRIP=debuginfo +$ export RUSTFLAGS="-Zlocation-detail=none" +$ cargo +nightly build -p wasmtime-c-api --release --no-default-features --features disable-logging \ + -Z build-std=std,panic_abort --target aarch64-apple-darwin \ + -Z build-std-features=panic_immediate_abort +$ ls -lh target/x86_64-unknown-linux-gnu/release/libwasmtime.so +-rwxrwxr-x 2 alex alex 698K Dec 12 07:54 target/x86_64-unknown-linux-gnu/release/libwasmtime.so ``` ## Minimizing further -Above shows an example of taking the default `cargo build` result of 130M down -to a 2.1M binary for the `wasmtime` executable. Similar steps can be done to -reduce the size of the C API binary artifact as well which currently produces a -~2.8M dynamic library. This is currently the smallest size with the source code -as-is, but there are more size reductions which haven't been implemented yet. +Above shows an example of taking the default `cargo build` result of 260M down +to a 700K binary for the `libwasmtime.so` binary of the C API. Similar steps +can be done to reduce the size of the `wasmtime` CLI executable as well. This is +currently the smallest size with the source code as-is, but there are more size +reductions which haven't been implemented yet. This is a listing of some example sources of binary size. Some sources of binary size may not apply to custom embeddings since, for example, your custom embedding might already not use WASI and might already not be included. -* WASI in the Wasmtime CLI - currently the CLI includes all of WASI. This - includes two separate implementations of WASI - one for preview2 and one for - preview1. This accounts for 1M+ of space which is a significant chunk of the - remaining 2.1M. While removing just preview2 or preview1 would be easy enough - with a Cargo feature, the resulting executable wouldn't be able to do - anything. Something like a [plugin feature for the - CLI](https://github.com/bytecodealliance/wasmtime/issues/7348), however, would - enable removing WASI while still being a usable executable. - -* Argument parsing in the Wasmtime CLI - as a command line executable `wasmtime` - contains parsing of command line arguments which currently uses the `clap` - crate. This contributes ~200k of binary size to the final executable which - would likely not be present in a custom embedding of Wasmtime. While this - can't be removed from Wasmtime it's something to consider when evaluating the - size of CI artifacts. - -* Cranelift in the C API - one of the features of Wasmtime is the ability to - have a runtime without Cranelift that only supports precompiled (AOT) wasm - modules. It's [not possible to build the C API without - Cranelift](https://github.com/bytecodealliance/wasmtime/issues/7349) though - because defining host functions requires Cranelift at this time to emit some - stubs. This means that the C API is significantly larger than a custom Rust - embedding which doesn't suffer from the same restriction. This means that - while it's still possible to build an embedding of Wasmtime which doesn't have - Cranelift it's not easy to see what it might look like size-wise from - looking at the C API artifacts. +* Unused functionality in the C API - building `libwasmtime.{a,so}` can show a + misleading file size because the linker is unable to remove unused code. For + example `libwasmtime.so` contains all code for the C API but your embedding + may not be using all of the symbols present so in practice the final linked + binary will often be much smaller than `libwasmtime.so`. Similarly + `libwasmtime.a` is forced to contain the entire C API so its size is likely + much larger than a linked application. For a minimal embedding it's + recommended to link against `libwasmtime.a` with `--gc-sections` as a linker + flag and evaluate the size of your own application. * Formatting strings in Wasmtime - Wasmtime makes extensive use of formatting strings for error messages and other purposes throughout the implementation. @@ -266,14 +265,32 @@ embedding might already not use WASI and might already not be included. size is accounted for by formatting string is unknown, but it's well known in Rust that `std::fmt` is not the slimmest of modules. -* Cranelift vs Winch - the "min" builds on CI try to exclude Cranelift from - their binary footprint (e.g. the CLI excludes it) but this comes at a cost of - the final executable not supporting compilation of wasm modules. If this is - required then no effort has yet been put into minimizing the code size of - Cranelift itself. One possible tradeoff that can be made though is to choose - between the Winch baseline compiler vs Cranelift. Winch should be much smaller - from a compiled footprint point of view while not sacrificing everything in - terms of performance. Note though that Winch is still under development. +* CLI: WASI implementation - currently the CLI includes all of WASI. This + includes two separate implementations of WASI - one for preview2 and one for + preview1. This accounts for 1M+ of space which is a significant chunk of the + remaining ~2M. While removing just preview2 or preview1 would be easy enough + with a Cargo feature, the resulting executable wouldn't be able to do + anything. Something like a [plugin feature for the + CLI](https://github.com/bytecodealliance/wasmtime/issues/7348), however, would + enable removing WASI while still being a usable executable. Note that the C + API's implementation of WASI can be disabled because custom host functionality + can be provided. + +* CLI: Argument parsing - as a command line executable `wasmtime` contains + parsing of command line arguments which currently uses the `clap` crate. This + contributes ~200k of binary size to the final executable which would likely + not be present in a custom embedding of Wasmtime. While this can't be removed + from Wasmtime it's something to consider when evaluating the size of CI + artifacts. + +* Cranelift vs Winch - the "min" builds on CI exclude Cranelift from their + binary footprint but this comes at a cost of the final binary not + supporting compilation of wasm modules. If this is required then no effort + has yet been put into minimizing the code size of Cranelift itself. One + possible tradeoff that can be made though is to choose between the Winch + baseline compiler vs Cranelift. Winch should be much smaller from a compiled + footprint point of view while not sacrificing everything in terms of + performance. Note though that Winch is still under development. Above are some future avenues to take in terms of reducing the binary size of Wasmtime and various tradeoffs that can be made. The Wasmtime project is eager @@ -284,22 +301,68 @@ and we'd be happy to discuss more how best to handle a particular use case. # Building Wasmtime for a Custom Platform -If you're not running on a built-in supported platform such as Windows, macOS, -or Linux, then Wasmtime won't work out-of-the-box for you. Wasmtime includes a -compilation mode, however, that enables you to define how to work with the -platform externally. - -This mode is enabled when `--cfg wasmtime_custom_platform` is passed to rustc, -via `RUSTFLAGS` for example when building through Cargo, when an existing -platform is not matched. This means that with this configuration Wasmtime may be -compiled for custom or previously unknown targets. - -Wasmtime's current "platform embedding API" which is required to operate is -defined at `examples/min-platform/embedding/wasmtime-platform.h`. That directory -additionally has an example of building a minimal `*.so` on Linux which has the -platform API implemented in C using Linux syscalls. While a bit contrived it -effectively shows a minimal Wasmtime embedding which has no dependencies other -than the platform API. +Wasmtime supports a wide range of functionality by default on major operating +systems such as Windows, macOS, and Linux, but this functionality is not +necessarily present on all platforms (much less custom platforms). Most of +Wasmtime's features are gated behind either platform-specific configuration +flags or Cargo feature flags. The `wasmtime` crate for example documents +[important crate +features](https://docs.rs/wasmtime/latest/wasmtime/#crate-features) which likely +want to be disabled for custom platforms. + +Not all of Wasmtime's features are supported on all platforms, but many are +enabled by default. For example the `parallel-compilation` crate feature +requires the host platform to have threads, or in other words the Rust `rayon` +crate must compile for your platform. If the `parallel-compilation` feature is +disabled, though, then `rayon` won't be compiled. For a custom platform, one of +the first things you'll want to do is to disable the default features of the +`wasmtime` crate (or C API). + +Some important features to be aware of for custom platforms are: + +* `runtime` - you likely want to enable this feature since this includes the + runtime to actually execute WebAssembly binaries. + +* `cranelift` and `winch` - you likely want to disable these features. This + primarily cuts down on binary size. Note that you'll need to use `*.cwasm` + artifacts so wasm files will need to be compiled outside of the target + platform and transferred to them. + +* `signals-based-traps` - without this feature Wasmtime won't rely on host OS + signals (e.g. segfaults) at runtime and will instead perform manual checks to + avoid signals. This increases portability at the cost of runtime performance. + For maximal portability leave this disabled. + +When compiling Wasmtime for an unknown platform, for example "not Windows" or +"not Unix", then Wasmtime will need some symbols to be provided by the embedder +to operate correctly. The header file at +[`examples/min-platform/embedding/wasmtime-platform.h`][header] describes the +symbols that the Wasmtime runtime requires to work which your platform will need +to provide. Some important notes about this are: + +* `wasmtime_{setjmp,longjmp}` are required for trap handling at this time. These + are thin wrappers around the standard `setjmp` and `longjmp` symbols you'll + need to provide. An example implementation [looks like this][jumps]. In the + future this dependency is likely going to go away as trap handling and + unwinding is migrated to compiled code (e.g. Cranelift) itself. + +* `wasmtime_tls_{get,set}` are required for the runtime to operate. Effectively + a single pointer of TLS storage is necessary. Whether or not this is actually + stored in TLS is up to the embedder, for example [storage in `static` + memory][tls] is ok if the embedder knows it won't be using threads. + +* `WASMTIME_SIGNALS_BASED_TRAPS` - if this `#define` is given (e.g. the + `signals-based-traps` feature was enabled at compile time), then your platform + must have the concept of virtual memory and support `mmap`-like APIs and + signal handling. Many APIs in [this header][header] are disabled if + `WASMTIME_SIGNALS_BASED_TRAPS` is turned off which is why it's more portable, + but if you enable this feature all of these APIs must be implemented. + +You can find an example [in the `wasmtime` repository][example] of building a +minimal embedding. Note that for Rust code you'll be using `#![no_std]` and +you'll need to provide a memory allocator and a panic handler as well. The +memory alloator will likely get hooked up to your platform's memory allocator +and the panic handler mostly just needs to abort. Building Wasmtime for a custom platform is not a turnkey process right now, there are a number of points that need to be considered: @@ -308,15 +371,9 @@ there are a number of points that need to be considered: target](https://docs.rust-embedded.org/embedonomicon/custom-target.html). This means that Nightly Rust will be required. -* Wasmtime and its dependencies require the Rust standard library `std` to be - available. The Rust standard library can be compiled for any target with - unsupported functionality being stubbed out. This mode of compiling the Rust - standard library is not stable, however. Currently this is done through the - `-Zbuild-std` argument to Cargo along with a - `+RUSTC_BOOTSTRAP_SYNTHETIC_TARGET=1` environment variable. - -* Wasmtime additionally depends on the availability of a memory allocator (e.g. - `malloc`). Wasmtime assumes that failed memory allocation aborts the process. +* Wasmtime depends on the availability of a memory allocator (e.g. `malloc`). + Wasmtime assumes that failed memory allocation aborts execution (except for + the case of allocating linear memories and growing them). * Not all features for Wasmtime can be built for custom targets. For example WASI support does not work on custom targets. When building Wasmtime you'll @@ -326,3 +383,8 @@ there are a number of points that need to be considered: The `examples/min-platform` directory has an example of building this minimal embedding and some necessary steps. Combined with the above features about producing a minimal build currently produces a 400K library on Linux. + +[header]: https://github.com/bytecodealliance/wasmtime/blob/main/examples/min-platform/embedding/wasmtime-platform.h +[jumps]: https://github.com/bytecodealliance/wasmtime/blob/e1307216f2aa74fd60c621c8fa326ba80e2a2f75/examples/min-platform/embedding/wasmtime-platform.c#L60-L72 +[tls]: https://github.com/bytecodealliance/wasmtime/blob/e1307216f2aa74fd60c621c8fa326ba80e2a2f75/examples/min-platform/embedding/wasmtime-platform.c#L144-L150 +[example]: https://github.com/bytecodealliance/wasmtime/blob/main/examples/min-platform/README.md