diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml new file mode 100644 index 0000000..a6efca5 --- /dev/null +++ b/.github/workflows/main.yml @@ -0,0 +1,16 @@ +name: CI + +on: + push: + pull_request: + +jobs: + canonical_abi: + name: Run Canonical ABI Tests + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - uses: actions/setup-python@v3 + with: + python-version: '>= 3.10.0' + - run: python design/mvp/canonical-abi/run_tests.py diff --git a/README.md b/README.md new file mode 100644 index 0000000..c41092e --- /dev/null +++ b/README.md @@ -0,0 +1,38 @@ +# Component Model design and specification + +This repository is where the component model is being standardized. For a more user-focussed explanation, take a look at the **[Component Model Documentation]**. + +This repository describes the high-level [goals], [use cases], [design choices] +and [FAQ] of the component model as well as a more-detailed [assembly-level explainer], [IDL], +[binary format] and [ABI] covering the initial Minimum Viable Product (MVP) +release. + +In the future, this repository will additionally contain a [formal spec], +reference interpreter and test suite. + +## Milestones + +The Component Model is currently being incrementally developed and stabilized +as part of [WASI Preview 2]. The subsequent "Preview 3" milestone will be +primarily concerned with the addition of [async support]. + +## Contributing + +All Component Model work is done as part of the [W3C WebAssembly Community Group]. +To contribute to any of these repositories, see the Community Group's +[Contributing Guidelines]. + +[Component Model Documentation]: https://component-model.bytecodealliance.org/ +[goals]: design/high-level/Goals.md +[use cases]: design/high-level/UseCases.md +[design choices]: design/high-level/Choices.md +[FAQ]: design/high-level/FAQ.md +[assembly-level explainer]: design/mvp/Explainer.md +[IDL]: design/mvp/WIT.md +[binary format]: design/mvp/Binary.md +[ABI]: design/mvp/CanonicalABI.md +[formal spec]: spec/ +[W3C WebAssembly Community Group]: https://www.w3.org/community/webassembly/ +[Contributing Guidelines]: https://webassembly.org/community/contributing/ +[WASI Preview 2]: https://github.com/WebAssembly/WASI/tree/main/preview2 +[Async Support]: https://docs.google.com/presentation/d/1MNVOZ8hdofO3tI0szg_i-Yoy0N2QPU2C--LzVuoGSlE/edit?usp=share_link diff --git a/design/LICENSE b/design/LICENSE new file mode 100644 index 0000000..8f71f43 --- /dev/null +++ b/design/LICENSE @@ -0,0 +1,202 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright {yyyy} {name of copyright owner} + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + diff --git a/design/README.md b/design/README.md new file mode 100644 index 0000000..b9032ba --- /dev/null +++ b/design/README.md @@ -0,0 +1 @@ +See the [parent README](../README.md). diff --git a/design/high-level/Choices.md b/design/high-level/Choices.md new file mode 100644 index 0000000..ff48a3b --- /dev/null +++ b/design/high-level/Choices.md @@ -0,0 +1,34 @@ +# Component Model High-Level Design Choices + +Based on the [goals](Goals.md) and [use cases](UseCases.md), the component +model makes several high-level design choices that permeate the rest of the +component model. + +1. The component model adopts a shared-nothing architecture in which component + instances fully encapsulate their linear memories, tables, globals and, in + the future, GC memory. Component interfaces contain only immutable copied + values, opaque typed handles and immutable uninstantiated modules/components. + While handles and imports can be used as an indirect form of sharing, the + [dependency use cases](UseCases.md#component-dependencies) enable this degree + of sharing to be finely controlled. + +2. The component model introduces no global singletons, namespaces, registries, + locator services or frameworks through which components are configured or + linked. Instead, all related use cases are addressed through explicit + parametrization of components via imports (of data, functions, and types) + with every client of a component having the option to independently + instantiate the component with its own chosen import values. + +3. The component model assumes no global inter-component garbage or cycle + collector that is able to trace through cross-component cycles. Instead + resources have lifetimes and require explicit acyclic ownership through + handles. The explicit lifetimes allow resources to have destructors that are + called deterministically and can be used to release linear memory + allocations in non-garbage-collected languages. + +4. The component model assumes that Just-In-Time compilation is not available + at runtime and thus only provides declarative linking features that admit + Ahead-of-Time compilation, optimization and analysis. While component instances + can be created at runtime, the components being instantiated as well as their + dependencies and clients are known before execution begins. + (See also [this slide](https://docs.google.com/presentation/d/1PSC3Q5oFsJEaYyV5lNJvVgh-SNxhySWUqZ6puyojMi8/edit#slide=id.gceaf867ebf_0_10).) diff --git a/design/high-level/FAQ.md b/design/high-level/FAQ.md new file mode 100644 index 0000000..f403f8e --- /dev/null +++ b/design/high-level/FAQ.md @@ -0,0 +1,26 @@ +# FAQ + +### How does WASI relate to the Component Model? + +[WASI] is layered on top of the Component Model, with the Component Model +providing the foundational building blocks used to define WASI's interfaces, +including: +* the grammar of types that can be used in WASI interfaces; +* the linking functionality that WASI can assume is used to compose separate + modules of code, isolate their capabilities and virtualize WASI interfaces; +* the core wasm ABI that core wasm toolchains can compile against when targeting WASI. + +By way of comparison to traditional Operating Systems, the Component Model +fills the role of an OS's process model (defining how processes start up and +communicate with each other) while WASI fills the role of an OS's many I/O +interfaces. + +Use of WASI does not force the client to target the Component Model, however. +Any core wasm producer can simply target the core wasm ABI defined by the +Component Model for a given WASI interface's signature. This approach reopens +many questions that are answered by the Component Model, particularly when more +than one wasm module is involved, but for single-module scenarios or highly +custom scenarios, this might be appropriate. + + +[WASI]: https://github.com/WebAssembly/WASI/blob/main/README.md diff --git a/design/high-level/Goals.md b/design/high-level/Goals.md new file mode 100644 index 0000000..b345e59 --- /dev/null +++ b/design/high-level/Goals.md @@ -0,0 +1,55 @@ +# Component Model High-Level Goals + +(For comparison, see WebAssembly's [original High-Level Goals].) + +1. Define a portable, load- and run-time-efficient binary format for + separately-compiled components built from WebAssembly core modules that + enable portable, cross-language composition. +2. Support the definition of portable, virtualizable, statically-analyzable, + capability-safe, language-agnostic interfaces, especially those being + defined by [WASI]. +3. Maintain and enhance WebAssembly's unique value proposition: + * *Language neutrality*: avoid biasing the component model toward just one + language or family of languages. + * *Embeddability*: design components to be embedded in a diverse set of + host execution environments, including browsers, servers, intermediaries, + small devices and data-intensive systems. + * *Optimizability*: maximize the static information available to + Ahead-of-Time compilers to minimize the cost of instantiation and + startup. + * *Formal semantics*: define the component model within the same semantic + framework as core wasm. + * *Web platform integration*: ensure components can be natively supported + in browsers by extending the existing WebAssembly integration points: the + [JS API], [Web API] and [ESM-integration]. Before native support is + implemented, ensure components can be polyfilled in browsers via + Ahead-of-Time compilation to currently-supported browser functionality. +4. Define the component model *incrementally*: starting from a set of + [initial use cases] and expanding the set of use cases over time, + prioritized by feedback and experience. + +## Non-goals + +1. Don't attempt to solve 100% of WebAssembly embedding scenarios. + * Some scenarios will require features in conflict with the above-mentioned goal. + * With the layered approach to specification, unsupported embedding + scenarios can be solved via alternative layered specifications or by + directly embedding the existing WebAssembly core specification. +2. Don't attempt to solve problems that are better solved by some combination + of the toolchain, the platform or higher layer specifications, including: + * package management and version control; + * deployment and live upgrade / dynamic reconfiguration; + * persistence and storage; and + * distributed computing and partial failure. +3. Don't specify a set of "component services". + * Specifying services that may be implemented by a host and exposed to + components is the domain of WASI and out of scope of the component model. + * See also the [WASI FAQ entry](FAQ.md#how-does-wasi-relate-to-the-component-model). + + +[original High-Level Goals]: https://github.com/WebAssembly/design/blob/main/HighLevelGoals.md +[WASI]: https://github.com/WebAssembly/WASI/blob/main/README.md +[JS API]: https://webassembly.github.io/spec/js-api/index.html +[Web API]: https://webassembly.github.io/spec/web-api/index.html +[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[initial use cases]: UseCases.md#Initial-MVP diff --git a/design/high-level/README.md b/design/high-level/README.md new file mode 100644 index 0000000..5abbd28 --- /dev/null +++ b/design/high-level/README.md @@ -0,0 +1,5 @@ +# Component Model High-Level Design Documents + +This directory contains design documents describing the component model's +[goals](Goals.md), [use cases](UseCases.md), [design choices](Choices.md) +and [FAQ](FAQ.md). diff --git a/design/high-level/UseCases.md b/design/high-level/UseCases.md new file mode 100644 index 0000000..4c9c321 --- /dev/null +++ b/design/high-level/UseCases.md @@ -0,0 +1,340 @@ +# Component Model Use Cases + +## Initial (MVP) + +This section describes a collection of use cases that characterize active and +developing embeddings of wasm and the limitations of the core wasm +specification that they run into outside of a browser context. The use cases +have a high degree of overlap in their required features and help to define the +scope of an "MVP" (Minimum Viable Product) for the Component Model. + +### Hosts embedding components + +One way that components are to be used is by being directly instantiated and +executed by a host (an application, system or service embedding a wasm +runtime), using the component model to provide a common format and toolchain so +that each distinct host doesn't have to define its own custom conventions and +sets of tools for solving the same problems. + +#### Value propositions to hosts for embedding components + +First, it's useful to enumerate some use cases for why the host wants to run +wasm in the first place (instead of using an alternative virtualization or +sandboxing technology): + +1. A native language runtime (like node.js or CPython) uses components as a + portable, sandboxed alternative to the runtime's native plugins, avoiding the + portability and security problems of native plugins. +2. A serverless platform wishing to move code closer to data or clients uses + wasm components in place of a fixed scripting language, leveraging wasm's + strong sandboxing and language neutrality. +3. A serverless platform wishing to spin up fresh execution contexts at high + volume with low latency uses wasm components due to their low overhead and fast + instantiation. +4. A system or service adds support for efficient, multi-language "scripting" + with only a modest amount of engineering effort by embedding an existing + component runtime, reusing existing WASI standards support where applicable. +5. A large application decouples the updating of modular pieces of the + application from the updating of the natively-installed base application, + by distributing and running the modular pieces as wasm components. +6. A monolithic application sandboxes an unsafe library by compiling it into a + wasm component and then AOT-compiling the wasm component into native code + linked into the monolithic application (e.g., [RLBox]). +7. A large application practices [Principle of Least Authority] and/or + [Modular Programming] by decomposing the application into wasm components, + leveraging the lightweight sandboxing model of wasm to avoid the overhead of + traditional process-based decomposition. + +#### Invoking component exports from the host + +Once a host chooses to embed wasm (for one of the preceding reasons), the first +design choice is how the host executes the wasm code. The core wasm [start function] +is sometimes used for this purpose, however the lack of parameters or results +miss out on several use cases listed below, which suggest the use of exported +wasm functions with typed signatures instead. However, there are a number of +use cases that go beyond the ability of core wasm: + +1. A JS developer `import`s a component (via [ESM-integration]) and calls the + component's exports as JS functions, passing high-level JS values like strings, + objects and arrays which are automatically coerced according to the high-level, + typed interface of the invoked component. +2. A generic wasm runtime CLI allows the user to invoke the exports of a + component directly from the command-line, automatically parsing argv and env + vars according to the high-level, typed interface of the invoked component. +3. A generic wasm runtime HTTP server maps HTTP endpoints onto the exports of a + component, automatically parsing request params, headers and body and + generating response headers and body according to the high-level, typed + interface of the invoked component. +4. A host implements a wasm execution platform by invoking wasm component + exports in response to domain-specific events (e.g., on new request, on new + chunk of data available for processing, on trigger firing) through a fixed + interface that is either standardized (e.g., via WASI) or specific to the host. + +The first three use cases demonstrate a more general use case of generically +reflecting typed component exports in terms of host-native concepts. + +#### Exposing host functionality to components as imports + +Once wasm has been invoked by the host, the next design choice is how to expose +the host's native functionality and resources to the wasm code while it executes. +Imports are the natural choice and already used for this purpose, but there are +a number of use cases that go beyond what can be expressed with core wasm +imports: + +1. A host defines imports in terms of explicit high-level value types (e.g., + numbers, strings, lists, records and variants) that can be automatically + bound to the calling component's source-language values. +2. A host returns non-value, non-copied resources (like files, storage + connections and requests/responses) to components via unforgeable handles + (analogous to Unix file descriptors). +3. A host exposes non-blocking and/or streaming I/O to components through + language-neutral interfaces that can be bound to different components' + source languages' concurrency features (such as promises, futures, + async/await and coroutines). +4. A host passes configuration (e.g., values from config files and secrets) to + a component through imports of typed high-level values and handles. +5. A component declares that a particular import is "optional", allowing that + component to execute on hosts with or without the imported functionality. +6. A developer instantiates a component with native host imports in production + and with mock or emulated imports in local development and testing. + +#### Host-determined component lifecycles and associativity + +Another design choice when a host embeds wasm is when to create new instances, +when to route events to existing instances, when existing instances are +destroyed, and how, if there are multiple live instances, do they interact with +each other, if at all. Some use cases include: + +1. A host creates many ephemeral, concurrent component instances, each of which + is tied to a particular host-domain-specific entity's lifecycle (e.g. a + request-response pair, connection, session, job, client or tenant), with a + component instance being destroyed when the associated entity's + domain-specified lifecycle completes. +2. A host delivers fine-grained events, for which component instantiation would + have too much overhead if performed per-event or for which retained mutable + state is desired, by making multiple export calls on the same component + instance over time. Export calls can be asynchronous, allowing multiple + fine-grained events to be processed concurrently. For example, multiple + packets could be delivered as multiple export calls to the component instance + for a connection. +3. A host represents associations between longer- and shorter-lived + host-domain-specific entities (e.g., a "connection's session" or a "session's + user") by having the shorter-lived component instances (e.g., "connections") + import the exports of the longer-lived component instances (e.g., "sessions"). + +### Component composition + +The other way components are to be used (other than via direct execution by the +host) is by other components, through component composition. + +#### Value propositions to developers for composing components + +Enumerating some of the reasons why we might want to compose components in the +first place (instead of simply using the module/package mechanisms built into +the programming language): + +1. A component developer reuses code already written in another language + instead of having to reimplement the functionality from scratch. +2. A component developer writing code in a high-level scripting language (e.g., + JS or Python) reuses high-performance code written in a lower-level language + (e.g., C++ or Rust). +3. A component developer mitigates the impact of supply-chain attacks by + putting their dependencies into several components and controlling the + capabilities delegated to each, taking advantage of the strong sandboxing model + of components. +4. A component runtime implements built-in host functionality as wasm + components to reduce the [Trusted Computing Base]. +5. An application developer applies the Unix philosophy without incurring the + full cost and OS-dependency of splitting their program into multiple processes + by instead having each component do one thing well and using the component + model to compose their program as a hierarchy of components. +6. An application developer composes multiple independently-developed + components that import and export the same interface (e.g., a HTTP + request-handling interface) by linking them together, exports-to-imports, being + able to create recursive, branching DAGs of linked components not otherwise + expressible with classic Unix-style pipelines. + +In all the above use cases, the developer has an additional goal of keeping the +component reuse as a private, fully-encapsulated implementation detail that +their client doesn't need to be aware of (either directly in code, or +indirectly in the developer workflow). + +#### Composition primitives + +Core wasm already provides the fundamental composition primitives of: imports, +exports and functions, allowing a module to export a function that is imported +by another module. Building from this starting point, there are a number of +use cases that require additional features: + +1. Developers importing or exporting functions use high-level value types in + their function signatures that include strings, lists, records, variants and + arbitrarily-nested combinations of these. Both developers (the caller and + callee) get to use the idiomatic values of their respective languages. + Values are passed by copy so that there is no shared mutation, ownership or + management of these values before or after the call that either developer + needs to worry about. +2. Developers importing or exporting functions use opaque typed handles in + their function signatures to pass resources that cannot or should not be copied + at the callsite. Both developers (the caller and callee) use their respective + languages' abstract data type support for interacting with resources. Handles + can encapsulate `i32` pointers to linear memory allocations that need to be + safely freed when the last handle goes away. +3. Developers import or export functions with signatures containing + concurrency-oriented types (e.g., future and stream) to address + concurrency use cases like non-blocking I/O, early return and streaming. Both + developers (the caller and callee) are able to use their respective languages' + native concurrency support, if it exists, using the concurrency-oriented types + to establish a deterministic communication protocol that defines how the + cross-language composition behaves. +4. A component developer makes a minor [semver] update which changes the + component's type in a logically backwards-compatible manner (e.g., adding a new + case to a variant parameter type). The component model ensures that the new + component stays valid (at link-time and run-time) for use by existing clients + compiled against the older signature. +5. A component developer uses their language, toolchain and memory + representation of choice (including, in the future, [GC memory]), with these + implementation choices fully encapsulated by the component and thus hidden from + the client. The component developer can switch languages, toolchains or memory + representations in the future without breaking existing clients. + +The above use cases roughly correspond to the use cases of an [RPC] framework, +which have similar goals of crossing language boundaries. The major difference +is the dropping of the distributed computing goals (see [non-goals](Goals.md#non-goals)) +and the additional performance goals mentioned [below](#performance). + +#### Component dependencies + +When a client component imports another component as a dependency, there are a +number of use cases for how the dependency's instance is configured and shared +or not shared with other clients of the same dependency. These use cases +require a greater degree of programmer control than allowed by most languages' +native module systems and most native code linking systems while not requiring +fully dynamic linking (e.g., as provided by the [JS API]). + +1. A component developer exposes their component's configuration to clients as + imports that are supplied when the component is instantiated by the client. +2. A component developer configures a dependency independently of any other + clients of the same dependency by creating a fresh private instance of the + dependency and supplying the desired configuration values at instantiation. +3. A component developer imports a dependency as an already-created instance, + giving the component's clients the responsibility to configure the + dependency and the freedom to share it with others. +4. A component developer creates a fresh private instance of a dependency to + isolate the dependency's mutable instance state in order to minimize the + damage that can be caused in the event of a supply chain attack or + exploitable bug in the dependency. +5. A component developer imports an already-created instance of a dependency, + allowing the dependency to use mutable instance state to deduplicate data or + cache common results, optimizing overall app performance. +6. A component developer imports a WASI interface and does not explicitly pass + the WASI interface to a privately-created dependency. The developer knows, + without manually auditing the code of the dependency, that the dependency + cannot access the WASI interface. +7. A component developer creates a private dependency instance, supplying it a + virtualized implementation of a WASI interface. The developer knows, without + manually auditing the code of the dependency, that the dependency exclusively + uses the virtualized implementation. +8. A component developer creates a fresh private instance of a dependency, + supplying the component's own functions as imports to the dependency. The + component does this to parameterize the dependency's behavior with the + component's own logic or implementation choices (achieving the goals usually + accomplished using callback registration or [dependency injection]). + +### Performance + +In pursuit of the above functional use cases, it's important that the component +model not sacrifice the performance properties that motivate the use of wasm in +the first place. Thus, the new features mentioned above should be consistent +with the predictable performance model established by core wasm by supporting +the following use cases: + +1. A component runtime implements cross-component calls with efficient, direct + control flow transfer without thread context switching or synchronization. +2. A component runtime implements component instances without needing to give + each instance its own event loop, green thread or message queue. +3. A component runtime or optimizing AOT compiler compiles all import and + export names into indices or more direct forms of reference (up to and + including direct inlining of cross-component definitions into uses). +4. A component runtime implements value passing between component instances + without ever creating an intermediate O(n) copy of aggregate data types, + outside of either component instance's explicitly-allocated linear memory. +5. A component runtime shares the compiled machine code of a component across + many instances of that component. +6. A component is composed of several core wasm modules that operate on a + single shared linear memory, some of which contain language runtime code + that is shared by all components produced from the same language toolchain. + A component runtime shares the compiled machine code of the shared language + runtime module. +7. A component runtime implements the component model and achieves expected + performance without using any runtime code generation or Just-in-Time + compilation. + +## Post-MVP + +The following are a list of use cases that make sense to support eventually, +but not necessarily in the initial release. + +### Runtime dynamic linking + +* A component lazily creates an instance of its dependency on the first call + to its exports. +* A component dynamically instantiates, calls, then destroys its dependency, + avoiding persistent resource usage by the dependency if the dependency is used + infrequently and/or preventing the dependency from accumulating state across + calls which could create supply chain attack risk. +* A component creates a fresh internal instance every time one of its exports + is called, avoiding any residual state between export calls and aligning with + the usual assumptions of C programs with a `main()`. + +### Parallelism + +* A component creates a new (green) thread to execute an export call to a + dependency, achieving task parallelism while avoiding low-level data races due + to the absence of shared mutable state between the component and the + dependency. +* Two component instances connected via stream execute in separate (green) + threads, achieving pipeline parallelism while preserving determinism due to the + absence of shared mutable state. + +### Copy Minimization + +* A component produces or consumes the high-level abstract value types using + its own arbitrary linear memory representation or procedural interface (like + iterator or generator) without having to make an intermediate copy in linear + memory or copy unwanted elements. +* A component is given a "blob" resource representing an immutable array of + bytes living outside any linear memory that can be semantically copied into + linear memory in a way that, if supported by the host, can be implemented via + copy-on-write memory-mapping. +* A component creates a stream directly from a data segment, avoiding the cost + of first copying the data segment into linear memory and then streaming from + linear memory. + +### Component-level multi-threading + +In the absence of these features, a component can assume its exports are +called in a single-threaded manner (just like core wasm). If and when core wasm +gets a primitive [`fork`] instruction, a component may, as a private +implementation detail, have its internal `shared` memory accessed by multiple +component-internal threads. However, these `fork`ed threads would not be able +to call imports, which could break other components' single-threaded assumptions. + +* A component explicitly annotates a function export with [`shared`], + opting in to it being called simultaneously from multiple threads. +* A component explicitly annotates a function import with `shared`, requiring + the imported function to have been explicitly `shared` and thus callable from + any `fork`ed thread. + +[RLBox]: https://rlbox.dev/ +[Principle of Least Authority]: https://en.wikipedia.org/wiki/Principle_of_least_privilege +[Modular Programming]: https://en.wikipedia.org/wiki/Modular_programming +[start function]: https://webassembly.github.io/spec/core/intro/overview.html#semantic-phases +[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[Trusted Computing Base]: https://en.wikipedia.org/wiki/Trusted_computing_base +[semver]: https://en.wikipedia.org/wiki/Software_versioning +[RPC]: https://en.wikipedia.org/wiki/Remote_procedure_call +[GC memory]: https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md +[JS API]: https://webassembly.github.io/spec/js-api/index.html +[dependency injection]: https://en.wikipedia.org/wiki/Dependency_injection +[`fork`]: https://dl.acm.org/doi/pdf/10.1145/3360559 +[`shared`]: https://dl.acm.org/doi/pdf/10.1145/3360559 diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md new file mode 100644 index 0000000..8561c07 --- /dev/null +++ b/design/mvp/Binary.md @@ -0,0 +1,387 @@ +# Component Model Binary Format Explainer + +This document defines the binary format for the AST defined in the +[explainer](Explainer.md). The top-level production is `component` and the +convention is that a file suffixed in `.wasm` may contain either a +[`core:module`] *or* a `component`, using the `layer` field to discriminate +between the two in the first 8 bytes (see [below](#component-definitions) for +more details). + +Note: this document is not meant to completely define the decoding or validation +rules, but rather merge the minimal need-to-know elements of both, with just +enough detail to create a prototype. A complete definition of the binary format +and validation will be present in the [formal specification](../../spec/). + +See the [explainer introduction](Explainer.md) for an explanation of 🪙. + + +## Component Definitions + +(See [Component Definitions](Explainer.md#component-definitions) in the explainer.) +```ebnf +component ::= s*:
* => (component flatten(s*)) +preamble ::= +magic ::= 0x00 0x61 0x73 0x6D +version ::= 0x0a 0x00 +layer ::= 0x01 0x00 +section ::= section_0() => ϵ + | m:section_1() => [core-prefix(m)] + | i*:section_2(vec()) => core-prefix(i)* + | t*:section_3(vec()) => core-prefix(t)* + | c: section_4() => [c] + | i*:section_5(vec()) => i* + | a*:section_6(vec()) => a* + | t*:section_7(vec()) => t* + | c*:section_8(vec()) => c* + | s: section_9() => [s] + | i*:section_10(vec()) => i* + | e*:section_11(vec()) => e* +``` +Notes: +* Reused Core binary rules: [`core:section`], [`core:custom`], [`core:module`] +* The `core-prefix(t)` meta-function inserts a `core` token after the leftmost + paren of `t` (e.g., `core-prefix( (module (func)) )` is `(core module (func))`). +* The `version` given above is pre-standard. As the proposal changes before + final standardization, `version` will be bumped from `0xa` upwards to + coordinate prototypes. When the standard is finalized, `version` will be + changed one last time to `0x1`. (This mirrors the path taken for the Core + WebAssembly 1.0 spec.) +* The `layer` field is meant to distinguish modules from components early in + the binary format. (Core WebAssembly modules already implicitly have a + `layer` field of `0x0` in their 4 byte [`core:version`] field.) + + +## Instance Definitions + +(See [Instance Definitions](Explainer.md#instance-definitions) in the explainer.) +```ebnf +core:instance ::= ie: => (instance ie) +core:instanceexpr ::= 0x00 m: arg*:vec() => (instantiate m arg*) + | 0x01 e*:vec() => e* +core:instantiatearg ::= n: 0x12 i: => (with n (instance i)) +core:sortidx ::= sort: idx: => (sort idx) +core:sort ::= 0x00 => func + | 0x01 => table + | 0x02 => memory + | 0x03 => global + | 0x10 => type + | 0x11 => module + | 0x12 => instance +core:inlineexport ::= n: si: => (export n si) + +instance ::= ie: => (instance ie) +instanceexpr ::= 0x00 c: arg*:vec() => (instantiate c arg*) + | 0x01 e*:vec() => e* +instantiatearg ::= n: si: => (with n si) +string ::= s: => s +sortidx ::= sort: idx: => (sort idx) +sort ::= 0x00 cs: => core cs + | 0x01 => func + | 0x02 => value 🪙 + | 0x03 => type + | 0x04 => component + | 0x05 => instance +inlineexport ::= n: si: => (export n si) +``` +Notes: +* Reused Core binary rules: [`core:name`], (variable-length encoded) [`core:u32`] +* The `core:sort` values are chosen to match the discriminant opcodes of + [`core:importdesc`]. +* `type` is added to `core:sort` in anticipation of the [type-imports] proposal. Until that + proposal, core modules won't be able to actually import or export types, however, the + `type` sort is allowed as part of outer aliases (below). +* `module` and `instance` are added to `core:sort` in anticipation of the [module-linking] + proposal, which would add these types to Core WebAssembly. Until then, they are useful + for aliases (below). +* Validation of `core:instantiatearg` initially only allows the `instance` + sort, but would be extended to accept other sorts as core wasm is extended. +* Validation of `instantiate` requires each `` in `c` to match a + `string` in a `with` argument (compared as strings) and for the types to + match. +* When validating `instantiate`, after each individual type-import is supplied + via `with`, the actual type supplied is immediately substituted for all uses + of the import, so that subsequent imports and all exports are now specialized + to the actual type. +* The indices in `sortidx` are validated according to their `sort`'s index + spaces, which are built incrementally as each definition is validated. + + +## Alias Definitions + +(See [Alias Definitions](Explainer.md#alias-definitions) in the explainer.) +```ebnf +alias ::= s: t: => (alias t (s)) +aliastarget ::= 0x00 i: n: => export i n + | 0x01 i: n: => core export i n + | 0x02 ct: idx: => outer ct idx +``` +Notes: +* Reused Core binary rules: (variable-length encoded) [`core:u32`] +* For `export` aliases, `i` is validated to refer to an instance in the + instance index space that exports `n` with the specified `sort`. +* For `outer` aliases, `ct` is validated to be *less or equal than* the number + of enclosing components and `i` is validated to be a valid + index in the `sort` index space of the `i`th enclosing component (counting + outward, starting with `0` referring to the current component). +* For `outer` aliases, validation restricts the `sort` to one + of `type`, `module` or `component` and additionally requires that the + outer-aliased type is not a `resource` type (which is generative). + + +## Type Definitions + +(See [Type Definitions](Explainer.md#type-definitions) in the explainer.) +```ebnf +core:type ::= dt: => (type dt) (GC proposal) +core:deftype ::= ft: => ft (WebAssembly 1.0) + | st: => st (GC proposal) + | at: => at (GC proposal) + | mt: => mt +core:moduletype ::= 0x50 md*:vec() => (module md*) +core:moduledecl ::= 0x00 i: => i + | 0x01 t: => t + | 0x02 a: => a + | 0x03 e: => e +core:alias ::= s: t: => (alias t (s)) +core:aliastarget ::= 0x01 ct: idx: => outer ct idx +core:importdecl ::= i: => i +core:exportdecl ::= n: d: => (export n d) +``` +Notes: +* Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] +* Validation of `core:moduledecl` rejects `core:moduletype` definitions + and `outer` aliases of `core:moduletype` definitions inside `type` + declarators. Thus, as an invariant, when validating a `core:moduletype`, the + core type index space will not contain any core module types. +* As described in the explainer, each module type is validated with an + initially-empty type index space. +* `alias` declarators currently only allow `outer` `type` aliases but + would add `export` aliases when core wasm adds type exports. +* Validation of `outer` aliases cannot see beyond the enclosing core type index + space. Since core modules and core module types cannot nest in the MVP, this + means that the maximum `ct` in an MVP `alias` declarator is `1`. + +```ebnf +type ::= dt: => (type dt) +deftype ::= dvt: => dvt + | ft: => ft + | ct: => ct + | it: => it +primvaltype ::= 0x7f => bool + | 0x7e => s8 + | 0x7d => u8 + | 0x7c => s16 + | 0x7b => u16 + | 0x7a => s32 + | 0x79 => u32 + | 0x78 => s64 + | 0x77 => u64 + | 0x76 => float32 + | 0x75 => float64 + | 0x74 => char + | 0x73 => string +defvaltype ::= pvt: => pvt + | 0x72 lt*:vec() => (record (field lt)*) (if |lt*| > 0) + | 0x71 case*:vec() => (variant case*) + | 0x70 t: => (list t) + | 0x6f t*:vec() => (tuple t+) (if |t*| > 0) + | 0x6e l*:vec() => (flags l+) (if |l*| > 0) + | 0x6d l*:vec() => (enum l*) + | 0x6b t: => (option t) + | 0x6a t?:? u?:? => (result t? (error u)?) + | 0x69 i: => (own i) + | 0x68 i: => (borrow i) +labelvaltype ::= l: t: => l t +case ::= l: t?:? 0x00 => (case l t?) +label' ::= len: l: