From f8524dfe8f6f9932230cd407f7f12fcf3f0bdac7 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 22 Sep 2021 14:30:47 -0500 Subject: [PATCH 001/301] Add initial drafts of high-level design docs --- README.md | 38 ++++ design/LICENSE | 202 ++++++++++++++++++++ design/README.md | 1 + design/high-level/Choices.md | 34 ++++ design/high-level/FAQ.md | 26 +++ design/high-level/Goals.md | 55 ++++++ design/high-level/UseCases.md | 342 ++++++++++++++++++++++++++++++++++ design/proposals/README.md | 5 + spec/README.md | 5 + 9 files changed, 708 insertions(+) create mode 100644 README.md create mode 100644 design/LICENSE create mode 100644 design/README.md create mode 100644 design/high-level/Choices.md create mode 100644 design/high-level/FAQ.md create mode 100644 design/high-level/Goals.md create mode 100644 design/high-level/UseCases.md create mode 100644 design/proposals/README.md create mode 100644 spec/README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..209025b --- /dev/null +++ b/README.md @@ -0,0 +1,38 @@ +# Component Model design and specification + +This repository contains documents describing the high-level [goals], +[use cases], [design choices] and [FAQ] of the component model. + +In the future, as proposals get merged, the repository will additionally +contain the spec, a reference interpreter, a test suite, and directories for +each proposal with the proposal's explainer and specific design documents. + +## Design Process & Contributing + +At this early stage, this repository only contains high-level design documents +and discussions about the Component Model in general. Detailed explainers, +specifications and discussions are broken into the following two repositories +which, together, will form the "MVP" of the Component Model: + +* The [module-linking] proposal will initialize the Component Model + specification, adding the ability for WebAssembly to import, nest, + instantiate and link multiple Core WebAssembly modules without host-specific + support. + +* The [interface-types] proposal will extend the Component Model specification + with a new set of high-level types for defining shared-nothing, + language-neutral "components". + +All Component Model work is done as part of the [W3C WebAssembly Community Group]. +To contribute to any of these repositories, see the Community Group's +[Contributing Guidelines]. + + +[goals]: design/high-level/Goals.md +[use cases]: design/high-level/UseCases.md +[design choices]: design/high-level/Choices.md +[FAQ]: design/high-level/FAQ.md +[module-linking]: https://github.com/webassembly/module-linking/ +[interface-types]: https://github.com/webassembly/interface-types/ +[W3C WebAssembly Community Group]: https://www.w3.org/community/webassembly/ +[Contributing Guidelines]: https://webassembly.org/community/contributing/ diff --git a/design/LICENSE b/design/LICENSE new file mode 100644 index 0000000..8f71f43 --- /dev/null +++ b/design/LICENSE @@ -0,0 +1,202 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright {yyyy} {name of copyright owner} + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + diff --git a/design/README.md b/design/README.md new file mode 100644 index 0000000..b9032ba --- /dev/null +++ b/design/README.md @@ -0,0 +1 @@ +See the [parent README](../README.md). diff --git a/design/high-level/Choices.md b/design/high-level/Choices.md new file mode 100644 index 0000000..ff48a3b --- /dev/null +++ b/design/high-level/Choices.md @@ -0,0 +1,34 @@ +# Component Model High-Level Design Choices + +Based on the [goals](Goals.md) and [use cases](UseCases.md), the component +model makes several high-level design choices that permeate the rest of the +component model. + +1. The component model adopts a shared-nothing architecture in which component + instances fully encapsulate their linear memories, tables, globals and, in + the future, GC memory. Component interfaces contain only immutable copied + values, opaque typed handles and immutable uninstantiated modules/components. + While handles and imports can be used as an indirect form of sharing, the + [dependency use cases](UseCases.md#component-dependencies) enable this degree + of sharing to be finely controlled. + +2. The component model introduces no global singletons, namespaces, registries, + locator services or frameworks through which components are configured or + linked. Instead, all related use cases are addressed through explicit + parametrization of components via imports (of data, functions, and types) + with every client of a component having the option to independently + instantiate the component with its own chosen import values. + +3. The component model assumes no global inter-component garbage or cycle + collector that is able to trace through cross-component cycles. Instead + resources have lifetimes and require explicit acyclic ownership through + handles. The explicit lifetimes allow resources to have destructors that are + called deterministically and can be used to release linear memory + allocations in non-garbage-collected languages. + +4. The component model assumes that Just-In-Time compilation is not available + at runtime and thus only provides declarative linking features that admit + Ahead-of-Time compilation, optimization and analysis. While component instances + can be created at runtime, the components being instantiated as well as their + dependencies and clients are known before execution begins. + (See also [this slide](https://docs.google.com/presentation/d/1PSC3Q5oFsJEaYyV5lNJvVgh-SNxhySWUqZ6puyojMi8/edit#slide=id.gceaf867ebf_0_10).) diff --git a/design/high-level/FAQ.md b/design/high-level/FAQ.md new file mode 100644 index 0000000..f403f8e --- /dev/null +++ b/design/high-level/FAQ.md @@ -0,0 +1,26 @@ +# FAQ + +### How does WASI relate to the Component Model? + +[WASI] is layered on top of the Component Model, with the Component Model +providing the foundational building blocks used to define WASI's interfaces, +including: +* the grammar of types that can be used in WASI interfaces; +* the linking functionality that WASI can assume is used to compose separate + modules of code, isolate their capabilities and virtualize WASI interfaces; +* the core wasm ABI that core wasm toolchains can compile against when targeting WASI. + +By way of comparison to traditional Operating Systems, the Component Model +fills the role of an OS's process model (defining how processes start up and +communicate with each other) while WASI fills the role of an OS's many I/O +interfaces. + +Use of WASI does not force the client to target the Component Model, however. +Any core wasm producer can simply target the core wasm ABI defined by the +Component Model for a given WASI interface's signature. This approach reopens +many questions that are answered by the Component Model, particularly when more +than one wasm module is involved, but for single-module scenarios or highly +custom scenarios, this might be appropriate. + + +[WASI]: https://github.com/WebAssembly/WASI/blob/main/README.md diff --git a/design/high-level/Goals.md b/design/high-level/Goals.md new file mode 100644 index 0000000..200c3c2 --- /dev/null +++ b/design/high-level/Goals.md @@ -0,0 +1,55 @@ +# Component Model High-Level Goals + +(For comparison, see WebAssembly's [original High-Level Goals].) + +1. Define a portable, load- and run-time-efficient binary format for + separately-compiled components built from WebAssembly core modules that + enable portable, cross-language composition. +2. Support the definition of portable, virtualizable, statically-analyzable, + capability-safe, language-agnostic interfaces, especially those being + defined by [WASI]. +3. Maintain and enhance WebAssembly's unique value proposition: + * *Language neutrality*: avoid biasing the component model toward just one + language or family of languages. + * *Embeddability*: design components to be embedded in a diverse set of + host execution environments, including browsers, servers, intermediaries, + small devices and data-intensive systems. + * *Optimizability*: maximize the static information available to + Ahead-of-Time compilers to minimize the cost of instantiation and + startup. + * *Formal semantics*: define the component model within the same semantic + framework as core wasm. + * *Web platform integration*: ensure components can be natively supported + in browsers by extending the existing WebAssembly integration points: the + [JS API], [Web API] and [ESM-integration]. Before native support is + implemented, ensure components can be polyfilled in browsers via + Ahead-of-Time compilation to currently-supported browser functionality. +4. Define the component model *incrementally*: starting from a set of + [initial use cases] and expanding the set of use cases over time, + prioritized by feedback and experience. + +## Non-goals + +1. Don't attempt to solve 100% of WebAssembly embedding scenarios. + * Some scenarios will require features in conflict with the above-mentioned goal. + * With the layered approach to specification, unsupported embedding + scenarios can be solved via alternative layered specifications or by + directly embedding the existing WebAssembly core specification. +2. Don't attempt to solve problems that are better solved by some combination + of the toolchain, the platform or higher layer specifications, including: + * package management and version control; + * deployment and live upgrade / dynamic reconfiguration; + * persistence and storage; and + * distributed computing and partial failure. +2. Don't specify a set of "component services". + * Specifying services that may be implemented by a host and exposed to + components is the domain of WASI and out of scope of the component model. + * See also the [WASI FAQ entry](FAQ.md#how-does-wasi-relate-to-the-component-model). + + +[original High-Level Goals]: https://github.com/WebAssembly/design/blob/main/HighLevelGoals.md +[WASI]: https://github.com/WebAssembly/WASI/blob/main/README.md +[JS API]: https://webassembly.github.io/spec/js-api/index.html +[Web API]: https://webassembly.github.io/spec/web-api/index.html +[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[initial use cases]: UseCases.md#Initial-MVP diff --git a/design/high-level/UseCases.md b/design/high-level/UseCases.md new file mode 100644 index 0000000..608e6d0 --- /dev/null +++ b/design/high-level/UseCases.md @@ -0,0 +1,342 @@ +# Component Model Use Cases + +## Initial (MVP) + +This section describes a collection of use cases that characterize active and +developing embeddings of wasm and the limitations of the core wasm +specification that they run into outside of a browser context. The use cases +have a high degree of overlap in their required features and help to define the +scope of an "MVP" (Minimum Viable Product) for the Component Model. + +### Hosts embedding components + +One way that components are to be used is by being directly instantiated and +executed by a host (an application, system or service embedding a wasm +runtime), using the component model to provide a common format and toolchain so +that each distinct host doesn't have to define its own custom conventions and +sets of tools for solving the same problems. + +#### Value propositions to hosts for embedding components + +First, it's useful to enumerate some use cases for why the host wants to run +wasm in the first place (instead of using an alternative virtualization or +sandboxing technology): + +1. A native language runtime (like node.js or CPython) uses components as a + portable, sandboxed alternative to the runtime's native plugins, avoiding the + portability and security problems of native plugins. +2. A serverless platform wishing to move code closer to data or clients uses + wasm components in place of a fixed scripting language, leveraging wasm's + strong sandboxing and language neutrality. +3. A serverless platform wishing to spin up fresh execution contexts at high + volume with low latency uses wasm components due to their low overhead and fast + instantiation. +4. A system or service adds support for efficient, multi-language "scripting" + with only a modest amount of engineering effort by embedding an existing + component runtime, reusing existing WASI standards support where applicable. +5. A large application decouples the updating of modular pieces of the + application from the updating of the natively-installed base application, + by distributing and running the modular pieces as wasm components. +6. A monolithic application sandboxes an unsafe library by compiling it into a + wasm component and then AOT-compiling the wasm component into native code + linked into the monolithic application (e.g., [RLBox]). +7. A large application practices [Principle of Least Authority] and/or + [Modular Programming] by decomposing the application into wasm components, + leveraging the lightweight sandboxing model of wasm to avoid the overhead of + traditional process-based decomposition. + +#### Invoking component exports from the host + +Once a host chooses to embed wasm (for one of the preceding reasons), the first +design choice is how host executes the wasm code. The core wasm [start function] +is sometimes used for this purpose, however the lack of parameters or results +miss out on several use cases listed below, which suggest the use of exported +wasm functions with typed signatures instead. However, there are a number of +use cases that go beyond the ability of core wasm: + +1. A JS developer `import`s a component (via [ESM-integration]) and calls the + component's exports as JS functions, passing high-level JS values like strings, + objects and arrays which are automatically coerced according to the high-level, + typed interface of the invoked component. +2. A generic wasm runtime CLI allows the user to invoke the exports of a + component directly from the command-line, automatically parsing argv and env + vars according to the high-level, typed interface of the invoked component. +3. A generic wasm runtime HTTP server maps HTTP endpoints onto the exports of a + component, automatically parsing request params, headers and body and + generating response headers and body according to the high-level, typed + interface of the invoked component. +4. A host implements a wasm execution platform by invoking wasm component + exports in response to domain-specific events (e.g., on new request, on new + chunk of data available for processing, on trigger firing) through a fixed + interface that is either standardized (e.g., via WASI) or specific to the host. + +The first three use cases demonstrate a more general use case of generically +reflecting typed component exports in terms of host-native concepts. + +#### Exposing host functionality to components as imports + +Once wasm has been invoked by the host, the next design choice is how to expose +the host's native functionality and resources to the wasm code while it executes. +Imports are the natural choice and already used for this purpose, but there are +a number of use cases that go beyond what can be expressed with core wasm +imports: + +1. A host defines imports in terms of explicit high-level value types (e.g., + numbers, strings, lists, records and variants) that can be automatically + bound to the calling component's source-language values. +2. A host returns non-value, non-copied resources (like files, storage + connections and requests/responses) to components via unforgeable handles + (analogous to Unix file descriptors). +3. A host exposes non-blocking and/or streaming I/O to components through + language-neutral interfaces that can be bound to different components' + source languages' concurrency features (such as promises, futures, + async/await and coroutines). +4. A host passes configuration (e.g., values from config files and secrets) to + a component through imports of typed high-level values and handles. +5. A component declares that a particular import is "optional", allowing that + component to execute on hosts with or without the imported functionality. +6. A developer instantiates a component with native host imports in production + and with mock or emulated imports in local development and testing. + +#### Host-determined component lifecycles and associativity + +Another design choice when a host embeds wasm is when to create new instances, +when to route events to existing instances, when existing instances are +destroyed, and how, if there are multiple live instances, do they interact with +each other, if at all. Some use cases include: + +1. A host creates many ephemeral, concurrent component instances, each of which + is tied to a particular host-domain-specific entity's lifecycle (e.g. a + request-response pair, connection, session, job, client or tenant), with a + component instance being destroyed when the associated entity's + domain-specified lifecycle completes. +2. A host delivers fine-grained events, for which component instantiation would + have too much overhead if performed per-event or for which retained mutable + state is desired, by making multiple export calls on the same component + instance over time. Export calls can be asynchronous, allowing multiple + fine-grained events to be processed concurrently. For example, multiple + packets could be delivered as multiple export calls to the component instance + for a connection. +3. A host represents associations between longer- and shorter-lived + host-domain-specific entities (e.g., a "connection's session" or a "session's + user") by having the shorter-lived component instances (e.g., "connections") + import the exports of the longer-lived component instances (e.g., "sessions"). + +### Component composition + +The other way components are to be used (other than via direct execution by the +host) is by other components, through component composition. + +#### Value propositions to developers for composing components + +Enumerating some of the reasons why we might want to compose components in the +first place (instead of simply using the module/package mechanisms built into +the programming language): + +1. A component developer reuses code already written in another language + instead of having to reimplement the functionality from scratch. +2. A component developer writing code in a high-level scripting language (e.g., + JS or Python) reuses high-performance code written in a lower-level language + (e.g., C++ or Rust). +3. A component developer mitigates the impact of supply-chain attacks by + putting their dependencies into several components and controlling the + capabilities delegated to each, taking advantage of the strong sandboxing model + of components. +4. A component runtime implements built-in host functionality as wasm + components to reduce the [Trusted Computing Base]. +5. An application developer applies the Unix philosophy without incurring the + full cost and OS-dependency of splitting their program into multiple processes + by instead having each component do one thing well and using the component + model to compose their program as a hierarchy of components. +6. An application developer composes multiple independently-developed + components that import and export the same interface (e.g., a HTTP + request-handling interface) by linking them together, exports-to-imports, being + able to create recursive, branching DAGs of linked components not otherwise + expressible with classic Unix-style pipelines. + +In all the above use cases, the developer has an additional goal of keeping the +component reuse as a private, fully-encapsulated implementation detail that +their client doesn't need to be aware of (either directly in code, or +indirectly in the developer workflow). + +#### Composition primitives + +Core wasm already provides the fundamental composition primitives of: imports, +exports and functions, allowing a module to export a function that is imported +by another module. Building from this starting point, there are a number of +use cases that require additional features: + +1. Developers importing or exporting functions use high-level value types in + their function signatures that include strings, lists, records, variants and + arbitrarily-nested combinations of these. Both developers (the caller and + callee) get to use the idiomatic values of their respective languages. + Values are passed by copy so that there is no shared mutation, ownership or + management of these values before or after the call that either developer + needs to worry about. +2. Developers importing or exporting functions use opaque typed handles in + their function signatures to pass resources that cannot or should not be copied + at the callsite. Both developers (the caller and callee) use their respective + languages' abstract data type support for interacting with resources. Handles + can encapsulate `i32` pointers to linear memory allocations that need to be + safely freed when the last handle goes away. +3. Developers import or export functions with signatures containing + concurrency-oriented types (e.g., async, future and stream) to address + concurrency use cases like non-blocking I/O, early return and streaming. Both + developers (the caller and callee) are able to use their respective languages' + native concurrency support, if it exists, using the concurrency-oriented types + to establish a deterministic communication protocol that defines how the + cross-language composition behaves. +4. A component developer makes a minor [semver] update which changes the + component's type in a logically backwards-compatible manner (e.g., adding a new + case to a variant parameter type). The component model ensures that the new + component stays valid (at link-time and run-time) for use by existing clients + compiled against the older signature. +5. A component developer uses their language, toolchain and memory + representation of choice (including, in the future, [GC memory]), with these + implementation choices fully encapsulated by the component and thus hidden from + the client. The component developer can switch languages, toolchains or memory + representations in the future without breaking existing clients. + +The above use cases roughly correspond to the use cases of an [RPC] framework, +which have similar goals of crossing language boundaries. The major difference +is the dropping of the distributed computing goals (see [non-goals](Goals.md#non-goals)) +and the additional performance goals mentioned [below](#performance). + +#### Component dependencies + +When a client component imports another component as a dependency, there are a +number of use cases for how the dependency's instance is configured and shared +or not shared with other clients of the same dependency. These use cases +require a greater degree of programmer control than allowed by most languages' +native module systems and most native code linking systems while not requiring +fully dynamic linking (e.g., as provided by the [JS API]). + +1. A component developer exposes their component's configuration to clients as + imports that are supplied when the component is instantiated by the client. +2. A component developer configures a dependency independently of any other + clients of the same dependency by creating a fresh private instance of the + dependency and supplying the desired configuration values at instantiation. +3. A component developer imports a dependency as an already-created instance, + giving the component's clients the responsibility to configure the + dependency and the freedom to share it with others. +4. A component developer creates a fresh private instance of a dependency to + isolate the dependency's mutable instance state in order to minimize the + damage that can be caused in the event of a supply chain attack or + exploitable bug in the dependency. +5. A component developer imports an already-created instance of a dependency, + allowing the dependency to use mutable instance state to deduplicate data or + cache common results, optimizing overall app performance. +6. A component developer imports a WASI interface and does not explicitly pass + the WASI interface to a privately-created dependency. The developer knows, + without manually auditing the code of the dependency, that the dependency + cannot access the WASI interface. +7. A component developer creates a private dependency instance, supplying it a + virtualized implementation of a WASI interface. The developer knows, without + manually auditing the code of the dependency, that the dependency exclusively + uses the virtualized implementation. +8. A component developer creates a fresh private instance of a dependency, + supplying the component's own functions as imports to the dependency. The + component does this to parameterize the dependency's behavior with the + component's own logic or implementation choices (achieving the goals usually + accomplished using callback registration or [dependency injection]). + +### Performance + +In pursuit of the above functional use cases, it's important that the component +model not sacrifice the performance properties that motivate the use of wasm in +the first place. Thus, the new features mentioned above should be consistent +with the predictable performance model established by core wasm by supporting +the following use cases: + +1. A component runtime implements cross-component calls with efficient, direct + control flow transfer without thread context switching or synchronization. +2. A component runtime implements component instances without needing to give + each instance its own event loop, green thread or message queue. +3. A component runtime or optimizing AOT compiler compiles all import and + export names into indices or more direct forms of reference (up to and + including direct inlining of cross-component definitions into uses). +4. A component runtime implements value passing between component instances + without ever creating an intermediate O(n) copy of aggregate data types, + outside of either component instance's explicitly-allocated linear memory. +5. A component runtime shares the compiled machine code of a component across + many instances of that component. +6. A component is composed of several core wasm modules that operate on a + single shared linear memory, some of which contain langauge runtime code + that is shared by all components produced from the same language toolchain. + A component runtime shares the compiled machine code of the shared language + runtime module. +7. A component runtime implements the component model and achieves expected + performance without using any runtime code generation or Just-in-Time + compilation. + +## Post-MVP + +The following are a list of use cases that make sense to support eventually, +but not necessarily in the initial release. + +### Runtime dynamic linking + +* A component lazily creates an instance of its dependency on the first call + to its exports. +* A component dynamically instantiates, calls, then destroys its dependency, + avoiding persistent resource usage by the dependency if the dependency is used + infrequently and/or preventing the dependency from accumulating state across + calls which could create supply chain attack risk. +* A component creates a fresh internal instance every time one of its exports + is called, avoiding any residual state between export calls and aligning with + the usual assumptions of C programs with a `main()`. + +### Parallelism + +* A component creates a new (green) thread to execute an export call to a + dependency, achieving task parallelism while avoiding low-level data races due + to the absence of shared mutable state between the component and the + dependency. +* Two component instances connected via stream execute in separate (green) + threads, achieving pipeline parallelism while preserving determinism due to the + absence of shared mutable state. + +### Copy Minimization + +* A component produces or consumes the high-level abstract value types using + its own arbitrary linear memory representation or procedural interface (like + iterator or generator) without having to make an intermediate copy in linear + memory or copy unwanted elements. +* A component is given a "blob" resource representing an immutable array of + bytes living outside any linear memory that can be semantically copied into + linear memory in a way that, if supported by the host, can be implemented via + copy-on-write memory-mapping. +* A component creates a stream directly from a data segment, avoiding the cost + of first copying the data segment into linear memory and then streaming from + linear memory. + +### Component-level multi-threading + +In the absence of these features, a component can assume its exports are +called in a single-threaded manner (just like core wasm). If and when core wasm +gets a primitive [`fork`] instruction, a component may, as a private +implementation detail, have its internal `shared` memory accessed by multiple +component-internal threads. However, these `fork`ed threads would not be able +to call imports, which could break other components' single-threaded assumptions. + +* A component explicitly annotates a function export with [`shared`], + opting in to it being called simultaneously from multiple threads. +* A component explicitly annotates a function import with `shared`, requiring + the imported function to have been explicitly `shared` and thus callable from + any `fork`ed thread. + + + +[RLBox]: https://plsyssec.github.io/rlbox_sandboxing_api/sphinx/ +[Principle of Least Authority]: https://en.wikipedia.org/wiki/Principle_of_least_privilege +[Modular Programming]: https://en.wikipedia.org/wiki/Modular_programming +[start function]: https://webassembly.github.io/spec/core/intro/overview.html#semantic-phases +[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[Trusted Computing Base]: https://en.wikipedia.org/wiki/Trusted_computing_base +[semver]: https://en.wikipedia.org/wiki/Software_versioning +[RPC]: https://en.wikipedia.org/wiki/Remote_procedure_call +[GC memory]: https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md +[JS API]: https://webassembly.github.io/spec/js-api/index.html +[dependency injection]: https://en.wikipedia.org/wiki/Dependency_injection +[`fork`]: https://dl.acm.org/doi/pdf/10.1145/3360559 +[`shared`]: https://dl.acm.org/doi/pdf/10.1145/3360559 diff --git a/design/proposals/README.md b/design/proposals/README.md new file mode 100644 index 0000000..3672521 --- /dev/null +++ b/design/proposals/README.md @@ -0,0 +1,5 @@ +This subdirectory will contain the explainers specific to each proposal, +starting initially with [module-linking] and [interface-types]. + +[module-linking]: https://github.com/webassembly/module-linking/ +[interface-types]: https://github.com/webassembly/interface-types/ diff --git a/spec/README.md b/spec/README.md new file mode 100644 index 0000000..241758e --- /dev/null +++ b/spec/README.md @@ -0,0 +1,5 @@ +This directory will be initialized by the [module-linking] proposal to contain +the Component Model specification, analogous to the [Core spec repo]. + +[module-linking]: https://github.com/webassembly/module-linking/ +[Core spec repo]: https://github.com/WebAssembly/spec/ From 70465d8bd461b7f7c87d52704984ee557ccb8be2 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 24 Sep 2021 15:02:51 -0500 Subject: [PATCH 002/301] Add README to design/high-level --- design/high-level/README.md | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 design/high-level/README.md diff --git a/design/high-level/README.md b/design/high-level/README.md new file mode 100644 index 0000000..5abbd28 --- /dev/null +++ b/design/high-level/README.md @@ -0,0 +1,5 @@ +# Component Model High-Level Design Documents + +This directory contains design documents describing the component model's +[goals](Goals.md), [use cases](UseCases.md), [design choices](Choices.md) +and [FAQ](FAQ.md). From 17f94ed1270a98218e0e796ca1dad1feb7e5c507 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 24 Jan 2022 14:07:48 -0600 Subject: [PATCH 003/301] Add skeleton Explainer.md containing only AST and Binary.md defining binary format --- README.md | 31 +- design/mvp/Binary.md | 252 +++++ design/mvp/CanonicalABI.md | 3 + design/mvp/Explainer.md | 915 ++++++++++++++++++ design/mvp/FutureFeatures.md | 80 ++ design/mvp/Subtyping.md | 24 + design/mvp/examples/LinkTimeVirtualization.md | 78 ++ .../SharedEverythingDynamicLinking.md | 422 ++++++++ .../images/link-time-virtualization.svg | 1 + .../shared-everything-dynamic-linking.svg | 1 + design/proposals/README.md | 5 - spec/README.md | 5 +- 12 files changed, 1787 insertions(+), 30 deletions(-) create mode 100644 design/mvp/Binary.md create mode 100644 design/mvp/CanonicalABI.md create mode 100644 design/mvp/Explainer.md create mode 100644 design/mvp/FutureFeatures.md create mode 100644 design/mvp/Subtyping.md create mode 100644 design/mvp/examples/LinkTimeVirtualization.md create mode 100644 design/mvp/examples/SharedEverythingDynamicLinking.md create mode 100644 design/mvp/examples/images/link-time-virtualization.svg create mode 100644 design/mvp/examples/images/shared-everything-dynamic-linking.svg delete mode 100644 design/proposals/README.md diff --git a/README.md b/README.md index 209025b..7543cd5 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,13 @@ # Component Model design and specification -This repository contains documents describing the high-level [goals], -[use cases], [design choices] and [FAQ] of the component model. +This repository describes the high-level [goals], [use cases], [design choices] +and [FAQ] of the component model as well as a more-detailed [explainer] and +[binary format] covering the initial Minimum Viable Product (MVP) release. -In the future, as proposals get merged, the repository will additionally -contain the spec, a reference interpreter, a test suite, and directories for -each proposal with the proposal's explainer and specific design documents. +In the future, this repository will additionally contain a [formal spec], +reference interpreter and test suite. -## Design Process & Contributing - -At this early stage, this repository only contains high-level design documents -and discussions about the Component Model in general. Detailed explainers, -specifications and discussions are broken into the following two repositories -which, together, will form the "MVP" of the Component Model: - -* The [module-linking] proposal will initialize the Component Model - specification, adding the ability for WebAssembly to import, nest, - instantiate and link multiple Core WebAssembly modules without host-specific - support. - -* The [interface-types] proposal will extend the Component Model specification - with a new set of high-level types for defining shared-nothing, - language-neutral "components". +## Contributing All Component Model work is done as part of the [W3C WebAssembly Community Group]. To contribute to any of these repositories, see the Community Group's @@ -32,7 +18,8 @@ To contribute to any of these repositories, see the Community Group's [use cases]: design/high-level/UseCases.md [design choices]: design/high-level/Choices.md [FAQ]: design/high-level/FAQ.md -[module-linking]: https://github.com/webassembly/module-linking/ -[interface-types]: https://github.com/webassembly/interface-types/ +[explainer]: design/mvp/Explainer.md +[binary format]: design/mvp/Binary.md +[formal spec]: spec/ [W3C WebAssembly Community Group]: https://www.w3.org/community/webassembly/ [Contributing Guidelines]: https://webassembly.org/community/contributing/ diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md new file mode 100644 index 0000000..90447c5 --- /dev/null +++ b/design/mvp/Binary.md @@ -0,0 +1,252 @@ +# Component Model Binary Format Explainer + +This document defines the binary format for the AST defined in the +[explainer](Explainer.md). The top-level production is `component` and the +convention is that a file suffixed in `.wasm` may contain either a +[`core:module`] *or* a `component`, using the `kind` field to discriminate +between the two in the first 8 bytes (see [below](#component-definitions) for +more details). + +Note: this document is not meant to completely define the decoding or validation +rules, but rather merge the minimal need-to-know elements of both, with just +enough detail to create a prototype. A complete definition of the binary format +and validation will be present in the [formal specification](../../spec/). + + +## Component Definitions + +(See [Component Definitions](Explainer.md#component-definitions) in the explainer.) +``` +component ::= s*:
* => (component flatten(s*)) +preamble ::= +magic ::= 0x00 0x61 0x73 0x6D +version ::= 0x0a 0x00 +kind ::= 0x01 0x00 +section ::= section_0() => ϵ + | t*:section_1(vec()) => t* + | i*:section_2(vec()) => i* + | f*:section_3(vec()) => f* + | m: section_4() => m + | c: section_5() => c + | i*:section_6(vec()) => i* + | e*:section_7(vec()) => e* + | s: section_8() => s + | a*:section_9(vec()) => a* +``` +Notes: +* Reused Core binary rules: [`core:section`], [`core:custom`], [`core:module`] +* The `version` given above is pre-standard. As the proposal changes before + final standardization, `version` will be bumped from `0xa` upwards to + coordinate prototypes. When the standard is finalized, `version` will be + changed one last time to `0x1`. (This mirrors the path taken for the Core + WebAssembly 1.0 spec.) +* The `kind` field is meant to distinguish modules from components early in the + binary format. (Core WebAssembly modules already implicitly have a `kind` + field of `0x0` in their 4 byte [`core:version`] field.) + + +## Instance Definitions + +(See [Instance Definitions](Explainer.md#instance-definitions) in the explainer.) +``` +instance ::= ie: => (instance ie) +instanceexpr ::= 0x00 0x00 m: a*:vec() => (instantiate (module m) (import a)*) + | 0x00 0x01 c: a*:vec() => (instantiate (component c) (import a)*) + | 0x01 e*:vec() => e* + | 0x02 e*:vec() => e* +modulearg ::= n: 0x02 i: => n (instance i) +componentarg ::= n: 0x00 m: => n (module m) + | n: 0x01 c: => n (component c) + | n: 0x02 i: => n (instance i) + | n: 0x03 f: => n (func f) + | n: 0x04 v: => n (value v) + | n: 0x05 t: => n (type t) (t must be an ) +export ::= a: => (export a) +name ::= n: => n +``` +Notes: +* Reused Core binary rules: [`core:export`], [`core:name`] +* The indices in `modulearg`/`componentarg` are validated according to their + respective index space, which are built incrementally as each definition is + validated. In general, unlike core modules, which supports cyclic references + between (function) definitions, component definitions are strictly acyclic + and validated in a linear incremental manner, like core wasm instructions. +* The arguments supplied by `instantiate` are validated against the consuming + module/component according to the [subtyping](Subtyping.md) rules. + + +## Alias Definitions + +(See [Alias Definitions](Explainer.md#alias-definitions) in the explainer.) +``` +alias ::= 0x00 0x00 i: n: => (alias export i n (module)) + | 0x00 0x01 i: n: => (alias export i n (component)) + | 0x00 0x02 i: n: => (alias export i n (instance)) + | 0x00 0x03 i: n: => (alias export i n (func)) + | 0x00 0x04 i: n: => (alias export i n (value)) + | 0x01 0x00 i: n: => (alias export i n (func)) + | 0x01 0x01 i: n: => (alias export i n (table)) + | 0x01 0x02 i: n: => (alias export i n (memory)) + | 0x01 0x03 i: n: => (alias export i n (global)) + | ... other Post-MVP Core definition kinds + | 0x02 0x00 ct: i: => (alias outer ct i (module)) + | 0x02 0x01 ct: i: => (alias outer ct i (component)) + | 0x02 0x05 ct: i: => (alias outer ct i (type)) +``` +Notes: +* For instance-export aliases (opcodes `0x00` and `0x01`), `i` is validated to + refer to an instance in the instance index space that exports `n` with the + specified definition kind. +* For outer aliases (opcode `0x02`), `ct` is validated to be *less or equal + than* the number of enclosing components and `i` is validated to be a valid + index in the specified definition's index space of the enclosing component + indicated by `ct` (counting outward, starting with `0` referring to the + current component). + + +## Type Definitions + +(See [Type Definitions](Explainer.md#type-definitions) in the explainer.) +``` +type ::= dt: => dt + | it: => it +deftype ::= mt: => mt + | ct: => ct + | it: => it + | ft: => ft + | vt: => vt +moduletype ::= 0x4f mtd*:vec() => (module mtd*) +moduletype-def ::= 0x01 dt: => dt + | 0x02 i: => i + | 0x07 n: d: => (export n d) +core:deftype ::= ft: => ft + | ... Post-MVP additions => ... +componenttype ::= 0x4e ctd*:vec() => (component ctd*) +instancetype ::= 0x4d itd*:vec() => (instance itd*) +componenttype-def ::= itd: => itd + | 0x02 i: => i +instancetype-def ::= 0x01 t: => t + | 0x07 n: dt: => (export n dt) + | 0x09 a: => a +import ::= n: dt: => (import n dt) +deftypeuse ::= i: => type-index-space[i] (must be ) +functype ::= 0x4c param*:vec() t: => (func param* (result t)) +param ::= n: t: => (param n t) +valuetype ::= 0x4b t: => (value t) +intertypeuse ::= i: => type-index-space[i] (must be ) + | pit: => pit +primintertype ::= 0x7f => unit + | 0x7e => bool + | 0x7d => s8 + | 0x7c => u8 + | 0x7b => s16 + | 0x7a => u16 + | 0x79 => s32 + | 0x78 => u32 + | 0x77 => s64 + | 0x76 => u64 + | 0x75 => float32 + | 0x74 => float64 + | 0x73 => char + | 0x72 => string +intertype ::= pit: => pit + | 0x71 field*:vec() => (record field*) + | 0x70 case*:vec() => (variant case*) + | 0x6f t: => (list t) + | 0x6e t*:vec() => (tuple t*) + | 0x6d n*:vec() => (flags n*) + | 0x6c n*:vec() => (enum n*) + | 0x6b t*:vec() => (union t*) + | 0x6a t: => (optional t) + | 0x69 t: u: => (expected t u) +field ::= n: t: => (field n t) +case ::= n: t: 0x0 => (case n t) + | n: t: 0x1 i: => (case n t (defaults-to case-label[i])) +``` +Notes: +* Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] +* The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, + with type opcodes starting at SLEB128(-1) (`0x7f`) and going down, + reserving the nonnegative SLEB128s for type indices. +* The (`module`|`component`|`instance`)`type-def` opcodes match the corresponding + section numbers. +* Module, component and instance types create fresh type index spaces that are + populated and referenced by their contained definitions. E.g., for a module + type that imports a function, the `import` `moduletype-def` must be preceded + by either a `type` or `alias` `moduletype-def` that adds the function type to + the type index space. +* Currently, the only allowed form of `alias` in instance and module types + is `(alias outer ct li (type))`. In the future, other kinds of aliases + will be needed and this restriction will be relaxed. + + +## Function Definitions + +(See [Function Definitions](Explainer.md#function-definitions) in the explainer.) +``` +func ::= body: => (func body) +funcbody ::= 0x00 ft: opt*:vec() f: => (canon.lift ft opt* f) + | 0x01 opt*:* f: => (canon.lower opt* f) +canonopt ::= 0x00 => string=utf8 + | 0x01 => string=utf16 + | 0x02 => string=latin1+utf16 + | 0x03 i: => (into i) +``` +Notes: +* Validation prevents duplicate or conflicting options. +* Validation of `canon.lift` requires `f` to have a `core:functype` that matches + the canonical-ABI-defined lowering of `ft`. The function defined by + `canon.lift` has type `ft`. +* Validation of `canon.lower` requires `f` to have a `functype`. The function + defined by `canon.lower` has a `core:functype` defined by the canonical ABI + lowering of `f`'s type. +* If the lifting/lowering operations implied by `canon.lift` or `canon.lower` + require access to `memory`, `realloc` or `free`, then validation will require + the `(into i)` `canonopt` be present and the corresponding export be present + in `i`'s `instancetype`. + + +## Start Definitions + +(See [Start Definitions](Explainer.md#start-definitions) in the explainer.) +``` +start ::= f: arg*:vec() => (start f (value arg)*) +``` +Notes: +* Validation requires `f` have `functype` with `param` arity and types matching `arg*`. +* Validation appends the `result` types of `f` to the value index space (making + them available for reference by subsequent definitions). + +In addition to the type-compatibility checks mentioned above, the validation +rules for value definitions additionally require that each value is consumed +exactly once. Thus, during validation, each value has an associated "consumed" +boolean flag. When a value is first added to the value index space (via +`import`, `instance`, `alias` or `start`), the flag is clear. When a value is +used (via `export`, `instantiate` or `start`), the flag is set. After +validating the last definition of a component, validation requires all values' +flags are set. + + +## Import and Export Definitions + +(See [Import and Export Definitions](Explainer.md#import-and-export-definitions) in the explainer.) + +As described in the explainer, the binary decode rules of `import` and `export` +have already been defined above. + +Notes: +* Validation requires all import and export `name`s are unique. + + + +[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version +[`core:section`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-section +[`core:custom`]: https://webassembly.github.io/spec/core/binary/modules.html#custom-section +[`core:module`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-module +[`core:export`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-export +[`core:name`]: https://webassembly.github.io/spec/core/binary/values.html#binary-name +[`core:import`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-import +[`core:importdesc`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-importdesc +[`core:functype`]: https://webassembly.github.io/spec/core/binary/types.html#binary-functype + +[Future Core Type]: https://github.com/WebAssembly/gc/blob/master/proposals/gc/MVP.md#type-definitions diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md new file mode 100644 index 0000000..1e4ab38 --- /dev/null +++ b/design/mvp/CanonicalABI.md @@ -0,0 +1,3 @@ +# Canonical ABI (sketch) + +TODO: import and update [interface-types/#132](https://github.com/WebAssembly/interface-types/pull/132) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md new file mode 100644 index 0000000..d335119 --- /dev/null +++ b/design/mvp/Explainer.md @@ -0,0 +1,915 @@ +# Component Model Explainer + +This explainer walks through the assembly-level definition of a +[component](../high-level) and the proposed embedding of components into a +native JavaScript runtime. + +* [Grammar](#grammar) + * [Component definitions](#component-definitions) + * [Instance definitions](#instance-definitions) + * [Alias definitions](#alias-definitions) + * [Type definitions](#type-definitions) + * [Function definitions](#function-definitions) + * [Start definitions](#start-definitions) + * [Import and export definitions](#import-and-export-definitions) +* [Component invariants](#component-invariants) +* [JavaScript embedding](#JavaScript-embedding) + * [JS API](#JS-API) + * [ESM-integration](#ESM-integration) +* [Examples](#examples) + +(Based on the previous [scoping and layering] proposal to the WebAssembly CG, +this repo merges and supersedes the [Module Linking] and [Interface Types] +proposals, pushing some of their original features into the post-MVP [future +feature](FutureFeatures.md) backlog.) + + +## Grammar + +This section defines components using an EBNF grammar that parses something in +between a pure Abstract Syntax Tree (like the Core WebAssembly spec's +[Structure Section]) and a complete text format (like the Core WebAssembly +spec's [Text Format Section]). The goal is to balance completeness with +succinctness, with just enough detail to write examples and define a [binary +format](Binary.md) in the style of the [Binary Format Section], deferring full +precision to the [formal specification](../../spec/). + +The main way the grammar hand-waves is regarding definition uses, where indices +referring to `X` definitions (written ``) should, in the real text +format, explicitly allow identifiers (``), checking at parse time that the +identifier resolves to an `X` definition and then embedding the resolved index +into the AST. + +Additionally, standard [abbreviations] defined by the Core WebAssembly text +format (e.g., inline export definitions) are assumed but not explicitly defined +below. + + +### Component Definitions + +At the top-level, a `component` is a sequence of definitions of various kinds: +``` +component ::= (component ? *) +definition ::= + | + | + | + | + | + | + | + | +``` +Core WebAssembly modules (henceforth just "modules") are also sequences of +(different kinds of) definitions. However, unlike modules, components allow +arbitrarily interleaving the different kinds of definitions. As we'll see +below, this arbitrary interleaving reflects the need for different kinds of +definitions to be able to refer back to each other. Importantly, though, +component definitions are acyclic: definitions can only refer back to preceding +definitions (in the AST, text format or binary format). + +The first kind of component definition is a module, as defined by the existing +Core WebAssembly specification's [`core:module`] top-level production. Thus, +components physically embed one or more modules and can be thought of as a +kind of container format for modules. + +The second kind of definition is, recursively, a component itself. Thus, +components form trees with modules (and all other kinds of definitions) only +appearing at the leaves. + +With what's defined so far, we can define the following component: +```wasm +(component + (component + (module (func (export "one") (result i32) (i32.const 1))) + (module (func (export "two") (result f32) (f32.const 2))) + ) + (module (func (export "three") (result i64) (i64.const 3))) + (component + (component + (module (func (export "four") (result f64) (f64.const 4))) + ) + ) + (component) +) +``` +This top-level component roots a tree with 4 modules and 1 component as +leaves. However, in the absence of any `instance` definitions (introduced +next), nothing will be instantiated or executed at runtime: everything here is +dead code. + + +### Instance Definitions + +Whereas modules and components represent immutable *code*, instances associate +code with potentially-mutable *state* (e.g., linear memory) and thus are +necessary to create before being able to *run* the code. Instance definitions +create module or component instances by selecting a module/component and +supplying a set of named *arguments* which satisfy all the named *imports* of +the selected module/component: +``` +instance ::= (instance ? ) +instanceexpr ::= (instantiate (module ) (import )*) + | (instantiate (component ) (import )*) + | * + | + +modulearg ::= (instance ) + | (instance +) +componentarg ::= (module ) + | (component ) + | (instance ) + | (func ) + | (value ) + | (type ) + | (instance *) +export ::= (export ) +``` +When instantiating a module via `(instantiate (module $M) *)`, the +two-level imports of the module `$M` are resolved as follows: +1. The first `name` of an import is looked up in the named list of `modulearg` + to select a module instance. +2. The second `name` of an import is looked up in the named list of exports of + the module instance found by the first step to select the imported + core definition (a `func`, `memory`, `table`, `global`, etc). + +Based on this, we can link two modules `$A` and `$B` together with the +following component: +```wasm +(component + (module $A + (func (export "one") (result i32) (i32.const 1)) + ) + (module $B + (func (import "a" "one") (result i32)) + ) + (instance $a (instantiate (module $A))) + (instance $b (instantiate (module $B) (import "a" (instance $a)))) +) +``` +Components, as we'll see below, have single-level imports, i.e., each import +has only a single `name`, and thus every different kind of definition can be +passed as a `componentarg` when instantiating a component, not just instances. +Component instantiation will be revisited below after introducing the +prerequisite type and import definitions. + +Lastly, the `(instance *)` and `(instance +)` +expressions allow component and module instances to be created by directly +tupling together preceding definitions, without the need to `instantiate` +anything. To disambiguate the empty case, we observe that there is never +a need to import an empty module instance and thus `(instance)` is an empty +*component* instance. The "inline" forms of these expressions in `modulearg` +and `componentarg` are text format sugar for the "out of line" form in +`instanceexpr`. To show an example of how these instance-creation forms are +useful, we'll first need to introduce the `alias` definitions in the next +section. + + +### Alias Definitions + +Alias definitions project definitions out of other components' index spaces +into the current component's index spaces. As represented in the AST below, +there are two kinds of "targets" for an alias: the `export` of a component +instance, or a local definition of an `outer` component that contains the +current component: +``` +alias ::= (alias ) +aliastarget ::= export + | outer +aliaskind ::= (module ?) + | (component ?) + | (instance ?) + | (func ?) + | (value ?) + | (type ?) + | (table ?) + | (memory ?) + | (global ?) + | ... other Post-MVP Core definition kinds +``` +Aliases add a new element to the index space indicated by `aliaskind`. +(Validation ensures that the `aliastarget` does indeed refer to a matching +definition kind.) The `id` in `aliaskind` is bound to this new index and +thus can be used anywhere a normal `id` can be used. + +In the case of `export` aliases, validation requires that `instanceidx` refers +to an instance which exports `name`. + +In the case of `outer` aliases, the (`outeridx`, `idx`) pair serves as a +[de Bruijn index], with `outeridx` being the number of enclosing components to +skip and `idx` being an index into the target component's `aliaskind` index +space. In particular, `outeridx` can be `0`, in which case the outer alias +refers to the current component. To maintain the acyclicity of module +instantiation, outer aliases are only allowed to refer to *preceding* outer +definitions. + +Components containing outer aliases effectively produce a [closure] at +instantiation time, including a copy of the outer-aliased definitions. Because +of the prevalent assumption that components are (stateless) *values*, outer +aliases are restricted to only refer to stateless definitions: components, +modules and types. (In the future, outer aliases to all kinds of definitions +could be allowed by recording the statefulness of the resulting component in +its type via some kind of "`stateful`" type attribute.) + +Both kinds of aliases come with syntactic sugar for implicitly declaring them +inline: + +For `export` aliases, the inline sugar has the form `(kind +)` +and can be used anywhere a `kind` index appears in the AST. For example, the +following snippet uses an inline function alias: +```wasm +(instance $j (instantiate (component $J) (import "f" (func $i "f")))) +(export "x" (func $j "g" "h")) +``` +which is desugared into: +```wasm +(alias export $i "f" (func $f_alias)) +(instance $j (instantiate (component $J) (import "f" (func $f_alias)))) +(alias export $j "g" (instance $g_alias)) +(alias export $g_alias "h" (func $h_alias)) +(export "x" (func $h_alias)) +``` + +For `outer` aliases, the inline sugar is simply the identifier of the outer +definition, resolved using normal lexical scoping rules. For example, the +following component: +```wasm +(component + (module $M ...) + (component + (instance (instantiate (module $M))) + ) +) +``` +is desugared into: +```wasm +(component $C + (module $M ...) + (component + (alias outer $C $M (module $C_M)) + (instance (instantiate (module $C_M))) + ) +) +``` + +With what's defined so far, we're able to link modules with arbitrary renamings: +```wasm +(component + (module $A + (func (export "one") (result i32) (i32.const 1)) + (func (export "two") (result i32) (i32.const 2)) + (func (export "three") (result i32) (i32.const 3)) + ) + (module $B + (func (import "a" "one") (result i32)) + ) + (instance $a (instantiate (module $A))) + (instance $b1 (instantiate (module $B) + (import "a" (instance $a)) ;; no renaming + )) + (alias export $a "two" (func $a_two)) + (instance $b2 (instantiate (module $B) + (import "a" (instance + (export "one" (func $a_two)) ;; renaming, using explicit alias + )) + )) + (instance $b3 (instantiate (module $B) + (import "a" (instance + (export "one" (func $a "three")) ;; renaming, using inline alias sugar + )) + )) +) +``` +To show analogous examples of linking components, we'll first need to define +a new set of types and functions for components to use. + + +### Type Definitions + +The type grammar below defines two levels of types, with the second level +building on the first: +1. `intertype` (also referred to as "interface types" below): the set of + types of first-class, high-level values communicated across shared-nothing + component interface boundaries +2. `deftype`: the set of types of second-class component definitions which are + imported/exported at instantiation-time. + +The top-level `type` definition is used to define types out-of-line so that +they can be reused via `typeidx` by future definitions. +``` +type ::= (type ? ) +typeexpr ::= + | +deftype ::= + | + | + | + | +moduletype ::= (module *) +moduletype-def ::= + | + | (export ) +core:deftype ::= + | ... Post-MVP additions +componenttype ::= (component (componenttype-def)*) +componenttype-def ::= + | +import ::= (import ) +instancetype ::= (instance (instancetype-def)*) +instancetype-def ::= + | + | (export ) +functype ::= (func (param )* (result )) +valuetype ::= (value ) +intertype ::= unit | bool + | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 + | float32 | float64 + | char | string + | (record (field )*) + | (variant (case (defaults-to )?)*) + | (list ) + | (tuple *) + | (flags *) + | (enum *) + | (union *) + | (optional ) + | (expected ) +``` +On a technical note: this type grammar uses `` and `` +recursively to allow it to more-precisely indicate the kinds of types allowed. +The formal spec AST would instead use a `` with validation rules to +restrict the target type while the formal text format would use something like +[`core:typeuse`], allowing any of: (1) a `typeidx`, (2) an identifier `$T` +resolving to a type definition (using `(type $T)` in cases where there is a +grammatical ambiguity), or (3) an inline type definition that is desugared into +a deduplicated out-of-line type definition. + +Starting with interface types, the set of values allowed for the *fundamental* +interface types is given by the following table: +| Type | Values | +| ------------------------- | ------ | +| `unit` | just one [uninteresting value] | +| `bool` | `true` and `false` | +| `s8`, `s16`, `s32`, `s64` | integers in the range [-2N-1, 2N-1-1] | +| `u8`, `u16`, `u32`, `u64` | integers in the range [0, 2N-1] | +| `float32`, `float64` | [IEEE754] floating-pointer numbers with a single, canonical "Not a Number" ([NaN]) value | +| `char` | [Unicode Scalar Values] | +| `record` | heterogeneous [tuples] of named `intertype` values | +| `variant` | heterogeneous [tagged unions] of named `intertype` values | +| `list` | homogeneous, variable-length [sequences] of `intertype` values | + +The sets of values allowed for the remaining *specialized* interface types are +defined by the following mapping: +``` + string ↦ (list char) + (tuple *) ↦ (record ("𝒊" )*) for 𝒊=0,1,... + (flags *) ↦ (record (field bool)*) + (enum *) ↦ (variant (case unit)*) + (optional ) ↦ (variant (case "none") (case "some" )) + (union *) ↦ (variant (case "𝒊" )*) for 𝒊=0,1,... +(expected ) ↦ (variant (case "ok" ) (case "error" )) +``` +Building on these interface types, there are four kinds of types describing the +four kinds of importable/exportable component definitions. (In the future, a +fifth type will be added for [resource types][Resource and Handle Types].) + +A `functype` describes a component function whose parameters and results are +`intertype` values. Thus `functype` is completely disjoint from +[`core:functype`] in the WebAssembly Core spec, whose parameters and results +are [`core:valtype`] values. Morever, since `core:functype` can only appear +syntactically within the `(module ...)` S-expression of a `moduletype`, there +is never a need to syntactically distinguish `functype` from `core:functype` +in the text format: the context dictates which one a `(func ...)` S-expression +parses into. + +A `valuetype` describes a single `intertype` value this is to be consumed +exactly once during component instantiation. How this happens is described +below along with [`start` definitions](#start-definitions). + +As described above, components and modules are immutable values representing +code that cannot be run until instantiated via `instance` definition. Thus, +`moduletype` and `componenttype` describe *uninstantiated code*. `moduletype` +and `componenttype` contain not just import and export definitions, but also +type and alias definitions, allowing them to capture type sharing relationships +between imports and exports. This type sharing becomes necessary (not just a +size optimization) with the upcoming addition of [type imports and exports] to +Core WebAssembly and, symmetrically, [resource and handle types] to the +Component Model. + +The `instancetype` type constructor describes component instances, which are +named tuples of other definitions. Although `instance` definitions can produce +both module *and* component instances, only *component* instances can be +imported or exported (due to the overall [shared-nothing design](../high-level/Choices.md) +of the Component Model) and thus only *component* instances need explicit type +definitions. Consequently, the text format of `instancetype` does not include +a syntax for defining *module* instance types. As with `componenttype` and +`moduletype`, `instancetype` allows nested type and alias definitions to allow +type sharing. + +Lastly, to ensure cross-language interoperability, `moduletype`, +`componenttype` and `instancetype` all require import and export names to be +unique (within a particular module, component, instance or type thereof). In +the case of `moduletype` and two-level imports, this translates to requiring +that import name *pairs* must be *pair*-wise unique. Since the current Core +WebAssembly validation rules allow duplicate imports, this means that some +valid modules will not be typeable and will fail validation if used with the +Component Model. + +The subtyping between all these types is described in a separate +[subtyping explainer](Subtyping.md). + +With what's defined so far, we can define component types using a mix of inline +and out-of-line type definitions: +```wasm +(component $C + (type $T (list (tuple string bool))) + (type $U (option $T)) + (type $G (func (param (list $T)) (result $U))) + (type $D (component + (alias outer $C $T (type $C_T)) + (type $L (list $C_T)) + (import "f" (func (param $L) (result (list u8)))) + (import "g" $G) + (export "g" $G) + (export "h" (func (result $U))) + )) +) +``` +Note that the inline use of `$G` and `$U` are inline `outer` aliases. + + +### Function Definitions + +To implement or call functions of type [`functype`](#type-definitions), we need +to be able to call across a shared-nothing boundary. Traditionally, this +problem is solved by defining a serialization format for copying data across +the boundary. The Component Model MVP takes roughly this same approach, +defining a linear-memory-based [ABI] called the *Canonical ABI* which +specifies, for any imported or exported `functype`, a corresponding +`core:functype` and rules for copying values into or out of linear memory. The +Component Model differs from traditional approaches, though, in that the ABI is +configurable, allowing different memory representations for the same abstract +value. In the MVP, this configurability is limited to the small set of +`canonopt` shown below. However, Post-MVP, [adapter functions] could be added +to allow far more programmatic control. + +The Canonical ABI, which is described in a separate [explainer](CanonicalABI.md), +is explicitly applied to "wrap" existing functions in one of two directions: +* `canon.lift` wraps a Core WebAssembly function (of type `core:functype`) + inside the current component to produce a Component Model function (of type + `functype`) that can be exported to other components. +* `canon.lower` wraps a Component Model function (of type `functype`) that can + have been imported from another component to produce a Core WebAssembly + function (of type `core:functype`) that can be imported and called from Core + WebAssembly code within the current component. + +Based on this, MVP function definitions simply specify one of these two +wrapping directions along with a set of Canonical ABI configurations. +``` +func ::= (func ? ) +funcbody ::= (canon.lift * ) + | (canon.lower * ) +canonopt ::= string=utf8 + | string=utf16 + | string=latin1+utf16 + | (into ) +``` +Validation fails if multiple conflicting options, such as two `string` +encodings, are given. The `latin1+utf16` encoding is [defined](CanonicalABI.md#latin1-utf16) +in the Canonical ABI explainer. If no string-encoding option is specified, the +default is `string=utf8`. + +The `into` option specifies a target instance which supplies the memory that +the canonical ABI should operate on as well as functions that the canonical ABI +can call to allocate, reallocate and free linear memory. Validation requires that +the given `instanceidx` is a module instance exporting the following fields: +``` +(export "memory" (memory 1)) +(export "realloc" (func (param i32 i32 i32 i32) (result i32))) +(export "free" (func (param i32 i32 i32))) +``` +The 4 parameters of `realloc` are: original allocation (or `0` for none), original +size (or `0` if none), alignment and new desired size. The 3 parameters of `free` +are the pointer, size and alignment. + +With this, we can finally write a non-trivial component that takes a string, +does some logging, then returns a string. +```wasm +(component + (import "wasi:logging" (instance $logging + (export "log" (func (param string))) + )) + (import "libc" (module $Libc + (export "memory" (memory 1)) + (export "realloc" (func (param i32 i32) (result i32))) + (export "free" (func (param i32))) + )) + (instance $libc (instantiate (module $Libc))) + (func $log + (canon.lower (into $libc) (func $logging "log")) + ) + (module $Main + (import "libc" "memory" (memory 1)) + (import "libc" "realloc" (func (param i32 i32) (result i32))) + (import "libc" "free" (func (param i32))) + (import "wasi:logging" "log" (func $log (param i32 i32))) + (func (export "run") (param i32 i32) (result i32 i32) + ... (call $log) ... + ) + ) + (instance $main (instantiate (module $Main) + (import "libc" (instance $libc)) + (import "wasi:logging" (instance (export "log" (func $log)))) + )) + (func (export "run") + (canon.lift (func (param string) (result string)) (into $libc) (func $main "run")) + ) +) +``` +This example shows the pattern of splitting out a reusable language runtime +module (`$Libc`) from a component-specific, non-reusable module (`$Main`). In +addition to reducing code size and increasing code-sharing in multi-component +scenarios, this separation allows `$libc` to be created first, so that its +exports are available for reference by `canon.lower`. Without this separation +(if `$Main` contained the `memory` and allocation functions), there would be a +cyclic dependency between `canon.lower` and `$Main` that would have to be +broken by the toolchain emitting an auxiliary module that broke the cycle using +a shared `funcref` table and `call_indirect`. + +Component Model functions are different from Core WebAssembly functions in that +all control flow transfer is explicitly reflected in their type (`functype`). +For example, with Core WebAssembly [exception handling] and [stack switching], +a `(func (result i32))` can return an `i32`, throw, suspend or trap. In +contrast, a Component Model `(func (result string))` may only return a `string` +or trap. To express failure, Component Model functions should return an +[`expected`](#type-definitions) type and languages with exception handling will +bind exceptions to the `error` case. Similarly, the future addition of +[future and stream types] would explicitly declare patterns of stack-switching +in Component Model function signatures. + + +### Start Definitions + +Like modules, components can have start functions that are called during +instantiation. Unlike modules, components can call start functions at multiple +points during instantiation with each such call having interface-typed +parameters and results. Thus, `start` definitions in components look like +function calls: +``` +start ::= (start (value )* (result (value ))?) +``` +The `(value )*` list specifies the arguments passed to `funcidx` by +indexing into the *value index space*. Value definitions (in the value index +space) are like immutable `global` definitions in Core WebAssembly except they +must be consumed exactly once at instantiation-time. + +As with any other definition kind, value definitions may be supplied to +components through `import` definitions. Using the grammar of `import` already +defined [above](#type-definitions), an example *value import* can be written: +``` +(import "env" (value $env (record (field "locale" (optional string))))) +``` +As this example suggests, value imports can serve as generalized [environment +variables], allowing not just `string`, but the full range of interface types +to describe the imported configuration schema. + +With this, we can define a component that imports a string and computes a new +exported string, all at instantiation time: +```wasm +(component + (import "name" (value $name string)) + (import "libc" (module $Libc + (export "memory" (memory 1)) + (export "realloc" (func (param i32 i32 i32 i32) (result i32))) + (export "free" (func (param i32 i32 i32))) + )) + (instance $libc (instantiate (module $Libc))) + (module $Main + (import "libc" ...) + (func (export "start") (param i32 i32) (result i32 i32) + ... general-purpose compute + ) + ) + (instance $main (instantiate (module $Main) (import "libc" (instance $libc)))) + (func $start + (canon.lift (func (param string) (result string)) (into $libc) (func $main "start")) + ) + (start $start (value $name) (result (value $greeting))) + (export "greeting" (value $greeting)) +) +``` +As this example shows, start functions reuse the same Canonical ABI machinery +as normal imports and exports for getting interface typed values into and out +of linear memory. + + +### Import and Export Definitions + +The rules for [`import`](#type-definitions) and [`export`](#instance-definitions) +definitions have actually already been defined above (with the caveat that the +real text format for `import` definitions would additionally allow binding an +identifier (e.g., adding the `$foo` in `(import "foo" (func $foo))`): +``` +import ::= already defined above as part of , but allow binding an +export ::= already defined above as part of +``` + +With what's defined so far, we can define a component that imports, links and +exports other components: +```wasm +(component + (import "c" (instance $c + (export "f" (func (result string))) + )) + (import "d" (component $D + (import "c" (instance $c + (export "f" (func (result string))) + )) + (export "g" (func (result string))) + )) + (instance $d1 (instantiate (component $D) + (import "c" (instance $c)) + )) + (instance $d2 (instantiate (component $D) + (import "c" (instance + (export "f" (func $d1 "g")) + )) + )) + (export "d2" (instance $d2)) +) +``` +Here, the imported component `d` is instantiated *twice*: first, with its +import satisfied by the imported instance `c`, and second, with its import +satisfied with the first instance of `d`. While this seems a little circular, +note that all definitions are acyclic as is the resulting instance graph. + + +## Component Invariants + +As a consequence of the shared-nothing design described above, all calls into +or out of a component instance necessarily transit through a component function +definition. Thus, component functions form a "membrane" around the collection +of module instances contained by a component instance, allowing the Component +Model to establish invariants that increase optimizability and composability in +ways not otherwise possible in the shared-everything setting of Core +WebAssembly. The Component Model proposes establishing the following three +runtime invariants: +1. Components define a "lockdown" state that prevents continued execution + after a trap. This both prevents continued execution with corrupt state and + also allows more-aggressive compiler optimizations (e.g., store reordering). + This was considered early in Core WebAssembly standardization but rejected + due to the lack of clear trapping boundary. With components, each component + instance is given a mutable "lockdown" state that is set upon trap and + implicitly checked at every execution step by component functions. Thus, + after a trap, it's no longer possible to observe the internal state of a + component instance. +2. Components prevent unexpected reentrance by setting the "lockdown" state + (in the previous bullet) whenever calling out through an import, clearing + the lockdown state on return, thereby preventing reentrant export calls in + the interim. This establishes a clear contract between separate components + that both prevents obscure composition-time bugs and also enables + more-efficient non-reentrant runtime glue code (particularly in the middle + of the [Canonical ABI](CanonicalABI.md)). +3. Components enforce the current informal rule that `start` functions are + only for "internal" initialization by trapping if a component attempts to + call a component import during instantiation. In Core WebAssembly, this + invariant is not viable since cross-module calls are often necessary when + initializing shared linear memory (e.g., calling `libc`'s `malloc`). + However, at the granularity of components, this invariant appears viable and + would allow runtimes and toolchains considerable optimization flexibility + based on the resulting purity of instantiation. As one example, tools like + [`wizer`] could be used to *transparently* snapshot the post-instantiation + state of a component to reuse in future instantiations. As another example, + a component runtime could optimize the instantiation of a component DAG by + transparently instantiating non-root components lazily and/or in parallel. + + +## JavaScript Embedding + +### JS API + +The [JS API] currently provides `WebAssembly.compile(Streaming)` which take +raw bytes from an `ArrayBuffer` or `Response` object and produces +`WebAssembly.Module` objects that represent decoded and validated modules. To +natively support the Component Model, the JS API would be extended to allow +these same JS API functions to accept component binaries and produce new +`WebAssembly.Component` objects that represent decoded and validated +components. The [binary format of components](Binary.md) is designed to allow +modules and components to be distinguished by the first 8 bytes of the binary +(splitting the 32-bit [`version`] field into a 16-bit `version` field and a +16-bit `kind` field with `0` for modules and `1` for components). + +Once compiled, a `WebAssemby.Component` could be instantiated using the +existing JS API `WebAssembly.instantiate(Streaming)`. Since components have the +same basic import/export structure as modules, this mostly just means extending +the [*read the imports*] logic to support single-level imports as well as +imports of modules, components and instances. Since the results of +instantiating a component is a record of JavaScript values, just like an +instantiated module, `WebAssembly.instantiate` would always produce a +`WebAssembly.Instance` object for both module and component arguments. + +Lastly, when given a component binary, the compile-then-instantiate overloads +of `WebAssembly.instantiate(Streaming)` would inherit the compound behavior of +the abovementioned functions (again, using the `version` field to eagerly +distinguish between modules and components). + +For example, the following component: +```wasm +;; a.wasm +(component + (import "one" (func)) + (import "two" (value string)) + (import "three" (instance + (export "four" (instance + (export "five" (module + (import "six" "a" (func)) + (export "six" "b" (func)) + )) + )) + )) + ... +) +``` +and module: +```wasm +;; b.wasm +(module + (import "six" "a" (func)) + (import "six" "b" (func)) + ... +) +``` +could be successfully instantiated via: +```js +WebAssembly.instantiateStreaming(fetch('./a.wasm'), { + one: () => (), + two: "hi", + three: { + four: { + five: await WebAssembly.instantiateStreaming(fetch('./b.wasm')) + } + } +}); +``` + +The other significant addition to the JS API would be the expansion of the set +of WebAssembly types coerced to and from JavaScript values (by [`ToJSValue`] +and [`ToWebAssemblyValue`]) to include all of [`intertype`](#type-definitions). +At a high level, the additional coercions would be: + +| Interface Type | `ToJSValue` | `ToWebAssemblyValue` | +| -------------- | ----------- | -------------------- | +| `unit` | `null` | accept everything | +| `bool` | `true` or `false` | `ToBoolean` | +| `s8`, `s16`, `s32` | as a Number value | `ToInt32` | +| `u8`, `u16`, `u32` | as a Number value | `ToUint32` | +| `s64` | as a BigInt value | `ToBigInt64` | +| `u64` | as a BigInt value | `ToBigUint64` | +| `float32`, `float64` | as a Number, mapping the canonical NaN to [JS NaN] | `ToNumber` mapping [JS NaN] to the canonical NaN | +| `char` | same as [`USVString`] | same as [`USVString`], throw if the USV length is not 1 | +| `record` | TBD: maybe a [JS Record]? | same as [`dictionary`] | +| `variant` | TBD | TBD | +| `list` | same as [`sequence`] | same as [`sequence`] | +| `string` | same as [`USVString`] | same as [`USVString`] | +| `tuple` | TBD: maybe a [JS Tuple]? | TBD | +| `flags` | TBD: maybe a [JS Record]? | same as [`dictionary`] of `boolean` fields | +| `enum` | same as [`enum`] | same as [`enum`] | +| `optional` | same as [`T?`] | same as [`T?`] | +| `union` | same as [`union`] | same as [`union`] | +| `expected` | same as `variant`, but coerce a top-level `error` return value to a thrown exception | same as `variant`, but coerce uncaught exceptions to top-level `error` return values | + +Notes: +* The forthcoming addition of [resource and handle types] would additionally + allow coercion to and from the remaining Symbol and Object JavaScript value + types. +* The forthcoming addition of [future and stream types] would allow `Promise` + and `ReadableStream` values to be passed directly to and from components + without requiring handles or callbacks. +* When an imported JavaScript function is a built-in function wrapping a Web + IDL function, the specified behavior should allow the intermediate JavaScript + call to be optimized away when the types are sufficiently compatible, falling + back to a plain call through JavaScript when the types are incompatible or + when the engine does not provide a separate optimized call path. + + +### ESM-integration + +Like the JS API, [ESM-integration] can be extended to load components in all +the same places where modules can be loaded today, branching on the `kind` +field in the binary format to determine whether to decode as a module or a +component. The main question is how to deal with component imports having a +single string as well as the new importable component, module and instance +types. Going through these one by one: + +For component imports of module type, we need a new way to request that the ESM +loader parse or decode a module without *also* instantiating that module. +Recognizing this same need from JavaScript, there is a TC39 proposal called +[Import Reflection] that adds the ability to write, in JavaScript: +```js +import Foo from "./foo.wasm" as "wasm-module"; +assert(Foo instanceof WebAssembly.Module); +``` +With this extension to JavaScript and the ESM loader, a component import +of module type can be treated the same as `import ... as "wasm-module"`. + +Component imports of component type would work the same way as modules, +potentially replacing `"wasm-module"` with `"wasm-component"`. + +In all other cases, the (single) string imported by a component is first +resolved to a [Module Record] using the same process as resolving the +[Module Specifier] of a JavaScript `import`. After this, the handling of the +imported Module Record is determined by the import type: + +For imports of instance type, the ESM loader would treat the exports of the +instance type as if they were the [Named Imports] of a JavaScript `import`. +Thus, single-level imports of instance type act like the two-level imports +of Core WebAssembly modules where the first-level has been factored out. Since +the exports of an instance type can themselves be instance types, this process +must be performed recursively. + +Otherwise, function or value imports are treated like an [Imported Default Binding] +and the Module Record is converted to its default value. This allows the following +component: +```wasm +(component + (import "./foo.js" (func (result string))) +) +``` +to be satisfied by a JavaScript module via ESM-integration: +```js +// foo.js +export default () => "hi"; +``` + + +## Examples + +For some use-case-focused, worked examples, see: +* [Link-time virtualization example](examples/LinkTimeVirtualization.md) +* [Shared-everything dynamic linking example](examples/SharedEverythingDynamicLinking.md) +* [Component Examples presentation](https://docs.google.com/presentation/d/11lY9GBghZJ5nCFrf4MKWVrecQude0xy_buE--tnO9kQ) + + + +[Structure Section]: https://webassembly.github.io/spec/core/syntax/index.html +[`core:module`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-module +[`core:export`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-export +[`core:import`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-import +[`core:importdesc`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-importdesc +[`core:functype`]: https://webassembly.github.io/spec/core/syntax/types.html#syntax-functype +[`core:valtype`]: https://webassembly.github.io/spec/core/syntax/types.html#value-types + +[Text Format Section]: https://webassembly.github.io/spec/core/text/index.html +[Abbreviations]: https://webassembly.github.io/spec/core/text/conventions.html#abbreviations +[`core:typeuse`]: https://webassembly.github.io/spec/core/text/modules.html#type-uses +[func-import-abbrev]: https://webassembly.github.io/spec/core/text/modules.html#text-func-abbrev + +[Binary Format Section]: https://webassembly.github.io/spec/core/binary/index.html +[`version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version + +[JS API]: https://webassembly.github.io/spec/js-api/index.html +[*read the imports*]: https://webassembly.github.io/spec/js-api/index.html#read-the-imports +[`ToJSValue`]: https://webassembly.github.io/spec/js-api/index.html#tojsvalue +[`ToWebAssemblyValue`]: https://webassembly.github.io/spec/js-api/index.html#towebassemblyvalue +[`USVString`]: https://webidl.spec.whatwg.org/#es-USVString +[`sequence`]: https://webidl.spec.whatwg.org/#es-sequence +[`dictionary`]: https://webidl.spec.whatwg.org/#es-dictionary +[`enum`]: https://webidl.spec.whatwg.org/#es-enumeration +[`T?`]: https://webidl.spec.whatwg.org/#es-nullable-type +[`union`]: https://webidl.spec.whatwg.org/#es-union +[JS NaN]: https://tc39.es/ecma262/#sec-ecmascript-language-types-number-type +[Import Reflection]: https://github.com/tc39-transfer/proposal-import-reflection +[Module Record]: https://tc39.es/ecma262/#sec-abstract-module-records +[Module Specifier]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-ModuleSpecifier +[Named Imports]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-NamedImports +[Imported Default Binding]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-ImportedDefaultBinding + +[JS Tuple]: https://github.com/tc39/proposal-record-tuple +[JS Record]: https://github.com/tc39/proposal-record-tuple + +[De Bruijn Index]: https://en.wikipedia.org/wiki/De_Bruijn_index +[Closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming) +[Uninteresting Value]: https://en.wikipedia.org/wiki/Unit_type#In_programming_languages +[IEEE754]: https://en.wikipedia.org/wiki/IEEE_754 +[NaN]: https://en.wikipedia.org/wiki/NaN +[Unicode Scalar Values]: https://unicode.org/glossary/#unicode_scalar_value +[Tuples]: https://en.wikipedia.org/wiki/Tuple +[Tagged Unions]: https://en.wikipedia.org/wiki/Tagged_union +[Sequences]: https://en.wikipedia.org/wiki/Sequence +[ABI]: https://en.wikipedia.org/wiki/Application_binary_interface +[Environment Variables]: https://en.wikipedia.org/wiki/Environment_variable + +[Module Linking]: https://github.com/webassembly/module-linking/ +[Interface Types]: https://github.com/webassembly/interface-types/ +[Type Imports and Exports]: https://github.com/WebAssembly/proposal-type-imports +[Exception Handling]: https://github.com/webAssembly/exception-handling +[Stack Switching]: https://github.com/WebAssembly/stack-switching +[ESM-integration]: https://github.com/WebAssembly/esm-integration + +[Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions +[Canonical ABI]: CanonicalABI.md + +[`wizer`]: https://github.com/bytecodealliance/wizer + +[Scoping and Layering]: https://docs.google.com/presentation/d/1PSC3Q5oFsJEaYyV5lNJvVgh-SNxhySWUqZ6puyojMi8 +[Resource and Handle Types]: https://docs.google.com/presentation/d/1ikwS2Ps-KLXFofuS5VAs6Bn14q4LBEaxMjPfLj61UZE +[Future and Stream Types]: https://docs.google.com/presentation/d/1WtnO_WlaoZu1wp4gI93yc7T_fWTuq3RZp8XUHlrQHl4 diff --git a/design/mvp/FutureFeatures.md b/design/mvp/FutureFeatures.md new file mode 100644 index 0000000..cf986b6 --- /dev/null +++ b/design/mvp/FutureFeatures.md @@ -0,0 +1,80 @@ +# Future Features + +As with Core WebAssembly 1.0, the Component Model 1.0 aims to be a Minimum +Viable Product (MVP), assuming incremental, backwards-compatible +standardization to continue after the initial "1.0" release. The following is +an incomplete list of specific features intentionally punted from the MVP. See +also the high-level [post-MVP use cases](../high-level/UseCases.md#post-mvp) +and [non-goals](../high-level/Goals.md#non-goals). + + +## Custom ABIs via "adapter functions" + +The original Interface Types proposal includes the goal of avoiding a fixed +serialization format, as this often incurs extra copying when the source or +destination language-runtime data structures don't precisely match the fixed +serialization format. A significant amount of work was spent designing a +language of [adapter functions] that provided fairly general programmatic +control over the process of serializing and deserializing interface-typed values. +(The Interface Types Explainer currently contains a snapshot of this design.) +However, a significant amount of additional design work remained, including +(likely) changing the underlying semantic foundations from lazy evaluation to +algebraic effects. + +In pursuit of a timely MVP and as part of the overall [scoping and layering proposal], +the goal of avoiding a fixed serialization format was dropped from the MVP, by +instead defining a [Canonical ABI](CanonicalABI.md) in the MVP. However, the +current design of [function definitions](Explainer.md#function-definitions) +anticipates a future extension whereby function bodies can contain not just the +fixed Canonical ABI-following `canon.lift` and `canon.lower` but, +alternatively, general adapter function code. + +In this future state, `canon.lift` and `canon.lower` could be specified by +simple expansion into the adapter code, making these instructions effectively +macros. However, even in this future state, there is still concrete value in +having a fixedly-defined Canonical ABI as it allows more-aggressive +optimization of calls between components (which both use the Canonical ABI) and +between a component and the host (which often must use a fixed ABI for calling +to and from the statically-compiled host implementation language). See +[`list.lift_canon` and `list.lower_canon`] for more details. + + +## Shared-some-things linking via "adapter modules" + +The original [Interface Types proposal] and the re-layered [Module Linking +proposal] both included an "adapter module" definition that allowed import and +export of both Core WebAssembly and Component Model Definitions and thus did +not establish a shared-nothing boundary. Since [component invariants] and +[GC-free runtime instantiation] both require a shared-nothing boundary, two +distinct "component" and "adapter module" concepts would need to be defined, +with all their own distinct types, index spaces, etc. Having both features in +the MVP adds non-trivial implementation complexity over having just one. +Additionally, having two similar-but-different, partially-overlapping concepts +makes the whole proposal harder to explain. Thus, the MVP drops the concept of +"adapter modules", including only shared-nothing "components". However, if +concrete future use cases emerged for creating modules that partially used +interface types and partially shared linear memory, "adapter modules" could be +added as a future feature. + + +## Shared-everything Module Linking in Core WebAssembly + +[Originally][Core Module Linking], Module Linking was proposed as an addition +to the Core WebAssembly specification, adding only the new concepts of instance +and module definitions (which, like other kinds of definitions, could be +imported and exported). As part of the overall [scoping and layering proposal], +Module Linking as moved into a layer above WebAssembly and merged with the +Interface Types proposal. However, it may still make sense and be complementary +to the Component Model to add Module Linking to Core WebAssembly in the future +as originally proposed. + + + +[Interface Types Proposal]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md +[Module Linking Proposal]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md +[Adapter Functions]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md#adapter-functions +[Scoping and Layering Proposal]: https://docs.google.com/presentation/d/1PSC3Q5oFsJEaYyV5lNJvVgh-SNxhySWUqZ6puyojMi8 +[`list.lift_canon` and `list.lower_canon`]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md#optimization-canonical-representation +[Component Invariants]: Explainer.md#component-invariants +[GC-free Runtime Instantiation]: https://docs.google.com/presentation/d/1PSC3Q5oFsJEaYyV5lNJvVgh-SNxhySWUqZ6puyojMi8/edit#slide=id.gd06989d984_1_274 +[Core Module Linking]: https://github.com/WebAssembly/module-linking/blob/63cd6c0e3ac5c0cdb798a985790f51ccdd77af00/proposals/module-linking/Explainer.md diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md new file mode 100644 index 0000000..241f8cd --- /dev/null +++ b/design/mvp/Subtyping.md @@ -0,0 +1,24 @@ +# Subtyping + +TODO: write this up in more detail. + +But roughly speaking: + +| Type | Subtyping | +| ------------------------- | --------- | +| `unit` | every interface type is a subtype of `unit` | +| `bool` | | +| `s8`, `s16`, `s32`, `s64`, `u8`, `u16`, `u32`, `u64` | lossless coercions are allowed | +| `float32`, `float64` | `float32 <: float64` | +| `char` | | +| `record` | fields can be reordered; covariant field payload subtyping; superfluous fields can be ignored in the subtype; `optional` fields can be ignored in the supertype | +| `variant` | cases can be reordered; contravariant case payload subtyping; superfluous cases can be ignored in the supertype; `defaults-to` cases can be ignored in the subtype | +| `list` | covariant element subtyping | +| `tuple` | `(tuple T ...) <: T` | +| `optional` | `T <: (optional T)` | +| `expected` | `T <: (expected T _)` | +| `union` | `T <: (union ... T ...)` | +| `func` | parameter names must match in order; covariant parameter subtyping; superfluous parameters can be ignored in the subtype; `optional` parameters can be ignored in the supertype; contravariant result subtyping | + +The remaining specialized interface types inherit their subtyping from their +fundamental interface types. diff --git a/design/mvp/examples/LinkTimeVirtualization.md b/design/mvp/examples/LinkTimeVirtualization.md new file mode 100644 index 0000000..584e6c1 --- /dev/null +++ b/design/mvp/examples/LinkTimeVirtualization.md @@ -0,0 +1,78 @@ +# Link-time Virtualization + +The idea with **link-time virtualization** use cases is to take the static +dependency graph on the left (where all 3 components import the +`wasi:filesystem` interface) and produce the runtime instance graph on the +right, where the `parent` instance has created a `virtualized` instance and +supplied it to a new `child` instance as the `wasi:filesystem` implementation. + +

+ +Importantly, the `child` instance has no access to the `wasi:filesystem` +instance imported by the `parent` instance. + +We start with the `child.wat` that has been written and compiled separately, +without regard to `parent.wasm`: +```wasm +;; child.wat +(component + (import "wasi:filesystem" (instance + (export "read" (func ...)) + (export "write" (func ...)) + )) + ... +) +``` + +We want to write a parent component that reuses the child component, giving the +child component a virtual filesystem. This virtual filesystem can be factored +out and reused as a separate component: +```wasm +;; virtualize.wat +(component + (import "wasi:filesystem" (instance $fs + (export "read" (func ...)) + (export "write" (func ...)) + )) + (func (export "read") + ... transitively calls (func $fs "read) + ) + (func (export "write") + ... transitively calls (func $fs "write") + ) +) +``` + +We now write the parent component by composing `child.wasm` with +`virtualize.wasm`: +```wasm +;; parent.wat +(component + (import "wasi:filesystem" (instance $real-fs ...)) + (import "./virtualize.wasm" (component $Virtualize ...)) + (import "./child.wasm" (component $Child ...)) + (instance $virtual-fs (instantiate (component $Virtualize) + (import "wasi:filesystem" (instance $real-fs)) + )) + (instance $child (instantiate (component $Child) + (import "wasi:filesystem" (instance $virtual-fs)) + )) +) +``` +Here we import the `child` and `virtualize` components, but they could also be +trivially copied in-line into the `parent` component using nested component +definitions in place of imports: +```wasm +;; parent.wat +(component + (import "wasi:filesystem" (instance $real-fs ...)) + (component $Virtualize ... copied inline ...) + (component $Child ... copied inline ...) + (instance $virtual-fs (instantiate (component $Virtualize) + (import "wasi:filesystem" (instance $real-fs)) + )) + (instance $child (instantiate (component $Child) + (import "wasi:filesystem" (instance $virtual-fs)) + )) +) +``` diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md new file mode 100644 index 0000000..b03d169 --- /dev/null +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -0,0 +1,422 @@ +# Shared-Everything Dynamic Linking + +*Shared-everything dynamic linking* refers to the ability to create a component +out of multiple Core WebAssembly modules so that common modules can be shared +with other components. This provides an alternative to *static linking* which +forces common code to be copied into each component. This type of linking +should be able to leverage of existing support for native dynamic linking (of +`.dll`s or `.so`s) which a single shared linear memory (hence +*shared-everything* dynamic linking). + +Shared-everything dynamic linking should be *complementary* to the +shared-nothing dynamic linking of components described in the +[explainer](Explainer.md). In particular, dynamically-linked modules must not +share linear memory across component instance boundaries. For example, we want +the static dependency graph on the left to produce the runtime instance graph +on the right: create the runtime instance graph on the right: + +

+ +Here, `libc` defines and exports a linear memory that is imported by the other +moudle instances contained within the same component instance. Thus, at +runtime, the composite application creates *three* instances of the `libc` +module (creating *three* linear memories) yet contains only *one* copy of the +`libc` code. This use case is tricky to implement in many module systems where +sharing module code implies sharing module instance state. + + +## `libc` + +As with native dynamic linking, shared-everything dynamic linking requires +toolchain conventions that are followed by all the toolchains producing the +participating modules. Here, as in most conventions, `libc` serves a special +role and is assumed to be bundled with the compiler. As part of this special +role, `libc` defines and exports linear memory, with the convention that +every other module imports memory from `libc`: +```wasm +;; libc.wat +(module + (memory (export "memory") 1) + (func (export "malloc") (param i32) (result i32) ...impl) + ... +) +``` + +Our compiler will also bundle standard library headers which contain +declarations compatible with `libc`. First, though, we first need some helper +macros, which we'll place in `stddef.h`: +```c +/* stddef.h */ +#define WASM_IMPORT(module,name) __attribute__((import_module(#module), import_name(#name))) name +#define WASM_EXPORT(name) __attribute__((export_name(#name))) name +``` + +These macros use the clang-specific attributes [`import_module`], [`import_name`] +and [`export_name`] to ensure that the annotated C functions produce the +correct wasm import or export definitions. Other compilers would use their own +magic syntax to achieve the same effect. Using these macros, we can declare +`malloc` in `stdlib.h`: +```c +/* stdlib.h */ +#include +#define LIBC(name) WASM_IMPORT(libc, name) +void* LIBC(malloc)(size_t n); +``` + +With these annotations, C programs that include and call this function will be +compiled to contain the following import: +```wasm +(import "libc" "malloc" (func (param i32) (result i32))) +``` + + +## `libzip` + +The interface exposed by `libzip` to its clients is a header file: +```c +/* libzip.h */ +#include +#define LIBZIP(name) WASM_IMPORT(libzip, name) +void* LIBZIP(zip)(void* in, size_t in_size, size_t* out_size); +``` +which can be implemented by the following source file: +```c +/* libzip.c */ +#include +void* WASM_EXPORT(zip)(void* in, size_t in_size, size_t* out_size) { + ... + void *p = malloc(n); + ... +} +``` + +Note that `libzip.h` annotates the `zip` declaration with an *import* attribute +so that client modules generate proper wasm *import definitions* while `libzip.c` +annotates the `zip` definition with an *export* attribute so that this function +generates a proper *export definition* in the compiled module. Compiling with +`clang -shared libzip.c` produces a module shaped like: +```wasm +;; libzip.wat +(module + (import "libc" "memory" (memory 1)) + (import "libc" "malloc" (func (param i32) (result i32))) + (func (export "zip") (param i32 i32 i32) (result i32) + ... + ) +) +``` + + +## `zipper` + +The main module of the `zipper` component is implemented by the following +source file: +```c +/* zipper.c */ +#include +#include "libzip.h" +int main(int argc, char* argv[]) { + ... + void *in = malloc(n); + ... + void *out = zip(in, n, &out_size); + ... +} +``` + +When compiled by a (future) component-aware `clang`, the resulting component +would look like: +```wasm +;; zipper.wat +(component + (import "libc" (module $Libc + (export "memory" (memory 1)) + (export "malloc" (func (param i32) (result i32))) + )) + (import "libzip" (module $Libzip + (import "libc" "memory" (memory 1)) + (import "libc" "malloc" (func (param i32) (result i32))) + (export "zip" (func (param i32 i32 i32) (result i32))) + )) + + (module $Main + (import "libc" "memory" (memory 1)) + (import "libc" "malloc" (func (param i32) (result i32))) + (import "libzip" "zip" (func (param i32 i32 i32) (result i32))) + ... + (func (export "zip") (param i32 i32) (result i32 i32) + ... + ) + ) + + (instance $libc (instantiate (module $Libc))) + (instance $libzip (instantiate (module $Libzip)) + (import "libc" (instance $libc)) + )) + (instance $main (instantiate (module $Main) + (import "libc" (instance $libc)) + (import "libzip" (instance $libzip)) + )) + (func (export "zip") + (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "zip")) + ) +) +``` +Here, `zipper` links its own private module code (`$Main`) with the shareable +`libc` and `libzip` module code, ensuring that each new instance of `zipper` +gets a fresh, private instance of `libc` and `libzip`. + + +## `libimg` + +Next we create a shared module `libimg` that depends on `libzip`: +```c +/* libimg.h */ +#include +#define LIBIMG(name) WASM_IMPORT(libimg, name) +void* LIBIMG(compress)(void* in, size_t in_size, size_t* out_size); +``` +```c +/* libimg.c */ +#include +#include "libzip.h" +void* WASM_EXPORT(compress)(void* in, size_t in_size, size_t* out_size) { + ... + void *out = zip(in, in_size, &out_size); + ... +} +``` +Compiling with `clang -shared libimg.c` produces a `libimg` module: +```wat +;; libimg.wat +(module + (import "libc" "memory" (memory 1)) + (import "libc" "malloc" (func (param i32) (result i32))) + (import "libzip" "zip" (func (param i32 i32 i32) (result i32))) + (func (export "compress") (param i32 i32 i32) (result i32) + ... + ) +) +``` + + +## `imgmgk` + +The main module of the `imgmgk` component is implemented by including +`stddef.h`, `libzip.h` and `libimg.h`. When compiled by a (future) +component-aware `clang`, the resulting component would look like: +```wasm +;; imgmgk.wat +(component $Imgmgk + (import "libc" (module $Libc ...)) + (import "libzip" (module $Libzip ...)) + (import "libimg" (module $Libimg ...)) + + (module $Main + (import "libc" "memory" (memory 1)) + (import "libc" "malloc" (func (param i32) (result i32))) + (import "libimg" "compress" (func (param i32 i32 i32) (result i32))) + ... + (func (export "transform") (param i32 i32) (result i32 i32) + ... + ) + ) + + (instance $libc (instantiate (module $Libc))) + (instance $libzip (instantiate (module $Libzip) + (import "libc" (instance $libc)) + )) + (instance $libimg (instantiate (module $Libimg) + (import "libc" (instance $libc)) + (import "libzip" (instance $libzip)) + )) + (instance $main (instantiate (module $Main) + (import "libc" (instance $libc)) + (import "libimg" (instance $libimg)) + )) + (func (export "transform") + (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "transform")) + ) +) +``` +Here, we see the general pattern emerging of the dependency DAG between +dynamically-linked modules expressed through `instance` definitions. + + +## `app` + +Finally, we can create the `app` component by composing the `zipper` and `imgmgk` +components. The resulting component could look like: +```wasm +;; app.wat +(component + (import "libc" (module $Libc ...)) + (import "libzip" (module $Libzip ...)) + (import "libimg" (module $Libimg ...)) + + (import "zipper" (component $Zipper ...)) + (import "imgmgk" (component $Imgmgk ...)) + + (module $Main + (import "libc" "memory" (memory 1)) + (import "libc" "malloc" (func (param i32) (result i32))) + (import "zipper" "zip" (func (param i32 i32) (result i32 i32))) + (import "imgmgk" "transform" (func (param i32 i32) (result i32 i32))) + ... + (func (export "run") (param i32 i32) (result i32 i32) + ... + ) + ) + + (instance $zipper (instantiate (component $Zipper) + (import "libc" (module $Libc)) + (import "libzip" (module $Libzip)) + )) + (instance $imgmgk (instantiate (component $Imgmgk) + (import "libc" (module $Libc)) + (import "libzip" (module $Libzip)) + (import "libimg" (module $Libimg)) + )) + + (instance $libc (instantiate (module $Libc))) + (func $zip + (canon.lower (into $libc) (func $zipper "zip")) + ) + (func $transform + (canon.lower (into $libc) (func $imgmgk "transform")) + ) + (instance $main (instantiate (module $Main) + (import "libc" (instance $libc)) + (import "zipper" (instance (export "zip" (func $zipper "zip")))) + (import "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) + )) + (func (export "run") + (canon.lift (func (param string) (result string)) (func $main "run")) + ) +) +``` +Note here that `$Libc` is passed to the nested `zipper` and `imgmgk` instances +as an (uninstantiated) module before `app` creates its own private instance +of `libc` linked with its private `$Main` module instance. Thus, the three +components share `libc` *code* without sharing `libc` *state*, realizing the +instance diagram at the beginning. + + +## Cyclic Dependencies + +If cyclic dependencies are necessary, such cycles can be broken by: +* identifying a [spanning] DAG over the module dependency graph; +* keeping the calls along the spanning DAG's edges as normal function imports + and direct calls (as shown above); then +* converting calls along "back edges" into indirect calls (`call_indirect`) of + an imported `(global i32)` containing the index in the function table. + +For example, a cycle between modules `$A` and `$B` could be broken by arbitrarily +saying that `$B` gets to directly import `$A` and then routing `$A`'s imports +through a shared mutable `funcref` table via `call_indirect`: +```wat +(module $A + ;; A imports B.bar indirectly via table+index + (import "linkage" "table" (table funcref)) + (import "linkage" "bar-index" (global $bar-index (mut i32))) + + (type $FooType (func)) + (func $some_use + (call_indirect $FooType (global.get $bar-index)) + ) + + ;; A exports A.foo directly to B + (func (export "foo") + ... + ) +) +``` +```wat +(module $B + ;; B directly imports A.foo + (import "a" "foo" (func $a_foo)) ;; B gets to directly import A + + ;; B indirectly exports B.bar to A + (func $bar ...) + (import "linkage" "table" (table $ftbl funcref)) + (import "linkage" "bar-index" (global $bar-index (mut i32))) + (elem (table $ftbl) (offset (i32.const 0)) $bar) + (func $start (global.set $bar-index (i32.const 0))) + (start $start) +) +``` +Lastly, a toolchain can link these together into a whole program by emitting +a wrapper adapter module that supplies both `$A` and `$B` with a shared +function table and `bar-index` mutable global. +```wat +(component + (import "A" (module $A ...)) + (import "B" (module $B ...)) + (module $Linkage + (global (export "bar-index") (mut i32)) + (table (export "table") funcref 1) + ) + (instance $linkage (instantiate (module $Linkage))) + (instance $a (instantiate (module $A) + (import "linkage" (instance $linkage)) + )) + (instance $b (instantiate (module $B) + (import "a" (instance $a)) + (import "linkage" (instance $linkage)) + )) +) +``` + + +## Function Pointer Identity + +To ensure C function pointer identity across shared libraries, for each exported +function, a shared library will need to export both the `func` definition and a +`(global (mut i32))` containing that `func`'s index in the global `funcref` table. + +Because a shared library can't know the absolute offset in the global `funcref` +table for all of its exported functions, the table slots' offsets must be +dynamic. One way this could be achieved is by the shared library calling into a +`ftalloc` export of `libc` (analogous to `malloc`, but for allocating from the +global `funcref` table) from the shared library's `start` function. Elements could +then be written into the table at the allocated offset and their indices +written into the exported `(global (mut i32))`s. + +(In theory, more efficient schemes are possible when the main program has more +static knowledge of its shared libraries.) + + +## Linear-memory stack pointer + +To implement address-taken local variables, varargs, and other corner cases, +wasm compilers maintain a stack in linear memory that is maintained in +lock-step with the native WebAssembly stack. The pointer to the top of the +linear-memory stack is usually maintained in a single `(global (mut i32))` +variable that must be shared by all linked instances. Following the above +linking scheme, this global would naturally be exported by `libc` along +with linear memory. + + +## Runtime Dynamic Linking + +The general case of runtime dynamic linking in the style of `dlopen`, where an +*a priori unknown* module is linked into the program at runtime, is not possible +to do purely within wasm with this proposal. Additional host-provided APIs are +required for: +* compiling files or bytes into a module; +* reading the import strings of a module; +* dynamically instantiating a module given a list of import values; and +* dynamically extracting the exports of an instance. + +Such APIs could be standardized as part of [WASI]. Moreover, the [JS API] +possesses all the above capabilities allowing the WASI APIs to be prototyped and +implemented in the browser. + + + +[`import_module`]: https://clang.llvm.org/docs/AttributeReference.html#import-module +[`import_name`]: https://clang.llvm.org/docs/AttributeReference.html#import-name +[`export_name`]: https://clang.llvm.org/docs/AttributeReference.html#export-name +[Spanning]: https://en.wikipedia.org/wiki/Spanning_tree +[WASI]: https://github.com/webassembly/wasi +[JS API]: https://webassembly.github.io/spec/js-api/index.html diff --git a/design/mvp/examples/images/link-time-virtualization.svg b/design/mvp/examples/images/link-time-virtualization.svg new file mode 100644 index 0000000..19b8bae --- /dev/null +++ b/design/mvp/examples/images/link-time-virtualization.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/design/mvp/examples/images/shared-everything-dynamic-linking.svg b/design/mvp/examples/images/shared-everything-dynamic-linking.svg new file mode 100644 index 0000000..294acc8 --- /dev/null +++ b/design/mvp/examples/images/shared-everything-dynamic-linking.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/design/proposals/README.md b/design/proposals/README.md deleted file mode 100644 index 3672521..0000000 --- a/design/proposals/README.md +++ /dev/null @@ -1,5 +0,0 @@ -This subdirectory will contain the explainers specific to each proposal, -starting initially with [module-linking] and [interface-types]. - -[module-linking]: https://github.com/webassembly/module-linking/ -[interface-types]: https://github.com/webassembly/interface-types/ diff --git a/spec/README.md b/spec/README.md index 241758e..c9a2417 100644 --- a/spec/README.md +++ b/spec/README.md @@ -1,5 +1,4 @@ -This directory will be initialized by the [module-linking] proposal to contain -the Component Model specification, analogous to the [Core spec repo]. +This directory will contain the formal Component Model specification, a +reference interpreter and test suite, similar to the [Core spec repo]. -[module-linking]: https://github.com/webassembly/module-linking/ [Core spec repo]: https://github.com/WebAssembly/spec/ From 8d429ed629ca2ea46451932b8232e74b9099ee14 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 22 Feb 2022 19:14:24 -0600 Subject: [PATCH 004/301] Factor out into --- design/mvp/Explainer.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index d335119..061d7ab 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -172,9 +172,9 @@ there are two kinds of "targets" for an alias: the `export` of a component instance, or a local definition of an `outer` component that contains the current component: ``` -alias ::= (alias ) -aliastarget ::= export - | outer +alias ::= (alias ) +aliastarget ::= export + | outer aliaskind ::= (module ?) | (component ?) | (instance ?) From 0f8e2d0e11c8ce0f256d82a7ae9fb13b51c2770f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 24 Feb 2022 10:55:04 -0600 Subject: [PATCH 005/301] Fix bug in JS API example --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 061d7ab..5272029 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -745,7 +745,7 @@ WebAssembly.instantiateStreaming(fetch('./a.wasm'), { two: "hi", three: { four: { - five: await WebAssembly.instantiateStreaming(fetch('./b.wasm')) + five: await WebAssembly.compileStreaming(fetch('./b.wasm')) } } }); From 63c8c0283a6da84c1e78b56ac1628d766ece49fe Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 24 Feb 2022 10:56:40 -0600 Subject: [PATCH 006/301] Fix other bug in JS API example --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 5272029..81da04d 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -722,7 +722,7 @@ For example, the following component: (export "four" (instance (export "five" (module (import "six" "a" (func)) - (export "six" "b" (func)) + (import "six" "b" (func)) )) )) )) From 7946c16f7b8343cb86b05f8352accc11bd324ada Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 24 Feb 2022 11:03:18 -0600 Subject: [PATCH 007/301] Extend ESM-integration example a bit --- design/mvp/Explainer.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 81da04d..7967dce 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -830,8 +830,10 @@ Otherwise, function or value imports are treated like an [Imported Default Bindi and the Module Record is converted to its default value. This allows the following component: ```wasm +;; bar.wasm (component (import "./foo.js" (func (result string))) + ... ) ``` to be satisfied by a JavaScript module via ESM-integration: @@ -839,6 +841,10 @@ to be satisfied by a JavaScript module via ESM-integration: // foo.js export default () => "hi"; ``` +when `bar.wasm` is loaded as an ESM: +``` + +``` ## Examples From 8bb66dc0b2a240877692548567a1835965511d35 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 24 Feb 2022 12:38:04 -0600 Subject: [PATCH 008/301] Fix 'type' typo in Binary.md, use varu32 consistently --- design/mvp/Binary.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 90447c5..0067435 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -160,8 +160,8 @@ intertype ::= pit: => pit | 0x6a t: => (optional t) | 0x69 t: u: => (expected t u) field ::= n: t: => (field n t) -case ::= n: t: 0x0 => (case n t) - | n: t: 0x1 i: => (case n t (defaults-to case-label[i])) +case ::= n: t: 0x0 => (case n t) + | n: t: 0x1 i: => (case n t (defaults-to case-label[i])) ``` Notes: * Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] From 3ee971141d1a063e850888043e56dad11ab4e8c2 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 24 Feb 2022 12:55:22 -0600 Subject: [PATCH 009/301] Put ? into the deftype type constructors --- design/mvp/Explainer.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 7967dce..a6f4b5e 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -304,22 +304,22 @@ deftype ::= | | | -moduletype ::= (module *) +moduletype ::= (module ? *) moduletype-def ::= | | (export ) core:deftype ::= | ... Post-MVP additions -componenttype ::= (component (componenttype-def)*) +componenttype ::= (component ? *) componenttype-def ::= | import ::= (import ) -instancetype ::= (instance (instancetype-def)*) +instancetype ::= (instance ? *) instancetype-def ::= | | (export ) -functype ::= (func (param )* (result )) -valuetype ::= (value ) +functype ::= (func ? (param )* (result )) +valuetype ::= (value ? ) intertype ::= unit | bool | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 | float32 | float64 @@ -343,6 +343,11 @@ resolving to a type definition (using `(type $T)` in cases where there is a grammatical ambiguity), or (3) an inline type definition that is desugared into a deduplicated out-of-line type definition. +On another technical note: the optional `id` in all the `deftype` type +constructors (e.g., `(module ? ...)`) is only allowed to be present in the +context of `import` since this is the only context in which binding an +identifier makes sense. + Starting with interface types, the set of values allowed for the *fundamental* interface types is given by the following table: | Type | Values | @@ -609,7 +614,7 @@ definitions have actually already been defined above (with the caveat that the real text format for `import` definitions would additionally allow binding an identifier (e.g., adding the `$foo` in `(import "foo" (func $foo))`): ``` -import ::= already defined above as part of , but allow binding an +import ::= already defined above as part of export ::= already defined above as part of ``` From 9620c37f695a36baeb6d108ba2c16bc1a5b007fd Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 24 Feb 2022 14:25:16 -0600 Subject: [PATCH 010/301] Add TODO section --- design/mvp/Explainer.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index a6f4b5e..3c15745 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -17,6 +17,7 @@ native JavaScript runtime. * [JS API](#JS-API) * [ESM-integration](#ESM-integration) * [Examples](#examples) +* [TODO](#TODO) (Based on the previous [scoping and layering] proposal to the WebAssembly CG, this repo merges and supersedes the [Module Linking] and [Interface Types] @@ -860,6 +861,17 @@ For some use-case-focused, worked examples, see: * [Component Examples presentation](https://docs.google.com/presentation/d/11lY9GBghZJ5nCFrf4MKWVrecQude0xy_buE--tnO9kQ) +## TODO + +The following features are needed to address the [MVP Use Cases](../high-level/UseCases.md) +and will be added over the coming months to complete the MVP proposal: +* concurrency support ([slides][Future And Stream Types]) +* abstract ("resource") types ([slides][Resource and Handle Types]) +* optional imports, definitions and exports (subsuming + [WASI Optional Imports](https://github.com/WebAssembly/WASI/blob/main/legacy/optional-imports.md) + and maybe [conditional-sections](https://github.com/WebAssembly/conditional-sections/issues/22)) + + [Structure Section]: https://webassembly.github.io/spec/core/syntax/index.html [`core:module`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-module From df13b2d062db52ad39d2028beef4d82b7876a39d Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 24 Feb 2022 15:17:06 -0600 Subject: [PATCH 011/301] Clarify no multi-threading in Component Invariants Co-authored-by: Dave Bakker --- design/mvp/Explainer.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 3c15745..400cf27 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -674,7 +674,8 @@ runtime invariants: the interim. This establishes a clear contract between separate components that both prevents obscure composition-time bugs and also enables more-efficient non-reentrant runtime glue code (particularly in the middle - of the [Canonical ABI](CanonicalABI.md)). + of the [Canonical ABI](CanonicalABI.md)). This implies that components by + default don't allow concurrency and multi-threaded access will trap. 3. Components enforce the current informal rule that `start` functions are only for "internal" initialization by trapping if a component attempts to call a component import during instantiation. In Core WebAssembly, this From 274af9d7c8c6f6fe2ab40276e8d0f36a1713143c Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 26 Feb 2022 10:27:47 -0600 Subject: [PATCH 012/301] Add explicit 'core' discriminant to --- design/mvp/Explainer.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 400cf27..39ae2bd 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -113,9 +113,9 @@ instance ::= (instance ? ) instanceexpr ::= (instantiate (module ) (import )*) | (instantiate (component ) (import )*) | * - | + + | core * modulearg ::= (instance ) - | (instance +) + | (instance *) componentarg ::= (module ) | (component ) | (instance ) @@ -153,12 +153,10 @@ passed as a `componentarg` when instantiating a component, not just instances. Component instantiation will be revisited below after introducing the prerequisite type and import definitions. -Lastly, the `(instance *)` and `(instance +)` +Lastly, the `(instance *)` and `(instance *)` expressions allow component and module instances to be created by directly tupling together preceding definitions, without the need to `instantiate` -anything. To disambiguate the empty case, we observe that there is never -a need to import an empty module instance and thus `(instance)` is an empty -*component* instance. The "inline" forms of these expressions in `modulearg` +anything. The "inline" forms of these expressions in `modulearg` and `componentarg` are text format sugar for the "out of line" form in `instanceexpr`. To show an example of how these instance-creation forms are useful, we'll first need to introduce the `alias` definitions in the next From b20acb3c95e454bb3370fcf740faa620a2286d28 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 26 Feb 2022 10:56:33 -0600 Subject: [PATCH 013/301] Make parameter names optional --- design/mvp/Binary.md | 3 ++- design/mvp/Explainer.md | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 0067435..f48a48f 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -131,7 +131,8 @@ instancetype-def ::= 0x01 t: => t import ::= n: dt: => (import n dt) deftypeuse ::= i: => type-index-space[i] (must be ) functype ::= 0x4c param*:vec() t: => (func param* (result t)) -param ::= n: t: => (param n t) +param ::= 0x00 t: => (param t) + | 0x01 n: t: => (param n t) valuetype ::= 0x4b t: => (value t) intertypeuse ::= i: => type-index-space[i] (must be ) | pit: => pit diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 39ae2bd..b833a20 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -317,7 +317,7 @@ instancetype ::= (instance ? *) instancetype-def ::= | | (export ) -functype ::= (func ? (param )* (result )) +functype ::= (func ? (param ? )* (result )) valuetype ::= (value ? ) intertype ::= unit | bool | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 From a6f9e4640459a5136cbb6ec84f7168b2ee65f3fe Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 26 Feb 2022 11:09:05 -0600 Subject: [PATCH 014/301] Revert option/optional name change --- design/mvp/Binary.md | 2 +- design/mvp/Explainer.md | 8 ++++---- design/mvp/Subtyping.md | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index f48a48f..1b75a42 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -158,7 +158,7 @@ intertype ::= pit: => pit | 0x6d n*:vec() => (flags n*) | 0x6c n*:vec() => (enum n*) | 0x6b t*:vec() => (union t*) - | 0x6a t: => (optional t) + | 0x6a t: => (option t) | 0x69 t: u: => (expected t u) field ::= n: t: => (field n t) case ::= n: t: 0x0 => (case n t) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index b833a20..93a079e 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -330,7 +330,7 @@ intertype ::= unit | bool | (flags *) | (enum *) | (union *) - | (optional ) + | (option ) | (expected ) ``` On a technical note: this type grammar uses `` and `` @@ -368,7 +368,7 @@ defined by the following mapping: (tuple *) ↦ (record ("𝒊" )*) for 𝒊=0,1,... (flags *) ↦ (record (field bool)*) (enum *) ↦ (variant (case unit)*) - (optional ) ↦ (variant (case "none") (case "some" )) + (option ) ↦ (variant (case "none") (case "some" )) (union *) ↦ (variant (case "𝒊" )*) for 𝒊=0,1,... (expected ) ↦ (variant (case "ok" ) (case "error" )) ``` @@ -570,7 +570,7 @@ As with any other definition kind, value definitions may be supplied to components through `import` definitions. Using the grammar of `import` already defined [above](#type-definitions), an example *value import* can be written: ``` -(import "env" (value $env (record (field "locale" (optional string))))) +(import "env" (value $env (record (field "locale" (option string))))) ``` As this example suggests, value imports can serve as generalized [environment variables], allowing not just `string`, but the full range of interface types @@ -778,7 +778,7 @@ At a high level, the additional coercions would be: | `tuple` | TBD: maybe a [JS Tuple]? | TBD | | `flags` | TBD: maybe a [JS Record]? | same as [`dictionary`] of `boolean` fields | | `enum` | same as [`enum`] | same as [`enum`] | -| `optional` | same as [`T?`] | same as [`T?`] | +| `option` | same as [`T?`] | same as [`T?`] | | `union` | same as [`union`] | same as [`union`] | | `expected` | same as `variant`, but coerce a top-level `error` return value to a thrown exception | same as `variant`, but coerce uncaught exceptions to top-level `error` return values | diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index 241f8cd..f697078 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -11,14 +11,14 @@ But roughly speaking: | `s8`, `s16`, `s32`, `s64`, `u8`, `u16`, `u32`, `u64` | lossless coercions are allowed | | `float32`, `float64` | `float32 <: float64` | | `char` | | -| `record` | fields can be reordered; covariant field payload subtyping; superfluous fields can be ignored in the subtype; `optional` fields can be ignored in the supertype | +| `record` | fields can be reordered; covariant field payload subtyping; superfluous fields can be ignored in the subtype; `option` fields can be ignored in the supertype | | `variant` | cases can be reordered; contravariant case payload subtyping; superfluous cases can be ignored in the supertype; `defaults-to` cases can be ignored in the subtype | | `list` | covariant element subtyping | | `tuple` | `(tuple T ...) <: T` | -| `optional` | `T <: (optional T)` | +| `option` | `T <: (option T)` | | `expected` | `T <: (expected T _)` | | `union` | `T <: (union ... T ...)` | -| `func` | parameter names must match in order; covariant parameter subtyping; superfluous parameters can be ignored in the subtype; `optional` parameters can be ignored in the supertype; contravariant result subtyping | +| `func` | parameter names must match in order; covariant parameter subtyping; superfluous parameters can be ignored in the subtype; `option` parameters can be ignored in the supertype; contravariant result subtyping | The remaining specialized interface types inherit their subtyping from their fundamental interface types. From 4ebd8afa42ba182d439f19510317e973f19d5ac1 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 28 Feb 2022 09:58:10 -0600 Subject: [PATCH 015/301] Relax type export restriction in binary format to allow any type definition --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 1b75a42..1438956 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -60,7 +60,7 @@ componentarg ::= n: 0x00 m: => n (module | n: 0x02 i: => n (instance i) | n: 0x03 f: => n (func f) | n: 0x04 v: => n (value v) - | n: 0x05 t: => n (type t) (t must be an ) + | n: 0x05 t: => n (type t) export ::= a: => (export a) name ::= n: => n ``` From 38d0a3c3d2c8ee3d7f9baac9c908420133fc8ef7 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 1 Mar 2022 15:41:05 -0600 Subject: [PATCH 016/301] Fix typo and variance direction in subtyping sketch --- design/mvp/Explainer.md | 2 +- design/mvp/Subtyping.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 93a079e..7af9985 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -365,7 +365,7 @@ The sets of values allowed for the remaining *specialized* interface types are defined by the following mapping: ``` string ↦ (list char) - (tuple *) ↦ (record ("𝒊" )*) for 𝒊=0,1,... + (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... (flags *) ↦ (record (field bool)*) (enum *) ↦ (variant (case unit)*) (option ) ↦ (variant (case "none") (case "some" )) diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index f697078..36c6277 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -12,13 +12,13 @@ But roughly speaking: | `float32`, `float64` | `float32 <: float64` | | `char` | | | `record` | fields can be reordered; covariant field payload subtyping; superfluous fields can be ignored in the subtype; `option` fields can be ignored in the supertype | -| `variant` | cases can be reordered; contravariant case payload subtyping; superfluous cases can be ignored in the supertype; `defaults-to` cases can be ignored in the subtype | +| `variant` | cases can be reordered; covariant case payload subtyping; superfluous cases can be ignored in the supertype; `defaults-to` cases can be ignored in the subtype | | `list` | covariant element subtyping | | `tuple` | `(tuple T ...) <: T` | | `option` | `T <: (option T)` | | `expected` | `T <: (expected T _)` | | `union` | `T <: (union ... T ...)` | -| `func` | parameter names must match in order; covariant parameter subtyping; superfluous parameters can be ignored in the subtype; `option` parameters can be ignored in the supertype; contravariant result subtyping | +| `func` | parameter names must match in order; contravariant parameter subtyping; superfluous parameters can be ignored in the subtype; `option` parameters can be ignored in the supertype; covariant result subtyping | The remaining specialized interface types inherit their subtyping from their fundamental interface types. From 0ccd2680df5972ca9964e11c5bd610a82e793147 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 7 Mar 2022 09:52:15 -0600 Subject: [PATCH 017/301] Update URL in CanonicalABI.md Co-authored-by: Liam Murphy --- design/mvp/CanonicalABI.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 1e4ab38..e0ead6b 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1,3 +1,3 @@ # Canonical ABI (sketch) -TODO: import and update [interface-types/#132](https://github.com/WebAssembly/interface-types/pull/132) +TODO: import and update [interface-types/#140](https://github.com/WebAssembly/interface-types/pull/140) From c339dc58f739f6f791fd3e925e72a7c72bef473c Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 7 Mar 2022 12:53:23 -0800 Subject: [PATCH 018/301] Change "import" to "arg" in the instantiate argument syntax. Fixes #2. --- design/mvp/Binary.md | 4 +-- design/mvp/Explainer.md | 31 ++++++++-------- design/mvp/examples/LinkTimeVirtualization.md | 8 ++--- .../SharedEverythingDynamicLinking.md | 36 +++++++++---------- 4 files changed, 40 insertions(+), 39 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 1438956..b6bc29c 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -50,8 +50,8 @@ Notes: (See [Instance Definitions](Explainer.md#instance-definitions) in the explainer.) ``` instance ::= ie: => (instance ie) -instanceexpr ::= 0x00 0x00 m: a*:vec() => (instantiate (module m) (import a)*) - | 0x00 0x01 c: a*:vec() => (instantiate (component c) (import a)*) +instanceexpr ::= 0x00 0x00 m: a*:vec() => (instantiate (module m) (arg a)*) + | 0x00 0x01 c: a*:vec() => (instantiate (component c) (arg a)*) | 0x01 e*:vec() => e* | 0x02 e*:vec() => e* modulearg ::= n: 0x02 i: => n (instance i) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 7af9985..fa275b1 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -110,8 +110,8 @@ supplying a set of named *arguments* which satisfy all the named *imports* of the selected module/component: ``` instance ::= (instance ? ) -instanceexpr ::= (instantiate (module ) (import )*) - | (instantiate (component ) (import )*) +instanceexpr ::= (instantiate (module ) (arg )*) + | (instantiate (component ) (arg )*) | * | core * modulearg ::= (instance ) @@ -125,8 +125,9 @@ componentarg ::= (module ) | (instance *) export ::= (export ) ``` -When instantiating a module via `(instantiate (module $M) *)`, the -two-level imports of the module `$M` are resolved as follows: +When instantiating a module via +`(instantiate (module $M) (arg )*)`, the two-level imports of +the module `$M` are resolved as follows: 1. The first `name` of an import is looked up in the named list of `modulearg` to select a module instance. 2. The second `name` of an import is looked up in the named list of exports of @@ -144,7 +145,7 @@ following component: (func (import "a" "one") (result i32)) ) (instance $a (instantiate (module $A))) - (instance $b (instantiate (module $B) (import "a" (instance $a)))) + (instance $b (instantiate (module $B) (arg "a" (instance $a)))) ) ``` Components, as we'll see below, have single-level imports, i.e., each import @@ -216,13 +217,13 @@ For `export` aliases, the inline sugar has the form `(kind + and can be used anywhere a `kind` index appears in the AST. For example, the following snippet uses an inline function alias: ```wasm -(instance $j (instantiate (component $J) (import "f" (func $i "f")))) +(instance $j (instantiate (component $J) (arg "f" (func $i "f")))) (export "x" (func $j "g" "h")) ``` which is desugared into: ```wasm (alias export $i "f" (func $f_alias)) -(instance $j (instantiate (component $J) (import "f" (func $f_alias)))) +(instance $j (instantiate (component $J) (arg "f" (func $f_alias)))) (alias export $j "g" (instance $g_alias)) (alias export $g_alias "h" (func $h_alias)) (export "x" (func $h_alias)) @@ -263,16 +264,16 @@ With what's defined so far, we're able to link modules with arbitrary renamings: ) (instance $a (instantiate (module $A))) (instance $b1 (instantiate (module $B) - (import "a" (instance $a)) ;; no renaming + (arg "a" (instance $a)) ;; no renaming )) (alias export $a "two" (func $a_two)) (instance $b2 (instantiate (module $B) - (import "a" (instance + (arg "a" (instance (export "one" (func $a_two)) ;; renaming, using explicit alias )) )) (instance $b3 (instantiate (module $B) - (import "a" (instance + (arg "a" (instance (export "one" (func $a "three")) ;; renaming, using inline alias sugar )) )) @@ -521,8 +522,8 @@ does some logging, then returns a string. ) ) (instance $main (instantiate (module $Main) - (import "libc" (instance $libc)) - (import "wasi:logging" (instance (export "log" (func $log)))) + (arg "libc" (instance $libc)) + (arg "wasi:logging" (instance (export "log" (func $log)))) )) (func (export "run") (canon.lift (func (param string) (result string)) (into $libc) (func $main "run")) @@ -593,7 +594,7 @@ exported string, all at instantiation time: ... general-purpose compute ) ) - (instance $main (instantiate (module $Main) (import "libc" (instance $libc)))) + (instance $main (instantiate (module $Main) (arg "libc" (instance $libc)))) (func $start (canon.lift (func (param string) (result string)) (into $libc) (func $main "start")) ) @@ -631,10 +632,10 @@ exports other components: (export "g" (func (result string))) )) (instance $d1 (instantiate (component $D) - (import "c" (instance $c)) + (arg "c" (instance $c)) )) (instance $d2 (instantiate (component $D) - (import "c" (instance + (arg "c" (instance (export "f" (func $d1 "g")) )) )) diff --git a/design/mvp/examples/LinkTimeVirtualization.md b/design/mvp/examples/LinkTimeVirtualization.md index 584e6c1..f03ca40 100644 --- a/design/mvp/examples/LinkTimeVirtualization.md +++ b/design/mvp/examples/LinkTimeVirtualization.md @@ -52,10 +52,10 @@ We now write the parent component by composing `child.wasm` with (import "./virtualize.wasm" (component $Virtualize ...)) (import "./child.wasm" (component $Child ...)) (instance $virtual-fs (instantiate (component $Virtualize) - (import "wasi:filesystem" (instance $real-fs)) + (arg "wasi:filesystem" (instance $real-fs)) )) (instance $child (instantiate (component $Child) - (import "wasi:filesystem" (instance $virtual-fs)) + (arg "wasi:filesystem" (instance $virtual-fs)) )) ) ``` @@ -69,10 +69,10 @@ definitions in place of imports: (component $Virtualize ... copied inline ...) (component $Child ... copied inline ...) (instance $virtual-fs (instantiate (component $Virtualize) - (import "wasi:filesystem" (instance $real-fs)) + (arg "wasi:filesystem" (instance $real-fs)) )) (instance $child (instantiate (component $Child) - (import "wasi:filesystem" (instance $virtual-fs)) + (arg "wasi:filesystem" (instance $virtual-fs)) )) ) ``` diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index b03d169..1ae3dd1 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -151,11 +151,11 @@ would look like: (instance $libc (instantiate (module $Libc))) (instance $libzip (instantiate (module $Libzip)) - (import "libc" (instance $libc)) + (arg "libc" (instance $libc)) )) (instance $main (instantiate (module $Main) - (import "libc" (instance $libc)) - (import "libzip" (instance $libzip)) + (arg "libc" (instance $libc)) + (arg "libzip" (instance $libzip)) )) (func (export "zip") (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "zip")) @@ -224,15 +224,15 @@ component-aware `clang`, the resulting component would look like: (instance $libc (instantiate (module $Libc))) (instance $libzip (instantiate (module $Libzip) - (import "libc" (instance $libc)) + (arg "libc" (instance $libc)) )) (instance $libimg (instantiate (module $Libimg) - (import "libc" (instance $libc)) - (import "libzip" (instance $libzip)) + (arg "libc" (instance $libc)) + (arg "libzip" (instance $libzip)) )) (instance $main (instantiate (module $Main) - (import "libc" (instance $libc)) - (import "libimg" (instance $libimg)) + (arg "libc" (instance $libc)) + (arg "libimg" (instance $libimg)) )) (func (export "transform") (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "transform")) @@ -269,13 +269,13 @@ components. The resulting component could look like: ) (instance $zipper (instantiate (component $Zipper) - (import "libc" (module $Libc)) - (import "libzip" (module $Libzip)) + (arg "libc" (module $Libc)) + (arg "libzip" (module $Libzip)) )) (instance $imgmgk (instantiate (component $Imgmgk) - (import "libc" (module $Libc)) - (import "libzip" (module $Libzip)) - (import "libimg" (module $Libimg)) + (arg "libc" (module $Libc)) + (arg "libzip" (module $Libzip)) + (arg "libimg" (module $Libimg)) )) (instance $libc (instantiate (module $Libc))) @@ -286,9 +286,9 @@ components. The resulting component could look like: (canon.lower (into $libc) (func $imgmgk "transform")) ) (instance $main (instantiate (module $Main) - (import "libc" (instance $libc)) - (import "zipper" (instance (export "zip" (func $zipper "zip")))) - (import "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) + (arg "libc" (instance $libc)) + (arg "zipper" (instance (export "zip" (func $zipper "zip")))) + (arg "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) (func (export "run") (canon.lift (func (param string) (result string)) (func $main "run")) @@ -358,11 +358,11 @@ function table and `bar-index` mutable global. ) (instance $linkage (instantiate (module $Linkage))) (instance $a (instantiate (module $A) - (import "linkage" (instance $linkage)) + (arg "linkage" (instance $linkage)) )) (instance $b (instantiate (module $B) (import "a" (instance $a)) - (import "linkage" (instance $linkage)) + (arg "linkage" (instance $linkage)) )) ) ``` From 3b104680a9fe544e9710a2bcef9755fa458d77ba Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Wed, 9 Mar 2022 07:02:10 -0800 Subject: [PATCH 019/301] Change `arg` to `with`. --- design/mvp/Binary.md | 4 +-- design/mvp/Explainer.md | 28 +++++++-------- design/mvp/examples/LinkTimeVirtualization.md | 8 ++--- .../SharedEverythingDynamicLinking.md | 36 +++++++++---------- 4 files changed, 38 insertions(+), 38 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index b6bc29c..6e9bae4 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -50,8 +50,8 @@ Notes: (See [Instance Definitions](Explainer.md#instance-definitions) in the explainer.) ``` instance ::= ie: => (instance ie) -instanceexpr ::= 0x00 0x00 m: a*:vec() => (instantiate (module m) (arg a)*) - | 0x00 0x01 c: a*:vec() => (instantiate (component c) (arg a)*) +instanceexpr ::= 0x00 0x00 m: a*:vec() => (instantiate (module m) (with a)*) + | 0x00 0x01 c: a*:vec() => (instantiate (component c) (with a)*) | 0x01 e*:vec() => e* | 0x02 e*:vec() => e* modulearg ::= n: 0x02 i: => n (instance i) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index fa275b1..296704f 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -110,8 +110,8 @@ supplying a set of named *arguments* which satisfy all the named *imports* of the selected module/component: ``` instance ::= (instance ? ) -instanceexpr ::= (instantiate (module ) (arg )*) - | (instantiate (component ) (arg )*) +instanceexpr ::= (instantiate (module ) (with )*) + | (instantiate (component ) (with )*) | * | core * modulearg ::= (instance ) @@ -126,7 +126,7 @@ componentarg ::= (module ) export ::= (export ) ``` When instantiating a module via -`(instantiate (module $M) (arg )*)`, the two-level imports of +`(instantiate (module $M) (with )*)`, the two-level imports of the module `$M` are resolved as follows: 1. The first `name` of an import is looked up in the named list of `modulearg` to select a module instance. @@ -145,7 +145,7 @@ following component: (func (import "a" "one") (result i32)) ) (instance $a (instantiate (module $A))) - (instance $b (instantiate (module $B) (arg "a" (instance $a)))) + (instance $b (instantiate (module $B) (with "a" (instance $a)))) ) ``` Components, as we'll see below, have single-level imports, i.e., each import @@ -217,13 +217,13 @@ For `export` aliases, the inline sugar has the form `(kind + and can be used anywhere a `kind` index appears in the AST. For example, the following snippet uses an inline function alias: ```wasm -(instance $j (instantiate (component $J) (arg "f" (func $i "f")))) +(instance $j (instantiate (component $J) (with "f" (func $i "f")))) (export "x" (func $j "g" "h")) ``` which is desugared into: ```wasm (alias export $i "f" (func $f_alias)) -(instance $j (instantiate (component $J) (arg "f" (func $f_alias)))) +(instance $j (instantiate (component $J) (with "f" (func $f_alias)))) (alias export $j "g" (instance $g_alias)) (alias export $g_alias "h" (func $h_alias)) (export "x" (func $h_alias)) @@ -264,16 +264,16 @@ With what's defined so far, we're able to link modules with arbitrary renamings: ) (instance $a (instantiate (module $A))) (instance $b1 (instantiate (module $B) - (arg "a" (instance $a)) ;; no renaming + (with "a" (instance $a)) ;; no renaming )) (alias export $a "two" (func $a_two)) (instance $b2 (instantiate (module $B) - (arg "a" (instance + (with "a" (instance (export "one" (func $a_two)) ;; renaming, using explicit alias )) )) (instance $b3 (instantiate (module $B) - (arg "a" (instance + (with "a" (instance (export "one" (func $a "three")) ;; renaming, using inline alias sugar )) )) @@ -522,8 +522,8 @@ does some logging, then returns a string. ) ) (instance $main (instantiate (module $Main) - (arg "libc" (instance $libc)) - (arg "wasi:logging" (instance (export "log" (func $log)))) + (with "libc" (instance $libc)) + (with "wasi:logging" (instance (export "log" (func $log)))) )) (func (export "run") (canon.lift (func (param string) (result string)) (into $libc) (func $main "run")) @@ -594,7 +594,7 @@ exported string, all at instantiation time: ... general-purpose compute ) ) - (instance $main (instantiate (module $Main) (arg "libc" (instance $libc)))) + (instance $main (instantiate (module $Main) (with "libc" (instance $libc)))) (func $start (canon.lift (func (param string) (result string)) (into $libc) (func $main "start")) ) @@ -632,10 +632,10 @@ exports other components: (export "g" (func (result string))) )) (instance $d1 (instantiate (component $D) - (arg "c" (instance $c)) + (with "c" (instance $c)) )) (instance $d2 (instantiate (component $D) - (arg "c" (instance + (with "c" (instance (export "f" (func $d1 "g")) )) )) diff --git a/design/mvp/examples/LinkTimeVirtualization.md b/design/mvp/examples/LinkTimeVirtualization.md index f03ca40..2af2897 100644 --- a/design/mvp/examples/LinkTimeVirtualization.md +++ b/design/mvp/examples/LinkTimeVirtualization.md @@ -52,10 +52,10 @@ We now write the parent component by composing `child.wasm` with (import "./virtualize.wasm" (component $Virtualize ...)) (import "./child.wasm" (component $Child ...)) (instance $virtual-fs (instantiate (component $Virtualize) - (arg "wasi:filesystem" (instance $real-fs)) + (with "wasi:filesystem" (instance $real-fs)) )) (instance $child (instantiate (component $Child) - (arg "wasi:filesystem" (instance $virtual-fs)) + (with "wasi:filesystem" (instance $virtual-fs)) )) ) ``` @@ -69,10 +69,10 @@ definitions in place of imports: (component $Virtualize ... copied inline ...) (component $Child ... copied inline ...) (instance $virtual-fs (instantiate (component $Virtualize) - (arg "wasi:filesystem" (instance $real-fs)) + (with "wasi:filesystem" (instance $real-fs)) )) (instance $child (instantiate (component $Child) - (arg "wasi:filesystem" (instance $virtual-fs)) + (with "wasi:filesystem" (instance $virtual-fs)) )) ) ``` diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 1ae3dd1..b5b5370 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -151,11 +151,11 @@ would look like: (instance $libc (instantiate (module $Libc))) (instance $libzip (instantiate (module $Libzip)) - (arg "libc" (instance $libc)) + (with "libc" (instance $libc)) )) (instance $main (instantiate (module $Main) - (arg "libc" (instance $libc)) - (arg "libzip" (instance $libzip)) + (with "libc" (instance $libc)) + (with "libzip" (instance $libzip)) )) (func (export "zip") (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "zip")) @@ -224,15 +224,15 @@ component-aware `clang`, the resulting component would look like: (instance $libc (instantiate (module $Libc))) (instance $libzip (instantiate (module $Libzip) - (arg "libc" (instance $libc)) + (with "libc" (instance $libc)) )) (instance $libimg (instantiate (module $Libimg) - (arg "libc" (instance $libc)) - (arg "libzip" (instance $libzip)) + (with "libc" (instance $libc)) + (with "libzip" (instance $libzip)) )) (instance $main (instantiate (module $Main) - (arg "libc" (instance $libc)) - (arg "libimg" (instance $libimg)) + (with "libc" (instance $libc)) + (with "libimg" (instance $libimg)) )) (func (export "transform") (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "transform")) @@ -269,13 +269,13 @@ components. The resulting component could look like: ) (instance $zipper (instantiate (component $Zipper) - (arg "libc" (module $Libc)) - (arg "libzip" (module $Libzip)) + (with "libc" (module $Libc)) + (with "libzip" (module $Libzip)) )) (instance $imgmgk (instantiate (component $Imgmgk) - (arg "libc" (module $Libc)) - (arg "libzip" (module $Libzip)) - (arg "libimg" (module $Libimg)) + (with "libc" (module $Libc)) + (with "libzip" (module $Libzip)) + (with "libimg" (module $Libimg)) )) (instance $libc (instantiate (module $Libc))) @@ -286,9 +286,9 @@ components. The resulting component could look like: (canon.lower (into $libc) (func $imgmgk "transform")) ) (instance $main (instantiate (module $Main) - (arg "libc" (instance $libc)) - (arg "zipper" (instance (export "zip" (func $zipper "zip")))) - (arg "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) + (with "libc" (instance $libc)) + (with "zipper" (instance (export "zip" (func $zipper "zip")))) + (with "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) (func (export "run") (canon.lift (func (param string) (result string)) (func $main "run")) @@ -358,11 +358,11 @@ function table and `bar-index` mutable global. ) (instance $linkage (instantiate (module $Linkage))) (instance $a (instantiate (module $A) - (arg "linkage" (instance $linkage)) + (with "linkage" (instance $linkage)) )) (instance $b (instantiate (module $B) (import "a" (instance $a)) - (arg "linkage" (instance $linkage)) + (with "linkage" (instance $linkage)) )) ) ``` From 84bdfe6bcc3affd15952bff9e77c4c607b275425 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 9 Mar 2022 18:55:33 -0600 Subject: [PATCH 020/301] Make (result) optional as syntactic sugar for (result unit) --- design/mvp/Explainer.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 296704f..55a7a33 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -380,11 +380,16 @@ fifth type will be added for [resource types][Resource and Handle Types].) A `functype` describes a component function whose parameters and results are `intertype` values. Thus `functype` is completely disjoint from [`core:functype`] in the WebAssembly Core spec, whose parameters and results -are [`core:valtype`] values. Morever, since `core:functype` can only appear -syntactically within the `(module ...)` S-expression of a `moduletype`, there -is never a need to syntactically distinguish `functype` from `core:functype` -in the text format: the context dictates which one a `(func ...)` S-expression -parses into. +are [`core:valtype`] values. As a low-level compiler target, `core:functype` +returns zero or more results. In contrast, as a high-level interface type +designed to be maximally bound to a variety of source languages, `functype` +always returns a single type, with `unit` being used for functions that don't +return an interesting value (analogous to "void" in some languages). As +syntactic sugar, the text format of `functype` additionally allows `result` to +be absent, interpreting this as `(result unit)`. Since `core:functype` can only +appear syntactically within a `(module ...)` S-expression, there is never a +need to syntactically distinguish `functype` from `core:functype` in the text +format: the context dictates which one a `(func ...)` S-expression parses into. A `valuetype` describes a single `intertype` value this is to be consumed exactly once during component instantiation. How this happens is described From ab612ecc39759504b3791465c1db5337decd8fc5 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 4 Apr 2022 17:37:36 -0500 Subject: [PATCH 021/301] Re-add inverted alias syntax from module-linking (#13) --- design/mvp/Explainer.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 55a7a33..403c365 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -251,6 +251,13 @@ is desugared into: ) ``` +Lastly, for symmetry with [imports][func-import-abbrev], aliases can be written +in an inverted form that puts the definition kind first: +```wasm +(func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) ;; (existing) +(func $g (alias $i "g1")) ≡ (alias $i "g1" (func $g)) ;; (new) +``` + With what's defined so far, we're able to link modules with arbitrary renamings: ```wasm (component @@ -266,7 +273,7 @@ With what's defined so far, we're able to link modules with arbitrary renamings: (instance $b1 (instantiate (module $B) (with "a" (instance $a)) ;; no renaming )) - (alias export $a "two" (func $a_two)) + (func $a_two (alias export $a "two")) ;; ≡ (alias export $a "two" (func $a_two)) (instance $b2 (instantiate (module $B) (with "a" (instance (export "one" (func $a_two)) ;; renaming, using explicit alias From e03b70f4aa6a05943b0394640a43b9a3b2031dfb Mon Sep 17 00:00:00 2001 From: Radu Matei Date: Sat, 9 Apr 2022 14:01:17 +0200 Subject: [PATCH 022/301] chore: address a few typos Signed-off-by: Radu Matei --- design/high-level/UseCases.md | 4 +--- design/mvp/Explainer.md | 2 +- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/design/high-level/UseCases.md b/design/high-level/UseCases.md index 608e6d0..a65e76c 100644 --- a/design/high-level/UseCases.md +++ b/design/high-level/UseCases.md @@ -48,7 +48,7 @@ sandboxing technology): #### Invoking component exports from the host Once a host chooses to embed wasm (for one of the preceding reasons), the first -design choice is how host executes the wasm code. The core wasm [start function] +design choice is how the host executes the wasm code. The core wasm [start function] is sometimes used for this purpose, however the lack of parameters or results miss out on several use cases listed below, which suggest the use of exported wasm functions with typed signatures instead. However, there are a number of @@ -325,8 +325,6 @@ to call imports, which could break other components' single-threaded assumptions the imported function to have been explicitly `shared` and thus callable from any `fork`ed thread. - - [RLBox]: https://plsyssec.github.io/rlbox_sandboxing_api/sphinx/ [Principle of Least Authority]: https://en.wikipedia.org/wiki/Principle_of_least_privilege [Modular Programming]: https://en.wikipedia.org/wiki/Modular_programming diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 403c365..422c6ae 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -398,7 +398,7 @@ appear syntactically within a `(module ...)` S-expression, there is never a need to syntactically distinguish `functype` from `core:functype` in the text format: the context dictates which one a `(func ...)` S-expression parses into. -A `valuetype` describes a single `intertype` value this is to be consumed +A `valuetype` describes a single `intertype` value that is to be consumed exactly once during component instantiation. How this happens is described below along with [`start` definitions](#start-definitions). From 41d8902608cdb6389ea44de1b42ea1b65eb19b03 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 28 Feb 2022 18:35:02 -0600 Subject: [PATCH 023/301] Fill in CanonicalABI.md (based on interface-types/#140) --- README.md | 6 +- design/mvp/Binary.md | 29 +- design/mvp/CanonicalABI.md | 1264 ++++++++++++++++- design/mvp/Explainer.md | 200 +-- design/mvp/canonical-abi/.gitignore | 1 + design/mvp/canonical-abi/README.md | 5 + design/mvp/canonical-abi/definitions.py | 902 ++++++++++++ design/mvp/canonical-abi/run_tests.py | 323 +++++ .../SharedEverythingDynamicLinking.md | 38 +- 9 files changed, 2650 insertions(+), 118 deletions(-) create mode 100644 design/mvp/canonical-abi/.gitignore create mode 100644 design/mvp/canonical-abi/README.md create mode 100644 design/mvp/canonical-abi/definitions.py create mode 100644 design/mvp/canonical-abi/run_tests.py diff --git a/README.md b/README.md index 7543cd5..88b72e2 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,9 @@ # Component Model design and specification This repository describes the high-level [goals], [use cases], [design choices] -and [FAQ] of the component model as well as a more-detailed [explainer] and -[binary format] covering the initial Minimum Viable Product (MVP) release. +and [FAQ] of the component model as well as a more-detailed [explainer], +[binary format] and [ABI] covering the initial Minimum Viable Product (MVP) +release. In the future, this repository will additionally contain a [formal spec], reference interpreter and test suite. @@ -20,6 +21,7 @@ To contribute to any of these repositories, see the Community Group's [FAQ]: design/high-level/FAQ.md [explainer]: design/mvp/Explainer.md [binary format]: design/mvp/Binary.md +[ABI]: design/mvp/CanonicalABI.md [formal spec]: spec/ [W3C WebAssembly Community Group]: https://www.w3.org/community/webassembly/ [Contributing Guidelines]: https://webassembly.org/community/contributing/ diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 6e9bae4..5b60545 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -188,23 +188,26 @@ Notes: func ::= body: => (func body) funcbody ::= 0x00 ft: opt*:vec() f: => (canon.lift ft opt* f) | 0x01 opt*:* f: => (canon.lower opt* f) -canonopt ::= 0x00 => string=utf8 - | 0x01 => string=utf16 - | 0x02 => string=latin1+utf16 - | 0x03 i: => (into i) +canonopt ::= 0x00 => string-encoding=utf8 + | 0x01 => string-encoding=utf16 + | 0x02 => string-encoding=latin1+utf16 + | 0x03 m: => (memory m) + | 0x04 f: => (realloc f) + | 0x05 f: => (post-return f) ``` Notes: * Validation prevents duplicate or conflicting options. -* Validation of `canon.lift` requires `f` to have a `core:functype` that matches - the canonical-ABI-defined lowering of `ft`. The function defined by - `canon.lift` has type `ft`. -* Validation of `canon.lower` requires `f` to have a `functype`. The function - defined by `canon.lower` has a `core:functype` defined by the canonical ABI - lowering of `f`'s type. +* Validation of `canon.lift` requires `f` to have type `flatten(ft)` (defined + by the [Canonical ABI](CanonicalABI.md#flattening)). The function being + defined is given type `ft`. +* Validation of `canon.lower` requires `f` to be a component function. The + function being defined is given core function type `flatten(ft)` where `ft` + is the `functype` of `f`. * If the lifting/lowering operations implied by `canon.lift` or `canon.lower` - require access to `memory`, `realloc` or `free`, then validation will require - the `(into i)` `canonopt` be present and the corresponding export be present - in `i`'s `instancetype`. + require access to `memory` or `realloc`, then validation requires these + options to be present. If present, `realloc` must have type + `(func (param i32 i32 i32 i32) (result i32))`. +* `post-return` is always optional, but, if present, must have type `(func)`. ## Start Definitions diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index e0ead6b..f72da65 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1,3 +1,1263 @@ -# Canonical ABI (sketch) +# Canonical ABI Explainer -TODO: import and update [interface-types/#140](https://github.com/WebAssembly/interface-types/pull/140) +This explainer walks through the Canonical ABI used by [function definitions] +to convert between high-level interface-typed values and low-level Core +WebAssembly values. + +* [Supporting definitions](#supporting-definitions) + * [Despecialization](#Despecialization) + * [Alignment](#alignment) + * [Size](#size) + * [Loading](#loading) + * [Storing](#storing) + * [Flattening](#flattening) + * [Flat Lifting](#flat-lifting) + * [Flat Lowering](#flat-lowering) + * [Lifting and Lowering](#lifting-and-lowering) +* [Canonical ABI built-ins](#canonical-abi-built-ins) + * [`canon.lift`](#canonlift) + * [`canon.lower`](#canonlower) + + +## Supporting definitions + +The Canonical ABI specifies, for each interface-typed function signature, a +corresponding core function signature and the process for reading +interface-typed values into and out of linear memory. While a full formal +specification would specify the Canonical ABI in terms of macro-expansion into +Core WebAssembly instructions augmented with a new set of (spec-internal) +[administrative instructions], the informal presentation here instead specifies +the process in terms of Python code that would be logically executed at +validation- and run-time by a component model implementation. The Python code +is presented by interleaving definitions with descriptions and eliding some +boilerplate. For a complete listing of all Python definitions in a single +executable file with a small unit test suite, see the +[`canonical-abi`](canonical-abi/) directory. + +The convention followed by the Python code below is that all traps are raised +by explicit `trap()`/`trap_if()` calls; Python `assert()` statements should +never fire and are only included as hints to the reader. Similarly, there +should be no uncaught Python exceptions. + +While the Python code appears to perform a copy as part of lifting +the contents of linear memory into high-level Python values, a normal +implementation should never need to make this extra intermediate copy. +This claim is expanded upon [below](#calling-into-a-component). + +Lastly, independently of Python, the Canonical ABI defined below assumes that +out-of-memory conditions (such as `memory.grow` returning `-1` from within +`realloc`) will trap (via `unreachable`). This significantly simplifies the +Canonical ABI by avoiding the need to support the complicated protocols +necessary to support recovery in the middle of nested allocations. In the MVP, +for large allocations that can OOM, [streams](Explainer.md#TODO) would usually +be the appropriate type to use and streams will be able to explicitly express +failure in their type. Post-MVP, [adapter functions] would allow fully custom +OOM handling for all interface types, allowing a toolchain to intentionally +propagate OOM into the appropriate explicit return value of the function's +declared return type. + + +### Despecialization + +[In the explainer][Type Definitions], interface types are classified as either *fundamental* or +*specialized*, where the specialized interface types are defined by expansion +into fundamental interface types. In most cases, the canonical ABI of a +specialized interface type is the same as its expansion so, to avoid +repetition, the other definitions below use the following `despecialize` +function to replace specialized interface types with their expansion: +```python +def despecialize(t): + match t: + case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) + case Unit() : return Record([]) + case Union(ts) : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ]) + case Enum(labels) : return Variant([ Case(l, Unit()) for l in labels ]) + case Option(t) : return Variant([ Case("none", Unit()), Case("some", t) ]) + case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) + case _ : return t +``` +The specialized interface types `string` and `flags` are missing from this list +because they are given specialized canonical ABI representations distinct from +their respective expansions. + + +### Alignment + +Each interface type is assigned an [alignment] which is used by subsequent +Canonical ABI definitions. Presenting the definition of `alignment` piecewise, +we start with the top-level case analysis: +```python +def alignment(t): + match despecialize(t): + case Bool() : return 1 + case S8() | U8() : return 1 + case S16() | U16() : return 2 + case S32() | U32() : return 4 + case S64() | U64() : return 8 + case Float32() : return 4 + case Float64() : return 8 + case Char() : return 4 + case String() | List(_) : return 4 + case Record(fields) : return max_alignment(types_of(fields)) + case Variant(cases) : return max_alignment(types_of(cases) + [discriminant_type(cases)]) + case Flags(labels) : return alignment_flags(labels) + +def types_of(fields_or_cases): + return [x.t for x in fields_or_cases] + +def max_alignment(ts): + a = 1 + for t in ts: + a = max(a, alignment(t)) + return a +``` + +As an optimization, `variant` discriminants are represented by the smallest integer +covering the number of cases in the variant. Depending on the payload type, +this can allow more compact representations of variants in memory. This smallest +integer type is selected by the following function, used above and below: +```python +def discriminant_type(cases): + n = len(cases) + assert(0 < n < (1 << 32)) + match math.ceil(math.log2(n)/8): + case 0: return U8() + case 1: return U8() + case 2: return U16() + case 3: return U32() +``` + +As an optimization, `flags` are represented as packed bit-vectors. Like variant +discriminants, `flags` use the smallest integer that fits all the bits, falling +back to sequences of `i32`s when there are more than 32 flags. +```python +def alignment_flags(labels): + n = len(labels) + if n <= 8: return 1 + if n <= 16: return 2 + return 4 +``` + + +### Size + +Each interface type is assigned two slightly-different measures of "size": +* its "byte size", which is the smallest number of bytes covering all its + fields when stored at an aligned address in linear memory; and +* its "element size", which is the size of the type when stored as an element + of a list, which may include additional padding at the end to ensure the + alignment of the next element. + +These two measures are defined by the following functions, which build on +the preceding alignment functions: +```python +def elem_size(t): + return align_to(byte_size(t), alignment(t)) + +def align_to(ptr, alignment): + return math.ceil(ptr / alignment) * alignment + +def byte_size(t): + match despecialize(t): + case Bool() : return 1 + case S8() | U8() : return 1 + case S16() | U16() : return 2 + case S32() | U32() : return 4 + case S64() | U64() : return 8 + case Float32() : return 4 + case Float64() : return 8 + case Char() : return 4 + case String() | List(_) : return 8 + case Record(fields) : return byte_size_record(fields) + case Variant(cases) : return byte_size_variant(cases) + case Flags(labels) : return byte_size_flags(labels) + +def byte_size_record(fields): + s = 0 + for f in fields: + s = align_to(s, alignment(f.t)) + s += byte_size(f.t) + return s + +def byte_size_variant(cases): + s = byte_size(discriminant_type(cases)) + s = align_to(s, max_alignment(types_of(cases))) + cs = 0 + for c in cases: + cs = max(cs, byte_size(c.t)) + return s + cs + +def byte_size_flags(labels): + n = len(labels) + if n <= 8: return 1 + if n <= 16: return 2 + return 4 * num_i32_flags(labels) + +def num_i32_flags(labels): + return math.ceil(len(labels) / 32) +``` + + +### Loading + +The `load` function defines how to read a value of a given interface type `t` +out of linear memory starting at offset `ptr`, returning a interface-typed +value (here, as a Python value). The `Opts`/`opts` class/parameter contains the +[`canonopt`] immediates supplied as part of `canon.lift`/`canon.lower`. +Presenting the definition of `load` piecewise, we start with the top-level case +analysis: +```python +class Opts: + string_encoding: str + memory: bytearray + realloc: types.FunctionType + post_return: types.FunctionType + +def load(opts, ptr, t): + assert(ptr == align_to(ptr, alignment(t))) + match despecialize(t): + case Bool() : return bool(load_int(opts, ptr, 1)) + case U8() : return load_int(opts, ptr, 1) + case U16() : return load_int(opts, ptr, 2) + case U32() : return load_int(opts, ptr, 4) + case U64() : return load_int(opts, ptr, 8) + case S8() : return load_int(opts, ptr, 1, signed=True) + case S16() : return load_int(opts, ptr, 2, signed=True) + case S32() : return load_int(opts, ptr, 4, signed=True) + case S64() : return load_int(opts, ptr, 8, signed=True) + case Float32() : return canonicalize(reinterpret_i32_as_float(load_int(opts, ptr, 4))) + case Float64() : return canonicalize(reinterpret_i64_as_float(load_int(opts, ptr, 8))) + case Char() : return i32_to_char(opts, load_int(opts, ptr, 4)) + case String() : return load_string(opts, ptr) + case List(t) : return load_list(opts, ptr, t) + case Record(fields) : return load_record(opts, ptr, fields) + case Variant(cases) : return load_variant(opts, ptr, cases) + case Flags(labels) : return load_flags(opts, ptr, labels) +``` + +Integers are loaded directly from memory, with their high-order bit interpreted +according to the signedness of the type: +```python +def load_int(opts, ptr, nbytes, signed = False): + trap_if(ptr + nbytes > len(opts.memory)) + return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) +``` + +Floats are loaded from memory and then "canonicalized", mapping all +Not-a-Number values to a single canonical `nan` bit-pattern: +```python +def reinterpret_i32_as_float(i): + return struct.unpack('!f', struct.pack('!I', i))[0] + +def reinterpret_i64_as_float(i): + return struct.unpack('!d', struct.pack('!Q', i))[0] + +def canonicalize(f): + if math.isnan(f): + return reinterpret_i64_as_float(0x7ff8000000000000) + return f +``` + +An `i32` is converted to a `char` (a [Unicode Scalar Value]) by dynamically +testing that its unsigned integral value is in the valid [Unicode Code Point] +range and not a [Surrogate]: +```python +def i32_to_char(opts, i): + trap_if(i >= 0x110000) + trap_if(0xD800 <= i <= 0xDFFF) + return chr(i) +``` + +Strings can be decoded in one of three ways, according to the `string-encoding` +option in [`canonopt`]. String interface values include their original encoding +and byte length as a "hint" that enables `store_string` (defined below) to make +better up-front allocation size choices in many cases. Thus, the interface +value produced by `load_string` isn't simply a Python `str`, but a *tuple* +containing a `str`, the original encoding and the original byte length. Lastly, +the custom `latin1+utf16` encoding represents a dynamic choice between `latin1` +(when all code points fit the one-byte Latin-1 encoding) and `utf16` +(otherwise). This dynamic choice is encoded in the high bit of the `i32` +containing the string's byte length. +```python +def load_string(opts, ptr): + begin = load_int(opts, ptr, 4) + packed_byte_length = load_int(opts, ptr + 4, 4) + return load_string_from_range(opts, begin, packed_byte_length) + +UTF16_BIT = 1 << 31 + +def load_string_from_range(opts, ptr, packed_byte_length): + match opts.string_encoding: + case 'utf8': + byte_length = packed_byte_length + encoding = 'utf-8' + case 'utf16': + byte_length = packed_byte_length + encoding = 'utf-16-le' + case 'latin1+utf16': + if bool(packed_byte_length & UTF16_BIT): + byte_length = packed_byte_length ^ UTF16_BIT + encoding = 'utf-16-le' + else: + byte_length = packed_byte_length + encoding = 'latin-1' + + trap_if(ptr + byte_length > len(opts.memory)) + try: + s = opts.memory[ptr : ptr+byte_length].decode(encoding) + except UnicodeError: + trap() + + return (s, opts.string_encoding, packed_byte_length) +``` + +Lists and records are loaded by recursively loading their elements/fields. +Note that lists use `elem_size` while records use `byte_size`. +```python +def load_list(opts, ptr, elem_type): + begin = load_int(opts, ptr, 4) + length = load_int(opts, ptr + 4, 4) + return load_list_from_range(opts, begin, length, elem_type) + +def load_list_from_range(opts, ptr, length, elem_type): + trap_if(ptr + length * elem_size(elem_type) > len(opts.memory)) + a = [] + for i in range(length): + a.append(load(opts, ptr + i * elem_size(elem_type), elem_type)) + return a + +def load_record(opts, ptr, fields): + record = {} + for field in fields: + ptr = align_to(ptr, alignment(field.t)) + record[field.label] = load(opts, ptr, field.t) + ptr += byte_size(field.t) + return record +``` +As a technical detail: the `align_to` in the loop in `load_record` is +guaranteed to be a no-op on the first iteration because the record as +a whole starts out aligned (as asserted at the top of `load`). + +Variants are loaded using the order of the cases in the type to determine the +case index. To support the subtyping allowed by `defaults-to`, a lifted variant +value semantically includes a full ordered list of its `defaults-to` case +labels so that the lowering code (defined below) can search this list to find a +case label it knows about. While the code below appears to perform case-label +lookup at runtime, a normal implementation can build the appropriate index +tables at compile-time so that variant-passing is always O(1) and not involving +string operations. +```python +def load_variant(opts, ptr, cases): + disc_size = byte_size(discriminant_type(cases)) + disc = load_int(opts, ptr, disc_size) + ptr += disc_size + trap_if(disc >= len(cases)) + case = cases[disc] + ptr = align_to(ptr, max_alignment(types_of(cases))) + return { case_label_with_defaults(case, cases): load(opts, ptr, case.t) } + +def case_label_with_defaults(case, cases): + label = case.label + while case.defaults_to is not None: + case = cases[find_case(case.defaults_to, cases)] + label += '|' + case.label + return label + +def find_case(label, cases): + matches = [i for i,c in enumerate(cases) if c.label == label] + assert(len(matches) <= 1) + if len(matches) == 1: + return matches[0] + return -1 +``` + +Finally, flags are converted from a bit-vector to a dictionary whose keys are +derived from the ordered labels of the `flags` type. The code here takes +advantage of Python's support for integers of arbitrary width. +```python +def load_flags(opts, ptr, labels): + i = load_int(opts, ptr, byte_size_flags(labels)) + return unpack_flags_from_int(i, labels) + +def unpack_flags_from_int(i, labels): + record = {} + for l in labels: + record[l] = bool(i & 1) + i >>= 1 + trap_if(i) + return record +``` + +### Storing + +The `store` function defines how to write a value `v` of a given interface type +`t` into linear memory starting at offset `ptr`. Presenting the definition of +`store` piecewise, we start with the top-level case analysis: +```python +def store(opts, v, t, ptr): + assert(ptr == align_to(ptr, alignment(t))) + match despecialize(t): + case Bool() : store_int(opts, int(bool(v)), ptr, 1) + case U8() : store_int(opts, v, ptr, 1) + case U16() : store_int(opts, v, ptr, 2) + case U32() : store_int(opts, v, ptr, 4) + case U64() : store_int(opts, v, ptr, 8) + case S8() : store_int(opts, v, ptr, 1, signed=True) + case S16() : store_int(opts, v, ptr, 2, signed=True) + case S32() : store_int(opts, v, ptr, 4, signed=True) + case S64() : store_int(opts, v, ptr, 8, signed=True) + case Float32() : store_int(opts, reinterpret_float_as_i32(v), ptr, 4) + case Float64() : store_int(opts, reinterpret_float_as_i64(v), ptr, 8) + case Char() : store_int(opts, char_to_i32(v), ptr, 4) + case String() : store_string(opts, v, ptr) + case List(t) : store_list(opts, v, ptr, t) + case Record(fields) : store_record(opts, v, ptr, fields) + case Variant(cases) : store_variant(opts, v, ptr, cases) + case Flags(labels) : store_flags(opts, v, ptr, labels) +``` + +Integers are stored directly into memory. Because the input domain is exactly +the integers in range for the given type, no extra range checks are necessary; +the `signed` parameter is only present to ensure that the internal range checks +of `int.to_bytes` are satisfied. +```python +def store_int(opts, v, ptr, nbytes, signed = False): + trap_if(ptr + nbytes > len(opts.memory)) + opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) +``` + +Floats are stored directly into memory. Because the input domain is exactly the +set of interface values which includes only a single `nan` value (which we +assume is the canonical one), no additional runtime canonicalization is +necessary. +```python +def reinterpret_float_as_i32(f): + return struct.unpack('!I', struct.pack('!f', f))[0] + +def reinterpret_float_as_i64(f): + return struct.unpack('!Q', struct.pack('!d', f))[0] +``` + +The integral value of a `char` (a [Unicode Scalar Value]) is a valid unsigned +`i32` and thus no runtime conversion or checking is necessary: +```python +def char_to_i32(c): + i = ord(c) + assert(0 <= i <= 0xD7FF or 0xD800 <= i <= 0x10FFFF) + return i +``` + +Storing strings is complicated by the goal of attempting to optimize the +different transcoding cases. In particular, one challenge is choosing the +linear memory allocation size *before* examining the contents of the string. +The reason for this constraint is that, in some settings where single-pass +iterators are involved (host calls and post-MVP [adapter functions]), examining +the contents of a string more than once would require making an engine-internal +temporary copy of the whole string, which the component model specifically aims +not to do. To avoid multiple passes, the canonical ABI instead uses a `realloc` +approach to update the allocation size during the single copy. A blind +`realloc` approach would normally suffer from multiple reallocations per string +(e.g., using the standard doubling-growth strategy). However, as already shown +in `load_string` above, interface-typed strings come with two useful hints: +their original encoding and byte length. From this hint data, `store_string` can +do a much better job minimizing the number of reallocations. + +We start with a case analysis to enumerate all the meaningful encoding +combinations, subdividing the `latin1+utf16` encoding into either `latin1` or +`utf16` based on the `UTF16_BIT` flag set by `load_string`: +```python +def store_string(opts, v, ptr): + begin, packed_byte_length = store_string_into_range(opts, v) + store_int(opts, begin, ptr, 4) + store_int(opts, packed_byte_length, ptr + 4, 4) + +def store_string_into_range(opts, v): + src, src_encoding, src_packed_byte_length = v + + if src_encoding == 'latin1+utf16': + if bool(src_packed_byte_length & UTF16_BIT): + src_byte_length = src_packed_byte_length ^ UTF16_BIT + src_unpacked_encoding = 'utf16' + else: + src_byte_length = src_packed_byte_length + src_unpacked_encoding = 'latin1' + else: + src_byte_length = src_packed_byte_length + src_unpacked_encoding = src_encoding + + match opts.string_encoding: + case 'utf8': + match src_unpacked_encoding: + case 'utf8' : return store_string_copy(opts, src, src_byte_length, 'utf-8') + case 'utf16' : return store_utf16_to_utf8(opts, src, src_byte_length) + case 'latin1' : return store_latin1_to_utf8(opts, src, src_byte_length) + case 'utf16': + match src_unpacked_encoding: + case 'utf8' : return store_utf8_to_utf16(opts, src, src_byte_length) + case 'utf16' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le') + case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le', inflation = 2) + case 'latin1+utf16': + match src_encoding: + case 'utf8' : return store_utf8_to_latin1_or_utf16(opts, src, src_byte_length) + case 'utf16' : return store_utf16_to_latin1_or_utf16(opts, src, src_byte_length) + case 'latin1+utf16' : + match src_unpacked_encoding: + case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'latin-1') + case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length) +``` + +The simplest 4 cases above can compute the exact destination size and then copy +with a simply loop (that possibly inflates Latin-1 to UTF-16 by injecting a 0 +byte after every Latin-1 byte). +```python +MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 + +def store_string_copy(opts, src, src_byte_length, dst_encoding, inflation = 1): + byte_length = src_byte_length * inflation + trap_if(byte_length > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, byte_length) + encoded = src.encode(dst_encoding) + assert(byte_length == len(encoded)) + opts.memory[ptr : ptr+len(encoded)] = encoded + return (ptr, byte_length) +``` +The choice of `MAX_STRING_BYTE_LENGTH` constant ensures that the high bit of a +string's byte length is never set, keeping it clear for `UTF16_BIT`. + +The next 3 cases can all be mapped down to a generic transcoding algorithm that +makes an initial optimistic size allocation that falls back to a second worst-case +size reallocation that is "fixed up" at the end with a third (hopefully O(1)) +shrinking reallocation. +```python +def store_utf16_to_utf8(opts, src, src_byte_length): + optimistic_size = int(src_byte_length / 2) + worst_case_size = optimistic_size * 3 + return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) + +def store_latin1_to_utf8(opts, src, src_byte_length): + optimistic_size = src_byte_length + worst_case_size = optimistic_size * 2 + return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) + +def store_utf8_to_utf16(opts, src, src_byte_length): + optimistic_size = src_byte_length * 2 + worst_case_size = optimistic_size + return store_string_transcode(opts, src, 'utf-16-le', optimistic_size, worst_case_size) + +def store_string_transcode(opts, src, dst_encoding, optimistic_size, worst_case_size): + trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, optimistic_size) + encoded = src.encode(dst_encoding) + bytes_copied = min(len(encoded), optimistic_size) + opts.memory[ptr : ptr+bytes_copied] = encoded[0 : bytes_copied] + if bytes_copied < optimistic_size: + ptr = opts.realloc(ptr, optimistic_size, 1, bytes_copied) + elif bytes_copied < len(encoded): + trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) + opts.memory[ptr+bytes_copied : ptr+len(encoded)] = encoded[bytes_copied : ] + if worst_case_size > len(encoded): + ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) + return (ptr, len(encoded)) +``` + +The remaining cases handle the `latin1+utf16` encoding, where there general +goal is to fit the incoming string into Latin-1 if possible based on the code +points of the incoming string. The UTF-8 and UTF-16 cases are similar to the +preceding transcoding algorithm in that they make a best-effort optimistic +allocation, speculating that all code points *do* fit into Latin-1, before +falling back to a worst-case allocation size when a code point is found outside +Latin-1. In this fallback case, the previously-stored Latin-1 bytes are +inflated *in place*, inserting a 0 byte after every Latin-1 byte (iterating +in reverse to avoid clobbering later bytes): +```python +def store_utf8_to_latin1_or_utf16(opts, src, src_byte_length): + optimistic_size = src_byte_length + worst_case_size = 2 * src_byte_length + return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) + +def store_utf16_to_latin1_or_utf16(opts, src, src_byte_length): + optimistic_size = int(src_byte_length / 2) + worst_case_size = src_byte_length + return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) + +def store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size): + trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, optimistic_size) + dst_byte_length = 0 + for usv in src: + if ord(usv) < (1 << 8): + opts.memory[ptr + dst_byte_length] = ord(usv) + dst_byte_length += 1 + else: + trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) + for j in range(dst_byte_length-1, -1, -1): + opts.memory[ptr + 2*j] = opts.memory[ptr + j] + opts.memory[ptr + 2*j + 1] = 0 + encoded = src.encode('utf-16-le') + opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] + if worst_case_size > len(encoded): + ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) + return (ptr, len(encoded) | UTF16_BIT) + if dst_byte_length < optimistic_size: + ptr = opts.realloc(ptr, optimistic_size, 1, dst_byte_length) + return (ptr, dst_byte_length) +``` + +The final string transcoding case takes advantage of the extra heuristic +information that the incoming UTF-16 bytes were intentionally chosen over +Latin-1 by the producer, indicating that they *probably* contain code points +outside Latin-1 and thus *probably* require inflation. Based on this +information, the transcoding algorithm pessimistically allocates storage for +UTF-16, deflating at the end if indeed no non-Latin-1 code points were +encountered. This Latin-1 deflation ensures that if a group of components +are all using `latin1+utf16` and *one* component over-uses UTF-16, other +components can recover the Latin-1 compression. (The Latin-1 check can be +inexpensively fused with the UTF-16 validate+copy loop.) +```python +def store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length): + trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, src_byte_length) + encoded = src.encode('utf-16-le') + opts.memory[ptr : ptr+len(encoded)] = encoded + if any(ord(c) >= (1 << 8) for c in src): + return (ptr, len(encoded) | UTF16_BIT) + latin1_size = int(len(encoded) / 2) + for i in range(latin1_size): + opts.memory[ptr + i] = opts.memory[ptr + 2*i] + ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size) + return (ptr, latin1_size) +``` + +Lists and records are stored by recursively storing their elements and +are symmetric to the loading functions. Unlike strings, lists can +simply allocate based on the up-front knowledge of length and static +element size. +```python +def store_list(opts, v, ptr, elem_type): + begin, length = store_list_into_range(opts, v, elem_type) + store_int(opts, begin, ptr, 4) + store_int(opts, length, ptr + 4, 4) + +def store_list_into_range(opts, v, elem_type): + byte_length = len(v) * elem_size(elem_type) + trap_if(byte_length >= (1 << 32)) + ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) + trap_if(ptr + byte_length > len(opts.memory)) + for i,e in enumerate(v): + store(opts, e, elem_type, ptr + i * elem_size(elem_type)) + return (ptr, len(v)) + +def store_record(opts, v, ptr, fields): + for f in fields: + ptr = align_to(ptr, alignment(f.t)) + store(opts, v[f.label], f.t, ptr) + ptr += byte_size(f.t) +``` + +Variants are stored using the `|`-separated list of `defaults-to` cases built +by `case_label_with_default` (above) to iteratively find a matching case (which +validation guarantees will succeed). While this code appears to do O(n) string +matching, a normal implemention can statically fuse `store_variant` with its +matching `load_variant` to ultimately build a dense array that maps producer's +case indices to the consumer's case indices. +```python +def store_variant(opts, v, ptr, cases): + case_index, case_value = match_case(v, cases) + disc_size = byte_size(discriminant_type(cases)) + store_int(opts, case_index, ptr, disc_size) + ptr += disc_size + ptr = align_to(ptr, max_alignment(types_of(cases))) + store(opts, case_value, cases[case_index].t, ptr) + +def match_case(v, cases): + assert(len(v.keys()) == 1) + key = list(v.keys())[0] + value = list(v.values())[0] + for label in key.split('|'): + case_index = find_case(label, cases) + if case_index != -1: + return (case_index, value) +``` + +Finally, flags are converted from a dictionary to a bit-vector by iterating +through the case-labels of the variant in the order they were listed in the +type definition and OR-ing all the bits together. Flag lifting/lowering can be +statically fused into array/integer operations (with a simple byte copy when +the case lists are the same) to avoid any string operations in a similar manner +to variants. +```python +def store_flags(opts, v, ptr, labels): + i = pack_flags_into_int(v, labels) + store_int(opts, i, ptr, byte_size_flags(labels)) + +def pack_flags_into_int(v, labels): + i = 0 + shift = 0 + for l in labels: + i |= (int(bool(v[l])) << shift) + shift += 1 + return i +``` + +### Flattening + +With only the definitions above, the Canonical ABI would be forced to place all +parameters and results in linear memory. While this is necessary in the general +case, in many cases performance can be improved by passing small-enough values +in registers by using core function parameters and results. To support this +optimization, the Canonical ABI defines `flatten` to map interface function +types to core function types by attempting to decompose all the +non-dynamically-sized interface types into core parameters and results. + +For a variety of [practical][Implementation Limits] reasons, we need to limit +the total number of flattened parameters and results, falling back to storing +everything in linear memory. The number of flattened results is currently +limited to 1 due to various parts of the toolchain (notably LLVM) not yet fully +supporting [multi-value]. Hopefully this limitation is temporary and can be +lifted before the Component Model is fully standardized. + +When there are too many flat values, in general, a single `i32` pointer can be +passed instead (pointing to a tuple in linear memory). When lowering *into* +linear memory, this requires the Canonical ABI to call `realloc` (in `lower` +below) to allocate space to put the tuple. As an optimization, when lowering +the return value of an imported function (lowered by `canon.lower`), the caller +can have already allocated space for the return value (e.g., efficiently on the +stack), passing in an `i32` pointer as an parameter instead of returning an +`i32` as a return value. + +Given all this, the top-level definition of `flatten` is: +```python +MAX_FLAT_PARAMS = 16 +MAX_FLAT_RESULTS = 1 + +def flatten(functype, context): + flat_params = flatten_types(functype.params) + if len(flat_params) > MAX_FLAT_PARAMS: + flat_params = ['i32'] + + flat_results = flatten_type(functype.result) + if len(flat_results) > MAX_FLAT_RESULTS: + match context: + case 'canon.lift': + flat_results = ['i32'] + case 'canon.lower': + flat_params += ['i32'] + flat_results = [] + + return { 'params': flat_params, 'results': flat_results } + +def flatten_types(ts): + return [ft for t in ts for ft in flatten_type(t)] +``` + +Presenting the definition of `flatten_type` piecewise, we start with the +top-level case analysis: +```python +def flatten_type(t): + match despecialize(t): + case Bool() : return ['i32'] + case U8() | U16() | U32() : return ['i32'] + case S8() | S16() | S32() : return ['i32'] + case S64() | U64() : return ['i64'] + case Float32() : return ['f32'] + case Float64() : return ['f64'] + case Char() : return ['i32'] + case String() | List(_) : return ['i32', 'i32'] + case Record(fields) : return flatten_types(types_of(fields)) + case Variant(cases) : return flatten_variant(cases) + case Flags(labels) : return ['i32'] * num_i32_flags(labels) +``` + +Variant flattening is more involved due to the fact that each case payload can +have a totally different flattening. Rather than giving up when there is a type +mismatch, the Canonical ABI relies on the fact that the 4 core value types can +be easily bit-cast between each other and defines a `join` operator to pick the +tightest approximation. What this means is that, regardless of the dynamic +case, all flattened variants are passed with the same static set of core types, +which may involve, e.g., reinterpreting an `f32` as an `i32` or zero-extending +an `i32` into an `i64`. +```python +def flatten_variant(cases): + flat = [] + for c in cases: + for i,ft in enumerate(flatten_type(c.t)): + if i < len(flat): + flat[i] = join(flat[i], ft) + else: + flat.append(ft) + return flatten_type(discriminant_type(cases)) + flat + +def join(a, b): + if a == b: return a + if (a == 'i32' and b == 'f32') or (a == 'f32' and b == 'i32'): return 'i32' + return 'i64' +``` + +### Flat Lifting + +The `lift_flat` function defines how to convert zero or more core values into a +single high-level value of interface type `t`. The values are given by a value +iterator that iterates over a complete parameter or result list and asserts +that the expected and actual types line up. Presenting the definition of +`lift_flat` piecewise, we start with the top-level case analysis: +```python +@dataclass +class Value: + t: str # 'i32'|'i64'|'f32'|'f64' + v: int|float + +@dataclass +class ValueIter: + values: [Value] + i = 0 + def next(self, t): + v = self.values[self.i] + self.i += 1 + assert(v.t == t) + return v.v + +def lift_flat(opts, vi, t): + match despecialize(t): + case Bool() : return bool(vi.next('i32')) + case U8() : return lift_flat_unsigned(vi, 32, 8) + case U16() : return lift_flat_unsigned(vi, 32, 16) + case U32() : return lift_flat_unsigned(vi, 32, 32) + case U64() : return lift_flat_unsigned(vi, 64, 64) + case S8() : return lift_flat_signed(vi, 32, 8) + case S16() : return lift_flat_signed(vi, 32, 16) + case S32() : return lift_flat_signed(vi, 32, 32) + case S64() : return lift_flat_signed(vi, 64, 64) + case Float32() : return canonicalize(vi.next('f32')) + case Float64() : return canonicalize(vi.next('f64')) + case Char() : return i32_to_char(opts, vi.next('i32')) + case String() : return lift_flat_string(opts, vi) + case List(t) : return lift_flat_list(opts, vi, t) + case Record(fields) : return lift_flat_record(opts, vi, fields) + case Variant(cases) : return lift_flat_variant(opts, vi, cases) + case Flags(labels) : return lift_flat_flags(vi, labels) +``` + +Integers are lifted from core `i32` or `i64` values using the signedness of the +interface type to interpret the high-order bit. When the interface type is +narrower than an `i32`, the Canonical ABI specifies a dynamic range check in +order to catch bugs. The conversion logic here assumes that `i32` values are +always represented as unsigned Python `int`s and thus lifting to a signed type +performs a manual 2s complement conversion in the Python (which would be a +no-op in hardware). +```python +def lift_flat_unsigned(vi, core_width, t_width): + i = vi.next('i' + str(core_width)) + assert(0 <= i < (1 << core_width)) + trap_if(i >= (1 << t_width)) + return i + +def lift_flat_signed(vi, core_width, t_width): + i = vi.next('i' + str(core_width)) + assert(0 <= i < (1 << core_width)) + if i >= (1 << (t_width - 1)): + i -= (1 << core_width) + trap_if(i < -(1 << (t_width - 1))) + return i + trap_if(i >= (1 << (t_width - 1))) + return i +``` + +The contents of strings and lists are always stored in memory so lifting these +types is essentially the same as loading them from memory; the only difference +is that the pointer and length come from `i32` values instead of from linear +memory: +```python +def lift_flat_string(opts, vi): + ptr = vi.next('i32') + packed_byte_length = vi.next('i32') + return load_string_from_range(opts, ptr, packed_byte_length) + +def lift_flat_list(opts, vi, elem_type): + ptr = vi.next('i32') + length = vi.next('i32') + return load_list_from_range(opts, ptr, length, elem_type) +``` + +Records are lifted by recursively lifting their fields: +```python +def lift_flat_record(opts, vi, fields): + record = {} + for f in fields: + record[f.label] = lift_flat(opts, vi, f.t) + return record +``` + +Variants are also lifted recursively. Lifting a variant must carefully follow +the definition of `flatten_variant` above, consuming the exact same core types +regardless of the dynamic case payload being lifted. Because of the `join` +performed by `flatten_variant`, we need a more-permissive value iterator that +reinterprets between the different types appropriately and also traps if the +high bits of an `i64` are set for a 32-bit type: +```python +def lift_flat_variant(opts, vi, cases): + flat_types = flatten_variant(cases) + assert(flat_types.pop(0) == 'i32') + disc = vi.next('i32') + trap_if(disc >= len(cases)) + case = cases[disc] + class CoerceValueIter: + def next(self, want): + have = flat_types.pop(0) + x = vi.next(have) + match (have, want): + case ('i32', 'f32') : return reinterpret_i32_as_float(x) + case ('i64', 'i32') : return narrow_i64_to_i32(x) + case ('i64', 'f32') : return reinterpret_i32_as_float(narrow_i64_to_i32(x)) + case ('i64', 'f64') : return reinterpret_i64_as_float(x) + case _ : return x + v = lift_flat(opts, CoerceValueIter(), case.t) + for have in flat_types: + _ = vi.next(have) + return { case_label_with_defaults(case, cases): v } + +def narrow_i64_to_i32(i): + trap_if(i >= (1 << 32)) + return i +``` + +Finally, flags are lifted by OR-ing together all the flattened `i32` values +and then lifting to a record the same way as when loading flags from linear +memory. The dynamic checks in `unpack_flags_from_int` will trap if any +bits are set in an `i32` that don't correspond to a flag. +```python +def lift_flat_flags(vi, labels): + i = 0 + shift = 0 + for _ in range(num_i32_flags(labels)): + i |= (vi.next('i32') << shift) + shift += 32 + return unpack_flags_from_int(i, labels) +``` + +### Flat Lowering + +The `lower_flat` function defines how to convert a value `v` of a given +interface type `t` into zero or more core values. Presenting the definition of +`lower_flat` piecewise, we start with the top-level case analysis: +```python +def lower_flat(opts, v, t): + match despecialize(t): + case Bool() : return [Value('i32', int(v))] + case U8() : return [Value('i32', v)] + case U16() : return [Value('i32', v)] + case U32() : return [Value('i32', v)] + case U64() : return [Value('i64', v)] + case S8() : return lower_flat_signed(v, 32) + case S16() : return lower_flat_signed(v, 32) + case S32() : return lower_flat_signed(v, 32) + case S64() : return lower_flat_signed(v, 64) + case Float32() : return [Value('f32', v)] + case Float64() : return [Value('f64', v)] + case Char() : return [Value('i32', char_to_i32(v))] + case String() : return lower_flat_string(opts, v) + case List(t) : return lower_flat_list(opts, v, t) + case Record(fields) : return lower_flat_record(opts, v, fields) + case Variant(cases) : return lower_flat_variant(opts, v, cases) + case Flags(labels) : return lower_flat_flags(v, labels) +``` + +Since interface-typed values are assumed to in-range and, as previously stated, +core `i32` values are always internally represented as unsigned `int`s, +unsigned interface values need no extra conversion. Signed interface values are +converted to unsigned core `i32`s by 2s complement arithmetic (which again +would be a no-op in hardware): +```python +def lower_flat_signed(i, core_bits): + if i < 0: + i += (1 << core_bits) + return [Value('i' + str(core_bits), i)] +``` + +Since strings and lists are stored in linear memory, lifting can reuse the +previous definitions; only the resulting pointers are returned differently +(as `i32` values instead of as a pair in linear memory): +```python +def lower_flat_string(opts, v): + ptr, packed_byte_length = store_string_into_range(opts, v) + return [Value('i32', ptr), Value('i32', packed_byte_length)] + +def lower_flat_list(opts, v, elem_type): + (ptr, length) = store_list_into_range(opts, v, elem_type) + return [Value('i32', ptr), Value('i32', length)] +``` + +Records are lowered by recursively lowering their fields: +```python +def lower_flat_record(opts, v, fields): + flat = [] + for f in fields: + flat += lower_flat(opts, v[f.label], f.t) + return flat +``` + +Variants are also lowered recursively. Symmetric to `lift_flat_variant` above, +`lower_flat_variant` must consume all flattened types of `flatten_variant`, +manually coercing the otherwise-incompatible type pairings allowed by `join`: +```python +def lower_flat_variant(opts, v, cases): + case_index, case_value = match_case(v, cases) + flat_types = flatten_variant(cases) + assert(flat_types.pop(0) == 'i32') + payload = lower_flat(opts, case_value, cases[case_index].t) + for i,have in enumerate(payload): + want = flat_types.pop(0) + match (have.t, want): + case ('f32', 'i32') : payload[i] = Value('i32', reinterpret_float_as_i32(have.v)) + case ('i32', 'i64') : payload[i] = Value('i64', have.v) + case ('f32', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i32(have.v)) + case ('f64', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i64(have.v)) + case _ : pass + for want in flat_types: + payload.append(Value(want, 0)) + return [Value('i32', case_index)] + payload +``` + +Finally, flags are lowered by slicing the bit vector into `i32` chunks: +```python +def lower_flat_flags(v, labels): + i = pack_flags_into_int(v, labels) + flat = [] + for _ in range(num_i32_flags(labels)): + flat.append(Value('i32', i & 0xffffffff)) + i >>= 32 + assert(i == 0) + return flat +``` + +### Lifting and Lowering + +The `lift` function defines how to lift a list of at most `max_flat` core +parameters or results given by the `ValueIter` `vi` into a tuple of interface +values with types `ts`: +```python +def lift(opts, max_flat, vi, ts): + flat_types = flatten_types(ts) + if len(flat_types) > max_flat: + return list(load(opts, vi.next('i32'), Tuple(ts)).values()) + else: + return [ lift_flat(opts, vi, t) for t in ts ] +``` + +The `lower` function defines how to lower a list of interface values `vs` of +types `ts` into a list of at most `max_flat` core values. As already described +for [`flatten`](#flattening) above, lowering handles the +greater-than-`max_flat` case by either allocating storage with `realloc` or +accepting a caller-allocated buffer as an out-param: +```python +def lower(opts, max_flat, vs, ts, out_param = None): + flat_types = flatten_types(ts) + if len(flat_types) > max_flat: + tuple_type = Tuple(functype.params) + tuple_value = {str(i): v for i,v in enumerate(vs)} + if out_param is None: + ptr = opts.realloc(0, 0, alignment(tuple_type), byte_size(tuple_type)) + else: + ptr = out_param.next('i32') + store(opts, tuple_value, tuple_type, ptr) + return [ Value('i32', ptr) ] + else: + flat_vals = [] + for i in range(len(vs)): + flat_vals += lower_flat(opts, vs[i], ts[i]) + return flat_vals +``` + +## Canonical ABI built-ins + +Using the above supporting definitions, we can describe the static and dynamic +semantics of [`func`], whose AST is defined in the main explainer as: +``` +func ::= (func ? ) +funcbody ::= (canon.lift * ) + | (canon.lower * ) +``` +The following subsections define the static and dynamic semantics of each +case of `funcbody`. + + +### `canon.lift` + +For a function: +``` +(func $f (canon.lift $ft: $opts:* $callee:)) +``` +validation specifies: + * `$callee` must have type `flatten($ft, 'canon.lift')` + * `$f` is given type `$ft` + +When instantiating component instance `$inst`: +* Define `$f` to be the closure `lambda args: canon_lift($opts, $inst, $callee, $ft, args)` + +Thus, `$f` captures `$opts`, `$inst`, `$callee` and `$ft` in a closure which can be +subsequently exported or passed into a child instance (via `with`). If `$f` +ends up being called by the host, the host is responsible for, in a +host-defined manner, conjuring up interface values suitable for passing into +`lower` and, conversely, consuming the interface values produced by `lift`. For +example, if the host is a native JS runtime, the [JavaScript embedding] would +specify how native JavaScript values are converted to and from interface +values. Alternatively, if the host is a Unix CLI that invokes component exports +directly from the command line, the CLI could choose to automatically parse +`argv` into interface values according to the declared interface types of the +export. In any case, `canon.lift` specifies how these variously-produced +interface values are consumed as parameters (and produced as results) by a +*single host-agnostic component*. + +The `$inst` captured above is assumed to have at least the following two fields, +which are used to implement the [component invariants]: +```python +class Instance: + may_leave = True + may_enter = True + # ... +``` +The `may_leave` state indicates whether the instance may call out to an import +and the `may_enter` state indicates whether the instance may be called from +the outside world through an export. + +Given the above closure arguments, `canon_lift` is defined: +```python +def canon_lift(callee_opts, callee_instance, callee, functype, args): + trap_if(not callee_instance.may_enter) + + assert(callee_instance.may_leave) + callee_instance.may_leave = False + flat_args = lower(callee_opts, MAX_FLAT_PARAMS, args, functype.params) + callee_instance.may_leave = True + + try: + flat_results = callee(flat_args) + except CoreWebAssemblyException: + trap() + + callee_instance.may_enter = False + [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) + def post_return(): + callee_instance.may_enter = True + callee_opts.post_return() + + return (result, post_return) +``` +There are a number of things to note about this definition: + +Uncaught Core WebAssembly [exceptions] result in a trap at component +boundaries. Thus, if a component wishes to signal signal an error, it must +use some sort of explicit interface type such as `expected` (whose `error` case +particular language bindings may choose to map to and from exceptions). + +The contract assumed by `canon_lift` (and ensured by `canon_lower` below) is +that the caller of `canon_lift` *must* call `post_return` right after lowering +`result`. This ordering ensures that the engine can reliably copy directly from +the callee's linear memory (read by `lift`) into the caller's linear memory +(written by `lower`). If `post_return` were called earlier (e.g., before +`canon_lift` returned), the callee's linear memory would have already been +freed and so the engine would need to eagerly make an intermediate copy in +`lift`. + +Even assuming this `post_return` contract, if the callee could be re-entered +by the caller in the middle of the caller's `lower` (e.g., via `realloc`), then +either the engine has to make an eager intermediate copy in `lift` *or* the +Canonical ABI would have to specify a precise interleaving of side effects +which is more complicated and would inhibit some optimizations. Instead, the +`may_enter` guard set before `lift` and cleared in `post_return` prevents this +re-entrance. Thus, it is the combination of `post_return` and the re-entrance +guard that ensures `lift` does not need to make an eager copy. + +The `may_leave` guard wrapping the lowering of parameters conservatively +ensures that `realloc` calls during lowering do not accidentally call imports +that accidentally re-enter the instance that lifted the same parameters. +While the `may_enter` guards of *those* component instances would also prevent +this re-entrance, it would be an error that only manifested in certain +component linking configurations, hence the eager error helps ensure +compositionality. + + +### `canon.lower` + +For a function: +``` +(func $f (canon.lower $opts:* $callee:)) +``` +where `$callee` has type `$ft`, validation specifies: +* `$f` is given type `flatten($ft, 'canon.lower')` + +When instantiating component instance `$inst`: +* Define `$f` to be the closure: `lambda args: canon_lower($opts, $inst, $callee, $ft, args)` + +Thus, from the perspective of Core WebAssembly, `$f` is a [function instance] +containing a `hostfunc` that closes over `$opts`, `$inst`, `$callee` and `$ft` +and, when called from Core WebAssembly code, calls `canon_lower`, which is defined as: +```python +def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): + trap_if(not caller_instance.may_leave) + + assert(caller_instance.may_enter) + caller_instance.may_enter = False + + flat_args = ValueIter(flat_args) + args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) + + result, post_return = callee(args) + + caller_instance.may_leave = False + flat_results = lower(caller_opts, MAX_FLAT_RESULTS, [result], [functype.result], flat_args) + caller_instance.may_leave = True + + post_return() + + caller_instance.may_enter = True + return flat_results +``` +The definitions of `canon_lift` and `canon_lower` are mostly symmetric (swapping +lifting and lowering), with a few exceptions: +* The calling instance cannot be re-entered over the course of the entire call, + not just while lifting the parameters. This ensures not just the needs of the + Canonical ABI, but the general non-re-entrance expectations outlined in the + [component invariants]. +* The caller does not need a `post-return` function since the Core WebAssembly + caller simply regains control when `canon_lower` returns, allowing it to free + (or not) any memory passed as `flat_args`. +* When handling the too-many-flat-values case, instead of relying on `realloc`, + the caller passs in a pointer to caller-allocated memory as a final + `i32` parameter. + +A useful consequence of the above rules for `may_enter` and `may_leave` is that +attempting to `canon.lower` to a `callee` in the same instance is a guaranteed, +immediate trap which a link-time compiler can eagerly compile to an +`unreachable`. This avoids what would otherwise be a surprising form of memory +aliasing that could introduce obscure bugs. + +The net effect here is that any cross-component call necessarily +transits through a composed `canon_lower`/`canon_lift` pair, allowing a link-time +compiler to fuse the lifting/lowering steps of these two definitions into a +single, efficient trampoline. This fusion model allows efficient compilation of +the permissive [subtyping](Subtyping.md) allowed between components (including +the elimination of string operations on the labels of records and variants) as +well as post-MVP [adapter functions]. + + +[Function Definitions]: Explainer.md#function-definitions +[`canonopt`]: Explainer.md#function-definitions +[`func`]: Explainer.md#function-definitions +[Type Definitions]: Explainer.md#type-definitions +[Component Invariants]: Explainer.md#component-invariants +[JavaScript Embedding]: Explainer.md#JavaScript-embedding +[Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions + +[Administrative Instructions]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-instr-admin +[Implementation Limits]: https://webassembly.github.io/spec/core/appendix/implementation.html +[Function Instance]: https://webassembly.github.io/spec/core/exec/runtime.html#function-instances + +[Multi-value]: https://github.com/WebAssembly/multi-value/blob/master/proposals/multi-value/Overview.md +[Exceptions]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md + +[Alignment]: https://en.wikipedia.org/wiki/Data_structure_alignment +[Unicode Scalar Value]: https://unicode.org/glossary/#unicode_scalar_value +[Unicode Code Point]: https://unicode.org/glossary/#code_point +[Surrogate]: https://unicode.org/faq/utf_bom.html#utf16-2 diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 422c6ae..087b645 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -332,7 +332,7 @@ intertype ::= unit | bool | float32 | float64 | char | string | (record (field )*) - | (variant (case (defaults-to )?)*) + | (variant (case (defaults-to )?)+) | (list ) | (tuple *) | (flags *) @@ -359,7 +359,6 @@ Starting with interface types, the set of values allowed for the *fundamental* interface types is given by the following table: | Type | Values | | ------------------------- | ------ | -| `unit` | just one [uninteresting value] | | `bool` | `true` and `false` | | `s8`, `s16`, `s32`, `s64` | integers in the range [-2N-1, 2N-1-1] | | `u8`, `u16`, `u32`, `u64` | integers in the range [0, 2N-1] | @@ -372,14 +371,20 @@ interface types is given by the following table: The sets of values allowed for the remaining *specialized* interface types are defined by the following mapping: ``` - string ↦ (list char) (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... (flags *) ↦ (record (field bool)*) - (enum *) ↦ (variant (case unit)*) + unit ↦ (record) + (enum +) ↦ (variant (case unit)+) (option ) ↦ (variant (case "none") (case "some" )) - (union *) ↦ (variant (case "𝒊" )*) for 𝒊=0,1,... + (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... (expected ) ↦ (variant (case "ok" ) (case "error" )) + string ↦ (list char) ``` +Note that, at least initially, variants are required to have a non-empty list of +cases. This could be relaxed in the future to allow an empty list of cases, with +the empty `(variant)` effectively serving as a [bottom type] and indicating +unreachability. + Building on these interface types, there are four kinds of types describing the four kinds of importable/exportable component definitions. (In the future, a fifth type will be added for [resource types][Resource and Handle Types].) @@ -456,78 +461,109 @@ Note that the inline use of `$G` and `$U` are inline `outer` aliases. ### Function Definitions -To implement or call functions of type [`functype`](#type-definitions), we need -to be able to call across a shared-nothing boundary. Traditionally, this -problem is solved by defining a serialization format for copying data across -the boundary. The Component Model MVP takes roughly this same approach, -defining a linear-memory-based [ABI] called the *Canonical ABI* which -specifies, for any imported or exported `functype`, a corresponding -`core:functype` and rules for copying values into or out of linear memory. The -Component Model differs from traditional approaches, though, in that the ABI is -configurable, allowing different memory representations for the same abstract -value. In the MVP, this configurability is limited to the small set of -`canonopt` shown below. However, Post-MVP, [adapter functions] could be added -to allow far more programmatic control. - -The Canonical ABI, which is described in a separate [explainer](CanonicalABI.md), -is explicitly applied to "wrap" existing functions in one of two directions: -* `canon.lift` wraps a Core WebAssembly function (of type `core:functype`) - inside the current component to produce a Component Model function (of type - `functype`) that can be exported to other components. -* `canon.lower` wraps a Component Model function (of type `functype`) that can - have been imported from another component to produce a Core WebAssembly - function (of type `core:functype`) that can be imported and called from Core - WebAssembly code within the current component. - -Based on this, MVP function definitions simply specify one of these two -wrapping directions along with a set of Canonical ABI configurations. +To implement or call interface-typed functions, we need to be able to cross a +shared-nothing boundary. Traditionally, this problem is solved by defining a +serialization format for copying data across the boundary. The Component Model +MVP takes roughly this same approach, defining a linear-memory-based [ABI] +called the "Canonical ABI" which specifies, for any interface function type, a +[corresponding](CanonicalABI.md#flattening) core function type and +[rules](CanonicalABI.md#lifting-and-lowering) for copying values into or out of +linear memory. The Component Model differs from traditional approaches, though, +in that the ABI is configurable, allowing different memory representations for +the same abstract value. In the MVP, this configurability is limited to the +small set of `canonopt` shown below. However, Post-MVP, [adapter functions] +could be added to allow far more programmatic control. + +The Canonical ABI is explicitly applied to "wrap" existing functions in one of +two directions: +* `canon.lift` wraps a core function (of type `core:functype`) inside the + current component to produce a component function (of type `functype`) + that can be exported to other components. +* `canon.lower` wraps a component function (of type `functype`) that can + have been imported from another component to produce a core function (of type + `core:functype`) that can be imported and called from Core WebAssembly code + within the current component. + +Function definitions specify one of these two wrapping directions along with a +set of Canonical ABI configuration options. ``` func ::= (func ? ) funcbody ::= (canon.lift * ) | (canon.lower * ) -canonopt ::= string=utf8 - | string=utf16 - | string=latin1+utf16 - | (into ) -``` -Validation fails if multiple conflicting options, such as two `string` -encodings, are given. The `latin1+utf16` encoding is [defined](CanonicalABI.md#latin1-utf16) -in the Canonical ABI explainer. If no string-encoding option is specified, the -default is `string=utf8`. - -The `into` option specifies a target instance which supplies the memory that -the canonical ABI should operate on as well as functions that the canonical ABI -can call to allocate, reallocate and free linear memory. Validation requires that -the given `instanceidx` is a module instance exporting the following fields: -``` -(export "memory" (memory 1)) -(export "realloc" (func (param i32 i32 i32 i32) (result i32))) -(export "free" (func (param i32 i32 i32))) -``` -The 4 parameters of `realloc` are: original allocation (or `0` for none), original -size (or `0` if none), alignment and new desired size. The 3 parameters of `free` -are the pointer, size and alignment. - -With this, we can finally write a non-trivial component that takes a string, -does some logging, then returns a string. +canonopt ::= string-encoding=utf8 + | string-encoding=utf16 + | string-encoding=latin1+utf16 + | (memory ) + | (realloc ) + | (post-return ) +``` +The `string-encoding` option specifies the encoding the Canonical ABI will use +for the `string` type. The `latin1+utf16` encoding captures a common string +encoding across Java, JavaScript and .NET VMs and allows a dynamic choice +between either Latin-1 (which has a fixed 1-byte encoding, but limited Code +Point range) or UTF-16 (which can express all Code Points, but uses either +2 or 4 bytes per Code Point). If no `string-encoding` option is specified, the +default is UTF-8. It is a validation error to include more than one +`string-encoding` option. + +The `(memory )` option specifies the memory that the Canonical ABI will +use to load and store values. If the Canoical ABI needs to load or store, +validation requires this option to be present (there is no default). + +The `(realloc )` option specifies a core function that is validated to +have the following signature: +```wasm +(func (param $originalPtr i32) + (param $originalSize i32) + (param $alignment i32) + (param $newSize i32) + (result i32)) +``` +The Canonical ABI will use `realloc` both to allocate (passing `0` for the +first two parameters) and reallocate. If the Canonical ABI needs `realloc`, +validation requires this option to be present (there is no default). + +The `(post-return )` option may only be present in `canon.lift` and +specifies a core function to be called after the return value has been fully +read, giving a chance for the runtime to deallocate memory and/or call +destructors. This option is always optional but, if present, is validated to +have the empty function signature `(func)`. + +Based on this description of the AST, the [Canonical ABI explainer][Canonical ABI] +gives a detailed walkthrough of the static and dynamic semantics of +`canon.lift` and `canon.lower`. + +One high-level consequence of the dynamic semantics of `canon.lift` given in +the Canonical ABI explainer is that component functions are different from core +functions in that all control flow transfer is explicitly reflected in their +type. For example, with Core WebAssembly [exception handling] and +[stack switching], a core function with type `(func (result i32))` can return +an `i32`, throw, suspend or trap. In contrast, a component function with type +`(func (result string))` may only return a `string` or trap. To express +failure, component functions can return `expected` and languages with exception +handling can bind exceptions to the `error` case. Similarly, the forthcoming +addition of [future and stream types] would explicitly declare patterns of +stack-switching in component function signatures. + +Using function definitions, we can finally write a non-trivial component that +takes a string, does some logging, then returns a string. ```wasm (component (import "wasi:logging" (instance $logging (export "log" (func (param string))) )) (import "libc" (module $Libc - (export "memory" (memory 1)) + (export "mem" (memory 1)) (export "realloc" (func (param i32 i32) (result i32))) - (export "free" (func (param i32))) )) (instance $libc (instantiate (module $Libc))) - (func $log - (canon.lower (into $libc) (func $logging "log")) - ) + (func $log (canon.lower + (memory (memory $libc "mem")) (realloc (func $libc "realloc")) + (func $logging "log") + )) (module $Main (import "libc" "memory" (memory 1)) (import "libc" "realloc" (func (param i32 i32) (result i32))) - (import "libc" "free" (func (param i32))) (import "wasi:logging" "log" (func $log (param i32 i32))) (func (export "run") (param i32 i32) (result i32 i32) ... (call $log) ... @@ -537,9 +573,11 @@ does some logging, then returns a string. (with "libc" (instance $libc)) (with "wasi:logging" (instance (export "log" (func $log)))) )) - (func (export "run") - (canon.lift (func (param string) (result string)) (into $libc) (func $main "run")) - ) + (func (export "run") (canon.lift + (func (param string) (result string)) + (memory (memory $libc "mem")) (realloc (func $libc "realloc")) + (func $main "run") + )) ) ``` This example shows the pattern of splitting out a reusable language runtime @@ -552,17 +590,6 @@ cyclic dependency between `canon.lower` and `$Main` that would have to be broken by the toolchain emitting an auxiliary module that broke the cycle using a shared `funcref` table and `call_indirect`. -Component Model functions are different from Core WebAssembly functions in that -all control flow transfer is explicitly reflected in their type (`functype`). -For example, with Core WebAssembly [exception handling] and [stack switching], -a `(func (result i32))` can return an `i32`, throw, suspend or trap. In -contrast, a Component Model `(func (result string))` may only return a `string` -or trap. To express failure, Component Model functions should return an -[`expected`](#type-definitions) type and languages with exception handling will -bind exceptions to the `error` case. Similarly, the future addition of -[future and stream types] would explicitly declare patterns of stack-switching -in Component Model function signatures. - ### Start Definitions @@ -597,7 +624,6 @@ exported string, all at instantiation time: (import "libc" (module $Libc (export "memory" (memory 1)) (export "realloc" (func (param i32 i32 i32 i32) (result i32))) - (export "free" (func (param i32 i32 i32))) )) (instance $libc (instantiate (module $Libc))) (module $Main @@ -607,9 +633,11 @@ exported string, all at instantiation time: ) ) (instance $main (instantiate (module $Main) (with "libc" (instance $libc)))) - (func $start - (canon.lift (func (param string) (result string)) (into $libc) (func $main "start")) - ) + (func $start (canon.lift + (func (param string) (result string)) + (memory (memory $libc "mem")) (realloc (func $libc "realloc")) + (func $main "start") + )) (start $start (value $name) (result (value $greeting))) (export "greeting" (value $greeting)) ) @@ -923,7 +951,7 @@ and will be added over the coming months to complete the MVP proposal: [De Bruijn Index]: https://en.wikipedia.org/wiki/De_Bruijn_index [Closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming) -[Uninteresting Value]: https://en.wikipedia.org/wiki/Unit_type#In_programming_languages +[Bottom Type]: https://en.wikipedia.org/wiki/Bottom_type [IEEE754]: https://en.wikipedia.org/wiki/IEEE_754 [NaN]: https://en.wikipedia.org/wiki/NaN [Unicode Scalar Values]: https://unicode.org/glossary/#unicode_scalar_value @@ -933,12 +961,12 @@ and will be added over the coming months to complete the MVP proposal: [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface [Environment Variables]: https://en.wikipedia.org/wiki/Environment_variable -[Module Linking]: https://github.com/webassembly/module-linking/ -[Interface Types]: https://github.com/webassembly/interface-types/ -[Type Imports and Exports]: https://github.com/WebAssembly/proposal-type-imports -[Exception Handling]: https://github.com/webAssembly/exception-handling -[Stack Switching]: https://github.com/WebAssembly/stack-switching -[ESM-integration]: https://github.com/WebAssembly/esm-integration +[Module Linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md +[Interface Types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md +[Type Imports and Exports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md +[Exception Handling]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md +[Stack Switching]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Overview.md +[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions [Canonical ABI]: CanonicalABI.md diff --git a/design/mvp/canonical-abi/.gitignore b/design/mvp/canonical-abi/.gitignore new file mode 100644 index 0000000..c18dd8d --- /dev/null +++ b/design/mvp/canonical-abi/.gitignore @@ -0,0 +1 @@ +__pycache__/ diff --git a/design/mvp/canonical-abi/README.md b/design/mvp/canonical-abi/README.md new file mode 100644 index 0000000..04cf92e --- /dev/null +++ b/design/mvp/canonical-abi/README.md @@ -0,0 +1,5 @@ +# Canonical ABI Code + +This directory contains: +* `definitions.py`: contains the source definitions copied into the [canonical ABI explainer](../CanonicalABI.md) +* `run_tests.py`: can be run via `python3 run_tests.py` (version >=3.10) to run all the tests diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py new file mode 100644 index 0000000..d3faf1c --- /dev/null +++ b/design/mvp/canonical-abi/definitions.py @@ -0,0 +1,902 @@ +# After the Boilerplate section, this file is ordered to line up with the code +# blocks in ../CanonicalABI.md (split by # comment lines). If you update this +# file, don't forget to update ../CanonicalABI.md. + +### Boilerplate + +import math +import struct +import types +from dataclasses import dataclass + +class Trap(BaseException): pass +class CoreWebAssemblyException(BaseException): pass + +def trap(): + raise Trap() + +def trap_if(cond): + if cond: + raise Trap() + +class InterfaceType: pass +class Unit(InterfaceType): pass +class Bool(InterfaceType): pass +class S8(InterfaceType): pass +class U8(InterfaceType): pass +class S16(InterfaceType): pass +class U16(InterfaceType): pass +class S32(InterfaceType): pass +class U32(InterfaceType): pass +class S64(InterfaceType): pass +class U64(InterfaceType): pass +class Float32(InterfaceType): pass +class Float64(InterfaceType): pass +class Char(InterfaceType): pass +class String(InterfaceType): pass + +@dataclass +class List(InterfaceType): + t: InterfaceType + +@dataclass +class Field: + label: str + t: InterfaceType + +@dataclass +class Record(InterfaceType): + fields: [Field] + +@dataclass +class Tuple(InterfaceType): + ts: [InterfaceType] + +@dataclass +class Flags(InterfaceType): + labels: [str] + +@dataclass +class Case: + label: str + t: InterfaceType + defaults_to: str = None + +@dataclass +class Variant(InterfaceType): + cases: [Case] + +@dataclass +class Enum(InterfaceType): + labels: [str] + +@dataclass +class Union(InterfaceType): + ts: [InterfaceType] + +@dataclass +class Option(InterfaceType): + t: InterfaceType + +@dataclass +class Expected(InterfaceType): + ok: InterfaceType + error: InterfaceType + +@dataclass +class Func: + params: [InterfaceType] + result: InterfaceType + +### Despecialization + +def despecialize(t): + match t: + case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) + case Unit() : return Record([]) + case Union(ts) : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ]) + case Enum(labels) : return Variant([ Case(l, Unit()) for l in labels ]) + case Option(t) : return Variant([ Case("none", Unit()), Case("some", t) ]) + case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) + case _ : return t + +### Alignment + +def alignment(t): + match despecialize(t): + case Bool() : return 1 + case S8() | U8() : return 1 + case S16() | U16() : return 2 + case S32() | U32() : return 4 + case S64() | U64() : return 8 + case Float32() : return 4 + case Float64() : return 8 + case Char() : return 4 + case String() | List(_) : return 4 + case Record(fields) : return max_alignment(types_of(fields)) + case Variant(cases) : return max_alignment(types_of(cases) + [discriminant_type(cases)]) + case Flags(labels) : return alignment_flags(labels) + +def types_of(fields_or_cases): + return [x.t for x in fields_or_cases] + +def max_alignment(ts): + a = 1 + for t in ts: + a = max(a, alignment(t)) + return a + +# + +def discriminant_type(cases): + n = len(cases) + assert(0 < n < (1 << 32)) + match math.ceil(math.log2(n)/8): + case 0: return U8() + case 1: return U8() + case 2: return U16() + case 3: return U32() + +# + +def alignment_flags(labels): + n = len(labels) + if n <= 8: return 1 + if n <= 16: return 2 + return 4 + +### Size + +def elem_size(t): + return align_to(byte_size(t), alignment(t)) + +def align_to(ptr, alignment): + return math.ceil(ptr / alignment) * alignment + +def byte_size(t): + match despecialize(t): + case Bool() : return 1 + case S8() | U8() : return 1 + case S16() | U16() : return 2 + case S32() | U32() : return 4 + case S64() | U64() : return 8 + case Float32() : return 4 + case Float64() : return 8 + case Char() : return 4 + case String() | List(_) : return 8 + case Record(fields) : return byte_size_record(fields) + case Variant(cases) : return byte_size_variant(cases) + case Flags(labels) : return byte_size_flags(labels) + +def byte_size_record(fields): + s = 0 + for f in fields: + s = align_to(s, alignment(f.t)) + s += byte_size(f.t) + return s + +def byte_size_variant(cases): + s = byte_size(discriminant_type(cases)) + s = align_to(s, max_alignment(types_of(cases))) + cs = 0 + for c in cases: + cs = max(cs, byte_size(c.t)) + return s + cs + +def byte_size_flags(labels): + n = len(labels) + if n <= 8: return 1 + if n <= 16: return 2 + return 4 * num_i32_flags(labels) + +def num_i32_flags(labels): + return math.ceil(len(labels) / 32) + +### Loading + +class Opts: + string_encoding: str + memory: bytearray + realloc: types.FunctionType + post_return: types.FunctionType + +def load(opts, ptr, t): + assert(ptr == align_to(ptr, alignment(t))) + match despecialize(t): + case Bool() : return bool(load_int(opts, ptr, 1)) + case U8() : return load_int(opts, ptr, 1) + case U16() : return load_int(opts, ptr, 2) + case U32() : return load_int(opts, ptr, 4) + case U64() : return load_int(opts, ptr, 8) + case S8() : return load_int(opts, ptr, 1, signed=True) + case S16() : return load_int(opts, ptr, 2, signed=True) + case S32() : return load_int(opts, ptr, 4, signed=True) + case S64() : return load_int(opts, ptr, 8, signed=True) + case Float32() : return canonicalize(reinterpret_i32_as_float(load_int(opts, ptr, 4))) + case Float64() : return canonicalize(reinterpret_i64_as_float(load_int(opts, ptr, 8))) + case Char() : return i32_to_char(opts, load_int(opts, ptr, 4)) + case String() : return load_string(opts, ptr) + case List(t) : return load_list(opts, ptr, t) + case Record(fields) : return load_record(opts, ptr, fields) + case Variant(cases) : return load_variant(opts, ptr, cases) + case Flags(labels) : return load_flags(opts, ptr, labels) + +# + +def load_int(opts, ptr, nbytes, signed = False): + trap_if(ptr + nbytes > len(opts.memory)) + return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) + +# + +def reinterpret_i32_as_float(i): + return struct.unpack('!f', struct.pack('!I', i))[0] + +def reinterpret_i64_as_float(i): + return struct.unpack('!d', struct.pack('!Q', i))[0] + +def canonicalize(f): + if math.isnan(f): + return reinterpret_i64_as_float(0x7ff8000000000000) + return f + +# + +def i32_to_char(opts, i): + trap_if(i >= 0x110000) + trap_if(0xD800 <= i <= 0xDFFF) + return chr(i) + +# + +def load_string(opts, ptr): + begin = load_int(opts, ptr, 4) + packed_byte_length = load_int(opts, ptr + 4, 4) + return load_string_from_range(opts, begin, packed_byte_length) + +UTF16_BIT = 1 << 31 + +def load_string_from_range(opts, ptr, packed_byte_length): + match opts.string_encoding: + case 'utf8': + byte_length = packed_byte_length + encoding = 'utf-8' + case 'utf16': + byte_length = packed_byte_length + encoding = 'utf-16-le' + case 'latin1+utf16': + if bool(packed_byte_length & UTF16_BIT): + byte_length = packed_byte_length ^ UTF16_BIT + encoding = 'utf-16-le' + else: + byte_length = packed_byte_length + encoding = 'latin-1' + + trap_if(ptr + byte_length > len(opts.memory)) + try: + s = opts.memory[ptr : ptr+byte_length].decode(encoding) + except UnicodeError: + trap() + + return (s, opts.string_encoding, packed_byte_length) + +# + +def load_list(opts, ptr, elem_type): + begin = load_int(opts, ptr, 4) + length = load_int(opts, ptr + 4, 4) + return load_list_from_range(opts, begin, length, elem_type) + +def load_list_from_range(opts, ptr, length, elem_type): + trap_if(ptr + length * elem_size(elem_type) > len(opts.memory)) + a = [] + for i in range(length): + a.append(load(opts, ptr + i * elem_size(elem_type), elem_type)) + return a + +def load_record(opts, ptr, fields): + record = {} + for field in fields: + ptr = align_to(ptr, alignment(field.t)) + record[field.label] = load(opts, ptr, field.t) + ptr += byte_size(field.t) + return record + +# + +def load_variant(opts, ptr, cases): + disc_size = byte_size(discriminant_type(cases)) + disc = load_int(opts, ptr, disc_size) + ptr += disc_size + trap_if(disc >= len(cases)) + case = cases[disc] + ptr = align_to(ptr, max_alignment(types_of(cases))) + return { case_label_with_defaults(case, cases): load(opts, ptr, case.t) } + +def case_label_with_defaults(case, cases): + label = case.label + while case.defaults_to is not None: + case = cases[find_case(case.defaults_to, cases)] + label += '|' + case.label + return label + +def find_case(label, cases): + matches = [i for i,c in enumerate(cases) if c.label == label] + assert(len(matches) <= 1) + if len(matches) == 1: + return matches[0] + return -1 + +# + +def load_flags(opts, ptr, labels): + i = load_int(opts, ptr, byte_size_flags(labels)) + return unpack_flags_from_int(i, labels) + +def unpack_flags_from_int(i, labels): + record = {} + for l in labels: + record[l] = bool(i & 1) + i >>= 1 + trap_if(i) + return record + +### Storing + +def store(opts, v, t, ptr): + assert(ptr == align_to(ptr, alignment(t))) + match despecialize(t): + case Bool() : store_int(opts, int(bool(v)), ptr, 1) + case U8() : store_int(opts, v, ptr, 1) + case U16() : store_int(opts, v, ptr, 2) + case U32() : store_int(opts, v, ptr, 4) + case U64() : store_int(opts, v, ptr, 8) + case S8() : store_int(opts, v, ptr, 1, signed=True) + case S16() : store_int(opts, v, ptr, 2, signed=True) + case S32() : store_int(opts, v, ptr, 4, signed=True) + case S64() : store_int(opts, v, ptr, 8, signed=True) + case Float32() : store_int(opts, reinterpret_float_as_i32(v), ptr, 4) + case Float64() : store_int(opts, reinterpret_float_as_i64(v), ptr, 8) + case Char() : store_int(opts, char_to_i32(v), ptr, 4) + case String() : store_string(opts, v, ptr) + case List(t) : store_list(opts, v, ptr, t) + case Record(fields) : store_record(opts, v, ptr, fields) + case Variant(cases) : store_variant(opts, v, ptr, cases) + case Flags(labels) : store_flags(opts, v, ptr, labels) + +# + +def store_int(opts, v, ptr, nbytes, signed = False): + trap_if(ptr + nbytes > len(opts.memory)) + opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) + +# + +def reinterpret_float_as_i32(f): + return struct.unpack('!I', struct.pack('!f', f))[0] + +def reinterpret_float_as_i64(f): + return struct.unpack('!Q', struct.pack('!d', f))[0] + +# + +def char_to_i32(c): + i = ord(c) + assert(0 <= i <= 0xD7FF or 0xD800 <= i <= 0x10FFFF) + return i + +# + +def store_string(opts, v, ptr): + begin, packed_byte_length = store_string_into_range(opts, v) + store_int(opts, begin, ptr, 4) + store_int(opts, packed_byte_length, ptr + 4, 4) + +def store_string_into_range(opts, v): + src, src_encoding, src_packed_byte_length = v + + if src_encoding == 'latin1+utf16': + if bool(src_packed_byte_length & UTF16_BIT): + src_byte_length = src_packed_byte_length ^ UTF16_BIT + src_unpacked_encoding = 'utf16' + else: + src_byte_length = src_packed_byte_length + src_unpacked_encoding = 'latin1' + else: + src_byte_length = src_packed_byte_length + src_unpacked_encoding = src_encoding + + match opts.string_encoding: + case 'utf8': + match src_unpacked_encoding: + case 'utf8' : return store_string_copy(opts, src, src_byte_length, 'utf-8') + case 'utf16' : return store_utf16_to_utf8(opts, src, src_byte_length) + case 'latin1' : return store_latin1_to_utf8(opts, src, src_byte_length) + case 'utf16': + match src_unpacked_encoding: + case 'utf8' : return store_utf8_to_utf16(opts, src, src_byte_length) + case 'utf16' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le') + case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le', inflation = 2) + case 'latin1+utf16': + match src_encoding: + case 'utf8' : return store_utf8_to_latin1_or_utf16(opts, src, src_byte_length) + case 'utf16' : return store_utf16_to_latin1_or_utf16(opts, src, src_byte_length) + case 'latin1+utf16' : + match src_unpacked_encoding: + case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'latin-1') + case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length) + +# + +MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 + +def store_string_copy(opts, src, src_byte_length, dst_encoding, inflation = 1): + byte_length = src_byte_length * inflation + trap_if(byte_length > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, byte_length) + encoded = src.encode(dst_encoding) + assert(byte_length == len(encoded)) + opts.memory[ptr : ptr+len(encoded)] = encoded + return (ptr, byte_length) + +# + +def store_utf16_to_utf8(opts, src, src_byte_length): + optimistic_size = int(src_byte_length / 2) + worst_case_size = optimistic_size * 3 + return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) + +def store_latin1_to_utf8(opts, src, src_byte_length): + optimistic_size = src_byte_length + worst_case_size = optimistic_size * 2 + return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) + +def store_utf8_to_utf16(opts, src, src_byte_length): + optimistic_size = src_byte_length * 2 + worst_case_size = optimistic_size + return store_string_transcode(opts, src, 'utf-16-le', optimistic_size, worst_case_size) + +def store_string_transcode(opts, src, dst_encoding, optimistic_size, worst_case_size): + trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, optimistic_size) + encoded = src.encode(dst_encoding) + bytes_copied = min(len(encoded), optimistic_size) + opts.memory[ptr : ptr+bytes_copied] = encoded[0 : bytes_copied] + if bytes_copied < optimistic_size: + ptr = opts.realloc(ptr, optimistic_size, 1, bytes_copied) + elif bytes_copied < len(encoded): + trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) + opts.memory[ptr+bytes_copied : ptr+len(encoded)] = encoded[bytes_copied : ] + if worst_case_size > len(encoded): + ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) + return (ptr, len(encoded)) + +# + +def store_utf8_to_latin1_or_utf16(opts, src, src_byte_length): + optimistic_size = src_byte_length + worst_case_size = 2 * src_byte_length + return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) + +def store_utf16_to_latin1_or_utf16(opts, src, src_byte_length): + optimistic_size = int(src_byte_length / 2) + worst_case_size = src_byte_length + return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) + +def store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size): + trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, optimistic_size) + dst_byte_length = 0 + for usv in src: + if ord(usv) < (1 << 8): + opts.memory[ptr + dst_byte_length] = ord(usv) + dst_byte_length += 1 + else: + trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) + for j in range(dst_byte_length-1, -1, -1): + opts.memory[ptr + 2*j] = opts.memory[ptr + j] + opts.memory[ptr + 2*j + 1] = 0 + encoded = src.encode('utf-16-le') + opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] + if worst_case_size > len(encoded): + ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) + return (ptr, len(encoded) | UTF16_BIT) + if dst_byte_length < optimistic_size: + ptr = opts.realloc(ptr, optimistic_size, 1, dst_byte_length) + return (ptr, dst_byte_length) + +# + +def store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length): + trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, src_byte_length) + encoded = src.encode('utf-16-le') + opts.memory[ptr : ptr+len(encoded)] = encoded + if any(ord(c) >= (1 << 8) for c in src): + return (ptr, len(encoded) | UTF16_BIT) + latin1_size = int(len(encoded) / 2) + for i in range(latin1_size): + opts.memory[ptr + i] = opts.memory[ptr + 2*i] + ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size) + return (ptr, latin1_size) + +# + +def store_list(opts, v, ptr, elem_type): + begin, length = store_list_into_range(opts, v, elem_type) + store_int(opts, begin, ptr, 4) + store_int(opts, length, ptr + 4, 4) + +def store_list_into_range(opts, v, elem_type): + byte_length = len(v) * elem_size(elem_type) + trap_if(byte_length >= (1 << 32)) + ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) + trap_if(ptr + byte_length > len(opts.memory)) + for i,e in enumerate(v): + store(opts, e, elem_type, ptr + i * elem_size(elem_type)) + return (ptr, len(v)) + +def store_record(opts, v, ptr, fields): + for f in fields: + ptr = align_to(ptr, alignment(f.t)) + store(opts, v[f.label], f.t, ptr) + ptr += byte_size(f.t) + +# + +def store_variant(opts, v, ptr, cases): + case_index, case_value = match_case(v, cases) + disc_size = byte_size(discriminant_type(cases)) + store_int(opts, case_index, ptr, disc_size) + ptr += disc_size + ptr = align_to(ptr, max_alignment(types_of(cases))) + store(opts, case_value, cases[case_index].t, ptr) + +def match_case(v, cases): + assert(len(v.keys()) == 1) + key = list(v.keys())[0] + value = list(v.values())[0] + for label in key.split('|'): + case_index = find_case(label, cases) + if case_index != -1: + return (case_index, value) + +# + +def store_flags(opts, v, ptr, labels): + i = pack_flags_into_int(v, labels) + store_int(opts, i, ptr, byte_size_flags(labels)) + +def pack_flags_into_int(v, labels): + i = 0 + shift = 0 + for l in labels: + i |= (int(bool(v[l])) << shift) + shift += 1 + return i + +### Flattening + +MAX_FLAT_PARAMS = 16 +MAX_FLAT_RESULTS = 1 + +def flatten(functype, context): + flat_params = flatten_types(functype.params) + if len(flat_params) > MAX_FLAT_PARAMS: + flat_params = ['i32'] + + flat_results = flatten_type(functype.result) + if len(flat_results) > MAX_FLAT_RESULTS: + match context: + case 'canon.lift': + flat_results = ['i32'] + case 'canon.lower': + flat_params += ['i32'] + flat_results = [] + + return { 'params': flat_params, 'results': flat_results } + +def flatten_types(ts): + return [ft for t in ts for ft in flatten_type(t)] + +# + +def flatten_type(t): + match despecialize(t): + case Bool() : return ['i32'] + case U8() | U16() | U32() : return ['i32'] + case S8() | S16() | S32() : return ['i32'] + case S64() | U64() : return ['i64'] + case Float32() : return ['f32'] + case Float64() : return ['f64'] + case Char() : return ['i32'] + case String() | List(_) : return ['i32', 'i32'] + case Record(fields) : return flatten_types(types_of(fields)) + case Variant(cases) : return flatten_variant(cases) + case Flags(labels) : return ['i32'] * num_i32_flags(labels) + +# + +def flatten_variant(cases): + flat = [] + for c in cases: + for i,ft in enumerate(flatten_type(c.t)): + if i < len(flat): + flat[i] = join(flat[i], ft) + else: + flat.append(ft) + return flatten_type(discriminant_type(cases)) + flat + +def join(a, b): + if a == b: return a + if (a == 'i32' and b == 'f32') or (a == 'f32' and b == 'i32'): return 'i32' + return 'i64' + +### Flat Lifting + +@dataclass +class Value: + t: str # 'i32'|'i64'|'f32'|'f64' + v: int|float + +@dataclass +class ValueIter: + values: [Value] + i = 0 + def next(self, t): + v = self.values[self.i] + self.i += 1 + assert(v.t == t) + return v.v + +def lift_flat(opts, vi, t): + match despecialize(t): + case Bool() : return bool(vi.next('i32')) + case U8() : return lift_flat_unsigned(vi, 32, 8) + case U16() : return lift_flat_unsigned(vi, 32, 16) + case U32() : return lift_flat_unsigned(vi, 32, 32) + case U64() : return lift_flat_unsigned(vi, 64, 64) + case S8() : return lift_flat_signed(vi, 32, 8) + case S16() : return lift_flat_signed(vi, 32, 16) + case S32() : return lift_flat_signed(vi, 32, 32) + case S64() : return lift_flat_signed(vi, 64, 64) + case Float32() : return canonicalize(vi.next('f32')) + case Float64() : return canonicalize(vi.next('f64')) + case Char() : return i32_to_char(opts, vi.next('i32')) + case String() : return lift_flat_string(opts, vi) + case List(t) : return lift_flat_list(opts, vi, t) + case Record(fields) : return lift_flat_record(opts, vi, fields) + case Variant(cases) : return lift_flat_variant(opts, vi, cases) + case Flags(labels) : return lift_flat_flags(vi, labels) + +# + +def lift_flat_unsigned(vi, core_width, t_width): + i = vi.next('i' + str(core_width)) + assert(0 <= i < (1 << core_width)) + trap_if(i >= (1 << t_width)) + return i + +def lift_flat_signed(vi, core_width, t_width): + i = vi.next('i' + str(core_width)) + assert(0 <= i < (1 << core_width)) + if i >= (1 << (t_width - 1)): + i -= (1 << core_width) + trap_if(i < -(1 << (t_width - 1))) + return i + trap_if(i >= (1 << (t_width - 1))) + return i + +# + +def lift_flat_string(opts, vi): + ptr = vi.next('i32') + packed_byte_length = vi.next('i32') + return load_string_from_range(opts, ptr, packed_byte_length) + +def lift_flat_list(opts, vi, elem_type): + ptr = vi.next('i32') + length = vi.next('i32') + return load_list_from_range(opts, ptr, length, elem_type) + +# + +def lift_flat_record(opts, vi, fields): + record = {} + for f in fields: + record[f.label] = lift_flat(opts, vi, f.t) + return record + +# + +def lift_flat_variant(opts, vi, cases): + flat_types = flatten_variant(cases) + assert(flat_types.pop(0) == 'i32') + disc = vi.next('i32') + trap_if(disc >= len(cases)) + case = cases[disc] + class CoerceValueIter: + def next(self, want): + have = flat_types.pop(0) + x = vi.next(have) + match (have, want): + case ('i32', 'f32') : return reinterpret_i32_as_float(x) + case ('i64', 'i32') : return narrow_i64_to_i32(x) + case ('i64', 'f32') : return reinterpret_i32_as_float(narrow_i64_to_i32(x)) + case ('i64', 'f64') : return reinterpret_i64_as_float(x) + case _ : return x + v = lift_flat(opts, CoerceValueIter(), case.t) + for have in flat_types: + _ = vi.next(have) + return { case_label_with_defaults(case, cases): v } + +def narrow_i64_to_i32(i): + trap_if(i >= (1 << 32)) + return i + +# + +def lift_flat_flags(vi, labels): + i = 0 + shift = 0 + for _ in range(num_i32_flags(labels)): + i |= (vi.next('i32') << shift) + shift += 32 + return unpack_flags_from_int(i, labels) + +### Flat Lowering + +def lower_flat(opts, v, t): + match despecialize(t): + case Bool() : return [Value('i32', int(v))] + case U8() : return [Value('i32', v)] + case U16() : return [Value('i32', v)] + case U32() : return [Value('i32', v)] + case U64() : return [Value('i64', v)] + case S8() : return lower_flat_signed(v, 32) + case S16() : return lower_flat_signed(v, 32) + case S32() : return lower_flat_signed(v, 32) + case S64() : return lower_flat_signed(v, 64) + case Float32() : return [Value('f32', v)] + case Float64() : return [Value('f64', v)] + case Char() : return [Value('i32', char_to_i32(v))] + case String() : return lower_flat_string(opts, v) + case List(t) : return lower_flat_list(opts, v, t) + case Record(fields) : return lower_flat_record(opts, v, fields) + case Variant(cases) : return lower_flat_variant(opts, v, cases) + case Flags(labels) : return lower_flat_flags(v, labels) + +# + +def lower_flat_signed(i, core_bits): + if i < 0: + i += (1 << core_bits) + return [Value('i' + str(core_bits), i)] + +# + +def lower_flat_string(opts, v): + ptr, packed_byte_length = store_string_into_range(opts, v) + return [Value('i32', ptr), Value('i32', packed_byte_length)] + +def lower_flat_list(opts, v, elem_type): + (ptr, length) = store_list_into_range(opts, v, elem_type) + return [Value('i32', ptr), Value('i32', length)] + +# + +def lower_flat_record(opts, v, fields): + flat = [] + for f in fields: + flat += lower_flat(opts, v[f.label], f.t) + return flat + +# + +def lower_flat_variant(opts, v, cases): + case_index, case_value = match_case(v, cases) + flat_types = flatten_variant(cases) + assert(flat_types.pop(0) == 'i32') + payload = lower_flat(opts, case_value, cases[case_index].t) + for i,have in enumerate(payload): + want = flat_types.pop(0) + match (have.t, want): + case ('f32', 'i32') : payload[i] = Value('i32', reinterpret_float_as_i32(have.v)) + case ('i32', 'i64') : payload[i] = Value('i64', have.v) + case ('f32', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i32(have.v)) + case ('f64', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i64(have.v)) + case _ : pass + for want in flat_types: + payload.append(Value(want, 0)) + return [Value('i32', case_index)] + payload + +# + +def lower_flat_flags(v, labels): + i = pack_flags_into_int(v, labels) + flat = [] + for _ in range(num_i32_flags(labels)): + flat.append(Value('i32', i & 0xffffffff)) + i >>= 32 + assert(i == 0) + return flat + +### Lifting and Lowering + +def lift(opts, max_flat, vi, ts): + flat_types = flatten_types(ts) + if len(flat_types) > max_flat: + return list(load(opts, vi.next('i32'), Tuple(ts)).values()) + else: + return [ lift_flat(opts, vi, t) for t in ts ] + +# + +def lower(opts, max_flat, vs, ts, out_param = None): + flat_types = flatten_types(ts) + if len(flat_types) > max_flat: + tuple_type = Tuple(functype.params) + tuple_value = {str(i): v for i,v in enumerate(vs)} + if out_param is None: + ptr = opts.realloc(0, 0, alignment(tuple_type), byte_size(tuple_type)) + else: + ptr = out_param.next('i32') + store(opts, tuple_value, tuple_type, ptr) + return [ Value('i32', ptr) ] + else: + flat_vals = [] + for i in range(len(vs)): + flat_vals += lower_flat(opts, vs[i], ts[i]) + return flat_vals + +### `canon.lift` + +class Instance: + may_leave = True + may_enter = True + # ... + +def canon_lift(callee_opts, callee_instance, callee, functype, args): + trap_if(not callee_instance.may_enter) + + assert(callee_instance.may_leave) + callee_instance.may_leave = False + flat_args = lower(callee_opts, MAX_FLAT_PARAMS, args, functype.params) + callee_instance.may_leave = True + + try: + flat_results = callee(flat_args) + except CoreWebAssemblyException: + trap() + + callee_instance.may_enter = False + [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) + def post_return(): + callee_instance.may_enter = True + callee_opts.post_return() + + return (result, post_return) + +### `canon.lower` + +def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): + trap_if(not caller_instance.may_leave) + + assert(caller_instance.may_enter) + caller_instance.may_enter = False + + flat_args = ValueIter(flat_args) + args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) + + result, post_return = callee(args) + + caller_instance.may_leave = False + flat_results = lower(caller_opts, MAX_FLAT_RESULTS, [result], [functype.result], flat_args) + caller_instance.may_leave = True + + post_return() + + caller_instance.may_enter = True + return flat_results diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py new file mode 100644 index 0000000..13b05a5 --- /dev/null +++ b/design/mvp/canonical-abi/run_tests.py @@ -0,0 +1,323 @@ +import definitions +from definitions import * + +def equal_modulo_string_encoding(s, t): + if isinstance(s, (bool,int,float,str)) and isinstance(t, (bool,int,float,str)): + return s == t + if isinstance(s, tuple) and isinstance(t, tuple): + if s == () and t == (): + return True + assert(isinstance(s[0], str)) + assert(isinstance(t[0], str)) + return s[0] == t[0] + if isinstance(s, dict) and isinstance(t, dict): + return all(equal_modulo_string_encoding(sv,tv) for sv,tv in zip(s.values(), t.values(), strict=True)) + if isinstance(s, list) and isinstance(t, list): + return all(equal_modulo_string_encoding(sv,tv) for sv,tv in zip(s, t, strict=True)) + assert(False) + +class Heap: + def __init__(self, arg): + self.memory = bytearray(arg) + self.last_alloc = 0 + + def realloc(self, original_ptr, original_size, alignment, new_size): + if original_ptr != 0 and new_size < original_size: + return align_to(original_ptr, alignment) + ret = align_to(self.last_alloc, alignment) + self.last_alloc = ret + new_size + if self.last_alloc > len(self.memory): + print('oom: have {} need {}'.format(len(self.memory), self.last_alloc)) + trap() + self.memory[ret : ret + original_size] = self.memory[original_ptr : original_ptr + original_size] + return ret + +def mk_opts(memory, encoding, realloc, post_return): + opts = Opts() + opts.memory = memory + opts.string_encoding = encoding + opts.realloc = realloc + opts.post_return = post_return + return opts + +def mk_str(s): + return (s, 'utf8', len(s.encode('utf-8'))) + +def mk_tup(*a): + def mk_tup_rec(x): + if isinstance(x, list): + return { str(i):mk_tup_rec(v) for i,v in enumerate(x) } + return x + return { str(i):mk_tup_rec(v) for i,v in enumerate(a) } + +def fail(msg): + raise BaseException(msg) + +def test(t, vals_to_lift, v, + opts = mk_opts(bytearray(), 'utf8', None, None), + dst_encoding = None, + lower_t = None, + lower_v = None): + def test_name(): + return "test({},{},{}):".format(t, vals_to_lift, v) + + vi = ValueIter([Value(ft, v) for ft,v in zip(flatten_type(t), vals_to_lift, strict=True)]) + + if v is None: + try: + got = lift_flat(opts, vi, t) + fail("{} expected trap, but got {}".format(test_name(), got)) + except Trap: + return + + got = lift_flat(opts, vi, t) + assert(vi.i == len(vi.values)) + if got != v: + fail("{} initial lift_flat() expected {} but got {}".format(test_name(), v, got)) + + if lower_t is None: + lower_t = t + if lower_v is None: + lower_v = v + + heap = Heap(5*len(opts.memory)) + if dst_encoding is None: + dst_encoding = opts.string_encoding + opts = mk_opts(heap.memory, dst_encoding, heap.realloc, None) + lowered_vals = lower_flat(opts, v, lower_t) + assert(flatten_type(lower_t) == list(map(lambda v: v.t, lowered_vals))) + + vi = ValueIter(lowered_vals) + got = lift_flat(opts, vi, lower_t) + if not equal_modulo_string_encoding(got, lower_v): + fail("{} re-lift expected {} but got {}".format(test_name(), lower_v, got)) + +test(Unit(), [], {}) +test(Record([Field('x',U8()), Field('y',U16()), Field('z',U32())]), [1,2,3], {'x':1,'y':2,'z':3}) +test(Tuple([Tuple([U8(),U8()]),U8()]), [1,2,3], {'0':{'0':1,'1':2},'1':3}) +t = Flags(['a','b']) +test(t, [0], {'a':False,'b':False}) +test(t, [2], {'a':False,'b':True}) +test(t, [3], {'a':True,'b':True}) +test(t, [4], None) +test(Flags([str(i) for i in range(33)]), [0xffffffff,0x1], { str(i):True for i in range(33) }) +t = Variant([Case('x',U8()),Case('y',Float32()),Case('z',Unit())]) +test(t, [0,42], {'x': 42}) +test(t, [0,256], None) +test(t, [1,0x4048f5c3], {'y': 3.140000104904175}) +test(t, [2,0xffffffff], {'z': {}}) +t = Union([U32(),U64()]) +test(t, [0,42], {'0':42}) +test(t, [0,(1<<35)], None) +test(t, [1,(1<<35)], {'1':(1<<35)}) +t = Union([Float32(), U64()]) +test(t, [0,0x4048f5c3], {'0': 3.140000104904175}) +test(t, [0,(1<<35)], None) +test(t, [1,(1<<35)], {'1': (1<<35)}) +t = Union([Float64(), U64()]) +test(t, [0,0x40091EB851EB851F], {'0': 3.14}) +test(t, [0,(1<<35)], {'0': 1.69759663277e-313}) +test(t, [1,(1<<35)], {'1': (1<<35)}) +t = Union([U8()]) +test(t, [0,42], {'0':42}) +test(t, [1,256], None) +test(t, [0,256], None) +t = Union([Tuple([U8(),Float32()]), U64()]) +test(t, [0,42,3.14], {'0': {'0':42, '1':3.14}}) +test(t, [1,(1<<35),0], {'1': (1<<35)}) +t = Option(Float32()) +test(t, [0,3.14], {'none':{}}) +test(t, [1,3.14], {'some':3.14}) +t = Expected(U8(),U32()) +test(t, [0, 42], {'ok':42}) +test(t, [1, 1000], {'error':1000}) +t = Variant([Case('w',U8()), Case('x',U8(),'w'), Case('y',U8()), Case('z',U8(),'x')]) +test(t, [0, 42], {'w':42}) +test(t, [1, 42], {'x|w':42}) +test(t, [2, 42], {'y':42}) +test(t, [3, 42], {'z|x|w':42}) +t2 = Variant([Case('w',U8())]) +test(t, [0, 42], {'w':42}, lower_t=t2, lower_v={'w':42}) +test(t, [1, 42], {'x|w':42}, lower_t=t2, lower_v={'w':42}) +test(t, [3, 42], {'z|x|w':42}, lower_t=t2, lower_v={'w':42}) + +def test_pairs(t, pairs): + for arg,expect in pairs: + test(t, [arg], expect) + +test_pairs(Bool(), [(0,False),(1,True),(2,True),(4294967295,True)]) +test_pairs(U8(), [(127,127),(128,128),(255,255),(256,None), + (4294967295,None),(4294967168,None),(4294967167,None)]) +test_pairs(S8(), [(127,127),(128,None),(255,None),(256,None), + (4294967295,-1),(4294967168,-128),(4294967167,None)]) +test_pairs(U16(), [(32767,32767),(32768,32768),(65535,65535),(65536,None), + ((1<<32)-1,None),((1<<32)-32768,None),((1<<32)-32769,None)]) +test_pairs(S16(), [(32767,32767),(32768,None),(65535,None),(65536,None), + ((1<<32)-1,-1),((1<<32)-32768,-32768),((1<<32)-32769,None)]) +test_pairs(U32(), [((1<<31)-1,(1<<31)-1),(1<<31,1<<31),(((1<<32)-1),(1<<32)-1)]) +test_pairs(S32(), [((1<<31)-1,(1<<31)-1),(1<<31,-(1<<31)),((1<<32)-1,-1)]) +test_pairs(U64(), [((1<<63)-1,(1<<63)-1), (1<<63,1<<63), ((1<<64)-1,(1<<64)-1)]) +test_pairs(S64(), [((1<<63)-1,(1<<63)-1), (1<<63,-(1<<63)), ((1<<64)-1,-1)]) +test_pairs(Float32(), [(3.14,3.14)]) +test_pairs(Float64(), [(3.14,3.14)]) +test_pairs(Char(), [(0,'\x00'), (65,'A'), (0xD7FF,'\uD7FF'), (0xD800,None), (0xDFFF,None)]) +test_pairs(Char(), [(0xE000,'\uE000'), (0x10FFFF,'\U0010FFFF'), (0x110000,None), (0xFFFFFFFF,None)]) +test_pairs(Enum(['a','b']), [(0,{'a':{}}), (1,{'b':{}}), (2,None)]) + +def test_string_internal(src_encoding, dst_encoding, s, encoded, utf16_bit = False): + heap = Heap(len(encoded)) + heap.memory[:] = encoded[:] + opts = mk_opts(heap.memory, src_encoding, None, None) + packed_byte_length = len(encoded) + if utf16_bit: + packed_byte_length |= UTF16_BIT + v = (s, src_encoding, packed_byte_length) + test(String(), [0, packed_byte_length], v, opts, dst_encoding) + +def test_string(src_encoding, dst_encoding, s): + if src_encoding == 'utf8': + encoded = s.encode('utf-8') + test_string_internal(src_encoding, dst_encoding, s, encoded) + elif src_encoding == 'utf16': + encoded = s.encode('utf-16-le') + test_string_internal(src_encoding, dst_encoding, s, encoded) + elif src_encoding == 'latin1+utf16': + try: + encoded = s.encode('latin-1') + test_string_internal(src_encoding, dst_encoding, s, encoded) + except UnicodeEncodeError: + pass + encoded = s.encode('utf-16-le') + test_string_internal(src_encoding, dst_encoding, s, encoded, utf16_bit = True) + +encodings = ['utf8', 'utf16', 'latin1+utf16'] + +fun_strings = ['', 'a', 'hi', '\x00', 'a\x00b', '\x80', '\x80b', 'ab\xefc', + '\u01ffy', 'xy\u01ff', 'a\ud7ffb', 'a\u02ff\u03ff\u04ffbc', + '\uf123', '\uf123\uf123abc', 'abcdef\uf123'] + +for src_encoding in encodings: + for dst_encoding in encodings: + for s in fun_strings: + test_string(src_encoding, dst_encoding, s) + +def test_heap(t, expect, args, byte_array): + heap = Heap(byte_array) + opts = mk_opts(heap.memory, 'utf8', None, None) + test(t, args, expect, opts) + +test_heap(List(Unit()), [{},{},{}], [0,3], []) +test_heap(List(Bool()), [True,False,True], [0,3], [1,0,1]) +test_heap(List(Bool()), [True,False,True], [3,3], [0xff,0xff,0xff, 1,0,1]) +test_heap(List(U8()), [1,2,3], [0,3], [1,2,3]) +test_heap(List(U16()), [1,2,3], [0,3], [1,0, 2,0, 3,0 ]) +test_heap(List(U32()), [1,2,3], [0,3], [1,0,0,0, 2,0,0,0, 3,0,0,0]) +test_heap(List(U64()), [1,2], [0,2], [1,0,0,0,0,0,0,0, 2,0,0,0,0,0,0,0]) +test_heap(List(S8()), [-1,-2,-3], [0,3], [0xff,0xfe,0xfd]) +test_heap(List(S16()), [-1,-2,-3], [0,3], [0xff,0xff, 0xfe,0xff, 0xfd,0xff]) +test_heap(List(S32()), [-1,-2,-3], [0,3], [0xff,0xff,0xff,0xff, 0xfe,0xff,0xff,0xff, 0xfd,0xff,0xff,0xff]) +test_heap(List(S64()), [-1,-2], [0,2], [0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xfe,0xff,0xff,0xff,0xff,0xff,0xff,0xff]) +test_heap(List(Char()), ['A','B','c'], [0,3], [65,00,00,00, 66,00,00,00, 99,00,00,00]) +test_heap(List(String()), [mk_str("hi"),mk_str("wat")], [0,2], + [16,0,0,0, 2,0,0,0, 21,0,0,0, 3,0,0,0, + ord('h'), ord('i'), 0xf,0xf,0xf, ord('w'), ord('a'), ord('t')]) +test_heap(List(List(U8())), [[3,4,5],[],[6,7]], [0,3], + [24,0,0,0, 3,0,0,0, 0,0,0,0, 0,0,0,0, 27,0,0,0, 2,0,0,0, + 3,4,5, 6,7]) +test_heap(List(Tuple([U8(),U8(),U16(),U32()])), [mk_tup(6,7,8,9),mk_tup(4,5,6,7)], [0,2], + [6, 7, 8,0, 9,0,0,0, 4, 5, 6,0, 7,0,0,0]) +test_heap(List(Tuple([U8(),U16(),U8(),U32()])), [mk_tup(6,7,8,9),mk_tup(4,5,6,7)], [0,2], + [6,0xff, 7,0, 8,0xff,0xff,0xff, 9,0,0,0, 4,0xff, 5,0, 6,0xff,0xff,0xff, 7,0,0,0]) +test_heap(List(Tuple([U16(),U8()])), [mk_tup(6,7),mk_tup(8,9)], [0,2], + [6,0, 7, 0x0ff, 8,0, 9, 0xff]) +test_heap(List(Tuple([Tuple([U16(),U8()]),U8()])), [mk_tup([4,5],6),mk_tup([7,8],9)], [0,2], + [4,0, 5, 6, 7,0, 8, 9]) +test_heap(List(Union([Unit(),U8(),Tuple([U8(),U16()])])), [{'0':{}}, {'1':42}, {'2':mk_tup(6,7)}], [0,3], + [0,0xff,0xff,0xff,0xff,0xff, 1,0xff,42,0xff,0xff,0xff, 2,0xff,6,0xff,7,0]) +test_heap(List(Union([U32(),U8()])), [{'0':256}, {'1':42}], [0,2], + [0,0xff,0xff,0xff,0,1,0,0, 1,0xff,0xff,0xff,42,0xff,0xff,0xff]) +test_heap(List(Tuple([Union([U8(),Tuple([U16(),U8()])]),U8()])), + [mk_tup({'1':mk_tup(5,6)},7),mk_tup({'0':8},9)], [0,2], + [1,0xff,5,0,6,7, 0,0xff,8,0xff,0xff,9]) +test_heap(List(Union([U8()])), [{'0':6},{'0':7},{'0':8}], [0,3], + [0,6, 0,7, 0,8]) +t = List(Flags(['a','b'])) +test_heap(t, [{'a':False,'b':False},{'a':False,'b':True},{'a':True,'b':True}], [0,3], + [0,2,3]) +test_heap(t, None, [0,3], + [0,2,4]) +t = List(Flags([str(i) for i in range(9)])) +test_heap(t, [{ str(i):b for i in range(9) } for b in [True,False]], [0,2], + [0xff,0x1, 0,0]) +test_heap(t, None, [0,2], + [0xff,0x3, 0,0]) +t = List(Flags([str(i) for i in range(17)])) +test_heap(t, [{ str(i):b for i in range(17) } for b in [True,False]], [0,2], + [0xff,0xff,0x1,0, 0,0,0,0]) +test_heap(t, None, [0,2], + [0xff,0xff,0x3,0, 0,0,0,0]) +t = List(Flags([str(i) for i in range(33)])) +test_heap(t, [{ str(i):b for i in range(33) } for b in [True,False]], [0,2], + [0xff,0xff,0xff,0xff,0x1,0,0,0, 0,0,0,0,0,0,0,0]) +test_heap(t, None, [0,2], + [0xff,0xff,0xff,0xff,0x3,0,0,0, 0,0,0,0,0,0,0,0]) + +def test_flatten(t, params, results): + expect = { 'params':params, 'results':results } + + if len(params) > definitions.MAX_FLAT_PARAMS: + expect['params'] = ['i32'] + + if len(results) > definitions.MAX_FLAT_RESULTS: + expect['results'] = ['i32'] + got = flatten(t, 'canon.lift') + assert(got == expect) + + if len(results) > definitions.MAX_FLAT_RESULTS: + expect['params'] += ['i32'] + expect['results'] = [] + got = flatten(t, 'canon.lower') + assert(got == expect) + +test_flatten(Func([U8(),Float32(),Float64()],Unit()), ['i32','f32','f64'], []) +test_flatten(Func([U8(),Float32(),Float64()],Float32()), ['i32','f32','f64'], ['f32']) +test_flatten(Func([U8(),Float32(),Float64()],U8()), ['i32','f32','f64'], ['i32']) +test_flatten(Func([U8(),Float32(),Float64()],Tuple([Float32()])), ['i32','f32','f64'], ['f32']) +test_flatten(Func([U8(),Float32(),Float64()],Tuple([Float32(),Float32()])), ['i32','f32','f64'], ['f32','f32']) +test_flatten(Func([U8() for _ in range(17)],Unit()), ['i32' for _ in range(17)], []) +test_flatten(Func([U8() for _ in range(17)],Tuple([U8(),U8()])), ['i32' for _ in range(17)], ['i32','i32']) + +def test_roundtrip(t, v): + before = definitions.MAX_FLAT_RESULTS + definitions.MAX_FLAT_RESULTS = 16 + + ft = Func([t],t) + callee_instance = Instance() + callee = lambda x: x + + callee_heap = Heap(1000) + callee_opts = mk_opts(callee_heap.memory, 'utf8', callee_heap.realloc, lambda: ()) + lifted_callee = lambda args: canon_lift(callee_opts, callee_instance, callee, ft, args) + + caller_heap = Heap(1000) + caller_instance = Instance() + caller_opts = mk_opts(caller_heap.memory, 'utf8', caller_heap.realloc, None) + + flat_args = lower_flat(caller_opts, v, t) + flat_results = canon_lower(caller_opts, caller_instance, lifted_callee, ft, flat_args) + got = lift_flat(caller_opts, ValueIter(flat_results), t) + + if got != v: + fail("test_roundtrip({},{},{}) got {}".format(t, v, caller_args, got)) + + assert(caller_instance.may_leave and caller_instance.may_enter) + assert(callee_instance.may_leave and callee_instance.may_enter) + definitions.MAX_FLAT_RESULTS = before + +test_roundtrip(S8(), -1) +test_roundtrip(Tuple([U16(),U16()]), mk_tup(3,4)) +test_roundtrip(List(String()), [mk_str("hello there")]) +test_roundtrip(List(List(String())), [[mk_str("one"),mk_str("two")],[mk_str("three")]]) +test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':mk_tup(mk_str("answer"),42)}]) + +print("All tests passed") diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index b5b5370..0957faa 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -157,9 +157,11 @@ would look like: (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) - (func (export "zip") - (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "zip")) - ) + (func (export "zip") (canon.lift + (func (param (list u8)) (result (list u8))) + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $main "zip") + )) ) ``` Here, `zipper` links its own private module code (`$Main`) with the shareable @@ -234,9 +236,11 @@ component-aware `clang`, the resulting component would look like: (with "libc" (instance $libc)) (with "libimg" (instance $libimg)) )) - (func (export "transform") - (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "transform")) - ) + (func (export "transform") (canon.lift + (func (param (list u8)) (result (list u8))) + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $main "transform") + )) ) ``` Here, we see the general pattern emerging of the dependency DAG between @@ -279,20 +283,24 @@ components. The resulting component could look like: )) (instance $libc (instantiate (module $Libc))) - (func $zip - (canon.lower (into $libc) (func $zipper "zip")) - ) - (func $transform - (canon.lower (into $libc) (func $imgmgk "transform")) - ) + (func $zip (canon.lower + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $zipper "zip") + )) + (func $transform (canon.lower + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $imgmgk "transform") + )) (instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "zipper" (instance (export "zip" (func $zipper "zip")))) (with "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) - (func (export "run") - (canon.lift (func (param string) (result string)) (func $main "run")) - ) + (func (export "run") (canon.lift + (func (param string) (result string)) + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $main "run") + )) ) ``` Note here that `$Libc` is passed to the nested `zipper` and `imgmgk` instances From 62f8fb897ea64d6a2d46515e2e205dfa85f7380f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 15:07:48 -0500 Subject: [PATCH 024/301] Add dynamic alignment checks to load and store --- design/mvp/CanonicalABI.md | 8 +++++++- design/mvp/canonical-abi/definitions.py | 8 +++++++- design/mvp/canonical-abi/run_tests.py | 7 +++++++ 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index f72da65..420d6eb 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -320,6 +320,7 @@ def load_list(opts, ptr, elem_type): return load_list_from_range(opts, begin, length, elem_type) def load_list_from_range(opts, ptr, length, elem_type): + trap_if(ptr != align_to(ptr, alignment(elem_type))) trap_if(ptr + length * elem_size(elem_type) > len(opts.memory)) a = [] for i in range(length): @@ -644,6 +645,7 @@ def store_list_into_range(opts, v, elem_type): byte_length = len(v) * elem_size(elem_type) trap_if(byte_length >= (1 << 32)) ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) + trap_if(ptr != align_to(ptr, alignment(elem_type))) trap_if(ptr + byte_length > len(opts.memory)) for i,e in enumerate(v): store(opts, e, elem_type, ptr + i * elem_size(elem_type)) @@ -1040,7 +1042,10 @@ values with types `ts`: def lift(opts, max_flat, vi, ts): flat_types = flatten_types(ts) if len(flat_types) > max_flat: - return list(load(opts, vi.next('i32'), Tuple(ts)).values()) + ptr = vi.next('i32') + tuple_type = Tuple(ts) + trap_if(ptr != align_to(ptr, alignment(tuple_type))) + return list(load(opts, ptr, tuple_type).values()) else: return [ lift_flat(opts, vi, t) for t in ts ] ``` @@ -1060,6 +1065,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): ptr = opts.realloc(0, 0, alignment(tuple_type), byte_size(tuple_type)) else: ptr = out_param.next('i32') + trap_if(ptr != align_to(ptr, alignment(tuple_type))) store(opts, tuple_value, tuple_type, ptr) return [ Value('i32', ptr) ] else: diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index d3faf1c..575f9f6 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -288,6 +288,7 @@ def load_list(opts, ptr, elem_type): return load_list_from_range(opts, begin, length, elem_type) def load_list_from_range(opts, ptr, length, elem_type): + trap_if(ptr != align_to(ptr, alignment(elem_type))) trap_if(ptr + length * elem_size(elem_type) > len(opts.memory)) a = [] for i in range(length): @@ -533,6 +534,7 @@ def store_list_into_range(opts, v, elem_type): byte_length = len(v) * elem_size(elem_type) trap_if(byte_length >= (1 << 32)) ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) + trap_if(ptr != align_to(ptr, alignment(elem_type))) trap_if(ptr + byte_length > len(opts.memory)) for i,e in enumerate(v): store(opts, e, elem_type, ptr + i * elem_size(elem_type)) @@ -828,7 +830,10 @@ def lower_flat_flags(v, labels): def lift(opts, max_flat, vi, ts): flat_types = flatten_types(ts) if len(flat_types) > max_flat: - return list(load(opts, vi.next('i32'), Tuple(ts)).values()) + ptr = vi.next('i32') + tuple_type = Tuple(ts) + trap_if(ptr != align_to(ptr, alignment(tuple_type))) + return list(load(opts, ptr, tuple_type).values()) else: return [ lift_flat(opts, vi, t) for t in ts ] @@ -843,6 +848,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): ptr = opts.realloc(0, 0, alignment(tuple_type), byte_size(tuple_type)) else: ptr = out_param.next('i32') + trap_if(ptr != align_to(ptr, alignment(tuple_type))) store(opts, tuple_value, tuple_type, ptr) return [ Value('i32', ptr) ] else: diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 13b05a5..46e371b 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -211,6 +211,7 @@ def test_heap(t, expect, args, byte_array): test_heap(List(Bool()), [True,False,True], [3,3], [0xff,0xff,0xff, 1,0,1]) test_heap(List(U8()), [1,2,3], [0,3], [1,2,3]) test_heap(List(U16()), [1,2,3], [0,3], [1,0, 2,0, 3,0 ]) +test_heap(List(U16()), None, [1,3], [0, 1,0, 2,0, 3,0 ]) test_heap(List(U32()), [1,2,3], [0,3], [1,0,0,0, 2,0,0,0, 3,0,0,0]) test_heap(List(U64()), [1,2], [0,2], [1,0,0,0,0,0,0,0, 2,0,0,0,0,0,0,0]) test_heap(List(S8()), [-1,-2,-3], [0,3], [0xff,0xfe,0xfd]) @@ -224,6 +225,12 @@ def test_heap(t, expect, args, byte_array): test_heap(List(List(U8())), [[3,4,5],[],[6,7]], [0,3], [24,0,0,0, 3,0,0,0, 0,0,0,0, 0,0,0,0, 27,0,0,0, 2,0,0,0, 3,4,5, 6,7]) +test_heap(List(List(U16())), [[5,6]], [0,1], + [8,0,0,0, 2,0,0,0, + 5,0, 6,0]) +test_heap(List(List(U16())), None, [0,1], + [9,0,0,0, 2,0,0,0, + 0, 5,0, 6,0]) test_heap(List(Tuple([U8(),U8(),U16(),U32()])), [mk_tup(6,7,8,9),mk_tup(4,5,6,7)], [0,2], [6, 7, 8,0, 9,0,0,0, 4, 5, 6,0, 7,0,0,0]) test_heap(List(Tuple([U8(),U16(),U8(),U32()])), [mk_tup(6,7,8,9),mk_tup(4,5,6,7)], [0,2], From 34db917543841928e2c0177252625ced62496eb2 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 15:20:11 -0500 Subject: [PATCH 025/301] Split out separate canonicalize32/64 so there is an explicit float32 bit-pattern --- design/mvp/canonical-abi/definitions.py | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 575f9f6..9187ee5 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -212,8 +212,8 @@ def load(opts, ptr, t): case S16() : return load_int(opts, ptr, 2, signed=True) case S32() : return load_int(opts, ptr, 4, signed=True) case S64() : return load_int(opts, ptr, 8, signed=True) - case Float32() : return canonicalize(reinterpret_i32_as_float(load_int(opts, ptr, 4))) - case Float64() : return canonicalize(reinterpret_i64_as_float(load_int(opts, ptr, 8))) + case Float32() : return canonicalize32(reinterpret_i32_as_float(load_int(opts, ptr, 4))) + case Float64() : return canonicalize64(reinterpret_i64_as_float(load_int(opts, ptr, 8))) case Char() : return i32_to_char(opts, load_int(opts, ptr, 4)) case String() : return load_string(opts, ptr) case List(t) : return load_list(opts, ptr, t) @@ -235,7 +235,12 @@ def reinterpret_i32_as_float(i): def reinterpret_i64_as_float(i): return struct.unpack('!d', struct.pack('!Q', i))[0] -def canonicalize(f): +def canonicalize32(f): + if math.isnan(f): + return reinterpret_i64_as_float(0x7fc00000) + return f + +def canonicalize64(f): if math.isnan(f): return reinterpret_i64_as_float(0x7ff8000000000000) return f @@ -664,8 +669,8 @@ def lift_flat(opts, vi, t): case S16() : return lift_flat_signed(vi, 32, 16) case S32() : return lift_flat_signed(vi, 32, 32) case S64() : return lift_flat_signed(vi, 64, 64) - case Float32() : return canonicalize(vi.next('f32')) - case Float64() : return canonicalize(vi.next('f64')) + case Float32() : return canonicalize32(vi.next('f32')) + case Float64() : return canonicalize64(vi.next('f64')) case Char() : return i32_to_char(opts, vi.next('i32')) case String() : return lift_flat_string(opts, vi) case List(t) : return lift_flat_list(opts, vi, t) From fe033246e1030e808ca629b62a0a01304857c48e Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 15:30:00 -0500 Subject: [PATCH 026/301] Sync CanonicalABI.md --- design/mvp/CanonicalABI.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 420d6eb..6253950 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -225,8 +225,8 @@ def load(opts, ptr, t): case S16() : return load_int(opts, ptr, 2, signed=True) case S32() : return load_int(opts, ptr, 4, signed=True) case S64() : return load_int(opts, ptr, 8, signed=True) - case Float32() : return canonicalize(reinterpret_i32_as_float(load_int(opts, ptr, 4))) - case Float64() : return canonicalize(reinterpret_i64_as_float(load_int(opts, ptr, 8))) + case Float32() : return canonicalize32(reinterpret_i32_as_float(load_int(opts, ptr, 4))) + case Float64() : return canonicalize64(reinterpret_i64_as_float(load_int(opts, ptr, 8))) case Char() : return i32_to_char(opts, load_int(opts, ptr, 4)) case String() : return load_string(opts, ptr) case List(t) : return load_list(opts, ptr, t) @@ -252,7 +252,12 @@ def reinterpret_i32_as_float(i): def reinterpret_i64_as_float(i): return struct.unpack('!d', struct.pack('!Q', i))[0] -def canonicalize(f): +def canonicalize32(f): + if math.isnan(f): + return reinterpret_i32_as_float(0x7fc00000) + return f + +def canonicalize64(f): if math.isnan(f): return reinterpret_i64_as_float(0x7ff8000000000000) return f @@ -831,8 +836,8 @@ def lift_flat(opts, vi, t): case S16() : return lift_flat_signed(vi, 32, 16) case S32() : return lift_flat_signed(vi, 32, 32) case S64() : return lift_flat_signed(vi, 64, 64) - case Float32() : return canonicalize(vi.next('f32')) - case Float64() : return canonicalize(vi.next('f64')) + case Float32() : return canonicalize32(vi.next('f32')) + case Float64() : return canonicalize64(vi.next('f64')) case Char() : return i32_to_char(opts, vi.next('i32')) case String() : return lift_flat_string(opts, vi) case List(t) : return lift_flat_list(opts, vi, t) From cb5259cff0cbcc6902fc9f0920edf9c2c7818449 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 15:30:35 -0500 Subject: [PATCH 027/301] Add comment for pack/unpack usage --- design/mvp/CanonicalABI.md | 8 ++++---- design/mvp/canonical-abi/definitions.py | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 6253950..306e823 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -247,10 +247,10 @@ Floats are loaded from memory and then "canonicalized", mapping all Not-a-Number values to a single canonical `nan` bit-pattern: ```python def reinterpret_i32_as_float(i): - return struct.unpack('!f', struct.pack('!I', i))[0] + return struct.unpack('!f', struct.pack('!I', i))[0] # f32.reinterpret_i32 def reinterpret_i64_as_float(i): - return struct.unpack('!d', struct.pack('!Q', i))[0] + return struct.unpack('!d', struct.pack('!Q', i))[0] # f64.reinterpret_i64 def canonicalize32(f): if math.isnan(f): @@ -438,10 +438,10 @@ assume is the canonical one), no additional runtime canonicalization is necessary. ```python def reinterpret_float_as_i32(f): - return struct.unpack('!I', struct.pack('!f', f))[0] + return struct.unpack('!I', struct.pack('!f', f))[0] # i32.reinterpret_f32 def reinterpret_float_as_i64(f): - return struct.unpack('!Q', struct.pack('!d', f))[0] + return struct.unpack('!Q', struct.pack('!d', f))[0] # i64.reinterpret_f64 ``` The integral value of a `char` (a [Unicode Scalar Value]) is a valid unsigned diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 9187ee5..c012f89 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -230,10 +230,10 @@ def load_int(opts, ptr, nbytes, signed = False): # def reinterpret_i32_as_float(i): - return struct.unpack('!f', struct.pack('!I', i))[0] + return struct.unpack('!f', struct.pack('!I', i))[0] # f32.reinterpret_i32 def reinterpret_i64_as_float(i): - return struct.unpack('!d', struct.pack('!Q', i))[0] + return struct.unpack('!d', struct.pack('!Q', i))[0] # f64.reinterpret_i64 def canonicalize32(f): if math.isnan(f): @@ -379,10 +379,10 @@ def store_int(opts, v, ptr, nbytes, signed = False): # def reinterpret_float_as_i32(f): - return struct.unpack('!I', struct.pack('!f', f))[0] + return struct.unpack('!I', struct.pack('!f', f))[0] # i32.reinterpret_f32 def reinterpret_float_as_i64(f): - return struct.unpack('!Q', struct.pack('!d', f))[0] + return struct.unpack('!Q', struct.pack('!d', f))[0] # i64.reinterpret_f64 # From 7a2ee91109bd75e303b4c60a4d0710306e2f72b1 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 15:50:14 -0500 Subject: [PATCH 028/301] Add NaN canonicalization explanation --- design/mvp/CanonicalABI.md | 5 +++-- design/mvp/Explainer.md | 9 +++++++++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 306e823..a689043 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -243,8 +243,9 @@ def load_int(opts, ptr, nbytes, signed = False): return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) ``` -Floats are loaded from memory and then "canonicalized", mapping all -Not-a-Number values to a single canonical `nan` bit-pattern: +For reasons [given](Explainer.md#type-definitions) in the explainer, floats are +loaded from memory and then "canonicalized", mapping all Not-a-Number bit +patterns to a single canonical `nan` value. ```python def reinterpret_i32_as_float(i): return struct.unpack('!f', struct.pack('!I', i))[0] # f32.reinterpret_i32 diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 087b645..ce37148 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -368,6 +368,14 @@ interface types is given by the following table: | `variant` | heterogeneous [tagged unions] of named `intertype` values | | `list` | homogeneous, variable-length [sequences] of `intertype` values | +NaN values are canonicalized to a single value so that: +1. consumers of NaN values are free to use the rest of the NaN payload for + optimization purposes (like [NaN boxing]) without needing to worry about + whether the NaN payload bits were significant; and +2. producers of NaN values across component boundaries do not develop brittle + assumptions that NaN payload bits are preserved by the other side (since + they often aren't). + The sets of values allowed for the remaining *specialized* interface types are defined by the following mapping: ``` @@ -954,6 +962,7 @@ and will be added over the coming months to complete the MVP proposal: [Bottom Type]: https://en.wikipedia.org/wiki/Bottom_type [IEEE754]: https://en.wikipedia.org/wiki/IEEE_754 [NaN]: https://en.wikipedia.org/wiki/NaN +[NaN Boxing]: https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations [Unicode Scalar Values]: https://unicode.org/glossary/#unicode_scalar_value [Tuples]: https://en.wikipedia.org/wiki/Tuple [Tagged Unions]: https://en.wikipedia.org/wiki/Tagged_union From 26aa846d6e174bc42cd21e8f16c913c69a6b7dd3 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 15:57:13 -0500 Subject: [PATCH 029/301] Canonicalize before storing/lowering to make it clear which bit pattern is used for NaNs --- design/mvp/CanonicalABI.md | 15 +++++++-------- design/mvp/canonical-abi/definitions.py | 8 ++++---- 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index a689043..ca40fd4 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -413,8 +413,8 @@ def store(opts, v, t, ptr): case S16() : store_int(opts, v, ptr, 2, signed=True) case S32() : store_int(opts, v, ptr, 4, signed=True) case S64() : store_int(opts, v, ptr, 8, signed=True) - case Float32() : store_int(opts, reinterpret_float_as_i32(v), ptr, 4) - case Float64() : store_int(opts, reinterpret_float_as_i64(v), ptr, 8) + case Float32() : store_int(opts, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4) + case Float64() : store_int(opts, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8) case Char() : store_int(opts, char_to_i32(v), ptr, 4) case String() : store_string(opts, v, ptr) case List(t) : store_list(opts, v, ptr, t) @@ -433,10 +433,9 @@ def store_int(opts, v, ptr, nbytes, signed = False): opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) ``` -Floats are stored directly into memory. Because the input domain is exactly the -set of interface values which includes only a single `nan` value (which we -assume is the canonical one), no additional runtime canonicalization is -necessary. +Floats are stored directly into memory (in the case of NaNs, using the +32-/64-bit canonical NaN bit pattern selected by +`canonicalize32`/`canonicalize64`): ```python def reinterpret_float_as_i32(f): return struct.unpack('!I', struct.pack('!f', f))[0] # i32.reinterpret_f32 @@ -961,8 +960,8 @@ def lower_flat(opts, v, t): case S16() : return lower_flat_signed(v, 32) case S32() : return lower_flat_signed(v, 32) case S64() : return lower_flat_signed(v, 64) - case Float32() : return [Value('f32', v)] - case Float64() : return [Value('f64', v)] + case Float32() : return [Value('f32', canonicalize32(v))] + case Float64() : return [Value('f64', canonicalize64(v))] case Char() : return [Value('i32', char_to_i32(v))] case String() : return lower_flat_string(opts, v) case List(t) : return lower_flat_list(opts, v, t) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index c012f89..4cb6938 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -361,8 +361,8 @@ def store(opts, v, t, ptr): case S16() : store_int(opts, v, ptr, 2, signed=True) case S32() : store_int(opts, v, ptr, 4, signed=True) case S64() : store_int(opts, v, ptr, 8, signed=True) - case Float32() : store_int(opts, reinterpret_float_as_i32(v), ptr, 4) - case Float64() : store_int(opts, reinterpret_float_as_i64(v), ptr, 8) + case Float32() : store_int(opts, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4) + case Float64() : store_int(opts, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8) case Char() : store_int(opts, char_to_i32(v), ptr, 4) case String() : store_string(opts, v, ptr) case List(t) : store_list(opts, v, ptr, t) @@ -766,8 +766,8 @@ def lower_flat(opts, v, t): case S16() : return lower_flat_signed(v, 32) case S32() : return lower_flat_signed(v, 32) case S64() : return lower_flat_signed(v, 64) - case Float32() : return [Value('f32', v)] - case Float64() : return [Value('f64', v)] + case Float32() : return [Value('f32', canonicalize32(v))] + case Float64() : return [Value('f64', canonicalize64(v))] case Char() : return [Value('i32', char_to_i32(v))] case String() : return lower_flat_string(opts, v) case List(t) : return lower_flat_list(opts, v, t) From 6a00372cede6a2d83f2d98287e7c02d2d7520e49 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 18:22:27 -0500 Subject: [PATCH 030/301] Add string encoding links and reword string paragraph so they show up first --- design/mvp/CanonicalABI.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index ca40fd4..9c4597c 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -274,16 +274,15 @@ def i32_to_char(opts, i): return chr(i) ``` -Strings can be decoded in one of three ways, according to the `string-encoding` -option in [`canonopt`]. String interface values include their original encoding -and byte length as a "hint" that enables `store_string` (defined below) to make -better up-front allocation size choices in many cases. Thus, the interface -value produced by `load_string` isn't simply a Python `str`, but a *tuple* -containing a `str`, the original encoding and the original byte length. Lastly, -the custom `latin1+utf16` encoding represents a dynamic choice between `latin1` -(when all code points fit the one-byte Latin-1 encoding) and `utf16` -(otherwise). This dynamic choice is encoded in the high bit of the `i32` -containing the string's byte length. +Strings are loaded from two `i32` values: a pointer (offset in linear memory) +and a number of bytes. There are three supported string encodings in [`canonopt`]: +[UTF-8], [UTF-16] and `latin1+utf16`. This last options allows a *dynamic* +choice between [Latin-1] and UTF-16, indicated by the high bit of the second `i32`. +String interface values include their original encoding and byte length as a +"hint" that enables `store_string` (defined below) to make better up-front +allocation size choices in many cases. Thus, the interface value produced by +`load_string` isn't simply a Python `str`, but a *tuple* containing a `str`, +the original encoding and the original byte length. ```python def load_string(opts, ptr): begin = load_int(opts, ptr, 4) @@ -1269,6 +1268,9 @@ well as post-MVP [adapter functions]. [Exceptions]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md [Alignment]: https://en.wikipedia.org/wiki/Data_structure_alignment +[UTF-8]: https://en.wikipedia.org/wiki/UTF-8 +[UTF-16]: https://en.wikipedia.org/wiki/UTF-16 +[Latin-1]: https://en.wikipedia.org/wiki/ISO/IEC_8859-1 [Unicode Scalar Value]: https://unicode.org/glossary/#unicode_scalar_value [Unicode Code Point]: https://unicode.org/glossary/#code_point [Surrogate]: https://unicode.org/faq/utf_bom.html#utf16-2 From da5d058046ba7e31d93f2a85227ef33435d24477 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 18:26:12 -0500 Subject: [PATCH 031/301] Fix text about multi-value limitation --- design/mvp/CanonicalABI.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 9c4597c..af659be 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -720,9 +720,9 @@ non-dynamically-sized interface types into core parameters and results. For a variety of [practical][Implementation Limits] reasons, we need to limit the total number of flattened parameters and results, falling back to storing everything in linear memory. The number of flattened results is currently -limited to 1 due to various parts of the toolchain (notably LLVM) not yet fully -supporting [multi-value]. Hopefully this limitation is temporary and can be -lifted before the Component Model is fully standardized. +limited to 1 due to various parts of the toolchain (notably the C ABI) not yet +being able to express [multi-value] returns. Hopefully this limitation is +temporary and can be lifted before the Component Model is fully standardized. When there are too many flat values, in general, a single `i32` pointer can be passed instead (pointing to a tuple in linear memory). When lowering *into* From b52a78c725ff50c3637f72d543839cb4c86a82c5 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 18:35:27 -0500 Subject: [PATCH 032/301] Add blurb about defaults-to in Explainer.md --- design/mvp/Explainer.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ce37148..60c4940 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -376,6 +376,13 @@ NaN values are canonicalized to a single value so that: assumptions that NaN payload bits are preserved by the other side (since they often aren't). +The subtyping between all these types is described in a separate +[subtyping explainer](Subtyping.md). Of note here, though: the optional +`defaults-to` field in the `case`s of `variant`s is exclusively concerned with +subtyping. In particular, a `variant` subtype can contain a `case` not present +in the supertype if the subtype's `case` `defaults-to` (directly or transitively) +some `case` in the supertype. + The sets of values allowed for the remaining *specialized* interface types are defined by the following mapping: ``` @@ -444,9 +451,6 @@ WebAssembly validation rules allow duplicate imports, this means that some valid modules will not be typeable and will fail validation if used with the Component Model. -The subtyping between all these types is described in a separate -[subtyping explainer](Subtyping.md). - With what's defined so far, we can define component types using a mix of inline and out-of-line type definitions: ```wasm From 4035e4e7fd7235b9060c1c3d7052361221beb8fb Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 19:31:25 -0500 Subject: [PATCH 033/301] Sync CanonicalABI.md with definitions.py --- design/mvp/canonical-abi/definitions.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 4cb6938..1de3d7b 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -237,7 +237,7 @@ def reinterpret_i64_as_float(i): def canonicalize32(f): if math.isnan(f): - return reinterpret_i64_as_float(0x7fc00000) + return reinterpret_i32_as_float(0x7fc00000) return f def canonicalize64(f): From 5679ffb06737b2bbfb30eb3e9e6cc9961595673f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 13 Apr 2022 19:32:17 -0500 Subject: [PATCH 034/301] Pass string length in code units, not bytes, and fix string realloc alignment --- design/mvp/CanonicalABI.md | 207 ++++++++++++------------ design/mvp/canonical-abi/definitions.py | 170 ++++++++++--------- design/mvp/canonical-abi/run_tests.py | 21 +-- 3 files changed, 199 insertions(+), 199 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index af659be..cfa2141 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -286,25 +286,25 @@ the original encoding and the original byte length. ```python def load_string(opts, ptr): begin = load_int(opts, ptr, 4) - packed_byte_length = load_int(opts, ptr + 4, 4) - return load_string_from_range(opts, begin, packed_byte_length) + tagged_code_units = load_int(opts, ptr + 4, 4) + return load_string_from_range(opts, begin, tagged_code_units) -UTF16_BIT = 1 << 31 +UTF16_TAG = 1 << 31 -def load_string_from_range(opts, ptr, packed_byte_length): +def load_string_from_range(opts, ptr, tagged_code_units): match opts.string_encoding: case 'utf8': - byte_length = packed_byte_length + byte_length = tagged_code_units encoding = 'utf-8' case 'utf16': - byte_length = packed_byte_length + byte_length = 2 * tagged_code_units encoding = 'utf-16-le' case 'latin1+utf16': - if bool(packed_byte_length & UTF16_BIT): - byte_length = packed_byte_length ^ UTF16_BIT + if bool(tagged_code_units & UTF16_TAG): + byte_length = 2 * (tagged_code_units ^ UTF16_TAG) encoding = 'utf-16-le' else: - byte_length = packed_byte_length + byte_length = tagged_code_units encoding = 'latin-1' trap_if(ptr + byte_length > len(opts.memory)) @@ -313,7 +313,7 @@ def load_string_from_range(opts, ptr, packed_byte_length): except UnicodeError: trap() - return (s, opts.string_encoding, packed_byte_length) + return (s, opts.string_encoding, tagged_code_units) ``` Lists and records are loaded by recursively loading their elements/fields. @@ -472,43 +472,43 @@ combinations, subdividing the `latin1+utf16` encoding into either `latin1` or `utf16` based on the `UTF16_BIT` flag set by `load_string`: ```python def store_string(opts, v, ptr): - begin, packed_byte_length = store_string_into_range(opts, v) + begin, tagged_code_units = store_string_into_range(opts, v) store_int(opts, begin, ptr, 4) - store_int(opts, packed_byte_length, ptr + 4, 4) + store_int(opts, tagged_code_units, ptr + 4, 4) def store_string_into_range(opts, v): - src, src_encoding, src_packed_byte_length = v + src, src_encoding, src_tagged_code_units = v if src_encoding == 'latin1+utf16': - if bool(src_packed_byte_length & UTF16_BIT): - src_byte_length = src_packed_byte_length ^ UTF16_BIT - src_unpacked_encoding = 'utf16' + if bool(src_tagged_code_units & UTF16_TAG): + src_simple_encoding = 'utf16' + src_code_units = src_tagged_code_units ^ UTF16_TAG else: - src_byte_length = src_packed_byte_length - src_unpacked_encoding = 'latin1' + src_simple_encoding = 'latin1' + src_code_units = src_tagged_code_units else: - src_byte_length = src_packed_byte_length - src_unpacked_encoding = src_encoding + src_simple_encoding = src_encoding + src_code_units = src_tagged_code_units match opts.string_encoding: case 'utf8': - match src_unpacked_encoding: - case 'utf8' : return store_string_copy(opts, src, src_byte_length, 'utf-8') - case 'utf16' : return store_utf16_to_utf8(opts, src, src_byte_length) - case 'latin1' : return store_latin1_to_utf8(opts, src, src_byte_length) + match src_simple_encoding: + case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 'utf-8') + case 'utf16' : return store_utf16_to_utf8(opts, src, src_code_units) + case 'latin1' : return store_latin1_to_utf8(opts, src, src_code_units) case 'utf16': - match src_unpacked_encoding: - case 'utf8' : return store_utf8_to_utf16(opts, src, src_byte_length) - case 'utf16' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le') - case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le', inflation = 2) + match src_simple_encoding: + case 'utf8' : return store_utf8_to_utf16(opts, src, src_code_units) + case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') + case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') case 'latin1+utf16': match src_encoding: - case 'utf8' : return store_utf8_to_latin1_or_utf16(opts, src, src_byte_length) - case 'utf16' : return store_utf16_to_latin1_or_utf16(opts, src, src_byte_length) + case 'utf8' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) + case 'utf16' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) case 'latin1+utf16' : - match src_unpacked_encoding: - case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'latin-1') - case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length) + match src_simple_encoding: + case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 'latin-1') + case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units) ``` The simplest 4 cases above can compute the exact destination size and then copy @@ -517,100 +517,99 @@ byte after every Latin-1 byte). ```python MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 -def store_string_copy(opts, src, src_byte_length, dst_encoding, inflation = 1): - byte_length = src_byte_length * inflation - trap_if(byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, byte_length) +def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_encoding): + dst_byte_length = dst_code_unit_size * src_code_units + trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, dst_code_unit_size, dst_byte_length) encoded = src.encode(dst_encoding) - assert(byte_length == len(encoded)) + assert(dst_byte_length == len(encoded)) opts.memory[ptr : ptr+len(encoded)] = encoded - return (ptr, byte_length) + return (ptr, src_code_units) ``` The choice of `MAX_STRING_BYTE_LENGTH` constant ensures that the high bit of a string's byte length is never set, keeping it clear for `UTF16_BIT`. -The next 3 cases can all be mapped down to a generic transcoding algorithm that -makes an initial optimistic size allocation that falls back to a second worst-case -size reallocation that is "fixed up" at the end with a third (hopefully O(1)) -shrinking reallocation. +The 2 cases of transcoding into UTF-8 share an algorithm that starts by +optimistically assuming that each code unit of the source string fits in a +single UTF-8 byte and then, failing that, reallocates to a worst-case size, +finishes the copy, and then finishes with a shrinking reallocation. ```python -def store_utf16_to_utf8(opts, src, src_byte_length): - optimistic_size = int(src_byte_length / 2) - worst_case_size = optimistic_size * 3 - return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) - -def store_latin1_to_utf8(opts, src, src_byte_length): - optimistic_size = src_byte_length - worst_case_size = optimistic_size * 2 - return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) - -def store_utf8_to_utf16(opts, src, src_byte_length): - optimistic_size = src_byte_length * 2 - worst_case_size = optimistic_size - return store_string_transcode(opts, src, 'utf-16-le', optimistic_size, worst_case_size) - -def store_string_transcode(opts, src, dst_encoding, optimistic_size, worst_case_size): - trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, optimistic_size) - encoded = src.encode(dst_encoding) - bytes_copied = min(len(encoded), optimistic_size) - opts.memory[ptr : ptr+bytes_copied] = encoded[0 : bytes_copied] - if bytes_copied < optimistic_size: - ptr = opts.realloc(ptr, optimistic_size, 1, bytes_copied) - elif bytes_copied < len(encoded): +def store_utf16_to_utf8(opts, src, src_code_units): + worst_case_size = src_code_units * 3 + return store_string_to_utf8(opts, src, src_code_units, worst_case_size) + +def store_latin1_to_utf8(opts, src, src_code_units): + worst_case_size = src_code_units * 2 + return store_string_to_utf8(opts, src, src_code_units, worst_case_size) + +def store_string_to_utf8(opts, src, src_code_units, worst_case_size): + assert(src_code_units <= MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, src_code_units) + encoded = src.encode('utf-8') + assert(src_code_units <= len(encoded)) + opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] + if src_code_units < len(encoded): trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) - opts.memory[ptr+bytes_copied : ptr+len(encoded)] = encoded[bytes_copied : ] + ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size) + opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] if worst_case_size > len(encoded): ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) return (ptr, len(encoded)) ``` -The remaining cases handle the `latin1+utf16` encoding, where there general +Converting from UTF-8 to UTF-16 performs an initial worst-case size allocation +(assuming each UTF-8 byte encodes a whole code point that inflates into a +two-byte UTF-16 code unit) and then does a shrinking reallocation at the end +if multiple UTF-8 bytes were collapsed into a single 2-byte UTF-16 code unit: +```python +def store_utf8_to_utf16(opts, src, src_code_units): + worst_case_size = 2 * src_code_units + trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 2, worst_case_size) + encoded = src.encode('utf-16-le') + opts.memory[ptr : ptr+len(encoded)] = encoded + if len(encoded) < worst_case_size: + ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + code_units = int(len(encoded) / 2) + return (ptr, code_units) +``` + +The next transcoding case handles `latin1+utf16` encoding, where there general goal is to fit the incoming string into Latin-1 if possible based on the code -points of the incoming string. The UTF-8 and UTF-16 cases are similar to the -preceding transcoding algorithm in that they make a best-effort optimistic -allocation, speculating that all code points *do* fit into Latin-1, before -falling back to a worst-case allocation size when a code point is found outside -Latin-1. In this fallback case, the previously-stored Latin-1 bytes are -inflated *in place*, inserting a 0 byte after every Latin-1 byte (iterating -in reverse to avoid clobbering later bytes): +points of the incoming string. The algorithm speculates that all code points +*do* fit into Latin-1 and then falls back to a worst-case allocation size when +a code point is found outside Latin-1. In this fallback case, the +previously-copied Latin-1 bytes are inflated *in place*, inserting a 0 byte +after every Latin-1 byte (iterating in reverse to avoid clobbering later +bytes): ```python -def store_utf8_to_latin1_or_utf16(opts, src, src_byte_length): - optimistic_size = src_byte_length - worst_case_size = 2 * src_byte_length - return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) - -def store_utf16_to_latin1_or_utf16(opts, src, src_byte_length): - optimistic_size = int(src_byte_length / 2) - worst_case_size = src_byte_length - return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) - -def store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size): - trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, optimistic_size) +def store_string_to_latin1_or_utf16(opts, src, src_code_units): + assert(src_code_units <= MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, src_code_units) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): opts.memory[ptr + dst_byte_length] = ord(usv) dst_byte_length += 1 else: + worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) + ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size) for j in range(dst_byte_length-1, -1, -1): opts.memory[ptr + 2*j] = opts.memory[ptr + j] opts.memory[ptr + 2*j + 1] = 0 encoded = src.encode('utf-16-le') opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] if worst_case_size > len(encoded): - ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) - return (ptr, len(encoded) | UTF16_BIT) - if dst_byte_length < optimistic_size: - ptr = opts.realloc(ptr, optimistic_size, 1, dst_byte_length) + ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + tagged_code_units = int(len(encoded) / 2) | UTF16_TAG + return (ptr, tagged_code_units) + if dst_byte_length < src_code_units: + ptr = opts.realloc(ptr, src_code_units, 1, dst_byte_length) return (ptr, dst_byte_length) ``` -The final string transcoding case takes advantage of the extra heuristic +The final transcoding case takes advantage of the extra heuristic information that the incoming UTF-16 bytes were intentionally chosen over Latin-1 by the producer, indicating that they *probably* contain code points outside Latin-1 and thus *probably* require inflation. Based on this @@ -621,13 +620,15 @@ are all using `latin1+utf16` and *one* component over-uses UTF-16, other components can recover the Latin-1 compression. (The Latin-1 check can be inexpensively fused with the UTF-16 validate+copy loop.) ```python -def store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length): +def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): + src_byte_length = 2 * src_code_units trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, src_byte_length) + ptr = opts.realloc(0, 0, 2, src_byte_length) encoded = src.encode('utf-16-le') opts.memory[ptr : ptr+len(encoded)] = encoded if any(ord(c) >= (1 << 8) for c in src): - return (ptr, len(encoded) | UTF16_BIT) + tagged_code_units = int(len(encoded) / 2) | UTF16_TAG + return (ptr, tagged_code_units) latin1_size = int(len(encoded) / 2) for i in range(latin1_size): opts.memory[ptr + i] = opts.memory[ptr + 2*i] @@ -877,8 +878,8 @@ memory: ```python def lift_flat_string(opts, vi): ptr = vi.next('i32') - packed_byte_length = vi.next('i32') - return load_string_from_range(opts, ptr, packed_byte_length) + packed_length = vi.next('i32') + return load_string_from_range(opts, ptr, packed_length) def lift_flat_list(opts, vi, elem_type): ptr = vi.next('i32') @@ -986,8 +987,8 @@ previous definitions; only the resulting pointers are returned differently (as `i32` values instead of as a pair in linear memory): ```python def lower_flat_string(opts, v): - ptr, packed_byte_length = store_string_into_range(opts, v) - return [Value('i32', ptr), Value('i32', packed_byte_length)] + ptr, packed_length = store_string_into_range(opts, v) + return [Value('i32', ptr), Value('i32', packed_length)] def lower_flat_list(opts, v, elem_type): (ptr, length) = store_list_into_range(opts, v, elem_type) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 1de3d7b..e11ea6b 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -256,25 +256,25 @@ def i32_to_char(opts, i): def load_string(opts, ptr): begin = load_int(opts, ptr, 4) - packed_byte_length = load_int(opts, ptr + 4, 4) - return load_string_from_range(opts, begin, packed_byte_length) + tagged_code_units = load_int(opts, ptr + 4, 4) + return load_string_from_range(opts, begin, tagged_code_units) -UTF16_BIT = 1 << 31 +UTF16_TAG = 1 << 31 -def load_string_from_range(opts, ptr, packed_byte_length): +def load_string_from_range(opts, ptr, tagged_code_units): match opts.string_encoding: case 'utf8': - byte_length = packed_byte_length + byte_length = tagged_code_units encoding = 'utf-8' case 'utf16': - byte_length = packed_byte_length + byte_length = 2 * tagged_code_units encoding = 'utf-16-le' case 'latin1+utf16': - if bool(packed_byte_length & UTF16_BIT): - byte_length = packed_byte_length ^ UTF16_BIT + if bool(tagged_code_units & UTF16_TAG): + byte_length = 2 * (tagged_code_units ^ UTF16_TAG) encoding = 'utf-16-le' else: - byte_length = packed_byte_length + byte_length = tagged_code_units encoding = 'latin-1' trap_if(ptr + byte_length > len(opts.memory)) @@ -283,7 +283,7 @@ def load_string_from_range(opts, ptr, packed_byte_length): except UnicodeError: trap() - return (s, opts.string_encoding, packed_byte_length) + return (s, opts.string_encoding, tagged_code_units) # @@ -394,134 +394,132 @@ def char_to_i32(c): # def store_string(opts, v, ptr): - begin, packed_byte_length = store_string_into_range(opts, v) + begin, tagged_code_units = store_string_into_range(opts, v) store_int(opts, begin, ptr, 4) - store_int(opts, packed_byte_length, ptr + 4, 4) + store_int(opts, tagged_code_units, ptr + 4, 4) def store_string_into_range(opts, v): - src, src_encoding, src_packed_byte_length = v + src, src_encoding, src_tagged_code_units = v if src_encoding == 'latin1+utf16': - if bool(src_packed_byte_length & UTF16_BIT): - src_byte_length = src_packed_byte_length ^ UTF16_BIT - src_unpacked_encoding = 'utf16' + if bool(src_tagged_code_units & UTF16_TAG): + src_simple_encoding = 'utf16' + src_code_units = src_tagged_code_units ^ UTF16_TAG else: - src_byte_length = src_packed_byte_length - src_unpacked_encoding = 'latin1' + src_simple_encoding = 'latin1' + src_code_units = src_tagged_code_units else: - src_byte_length = src_packed_byte_length - src_unpacked_encoding = src_encoding + src_simple_encoding = src_encoding + src_code_units = src_tagged_code_units match opts.string_encoding: case 'utf8': - match src_unpacked_encoding: - case 'utf8' : return store_string_copy(opts, src, src_byte_length, 'utf-8') - case 'utf16' : return store_utf16_to_utf8(opts, src, src_byte_length) - case 'latin1' : return store_latin1_to_utf8(opts, src, src_byte_length) + match src_simple_encoding: + case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 'utf-8') + case 'utf16' : return store_utf16_to_utf8(opts, src, src_code_units) + case 'latin1' : return store_latin1_to_utf8(opts, src, src_code_units) case 'utf16': - match src_unpacked_encoding: - case 'utf8' : return store_utf8_to_utf16(opts, src, src_byte_length) - case 'utf16' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le') - case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'utf-16-le', inflation = 2) + match src_simple_encoding: + case 'utf8' : return store_utf8_to_utf16(opts, src, src_code_units) + case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') + case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') case 'latin1+utf16': match src_encoding: - case 'utf8' : return store_utf8_to_latin1_or_utf16(opts, src, src_byte_length) - case 'utf16' : return store_utf16_to_latin1_or_utf16(opts, src, src_byte_length) + case 'utf8' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) + case 'utf16' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) case 'latin1+utf16' : - match src_unpacked_encoding: - case 'latin1' : return store_string_copy(opts, src, src_byte_length, 'latin-1') - case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length) + match src_simple_encoding: + case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 'latin-1') + case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units) # MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 -def store_string_copy(opts, src, src_byte_length, dst_encoding, inflation = 1): - byte_length = src_byte_length * inflation - trap_if(byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, byte_length) +def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_encoding): + dst_byte_length = dst_code_unit_size * src_code_units + trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, dst_code_unit_size, dst_byte_length) encoded = src.encode(dst_encoding) - assert(byte_length == len(encoded)) + assert(dst_byte_length == len(encoded)) opts.memory[ptr : ptr+len(encoded)] = encoded - return (ptr, byte_length) + return (ptr, src_code_units) # -def store_utf16_to_utf8(opts, src, src_byte_length): - optimistic_size = int(src_byte_length / 2) - worst_case_size = optimistic_size * 3 - return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) +def store_utf16_to_utf8(opts, src, src_code_units): + worst_case_size = src_code_units * 3 + return store_string_to_utf8(opts, src, src_code_units, worst_case_size) -def store_latin1_to_utf8(opts, src, src_byte_length): - optimistic_size = src_byte_length - worst_case_size = optimistic_size * 2 - return store_string_transcode(opts, src, 'utf-8', optimistic_size, worst_case_size) +def store_latin1_to_utf8(opts, src, src_code_units): + worst_case_size = src_code_units * 2 + return store_string_to_utf8(opts, src, src_code_units, worst_case_size) -def store_utf8_to_utf16(opts, src, src_byte_length): - optimistic_size = src_byte_length * 2 - worst_case_size = optimistic_size - return store_string_transcode(opts, src, 'utf-16-le', optimistic_size, worst_case_size) - -def store_string_transcode(opts, src, dst_encoding, optimistic_size, worst_case_size): - trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, optimistic_size) - encoded = src.encode(dst_encoding) - bytes_copied = min(len(encoded), optimistic_size) - opts.memory[ptr : ptr+bytes_copied] = encoded[0 : bytes_copied] - if bytes_copied < optimistic_size: - ptr = opts.realloc(ptr, optimistic_size, 1, bytes_copied) - elif bytes_copied < len(encoded): +def store_string_to_utf8(opts, src, src_code_units, worst_case_size): + assert(src_code_units <= MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, src_code_units) + encoded = src.encode('utf-8') + assert(src_code_units <= len(encoded)) + opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] + if src_code_units < len(encoded): trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) - opts.memory[ptr+bytes_copied : ptr+len(encoded)] = encoded[bytes_copied : ] + ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size) + opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] if worst_case_size > len(encoded): ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) return (ptr, len(encoded)) # -def store_utf8_to_latin1_or_utf16(opts, src, src_byte_length): - optimistic_size = src_byte_length - worst_case_size = 2 * src_byte_length - return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) +def store_utf8_to_utf16(opts, src, src_code_units): + worst_case_size = 2 * src_code_units + trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 2, worst_case_size) + encoded = src.encode('utf-16-le') + opts.memory[ptr : ptr+len(encoded)] = encoded + if len(encoded) < worst_case_size: + ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + code_units = int(len(encoded) / 2) + return (ptr, code_units) -def store_utf16_to_latin1_or_utf16(opts, src, src_byte_length): - optimistic_size = int(src_byte_length / 2) - worst_case_size = src_byte_length - return store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size) +# -def store_string_to_latin1_or_utf16(opts, src, optimistic_size, worst_case_size): - trap_if(optimistic_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, optimistic_size) +def store_string_to_latin1_or_utf16(opts, src, src_code_units): + assert(src_code_units <= MAX_STRING_BYTE_LENGTH) + ptr = opts.realloc(0, 0, 1, src_code_units) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): opts.memory[ptr + dst_byte_length] = ord(usv) dst_byte_length += 1 else: + worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, optimistic_size, 1, worst_case_size) + ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size) for j in range(dst_byte_length-1, -1, -1): opts.memory[ptr + 2*j] = opts.memory[ptr + j] opts.memory[ptr + 2*j + 1] = 0 encoded = src.encode('utf-16-le') opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] if worst_case_size > len(encoded): - ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) - return (ptr, len(encoded) | UTF16_BIT) - if dst_byte_length < optimistic_size: - ptr = opts.realloc(ptr, optimistic_size, 1, dst_byte_length) + ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + tagged_code_units = int(len(encoded) / 2) | UTF16_TAG + return (ptr, tagged_code_units) + if dst_byte_length < src_code_units: + ptr = opts.realloc(ptr, src_code_units, 1, dst_byte_length) return (ptr, dst_byte_length) # -def store_probably_utf16_to_latin1_or_utf16(opts, src, src_byte_length): +def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): + src_byte_length = 2 * src_code_units trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, src_byte_length) + ptr = opts.realloc(0, 0, 2, src_byte_length) encoded = src.encode('utf-16-le') opts.memory[ptr : ptr+len(encoded)] = encoded if any(ord(c) >= (1 << 8) for c in src): - return (ptr, len(encoded) | UTF16_BIT) + tagged_code_units = int(len(encoded) / 2) | UTF16_TAG + return (ptr, tagged_code_units) latin1_size = int(len(encoded) / 2) for i in range(latin1_size): opts.memory[ptr + i] = opts.memory[ptr + 2*i] @@ -700,8 +698,8 @@ def lift_flat_signed(vi, core_width, t_width): def lift_flat_string(opts, vi): ptr = vi.next('i32') - packed_byte_length = vi.next('i32') - return load_string_from_range(opts, ptr, packed_byte_length) + packed_length = vi.next('i32') + return load_string_from_range(opts, ptr, packed_length) def lift_flat_list(opts, vi, elem_type): ptr = vi.next('i32') @@ -785,8 +783,8 @@ def lower_flat_signed(i, core_bits): # def lower_flat_string(opts, v): - ptr, packed_byte_length = store_string_into_range(opts, v) - return [Value('i32', ptr), Value('i32', packed_byte_length)] + ptr, packed_length = store_string_into_range(opts, v) + return [Value('i32', ptr), Value('i32', packed_length)] def lower_flat_list(opts, v, elem_type): (ptr, length) = store_list_into_range(opts, v, elem_type) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 46e371b..bbc9526 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -164,31 +164,32 @@ def test_pairs(t, pairs): test_pairs(Char(), [(0xE000,'\uE000'), (0x10FFFF,'\U0010FFFF'), (0x110000,None), (0xFFFFFFFF,None)]) test_pairs(Enum(['a','b']), [(0,{'a':{}}), (1,{'b':{}}), (2,None)]) -def test_string_internal(src_encoding, dst_encoding, s, encoded, utf16_bit = False): +def test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units): heap = Heap(len(encoded)) heap.memory[:] = encoded[:] opts = mk_opts(heap.memory, src_encoding, None, None) - packed_byte_length = len(encoded) - if utf16_bit: - packed_byte_length |= UTF16_BIT - v = (s, src_encoding, packed_byte_length) - test(String(), [0, packed_byte_length], v, opts, dst_encoding) + v = (s, src_encoding, tagged_code_units) + test(String(), [0, tagged_code_units], v, opts, dst_encoding) def test_string(src_encoding, dst_encoding, s): if src_encoding == 'utf8': encoded = s.encode('utf-8') - test_string_internal(src_encoding, dst_encoding, s, encoded) + tagged_code_units = len(encoded) + test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units) elif src_encoding == 'utf16': encoded = s.encode('utf-16-le') - test_string_internal(src_encoding, dst_encoding, s, encoded) + tagged_code_units = int(len(encoded) / 2) + test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units) elif src_encoding == 'latin1+utf16': try: encoded = s.encode('latin-1') - test_string_internal(src_encoding, dst_encoding, s, encoded) + tagged_code_units = len(encoded) + test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units) except UnicodeEncodeError: pass encoded = s.encode('utf-16-le') - test_string_internal(src_encoding, dst_encoding, s, encoded, utf16_bit = True) + tagged_code_units = int(len(encoded) / 2) | UTF16_TAG + test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units) encodings = ['utf8', 'utf16', 'latin1+utf16'] From 171e7068b65b66bc75437f274c68f9fbe191dcc0 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 14 Apr 2022 11:21:18 -0500 Subject: [PATCH 035/301] Merge elem_size and byte_size into size --- design/mvp/CanonicalABI.md | 70 +++++++++++-------------- design/mvp/canonical-abi/definitions.py | 56 ++++++++++---------- design/mvp/canonical-abi/run_tests.py | 4 +- 3 files changed, 59 insertions(+), 71 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index cfa2141..6fe7788 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -141,23 +141,10 @@ def alignment_flags(labels): ### Size -Each interface type is assigned two slightly-different measures of "size": -* its "byte size", which is the smallest number of bytes covering all its - fields when stored at an aligned address in linear memory; and -* its "element size", which is the size of the type when stored as an element - of a list, which may include additional padding at the end to ensure the - alignment of the next element. - -These two measures are defined by the following functions, which build on -the preceding alignment functions: +Each interface type is also assigned a `size`, measured in bytes, which +corresponds the `sizeof` operator in C: ```python -def elem_size(t): - return align_to(byte_size(t), alignment(t)) - -def align_to(ptr, alignment): - return math.ceil(ptr / alignment) * alignment - -def byte_size(t): +def size(t): match despecialize(t): case Bool() : return 1 case S8() | U8() : return 1 @@ -168,26 +155,30 @@ def byte_size(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 8 - case Record(fields) : return byte_size_record(fields) - case Variant(cases) : return byte_size_variant(cases) - case Flags(labels) : return byte_size_flags(labels) + case Record(fields) : return size_record(fields) + case Variant(cases) : return size_variant(cases) + case Flags(labels) : return size_flags(labels) -def byte_size_record(fields): +def size_record(fields): s = 0 for f in fields: s = align_to(s, alignment(f.t)) - s += byte_size(f.t) - return s + s += size(f.t) + return align_to(s, alignment(Record(fields))) + +def align_to(ptr, alignment): + return math.ceil(ptr / alignment) * alignment -def byte_size_variant(cases): - s = byte_size(discriminant_type(cases)) +def size_variant(cases): + s = size(discriminant_type(cases)) s = align_to(s, max_alignment(types_of(cases))) cs = 0 for c in cases: - cs = max(cs, byte_size(c.t)) - return s + cs + cs = max(cs, size(c.t)) + s += cs + return align_to(s, alignment(Variant(cases))) -def byte_size_flags(labels): +def size_flags(labels): n = len(labels) if n <= 8: return 1 if n <= 16: return 2 @@ -316,8 +307,7 @@ def load_string_from_range(opts, ptr, tagged_code_units): return (s, opts.string_encoding, tagged_code_units) ``` -Lists and records are loaded by recursively loading their elements/fields. -Note that lists use `elem_size` while records use `byte_size`. +Lists and records are loaded by recursively loading their elements/fields: ```python def load_list(opts, ptr, elem_type): begin = load_int(opts, ptr, 4) @@ -326,10 +316,10 @@ def load_list(opts, ptr, elem_type): def load_list_from_range(opts, ptr, length, elem_type): trap_if(ptr != align_to(ptr, alignment(elem_type))) - trap_if(ptr + length * elem_size(elem_type) > len(opts.memory)) + trap_if(ptr + length * size(elem_type) > len(opts.memory)) a = [] for i in range(length): - a.append(load(opts, ptr + i * elem_size(elem_type), elem_type)) + a.append(load(opts, ptr + i * size(elem_type), elem_type)) return a def load_record(opts, ptr, fields): @@ -337,7 +327,7 @@ def load_record(opts, ptr, fields): for field in fields: ptr = align_to(ptr, alignment(field.t)) record[field.label] = load(opts, ptr, field.t) - ptr += byte_size(field.t) + ptr += size(field.t) return record ``` As a technical detail: the `align_to` in the loop in `load_record` is @@ -354,7 +344,7 @@ tables at compile-time so that variant-passing is always O(1) and not involving string operations. ```python def load_variant(opts, ptr, cases): - disc_size = byte_size(discriminant_type(cases)) + disc_size = size(discriminant_type(cases)) disc = load_int(opts, ptr, disc_size) ptr += disc_size trap_if(disc >= len(cases)) @@ -382,7 +372,7 @@ derived from the ordered labels of the `flags` type. The code here takes advantage of Python's support for integers of arbitrary width. ```python def load_flags(opts, ptr, labels): - i = load_int(opts, ptr, byte_size_flags(labels)) + i = load_int(opts, ptr, size_flags(labels)) return unpack_flags_from_int(i, labels) def unpack_flags_from_int(i, labels): @@ -647,20 +637,20 @@ def store_list(opts, v, ptr, elem_type): store_int(opts, length, ptr + 4, 4) def store_list_into_range(opts, v, elem_type): - byte_length = len(v) * elem_size(elem_type) + byte_length = len(v) * size(elem_type) trap_if(byte_length >= (1 << 32)) ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) trap_if(ptr != align_to(ptr, alignment(elem_type))) trap_if(ptr + byte_length > len(opts.memory)) for i,e in enumerate(v): - store(opts, e, elem_type, ptr + i * elem_size(elem_type)) + store(opts, e, elem_type, ptr + i * size(elem_type)) return (ptr, len(v)) def store_record(opts, v, ptr, fields): for f in fields: ptr = align_to(ptr, alignment(f.t)) store(opts, v[f.label], f.t, ptr) - ptr += byte_size(f.t) + ptr += size(f.t) ``` Variants are stored using the `|`-separated list of `defaults-to` cases built @@ -672,7 +662,7 @@ case indices to the consumer's case indices. ```python def store_variant(opts, v, ptr, cases): case_index, case_value = match_case(v, cases) - disc_size = byte_size(discriminant_type(cases)) + disc_size = size(discriminant_type(cases)) store_int(opts, case_index, ptr, disc_size) ptr += disc_size ptr = align_to(ptr, max_alignment(types_of(cases))) @@ -697,7 +687,7 @@ to variants. ```python def store_flags(opts, v, ptr, labels): i = pack_flags_into_int(v, labels) - store_int(opts, i, ptr, byte_size_flags(labels)) + store_int(opts, i, ptr, size_flags(labels)) def pack_flags_into_int(v, labels): i = 0 @@ -1067,7 +1057,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): tuple_type = Tuple(functype.params) tuple_value = {str(i): v for i,v in enumerate(vs)} if out_param is None: - ptr = opts.realloc(0, 0, alignment(tuple_type), byte_size(tuple_type)) + ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) else: ptr = out_param.next('i32') trap_if(ptr != align_to(ptr, alignment(tuple_type))) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index e11ea6b..27a5dba 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -147,13 +147,7 @@ def alignment_flags(labels): ### Size -def elem_size(t): - return align_to(byte_size(t), alignment(t)) - -def align_to(ptr, alignment): - return math.ceil(ptr / alignment) * alignment - -def byte_size(t): +def size(t): match despecialize(t): case Bool() : return 1 case S8() | U8() : return 1 @@ -164,26 +158,30 @@ def byte_size(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 8 - case Record(fields) : return byte_size_record(fields) - case Variant(cases) : return byte_size_variant(cases) - case Flags(labels) : return byte_size_flags(labels) + case Record(fields) : return size_record(fields) + case Variant(cases) : return size_variant(cases) + case Flags(labels) : return size_flags(labels) -def byte_size_record(fields): +def size_record(fields): s = 0 for f in fields: s = align_to(s, alignment(f.t)) - s += byte_size(f.t) - return s + s += size(f.t) + return align_to(s, alignment(Record(fields))) + +def align_to(ptr, alignment): + return math.ceil(ptr / alignment) * alignment -def byte_size_variant(cases): - s = byte_size(discriminant_type(cases)) +def size_variant(cases): + s = size(discriminant_type(cases)) s = align_to(s, max_alignment(types_of(cases))) cs = 0 for c in cases: - cs = max(cs, byte_size(c.t)) - return s + cs + cs = max(cs, size(c.t)) + s += cs + return align_to(s, alignment(Variant(cases))) -def byte_size_flags(labels): +def size_flags(labels): n = len(labels) if n <= 8: return 1 if n <= 16: return 2 @@ -294,10 +292,10 @@ def load_list(opts, ptr, elem_type): def load_list_from_range(opts, ptr, length, elem_type): trap_if(ptr != align_to(ptr, alignment(elem_type))) - trap_if(ptr + length * elem_size(elem_type) > len(opts.memory)) + trap_if(ptr + length * size(elem_type) > len(opts.memory)) a = [] for i in range(length): - a.append(load(opts, ptr + i * elem_size(elem_type), elem_type)) + a.append(load(opts, ptr + i * size(elem_type), elem_type)) return a def load_record(opts, ptr, fields): @@ -305,13 +303,13 @@ def load_record(opts, ptr, fields): for field in fields: ptr = align_to(ptr, alignment(field.t)) record[field.label] = load(opts, ptr, field.t) - ptr += byte_size(field.t) + ptr += size(field.t) return record # def load_variant(opts, ptr, cases): - disc_size = byte_size(discriminant_type(cases)) + disc_size = size(discriminant_type(cases)) disc = load_int(opts, ptr, disc_size) ptr += disc_size trap_if(disc >= len(cases)) @@ -336,7 +334,7 @@ def find_case(label, cases): # def load_flags(opts, ptr, labels): - i = load_int(opts, ptr, byte_size_flags(labels)) + i = load_int(opts, ptr, size_flags(labels)) return unpack_flags_from_int(i, labels) def unpack_flags_from_int(i, labels): @@ -534,26 +532,26 @@ def store_list(opts, v, ptr, elem_type): store_int(opts, length, ptr + 4, 4) def store_list_into_range(opts, v, elem_type): - byte_length = len(v) * elem_size(elem_type) + byte_length = len(v) * size(elem_type) trap_if(byte_length >= (1 << 32)) ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) trap_if(ptr != align_to(ptr, alignment(elem_type))) trap_if(ptr + byte_length > len(opts.memory)) for i,e in enumerate(v): - store(opts, e, elem_type, ptr + i * elem_size(elem_type)) + store(opts, e, elem_type, ptr + i * size(elem_type)) return (ptr, len(v)) def store_record(opts, v, ptr, fields): for f in fields: ptr = align_to(ptr, alignment(f.t)) store(opts, v[f.label], f.t, ptr) - ptr += byte_size(f.t) + ptr += size(f.t) # def store_variant(opts, v, ptr, cases): case_index, case_value = match_case(v, cases) - disc_size = byte_size(discriminant_type(cases)) + disc_size = size(discriminant_type(cases)) store_int(opts, case_index, ptr, disc_size) ptr += disc_size ptr = align_to(ptr, max_alignment(types_of(cases))) @@ -572,7 +570,7 @@ def match_case(v, cases): def store_flags(opts, v, ptr, labels): i = pack_flags_into_int(v, labels) - store_int(opts, i, ptr, byte_size_flags(labels)) + store_int(opts, i, ptr, size_flags(labels)) def pack_flags_into_int(v, labels): i = 0 @@ -848,7 +846,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): tuple_type = Tuple(functype.params) tuple_value = {str(i): v for i,v in enumerate(vs)} if out_param is None: - ptr = opts.realloc(0, 0, alignment(tuple_type), byte_size(tuple_type)) + ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) else: ptr = out_param.next('i32') trap_if(ptr != align_to(ptr, alignment(tuple_type))) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index bbc9526..1fc1ead 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -239,14 +239,14 @@ def test_heap(t, expect, args, byte_array): test_heap(List(Tuple([U16(),U8()])), [mk_tup(6,7),mk_tup(8,9)], [0,2], [6,0, 7, 0x0ff, 8,0, 9, 0xff]) test_heap(List(Tuple([Tuple([U16(),U8()]),U8()])), [mk_tup([4,5],6),mk_tup([7,8],9)], [0,2], - [4,0, 5, 6, 7,0, 8, 9]) + [4,0, 5,0xff, 6,0xff, 7,0, 8,0xff, 9,0xff]) test_heap(List(Union([Unit(),U8(),Tuple([U8(),U16()])])), [{'0':{}}, {'1':42}, {'2':mk_tup(6,7)}], [0,3], [0,0xff,0xff,0xff,0xff,0xff, 1,0xff,42,0xff,0xff,0xff, 2,0xff,6,0xff,7,0]) test_heap(List(Union([U32(),U8()])), [{'0':256}, {'1':42}], [0,2], [0,0xff,0xff,0xff,0,1,0,0, 1,0xff,0xff,0xff,42,0xff,0xff,0xff]) test_heap(List(Tuple([Union([U8(),Tuple([U16(),U8()])]),U8()])), [mk_tup({'1':mk_tup(5,6)},7),mk_tup({'0':8},9)], [0,2], - [1,0xff,5,0,6,7, 0,0xff,8,0xff,0xff,9]) + [1,0xff,5,0,6,0xff,7,0xff, 0,0xff,8,0xff,0xff,0xff,9,0xff]) test_heap(List(Union([U8()])), [{'0':6},{'0':7},{'0':8}], [0,3], [0,6, 0,7, 0,8]) t = List(Flags(['a','b'])) From 32f8a0ccf5ebffac5ca91e85863d51e106212180 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Thu, 14 Apr 2022 10:41:15 -0700 Subject: [PATCH 036/301] Add GitHub Actions workflow to test canonical abi --- .github/workflows/main.yml | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 .github/workflows/main.yml diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml new file mode 100644 index 0000000..a6efca5 --- /dev/null +++ b/.github/workflows/main.yml @@ -0,0 +1,16 @@ +name: CI + +on: + push: + pull_request: + +jobs: + canonical_abi: + name: Run Canonical ABI Tests + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - uses: actions/setup-python@v3 + with: + python-version: '>= 3.10.0' + - run: python design/mvp/canonical-abi/run_tests.py From 6d77ea9778aa65bb2d8ca7311c2000bda73759db Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Apr 2022 11:25:44 -0500 Subject: [PATCH 037/301] Make byte-to-bool conversion trap if greater than 1 --- design/mvp/CanonicalABI.md | 14 ++++++++++++-- design/mvp/canonical-abi/definitions.py | 12 ++++++++++-- design/mvp/canonical-abi/run_tests.py | 3 ++- 3 files changed, 24 insertions(+), 5 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 6fe7788..ff45716 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -207,7 +207,7 @@ class Opts: def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) match despecialize(t): - case Bool() : return bool(load_int(opts, ptr, 1)) + case Bool() : return narrow_uint_to_bool(load_int(opts, ptr, 1)) case U8() : return load_int(opts, ptr, 1) case U16() : return load_int(opts, ptr, 2) case U32() : return load_int(opts, ptr, 4) @@ -227,13 +227,22 @@ def load(opts, ptr, t): ``` Integers are loaded directly from memory, with their high-order bit interpreted -according to the signedness of the type: +according to the signedness of the type. ```python def load_int(opts, ptr, nbytes, signed = False): trap_if(ptr + nbytes > len(opts.memory)) return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) ``` +As a general rule, the Canonical ABI traps when given extraneous bits, so the +narrowing conversion from a byte to a `bool` traps if the high 7 bits are set. +```python +def narrow_uint_to_bool(i): + assert(i >= 0) + trap_if(i > 1) + return bool(i) +``` + For reasons [given](Explainer.md#type-definitions) in the explainer, floats are loaded from memory and then "canonicalized", mapping all Not-a-Number bit patterns to a single canonical `nan` value. @@ -915,6 +924,7 @@ def lift_flat_variant(opts, vi, cases): return { case_label_with_defaults(case, cases): v } def narrow_i64_to_i32(i): + assert(0 <= i < (1 << 64)) trap_if(i >= (1 << 32)) return i ``` diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 27a5dba..185f9e9 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -201,7 +201,7 @@ class Opts: def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) match despecialize(t): - case Bool() : return bool(load_int(opts, ptr, 1)) + case Bool() : return narrow_uint_to_bool(load_int(opts, ptr, 1)) case U8() : return load_int(opts, ptr, 1) case U16() : return load_int(opts, ptr, 2) case U32() : return load_int(opts, ptr, 4) @@ -227,6 +227,13 @@ def load_int(opts, ptr, nbytes, signed = False): # +def narrow_uint_to_bool(i): + assert(i >= 0) + trap_if(i > 1) + return bool(i) + +# + def reinterpret_i32_as_float(i): return struct.unpack('!f', struct.pack('!I', i))[0] # f32.reinterpret_i32 @@ -656,7 +663,7 @@ def next(self, t): def lift_flat(opts, vi, t): match despecialize(t): - case Bool() : return bool(vi.next('i32')) + case Bool() : return narrow_uint_to_bool(vi.next('i32')) case U8() : return lift_flat_unsigned(vi, 32, 8) case U16() : return lift_flat_unsigned(vi, 32, 16) case U32() : return lift_flat_unsigned(vi, 32, 32) @@ -736,6 +743,7 @@ def next(self, want): return { case_label_with_defaults(case, cases): v } def narrow_i64_to_i32(i): + assert(0 <= i < (1 << 64)) trap_if(i >= (1 << 32)) return i diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 1fc1ead..1abc185 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -145,7 +145,7 @@ def test_pairs(t, pairs): for arg,expect in pairs: test(t, [arg], expect) -test_pairs(Bool(), [(0,False),(1,True),(2,True),(4294967295,True)]) +test_pairs(Bool(), [(0,False),(1,True),(2,None),(4294967295,None)]) test_pairs(U8(), [(127,127),(128,128),(255,255),(256,None), (4294967295,None),(4294967168,None),(4294967167,None)]) test_pairs(S8(), [(127,127),(128,None),(255,None),(256,None), @@ -209,6 +209,7 @@ def test_heap(t, expect, args, byte_array): test_heap(List(Unit()), [{},{},{}], [0,3], []) test_heap(List(Bool()), [True,False,True], [0,3], [1,0,1]) +test_heap(List(Bool()), None, [0,3], [1,0,2]) test_heap(List(Bool()), [True,False,True], [3,3], [0xff,0xff,0xff, 1,0,1]) test_heap(List(U8()), [1,2,3], [0,3], [1,2,3]) test_heap(List(U16()), [1,2,3], [0,3], [1,0, 2,0, 3,0 ]) From 4bf614efa385ef81829efdbeab418b7ba18c8176 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Apr 2022 15:44:17 -0500 Subject: [PATCH 038/301] Add + to enum/union to match the other places --- design/mvp/Explainer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 60c4940..5e890a3 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -336,8 +336,8 @@ intertype ::= unit | bool | (list ) | (tuple *) | (flags *) - | (enum *) - | (union *) + | (enum +) + | (union +) | (option ) | (expected ) ``` From 5e438f44d318d500ca47a916c7f009ca85dd4951 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Apr 2022 16:13:06 -0500 Subject: [PATCH 039/301] Pass the return values to the post-return function --- design/mvp/CanonicalABI.md | 9 ++++++++- design/mvp/Explainer.md | 9 +++++---- design/mvp/canonical-abi/definitions.py | 3 ++- design/mvp/canonical-abi/run_tests.py | 2 +- 4 files changed, 16 insertions(+), 7 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index ff45716..a217bbb 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1102,6 +1102,9 @@ For a function: validation specifies: * `$callee` must have type `flatten($ft, 'canon.lift')` * `$f` is given type `$ft` + * a `memory` is present if required by lifting and is a subtype of `(memory 1)` + * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` + * if a `post-return` is present, it has type `(func (param flatten($ft)['results']))` When instantiating component instance `$inst`: * Define `$f` to be the closure `lambda args: canon_lift($opts, $inst, $callee, $ft, args)` @@ -1151,7 +1154,8 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) def post_return(): callee_instance.may_enter = True - callee_opts.post_return() + if callee_opts.post_return is not None: + callee_opts.post_return(flat_results) return (result, post_return) ``` @@ -1197,6 +1201,9 @@ For a function: ``` where `$callee` has type `$ft`, validation specifies: * `$f` is given type `flatten($ft, 'canon.lower')` + * a `memory` is present if required by lifting and is a subtype of `(memory 1)` + * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` + * there is no `post-return` in `$opts` When instantiating component instance `$inst`: * Define `$f` to be the closure: `lambda args: canon_lower($opts, $inst, $callee, $ft, args)` diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 5e890a3..44105c4 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -536,10 +536,11 @@ first two parameters) and reallocate. If the Canonical ABI needs `realloc`, validation requires this option to be present (there is no default). The `(post-return )` option may only be present in `canon.lift` and -specifies a core function to be called after the return value has been fully -read, giving a chance for the runtime to deallocate memory and/or call -destructors. This option is always optional but, if present, is validated to -have the empty function signature `(func)`. +specifies a core function to be called with the original return values after +they have finished being read, allowing memory to be deallocated and +destructors called. This immediate is always optional but, if present, is +validated to have parameters matching the callee's return type and empty +results. Based on this description of the AST, the [Canonical ABI explainer][Canonical ABI] gives a detailed walkthrough of the static and dynamic semantics of diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 185f9e9..86d0e9f 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -890,7 +890,8 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) def post_return(): callee_instance.may_enter = True - callee_opts.post_return() + if callee_opts.post_return is not None: + callee_opts.post_return(flat_results) return (result, post_return) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 1abc185..fff8337 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -305,7 +305,7 @@ def test_roundtrip(t, v): callee = lambda x: x callee_heap = Heap(1000) - callee_opts = mk_opts(callee_heap.memory, 'utf8', callee_heap.realloc, lambda: ()) + callee_opts = mk_opts(callee_heap.memory, 'utf8', callee_heap.realloc, lambda x: () ) lifted_callee = lambda args: canon_lift(callee_opts, callee_instance, callee, ft, args) caller_heap = Heap(1000) From 62524b766835c5175e336529d7fc2e4eb5c811d0 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 20 Apr 2022 17:46:17 -0500 Subject: [PATCH 040/301] Fix typos Co-authored-by: Dan Gohman --- design/mvp/CanonicalABI.md | 6 +++--- design/mvp/Explainer.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index a217bbb..db6aeb7 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -665,7 +665,7 @@ def store_record(opts, v, ptr, fields): Variants are stored using the `|`-separated list of `defaults-to` cases built by `case_label_with_default` (above) to iteratively find a matching case (which validation guarantees will succeed). While this code appears to do O(n) string -matching, a normal implemention can statically fuse `store_variant` with its +matching, a normal implementation can statically fuse `store_variant` with its matching `load_variant` to ultimately build a dense array that maps producer's case indices to the consumer's case indices. ```python @@ -1162,7 +1162,7 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): There are a number of things to note about this definition: Uncaught Core WebAssembly [exceptions] result in a trap at component -boundaries. Thus, if a component wishes to signal signal an error, it must +boundaries. Thus, if a component wishes to signal an error, it must use some sort of explicit interface type such as `expected` (whose `error` case particular language bindings may choose to map to and from exceptions). @@ -1242,7 +1242,7 @@ lifting and lowering), with a few exceptions: caller simply regains control when `canon_lower` returns, allowing it to free (or not) any memory passed as `flat_args`. * When handling the too-many-flat-values case, instead of relying on `realloc`, - the caller passs in a pointer to caller-allocated memory as a final + the caller pass in a pointer to caller-allocated memory as a final `i32` parameter. A useful consequence of the above rules for `may_enter` and `may_leave` is that diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 44105c4..351fbd0 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -519,7 +519,7 @@ default is UTF-8. It is a validation error to include more than one `string-encoding` option. The `(memory )` option specifies the memory that the Canonical ABI will -use to load and store values. If the Canoical ABI needs to load or store, +use to load and store values. If the Canonical ABI needs to load or store, validation requires this option to be present (there is no default). The `(realloc )` option specifies a core function that is validated to From 818e6d3ce2c7602c1f5a8b8835a6af17f7c9ea7a Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 20 Apr 2022 18:16:05 -0500 Subject: [PATCH 041/301] Add NaN canonicalization tests --- design/mvp/canonical-abi/definitions.py | 7 ++++-- design/mvp/canonical-abi/run_tests.py | 33 +++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 86d0e9f..a664d2c 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -240,14 +240,17 @@ def reinterpret_i32_as_float(i): def reinterpret_i64_as_float(i): return struct.unpack('!d', struct.pack('!Q', i))[0] # f64.reinterpret_i64 +CANONICAL_FLOAT32_NAN = 0x7fc00000 +CANONICAL_FLOAT64_NAN = 0x7ff8000000000000 + def canonicalize32(f): if math.isnan(f): - return reinterpret_i32_as_float(0x7fc00000) + return reinterpret_i32_as_float(CANONICAL_FLOAT32_NAN) return f def canonicalize64(f): if math.isnan(f): - return reinterpret_i64_as_float(0x7ff8000000000000) + return reinterpret_i64_as_float(CANONICAL_FLOAT64_NAN) return f # diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index fff8337..9e6bb0c 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -164,6 +164,39 @@ def test_pairs(t, pairs): test_pairs(Char(), [(0xE000,'\uE000'), (0x10FFFF,'\U0010FFFF'), (0x110000,None), (0xFFFFFFFF,None)]) test_pairs(Enum(['a','b']), [(0,{'a':{}}), (1,{'b':{}}), (2,None)]) +def test_nan32(inbits, outbits): + f = lift_flat(Opts(), ValueIter([Value('f32', reinterpret_i32_as_float(inbits))]), Float32()) + assert(reinterpret_float_as_i32(f) == outbits) + load_opts = Opts() + load_opts.memory = bytearray(4) + load_opts.memory = int.to_bytes(inbits, 4, 'little') + f = load(load_opts, 0, Float32()) + assert(reinterpret_float_as_i32(f) == outbits) + +def test_nan64(inbits, outbits): + f = lift_flat(Opts(), ValueIter([Value('f64', reinterpret_i64_as_float(inbits))]), Float64()) + assert(reinterpret_float_as_i64(f) == outbits) + load_opts = Opts() + load_opts.memory = bytearray(8) + load_opts.memory = int.to_bytes(inbits, 8, 'little') + f = load(load_opts, 0, Float64()) + assert(reinterpret_float_as_i64(f) == outbits) + +test_nan32(0x7fc00000, CANONICAL_FLOAT32_NAN) +test_nan32(0x7fc00001, CANONICAL_FLOAT32_NAN) +test_nan32(0x7fe00000, CANONICAL_FLOAT32_NAN) +test_nan32(0x7fffffff, CANONICAL_FLOAT32_NAN) +test_nan32(0xffffffff, CANONICAL_FLOAT32_NAN) +test_nan32(0x7f800000, 0x7f800000) +test_nan32(0x3fc00000, 0x3fc00000) +test_nan64(0x7ff8000000000000, CANONICAL_FLOAT64_NAN) +test_nan64(0x7ff8000000000001, CANONICAL_FLOAT64_NAN) +test_nan64(0x7ffc000000000000, CANONICAL_FLOAT64_NAN) +test_nan64(0x7fffffffffffffff, CANONICAL_FLOAT64_NAN) +test_nan64(0xffffffffffffffff, CANONICAL_FLOAT64_NAN) +test_nan64(0x7ff0000000000000, 0x7ff0000000000000) +test_nan64(0x3ff0000000000000, 0x3ff0000000000000) + def test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units): heap = Heap(len(encoded)) heap.memory[:] = encoded[:] From bf73065dc5574d88d0bb8f7447a0d12d53e2cffb Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 21 Apr 2022 09:37:42 -0500 Subject: [PATCH 042/301] Sync CanonicalABI.md with canonical-abi/definitions.py --- design/mvp/CanonicalABI.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index db6aeb7..eca2100 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -253,14 +253,17 @@ def reinterpret_i32_as_float(i): def reinterpret_i64_as_float(i): return struct.unpack('!d', struct.pack('!Q', i))[0] # f64.reinterpret_i64 +CANONICAL_FLOAT32_NAN = 0x7fc00000 +CANONICAL_FLOAT64_NAN = 0x7ff8000000000000 + def canonicalize32(f): if math.isnan(f): - return reinterpret_i32_as_float(0x7fc00000) + return reinterpret_i32_as_float(CANONICAL_FLOAT32_NAN) return f def canonicalize64(f): if math.isnan(f): - return reinterpret_i64_as_float(0x7ff8000000000000) + return reinterpret_i64_as_float(CANONICAL_FLOAT64_NAN) return f ``` @@ -826,7 +829,7 @@ class ValueIter: def lift_flat(opts, vi, t): match despecialize(t): - case Bool() : return bool(vi.next('i32')) + case Bool() : return narrow_uint_to_bool(vi.next('i32')) case U8() : return lift_flat_unsigned(vi, 32, 8) case U16() : return lift_flat_unsigned(vi, 32, 16) case U32() : return lift_flat_unsigned(vi, 32, 32) From 2e5f73ff6f604e4aed83e4e60da8349e5194c3bd Mon Sep 17 00:00:00 2001 From: Dave Bakker Date: Sun, 24 Apr 2022 20:18:51 +0200 Subject: [PATCH 043/301] Avoid potential name clash with default values. --- design/mvp/Binary.md | 2 +- design/mvp/Explainer.md | 6 +++--- design/mvp/Subtyping.md | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 5b60545..afbcc28 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -162,7 +162,7 @@ intertype ::= pit: => pit | 0x69 t: u: => (expected t u) field ::= n: t: => (field n t) case ::= n: t: 0x0 => (case n t) - | n: t: 0x1 i: => (case n t (defaults-to case-label[i])) + | n: t: 0x1 i: => (case n t (subtype-of case-label[i])) ``` Notes: * Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 351fbd0..4636434 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -332,7 +332,7 @@ intertype ::= unit | bool | float32 | float64 | char | string | (record (field )*) - | (variant (case (defaults-to )?)+) + | (variant (case (subtype-of )?)+) | (list ) | (tuple *) | (flags *) @@ -378,9 +378,9 @@ NaN values are canonicalized to a single value so that: The subtyping between all these types is described in a separate [subtyping explainer](Subtyping.md). Of note here, though: the optional -`defaults-to` field in the `case`s of `variant`s is exclusively concerned with +`subtype-of` field in the `case`s of `variant`s is exclusively concerned with subtyping. In particular, a `variant` subtype can contain a `case` not present -in the supertype if the subtype's `case` `defaults-to` (directly or transitively) +in the supertype if the subtype's `case` `subtype-of` (directly or transitively) some `case` in the supertype. The sets of values allowed for the remaining *specialized* interface types are diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index 36c6277..461cb67 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -12,7 +12,7 @@ But roughly speaking: | `float32`, `float64` | `float32 <: float64` | | `char` | | | `record` | fields can be reordered; covariant field payload subtyping; superfluous fields can be ignored in the subtype; `option` fields can be ignored in the supertype | -| `variant` | cases can be reordered; covariant case payload subtyping; superfluous cases can be ignored in the supertype; `defaults-to` cases can be ignored in the subtype | +| `variant` | cases can be reordered; covariant case payload subtyping; superfluous cases can be ignored in the supertype; `subtype-of` cases can be ignored in the subtype | | `list` | covariant element subtyping | | `tuple` | `(tuple T ...) <: T` | | `option` | `T <: (option T)` | From d4a0c42eaf29f1e832ea65305fcccc889a3ee7e0 Mon Sep 17 00:00:00 2001 From: Dave Bakker Date: Mon, 23 May 2022 22:40:59 +0200 Subject: [PATCH 044/301] Avoid potential name clash with default values. Rename `defaults-to` & `subtype-of` to: `refines` --- design/mvp/Binary.md | 2 +- design/mvp/CanonicalABI.md | 18 +++++++++--------- design/mvp/Explainer.md | 6 +++--- design/mvp/Subtyping.md | 2 +- design/mvp/canonical-abi/definitions.py | 12 ++++++------ 5 files changed, 20 insertions(+), 20 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index afbcc28..4db0707 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -162,7 +162,7 @@ intertype ::= pit: => pit | 0x69 t: u: => (expected t u) field ::= n: t: => (field n t) case ::= n: t: 0x0 => (case n t) - | n: t: 0x1 i: => (case n t (subtype-of case-label[i])) + | n: t: 0x1 i: => (case n t (refines case-label[i])) ``` Notes: * Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index eca2100..96c4592 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -347,8 +347,8 @@ guaranteed to be a no-op on the first iteration because the record as a whole starts out aligned (as asserted at the top of `load`). Variants are loaded using the order of the cases in the type to determine the -case index. To support the subtyping allowed by `defaults-to`, a lifted variant -value semantically includes a full ordered list of its `defaults-to` case +case index. To support the subtyping allowed by `refines`, a lifted variant +value semantically includes a full ordered list of its `refines` case labels so that the lowering code (defined below) can search this list to find a case label it knows about. While the code below appears to perform case-label lookup at runtime, a normal implementation can build the appropriate index @@ -362,12 +362,12 @@ def load_variant(opts, ptr, cases): trap_if(disc >= len(cases)) case = cases[disc] ptr = align_to(ptr, max_alignment(types_of(cases))) - return { case_label_with_defaults(case, cases): load(opts, ptr, case.t) } + return { case_label_with_refinements(case, cases): load(opts, ptr, case.t) } -def case_label_with_defaults(case, cases): +def case_label_with_refinements(case, cases): label = case.label - while case.defaults_to is not None: - case = cases[find_case(case.defaults_to, cases)] + while case.refines is not None: + case = cases[find_case(case.refines, cases)] label += '|' + case.label return label @@ -665,8 +665,8 @@ def store_record(opts, v, ptr, fields): ptr += size(f.t) ``` -Variants are stored using the `|`-separated list of `defaults-to` cases built -by `case_label_with_default` (above) to iteratively find a matching case (which +Variants are stored using the `|`-separated list of `refines` cases built +by `case_label_with_refinements` (above) to iteratively find a matching case (which validation guarantees will succeed). While this code appears to do O(n) string matching, a normal implementation can statically fuse `store_variant` with its matching `load_variant` to ultimately build a dense array that maps producer's @@ -924,7 +924,7 @@ def lift_flat_variant(opts, vi, cases): v = lift_flat(opts, CoerceValueIter(), case.t) for have in flat_types: _ = vi.next(have) - return { case_label_with_defaults(case, cases): v } + return { case_label_with_refinements(case, cases): v } def narrow_i64_to_i32(i): assert(0 <= i < (1 << 64)) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 4636434..85d418a 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -332,7 +332,7 @@ intertype ::= unit | bool | float32 | float64 | char | string | (record (field )*) - | (variant (case (subtype-of )?)+) + | (variant (case (refines )?)+) | (list ) | (tuple *) | (flags *) @@ -378,9 +378,9 @@ NaN values are canonicalized to a single value so that: The subtyping between all these types is described in a separate [subtyping explainer](Subtyping.md). Of note here, though: the optional -`subtype-of` field in the `case`s of `variant`s is exclusively concerned with +`refines` field in the `case`s of `variant`s is exclusively concerned with subtyping. In particular, a `variant` subtype can contain a `case` not present -in the supertype if the subtype's `case` `subtype-of` (directly or transitively) +in the supertype if the subtype's `case` `refines` (directly or transitively) some `case` in the supertype. The sets of values allowed for the remaining *specialized* interface types are diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index 461cb67..608dc08 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -12,7 +12,7 @@ But roughly speaking: | `float32`, `float64` | `float32 <: float64` | | `char` | | | `record` | fields can be reordered; covariant field payload subtyping; superfluous fields can be ignored in the subtype; `option` fields can be ignored in the supertype | -| `variant` | cases can be reordered; covariant case payload subtyping; superfluous cases can be ignored in the supertype; `subtype-of` cases can be ignored in the subtype | +| `variant` | cases can be reordered; covariant case payload subtyping; superfluous cases can be ignored in the supertype; `refines` cases can be ignored in the subtype | | `list` | covariant element subtyping | | `tuple` | `(tuple T ...) <: T` | | `option` | `T <: (option T)` | diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index a664d2c..949ae02 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -60,7 +60,7 @@ class Flags(InterfaceType): class Case: label: str t: InterfaceType - defaults_to: str = None + refines: str = None @dataclass class Variant(InterfaceType): @@ -325,12 +325,12 @@ def load_variant(opts, ptr, cases): trap_if(disc >= len(cases)) case = cases[disc] ptr = align_to(ptr, max_alignment(types_of(cases))) - return { case_label_with_defaults(case, cases): load(opts, ptr, case.t) } + return { case_label_with_refinements(case, cases): load(opts, ptr, case.t) } -def case_label_with_defaults(case, cases): +def case_label_with_refinements(case, cases): label = case.label - while case.defaults_to is not None: - case = cases[find_case(case.defaults_to, cases)] + while case.refines is not None: + case = cases[find_case(case.refines, cases)] label += '|' + case.label return label @@ -743,7 +743,7 @@ def next(self, want): v = lift_flat(opts, CoerceValueIter(), case.t) for have in flat_types: _ = vi.next(have) - return { case_label_with_defaults(case, cases): v } + return { case_label_with_refinements(case, cases): v } def narrow_i64_to_i32(i): assert(0 <= i < (1 << 64)) From 85873a605d893da16cef481c4f6b3295a219482f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Apr 2022 15:39:46 -0500 Subject: [PATCH 045/301] Split core definitions out into separate index spaces --- design/mvp/Binary.md | 344 ++++--- design/mvp/CanonicalABI.md | 161 ++-- design/mvp/Explainer.md | 868 ++++++++++-------- design/mvp/FutureFeatures.md | 25 +- design/mvp/Subtyping.md | 6 +- design/mvp/canonical-abi/definitions.py | 76 +- design/mvp/canonical-abi/run_tests.py | 4 +- .../SharedEverythingDynamicLinking.md | 20 +- 8 files changed, 819 insertions(+), 685 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 4db0707..a37f4c5 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -3,7 +3,7 @@ This document defines the binary format for the AST defined in the [explainer](Explainer.md). The top-level production is `component` and the convention is that a file suffixed in `.wasm` may contain either a -[`core:module`] *or* a `component`, using the `kind` field to discriminate +[`core:module`] *or* a `component`, using the `layer` field to discriminate between the two in the first 8 bytes (see [below](#component-definitions) for more details). @@ -17,197 +17,234 @@ and validation will be present in the [formal specification](../../spec/). (See [Component Definitions](Explainer.md#component-definitions) in the explainer.) ``` -component ::= s*:
* => (component flatten(s*)) -preamble ::= +component ::= s*:
* => (component flatten(s*)) +preamble ::= magic ::= 0x00 0x61 0x73 0x6D version ::= 0x0a 0x00 -kind ::= 0x01 0x00 -section ::= section_0() => ϵ - | t*:section_1(vec()) => t* - | i*:section_2(vec()) => i* - | f*:section_3(vec()) => f* - | m: section_4() => m - | c: section_5() => c - | i*:section_6(vec()) => i* - | e*:section_7(vec()) => e* - | s: section_8() => s - | a*:section_9(vec()) => a* +layer ::= 0x01 0x00 +section ::= section_0() => ϵ + | m*:section_1() => [core-prefix(m)] + | i*:section_2(vec()) => core-prefix(i)* + | a*:section_3(vec()) => core-prefix(a)* + | t*:section_4(vec()) => core-prefix(t)* + | c: section_5() => [c] + | i*:section_6(vec()) => i* + | a*:section_7(vec()) => a* + | t*:section_8(vec()) => t* + | c*:section_9(vec()) => c* + | s: section_10() => [s] + | i*:section_11(vec()) => i* + | e*:section_12(vec()) => e* ``` Notes: * Reused Core binary rules: [`core:section`], [`core:custom`], [`core:module`] +* The `core-prefix(t)` meta-function inserts a `core` token after the leftmost + paren of `t` (e.g., `core-prefix( (module (func)) )` is `(core module (func))`). * The `version` given above is pre-standard. As the proposal changes before final standardization, `version` will be bumped from `0xa` upwards to coordinate prototypes. When the standard is finalized, `version` will be changed one last time to `0x1`. (This mirrors the path taken for the Core WebAssembly 1.0 spec.) -* The `kind` field is meant to distinguish modules from components early in the - binary format. (Core WebAssembly modules already implicitly have a `kind` - field of `0x0` in their 4 byte [`core:version`] field.) +* The `layer` field is meant to distinguish modules from components early in + the binary format. (Core WebAssembly modules already implicitly have a + `layer` field of `0x0` in their 4 byte [`core:version`] field.) ## Instance Definitions (See [Instance Definitions](Explainer.md#instance-definitions) in the explainer.) ``` -instance ::= ie: => (instance ie) -instanceexpr ::= 0x00 0x00 m: a*:vec() => (instantiate (module m) (with a)*) - | 0x00 0x01 c: a*:vec() => (instantiate (component c) (with a)*) - | 0x01 e*:vec() => e* - | 0x02 e*:vec() => e* -modulearg ::= n: 0x02 i: => n (instance i) -componentarg ::= n: 0x00 m: => n (module m) - | n: 0x01 c: => n (component c) - | n: 0x02 i: => n (instance i) - | n: 0x03 f: => n (func f) - | n: 0x04 v: => n (value v) - | n: 0x05 t: => n (type t) -export ::= a: => (export a) -name ::= n: => n +core:instance ::= ie: => (instance ie) +core:instanceexpr ::= 0x00 m: arg*:vec() => (instantiate m arg*) + | 0x01 e*:vec() => e* +core:instantiatearg ::= n: si: => (with n si) +core:sortidx ::= sort: idx: => (sort idx) +core:sort ::= 0x00 => func + | 0x01 => table + | 0x02 => memory + | 0x03 => global + | 0x04 => type + | 0x10 => module + | 0x11 => instance +core:export ::= n: si: => (export n si) + +instance ::= ie: => (instance ie) +instanceexpr ::= 0x00 c: arg*:vec() => (instantiate c arg*) + | 0x01 e*:vec() => e* +instantiatearg ::= n: si: => (with n si) +sortidx ::= sort: idx: => (sort idx) +sort ::= 0x00 si: => si + | 0x01 => func + | 0x02 => value + | 0x03 => type + | 0x04 => component + | 0x05 => instance +export ::= n: si: => (export n si) ``` Notes: -* Reused Core binary rules: [`core:export`], [`core:name`] -* The indices in `modulearg`/`componentarg` are validated according to their - respective index space, which are built incrementally as each definition is - validated. In general, unlike core modules, which supports cyclic references - between (function) definitions, component definitions are strictly acyclic - and validated in a linear incremental manner, like core wasm instructions. -* The arguments supplied by `instantiate` are validated against the consuming - module/component according to the [subtyping](Subtyping.md) rules. - +* Reused Core binary rules: [`core:name`] +* The `core:sort` values are chosen to match the discriminant opcodes of + [`core:importdesc`] so that `core:exportdesc` (below) is identical. +* `type` is added to `core:sort` in anticipation of the [type-imports] proposal. Until that + proposal, core modules won't be able to actually import or export types, however, the + `type` sort is allowed as part of outer aliases (below). +* `module` and `instance` are added to `core:sort` in anticipation of the [module-linking] + proposal, which would add these types to Core WebAssembly. Again, core modules won't be + able to actually import or export modules/instances, but they are used for aliases. +* The indices in `sortidx` are validated according to their `sort`'s index + spaces, which are built incrementally as each definition is validated. +* The types of arguments supplied by `instantiate` are validated against the + types of the matching import according to the [subtyping](Subtyping.md) rules. ## Alias Definitions (See [Alias Definitions](Explainer.md#alias-definitions) in the explainer.) ``` -alias ::= 0x00 0x00 i: n: => (alias export i n (module)) - | 0x00 0x01 i: n: => (alias export i n (component)) - | 0x00 0x02 i: n: => (alias export i n (instance)) - | 0x00 0x03 i: n: => (alias export i n (func)) - | 0x00 0x04 i: n: => (alias export i n (value)) - | 0x01 0x00 i: n: => (alias export i n (func)) - | 0x01 0x01 i: n: => (alias export i n (table)) - | 0x01 0x02 i: n: => (alias export i n (memory)) - | 0x01 0x03 i: n: => (alias export i n (global)) - | ... other Post-MVP Core definition kinds - | 0x02 0x00 ct: i: => (alias outer ct i (module)) - | 0x02 0x01 ct: i: => (alias outer ct i (component)) - | 0x02 0x05 ct: i: => (alias outer ct i (type)) +core:alias ::= sort: target: => (core alias target (sort)) +core:aliastarget ::= 0x00 i: n: => export i n + | 0x01 ct: idx: => outer ct idx + +alias ::= sort: target: => (alias target (sort)) +aliastarget ::= 0x00 i: n: => export i n + | 0x01 ct: idx: => outer ct idx ``` Notes: -* For instance-export aliases (opcodes `0x00` and `0x01`), `i` is validated to - refer to an instance in the instance index space that exports `n` with the - specified definition kind. -* For outer aliases (opcode `0x02`), `ct` is validated to be *less or equal - than* the number of enclosing components and `i` is validated to be a valid - index in the specified definition's index space of the enclosing component - indicated by `ct` (counting outward, starting with `0` referring to the - current component). +* For `export` aliases, `i` is validated to refer to an instance in the + instance index space that exports `n` with the specified `sort`. +* For `outer` aliases, `ct` is validated to be *less or equal than* the number + of enclosing components and `i` is validated to be a valid + index in the `sort` index space of the `i`th enclosing component (counting + outward, starting with `0` referring to the current component). +* For `outer` aliases, validation restricts the `sort` of the `aliastarget` + to one of `type`, `module` or `component`. ## Type Definitions (See [Type Definitions](Explainer.md#type-definitions) in the explainer.) ``` -type ::= dt: => dt - | it: => it -deftype ::= mt: => mt - | ct: => ct - | it: => it - | ft: => ft - | vt: => vt -moduletype ::= 0x4f mtd*:vec() => (module mtd*) -moduletype-def ::= 0x01 dt: => dt - | 0x02 i: => i - | 0x07 n: d: => (export n d) -core:deftype ::= ft: => ft - | ... Post-MVP additions => ... -componenttype ::= 0x4e ctd*:vec() => (component ctd*) -instancetype ::= 0x4d itd*:vec() => (instance itd*) -componenttype-def ::= itd: => itd - | 0x02 i: => i -instancetype-def ::= 0x01 t: => t - | 0x07 n: dt: => (export n dt) - | 0x09 a: => a -import ::= n: dt: => (import n dt) -deftypeuse ::= i: => type-index-space[i] (must be ) -functype ::= 0x4c param*:vec() t: => (func param* (result t)) -param ::= 0x00 t: => (param t) - | 0x01 n: t: => (param n t) -valuetype ::= 0x4b t: => (value t) -intertypeuse ::= i: => type-index-space[i] (must be ) - | pit: => pit -primintertype ::= 0x7f => unit - | 0x7e => bool - | 0x7d => s8 - | 0x7c => u8 - | 0x7b => s16 - | 0x7a => u16 - | 0x79 => s32 - | 0x78 => u32 - | 0x77 => s64 - | 0x76 => u64 - | 0x75 => float32 - | 0x74 => float64 - | 0x73 => char - | 0x72 => string -intertype ::= pit: => pit - | 0x71 field*:vec() => (record field*) - | 0x70 case*:vec() => (variant case*) - | 0x6f t: => (list t) - | 0x6e t*:vec() => (tuple t*) - | 0x6d n*:vec() => (flags n*) - | 0x6c n*:vec() => (enum n*) - | 0x6b t*:vec() => (union t*) - | 0x6a t: => (option t) - | 0x69 t: u: => (expected t u) -field ::= n: t: => (field n t) -case ::= n: t: 0x0 => (case n t) - | n: t: 0x1 i: => (case n t (refines case-label[i])) +core:type ::= dt: => (type dt) (GC proposal) +core:deftype ::= ft: => ft (WebAssembly 1.0) + | st: => st (GC proposal) + | at: => at (GC proposal) + | mt: => mt +core:moduletype ::= 0x50 md*:vec() => (module md*) +core:moduledecl ::= 0x00 i: => i + | 0x01 t: => t + | 0x02 a: => a + | 0x03 e: => e +core:import ::= m: f: ed: => (import m f ed) (WebAssembly 1.0) +core:externdesc ::= id: => id (WebAssembly 1.0) +core:exportdecl ::= n: ed: => (export n ed) +``` +Notes: +* Reused Core binary rules: [`core:importdesc`], [`core:functype`] +* `core:import` as written above is binary-compatible with [`core:import`]. +* Validation of `core:moduledecl` (currently) rejects `core:moduletype` definitions + inside `type` declarators (i.e., nested core module types). +* Validation of `core:moduledecl` (currently) only allows `outer` `type` + `alias` declarators. +* As described in the explainer, each module type is validated with an + initially-empty type index space. Outer aliases can be used to pull + in type definitions from containing components. + +``` +type ::= dt: => (type dt) +deftype ::= vt: => vt + | ft: => ft + | ct: => ct + | it: => it +functype ::= 0x40 param*:vec() t: => (func param* (result t)) +param ::= 0x00 t: => (param t) + | 0x01 n: t: => (param n t) +componenttype ::= 0x41 cd*:vec() => (component cd*) +instancetype ::= 0x42 id*:vec() => (instance id*) +componentdecl ::= 0x00 id: => id + | id: => id +instancedecl ::= 0x01 t: => t + | 0x02 a: => a + | 0x03 ed: => ed +importdecl ::= n: ed: => (import n ed) +exportdecl ::= n: ed: => (export n ed) +externdesc ::= 0x00 i: => core-type-index-space[i] (must be moduletype) + | 0x01 i: => type-index-space[i] (must be func|instance|componenttype) + | 0x02 t: => (value t) + | 0x03 tb: => (type tb) +typebound ::= 0x00 i: => (eq type-index-space[i]) (any deftype) + | 0x00 t: => (eq t) +valtype ::= i: => type-index-space[i] (must be valtype) + | 0x7f => unit + | 0x7e => bool + | 0x7d => s8 + | 0x7c => u8 + | 0x7b => s16 + | 0x7a => u16 + | 0x79 => s32 + | 0x78 => u32 + | 0x77 => s64 + | 0x76 => u64 + | 0x75 => float32 + | 0x74 => float64 + | 0x73 => char + | 0x72 => string + | 0x71 field*:vec() => (record field*) + | 0x70 case*:vec() => (variant case*) + | 0x6f t: => (list t) + | 0x6e t*:vec() => (tuple t*) + | 0x6d n*:vec() => (flags n*) + | 0x6c n*:vec() => (enum n*) + | 0x6b t*:vec() => (union t*) + | 0x6a t: => (option t) + | 0x69 t: u: => (expected t u) +field ::= n: t: => (field n t) +case ::= n: t: 0x0 => (case n t) + | n: t: 0x1 i: => (case n t (refines case-label[i])) ``` Notes: -* Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] * The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, with type opcodes starting at SLEB128(-1) (`0x7f`) and going down, reserving the nonnegative SLEB128s for type indices. -* The (`module`|`component`|`instance`)`type-def` opcodes match the corresponding - section numbers. -* Module, component and instance types create fresh type index spaces that are - populated and referenced by their contained definitions. E.g., for a module - type that imports a function, the `import` `moduletype-def` must be preceded - by either a `type` or `alias` `moduletype-def` that adds the function type to - the type index space. -* Currently, the only allowed form of `alias` in instance and module types - is `(alias outer ct li (type))`. In the future, other kinds of aliases - will be needed and this restriction will be relaxed. +* Validation of `moduledecl` (currently) only allows `outer` `type` `alias` + declarators. +* As described in the explainer, each component and instance type is validated + with an initially-empty type index space. Outer aliases can be used to pull + in type definitions from containing components. +* The rule for `typebound` contains both an unrestricted `` case and, + within `valtype`, a `valtype`-restricted `` case. Since the former + is a strict generalization of the latter, there is no ambiguity. The net + effect is that `eq` accepts all types. -## Function Definitions +## Canonical Definitions -(See [Function Definitions](Explainer.md#function-definitions) in the explainer.) +(See [Canonical Definitions](Explainer.md#canonical-definitions) in the explainer.) ``` -func ::= body: => (func body) -funcbody ::= 0x00 ft: opt*:vec() f: => (canon.lift ft opt* f) - | 0x01 opt*:* f: => (canon.lower opt* f) -canonopt ::= 0x00 => string-encoding=utf8 - | 0x01 => string-encoding=utf16 - | 0x02 => string-encoding=latin1+utf16 - | 0x03 m: => (memory m) - | 0x04 f: => (realloc f) - | 0x05 f: => (post-return f) +canon ::= 0x00 0x00 f: ft: opts: => (canon lift f type-index-space[ft] opts (func)) + | 0x01 0x00 f: opts: => (canon lower f opts (core func)) +opts ::= opt*:vec() => opt* +canonopt ::= 0x00 => string-encoding=utf8 + | 0x01 => string-encoding=utf16 + | 0x02 => string-encoding=latin1+utf16 + | 0x03 m: => (memory m) + | 0x04 f: => (realloc f) + | 0x05 f: => (post-return f) ``` Notes: -* Validation prevents duplicate or conflicting options. -* Validation of `canon.lift` requires `f` to have type `flatten(ft)` (defined +* The second `0x00` byte in `canon` stands for the `func` sort and thus the + `0x00 ` pair standards for a `func` `sortidx` or `core:sortidx`. +* Validation prevents duplicate or conflicting `canonopt`. +* Validation of `canon lift` requires `f` to have type `flatten(ft)` (defined by the [Canonical ABI](CanonicalABI.md#flattening)). The function being defined is given type `ft`. -* Validation of `canon.lower` requires `f` to be a component function. The +* Validation of `canon lower` requires `f` to be a component function. The function being defined is given core function type `flatten(ft)` where `ft` is the `functype` of `f`. -* If the lifting/lowering operations implied by `canon.lift` or `canon.lower` - require access to `memory` or `realloc`, then validation requires these - options to be present. If present, `realloc` must have type +* If the lifting/lowering operations implied by `lift` or `lower` require + access to `memory` or `realloc`, then validation requires these options to be + present. If present, `realloc` must have core type `(func (param i32 i32 i32 i32) (result i32))`. -* `post-return` is always optional, but, if present, must have type `(func)`. +* `post-return` is always optional, but, if present, must have core type + `(func)`. ## Start Definitions @@ -233,24 +270,25 @@ flags are set. ## Import and Export Definitions -(See [Import and Export Definitions](Explainer.md#import-and-export-definitions) in the explainer.) - -As described in the explainer, the binary decode rules of `import` and `export` -have already been defined above. - +(See [Import and Export Definitions](Explainer.md#import-and-export-definitions) +in the explainer.) +``` +import ::= n: ed: => (import n ed) +export ::= n: si: => (export n si) +``` Notes: * Validation requires all import and export `name`s are unique. -[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version [`core:section`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-section [`core:custom`]: https://webassembly.github.io/spec/core/binary/modules.html#custom-section [`core:module`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-module -[`core:export`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-export +[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version [`core:name`]: https://webassembly.github.io/spec/core/binary/values.html#binary-name [`core:import`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-import [`core:importdesc`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-importdesc [`core:functype`]: https://webassembly.github.io/spec/core/binary/types.html#binary-functype -[Future Core Type]: https://github.com/WebAssembly/gc/blob/master/proposals/gc/MVP.md#type-definitions +[type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md +[module-linking]: https://github.com/WebAssembly/module-linking/blob/main/proposals/module-linking/Explainer.md diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 96c4592..02173fc 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1,7 +1,7 @@ # Canonical ABI Explainer -This explainer walks through the Canonical ABI used by [function definitions] -to convert between high-level interface-typed values and low-level Core +This explainer walks through the Canonical ABI used by [canonical definitions] +to convert between high-level Component Model values and low-level Core WebAssembly values. * [Supporting definitions](#supporting-definitions) @@ -14,16 +14,16 @@ WebAssembly values. * [Flat Lifting](#flat-lifting) * [Flat Lowering](#flat-lowering) * [Lifting and Lowering](#lifting-and-lowering) -* [Canonical ABI built-ins](#canonical-abi-built-ins) - * [`canon.lift`](#canonlift) - * [`canon.lower`](#canonlower) +* [Canonical definitions](#canonical-definitions) + * [`lift`](#lift) + * [`lower`](#lower) ## Supporting definitions -The Canonical ABI specifies, for each interface-typed function signature, a +The Canonical ABI specifies, for each component function signature, a corresponding core function signature and the process for reading -interface-typed values into and out of linear memory. While a full formal +component-level values into and out of linear memory. While a full formal specification would specify the Canonical ABI in terms of macro-expansion into Core WebAssembly instructions augmented with a new set of (spec-internal) [administrative instructions], the informal presentation here instead specifies @@ -52,19 +52,19 @@ necessary to support recovery in the middle of nested allocations. In the MVP, for large allocations that can OOM, [streams](Explainer.md#TODO) would usually be the appropriate type to use and streams will be able to explicitly express failure in their type. Post-MVP, [adapter functions] would allow fully custom -OOM handling for all interface types, allowing a toolchain to intentionally -propagate OOM into the appropriate explicit return value of the function's -declared return type. +OOM handling for all component-level types, allowing a toolchain to +intentionally propagate OOM into the appropriate explicit return value of the +function's declared return type. ### Despecialization -[In the explainer][Type Definitions], interface types are classified as either *fundamental* or -*specialized*, where the specialized interface types are defined by expansion -into fundamental interface types. In most cases, the canonical ABI of a -specialized interface type is the same as its expansion so, to avoid +[In the explainer][Type Definitions], component value types are classified as +either *fundamental* or *specialized*, where the specialized value types are +defined by expansion into fundamental value types. In most cases, the canonical +ABI of a specialized value type is the same as its expansion so, to avoid repetition, the other definitions below use the following `despecialize` -function to replace specialized interface types with their expansion: +function to replace specialized value types with their expansion: ```python def despecialize(t): match t: @@ -76,14 +76,14 @@ def despecialize(t): case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) case _ : return t ``` -The specialized interface types `string` and `flags` are missing from this list +The specialized value types `string` and `flags` are missing from this list because they are given specialized canonical ABI representations distinct from their respective expansions. ### Alignment -Each interface type is assigned an [alignment] which is used by subsequent +Each value type is assigned an [alignment] which is used by subsequent Canonical ABI definitions. Presenting the definition of `alignment` piecewise, we start with the top-level case analysis: ```python @@ -141,8 +141,8 @@ def alignment_flags(labels): ### Size -Each interface type is also assigned a `size`, measured in bytes, which -corresponds the `sizeof` operator in C: +Each value type is also assigned a `size`, measured in bytes, which corresponds +the `sizeof` operator in C: ```python def size(t): match despecialize(t): @@ -191,10 +191,10 @@ def num_i32_flags(labels): ### Loading -The `load` function defines how to read a value of a given interface type `t` -out of linear memory starting at offset `ptr`, returning a interface-typed -value (here, as a Python value). The `Opts`/`opts` class/parameter contains the -[`canonopt`] immediates supplied as part of `canon.lift`/`canon.lower`. +The `load` function defines how to read a value of a given value type `t` +out of linear memory starting at offset `ptr`, returning the value represented +as a Python value. The `Opts`/`opts` class/parameter contains the +[`canonopt`] immediates supplied as part of `canon lift`/`canon lower`. Presenting the definition of `load` piecewise, we start with the top-level case analysis: ```python @@ -280,10 +280,10 @@ def i32_to_char(opts, i): Strings are loaded from two `i32` values: a pointer (offset in linear memory) and a number of bytes. There are three supported string encodings in [`canonopt`]: [UTF-8], [UTF-16] and `latin1+utf16`. This last options allows a *dynamic* -choice between [Latin-1] and UTF-16, indicated by the high bit of the second `i32`. -String interface values include their original encoding and byte length as a +choice between [Latin-1] and UTF-16, indicated by the high bit of the second +`i32`. String values include their original encoding and byte length as a "hint" that enables `store_string` (defined below) to make better up-front -allocation size choices in many cases. Thus, the interface value produced by +allocation size choices in many cases. Thus, the value produced by `load_string` isn't simply a Python `str`, but a *tuple* containing a `str`, the original encoding and the original byte length. ```python @@ -398,7 +398,7 @@ def unpack_flags_from_int(i, labels): ### Storing -The `store` function defines how to write a value `v` of a given interface type +The `store` function defines how to write a value `v` of a given value type `t` into linear memory starting at offset `ptr`. Presenting the definition of `store` piecewise, we start with the top-level case analysis: ```python @@ -465,9 +465,9 @@ not to do. To avoid multiple passes, the canonical ABI instead uses a `realloc` approach to update the allocation size during the single copy. A blind `realloc` approach would normally suffer from multiple reallocations per string (e.g., using the standard doubling-growth strategy). However, as already shown -in `load_string` above, interface-typed strings come with two useful hints: -their original encoding and byte length. From this hint data, `store_string` can -do a much better job minimizing the number of reallocations. +in `load_string` above, string values come with two useful hints: their +original encoding and byte length. From this hint data, `store_string` can do a +much better job minimizing the number of reallocations. We start with a case analysis to enumerate all the meaningful encoding combinations, subdividing the `latin1+utf16` encoding into either `latin1` or @@ -716,9 +716,9 @@ With only the definitions above, the Canonical ABI would be forced to place all parameters and results in linear memory. While this is necessary in the general case, in many cases performance can be improved by passing small-enough values in registers by using core function parameters and results. To support this -optimization, the Canonical ABI defines `flatten` to map interface function +optimization, the Canonical ABI defines `flatten` to map component function types to core function types by attempting to decompose all the -non-dynamically-sized interface types into core parameters and results. +non-dynamically-sized component value types into core value types. For a variety of [practical][Implementation Limits] reasons, we need to limit the total number of flattened parameters and results, falling back to storing @@ -731,8 +731,8 @@ When there are too many flat values, in general, a single `i32` pointer can be passed instead (pointing to a tuple in linear memory). When lowering *into* linear memory, this requires the Canonical ABI to call `realloc` (in `lower` below) to allocate space to put the tuple. As an optimization, when lowering -the return value of an imported function (lowered by `canon.lower`), the caller -can have already allocated space for the return value (e.g., efficiently on the +the return value of an imported function (via `canon lower`), the caller can +have already allocated space for the return value (e.g., efficiently on the stack), passing in an `i32` pointer as an parameter instead of returning an `i32` as a return value. @@ -749,9 +749,9 @@ def flatten(functype, context): flat_results = flatten_type(functype.result) if len(flat_results) > MAX_FLAT_RESULTS: match context: - case 'canon.lift': + case 'lift': flat_results = ['i32'] - case 'canon.lower': + case 'lower': flat_params += ['i32'] flat_results = [] @@ -807,10 +807,10 @@ def join(a, b): ### Flat Lifting The `lift_flat` function defines how to convert zero or more core values into a -single high-level value of interface type `t`. The values are given by a value -iterator that iterates over a complete parameter or result list and asserts -that the expected and actual types line up. Presenting the definition of -`lift_flat` piecewise, we start with the top-level case analysis: +single high-level value of type `t`. The values are given by a value iterator +that iterates over a complete parameter or result list and asserts that the +expected and actual types line up. Presenting the definition of `lift_flat` +piecewise, we start with the top-level case analysis: ```python @dataclass class Value: @@ -849,10 +849,10 @@ def lift_flat(opts, vi, t): ``` Integers are lifted from core `i32` or `i64` values using the signedness of the -interface type to interpret the high-order bit. When the interface type is -narrower than an `i32`, the Canonical ABI specifies a dynamic range check in -order to catch bugs. The conversion logic here assumes that `i32` values are -always represented as unsigned Python `int`s and thus lifting to a signed type +target type to interpret the high-order bit. When the target type is narrower +than an `i32`, the Canonical ABI specifies a dynamic range check in order to +catch bugs. The conversion logic here assumes that `i32` values are always +represented as unsigned Python `int`s and thus lifting to a signed type performs a manual 2s complement conversion in the Python (which would be a no-op in hardware). ```python @@ -948,9 +948,9 @@ def lift_flat_flags(vi, labels): ### Flat Lowering -The `lower_flat` function defines how to convert a value `v` of a given -interface type `t` into zero or more core values. Presenting the definition of -`lower_flat` piecewise, we start with the top-level case analysis: +The `lower_flat` function defines how to convert a value `v` of a given type +`t` into zero or more core values. Presenting the definition of `lower_flat` +piecewise, we start with the top-level case analysis: ```python def lower_flat(opts, v, t): match despecialize(t): @@ -973,9 +973,9 @@ def lower_flat(opts, v, t): case Flags(labels) : return lower_flat_flags(v, labels) ``` -Since interface-typed values are assumed to in-range and, as previously stated, +Since component-level values are assumed in-range and, as previously stated, core `i32` values are always internally represented as unsigned `int`s, -unsigned interface values need no extra conversion. Signed interface values are +unsigned integer values need no extra conversion. Signed integer values are converted to unsigned core `i32`s by 2s complement arithmetic (which again would be a no-op in hardware): ```python @@ -1044,8 +1044,8 @@ def lower_flat_flags(v, labels): ### Lifting and Lowering The `lift` function defines how to lift a list of at most `max_flat` core -parameters or results given by the `ValueIter` `vi` into a tuple of interface -values with types `ts`: +parameters or results given by the `ValueIter` `vi` into a tuple of values with +types `ts`: ```python def lift(opts, max_flat, vi, ts): flat_types = flatten_types(ts) @@ -1058,9 +1058,9 @@ def lift(opts, max_flat, vi, ts): return [ lift_flat(opts, vi, t) for t in ts ] ``` -The `lower` function defines how to lower a list of interface values `vs` of -types `ts` into a list of at most `max_flat` core values. As already described -for [`flatten`](#flattening) above, lowering handles the +The `lower` function defines how to lower a list of component-level values `vs` +of types `ts` into a list of at most `max_flat` core values. As already +described for [`flatten`](#flattening) above, lowering handles the greater-than-`max_flat` case by either allocating storage with `realloc` or accepting a caller-allocated buffer as an out-param: ```python @@ -1086,24 +1086,23 @@ def lower(opts, max_flat, vs, ts, out_param = None): ## Canonical ABI built-ins Using the above supporting definitions, we can describe the static and dynamic -semantics of [`func`], whose AST is defined in the main explainer as: +semantics of [`canon`], whose AST is defined in the main explainer as: ``` -func ::= (func ? ) -funcbody ::= (canon.lift * ) - | (canon.lower * ) +canon ::= (canon lift * (func ?)) + | (canon lower * (core func ?)) ``` The following subsections define the static and dynamic semantics of each case of `funcbody`. -### `canon.lift` +### `lift` For a function: ``` -(func $f (canon.lift $ft: $opts:* $callee:)) +(canon lift $ft: $opts:* $callee: (func $f)) ``` validation specifies: - * `$callee` must have type `flatten($ft, 'canon.lift')` + * `$callee` must have type `flatten($ft, 'lift')` * `$f` is given type `$ft` * a `memory` is present if required by lifting and is a subtype of `(memory 1)` * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` @@ -1112,19 +1111,19 @@ validation specifies: When instantiating component instance `$inst`: * Define `$f` to be the closure `lambda args: canon_lift($opts, $inst, $callee, $ft, args)` -Thus, `$f` captures `$opts`, `$inst`, `$callee` and `$ft` in a closure which can be -subsequently exported or passed into a child instance (via `with`). If `$f` -ends up being called by the host, the host is responsible for, in a -host-defined manner, conjuring up interface values suitable for passing into -`lower` and, conversely, consuming the interface values produced by `lift`. For +Thus, `$f` captures `$opts`, `$inst`, `$callee` and `$ft` in a closure which +can be subsequently exported or passed into a child instance (via `with`). If +`$f` ends up being called by the host, the host is responsible for, in a +host-defined manner, conjuring up component values suitable for passing into +`lower` and, conversely, consuming the component values produced by `lift`. For example, if the host is a native JS runtime, the [JavaScript embedding] would -specify how native JavaScript values are converted to and from interface +specify how native JavaScript values are converted to and from component values. Alternatively, if the host is a Unix CLI that invokes component exports directly from the command line, the CLI could choose to automatically parse -`argv` into interface values according to the declared interface types of the -export. In any case, `canon.lift` specifies how these variously-produced -interface values are consumed as parameters (and produced as results) by a -*single host-agnostic component*. +`argv` into component-level values according to the declared types of the +export. In any case, `canon lift` specifies how these variously-produced values +are consumed as parameters (and produced as results) by a *single host-agnostic +component*. The `$inst` captured above is assumed to have at least the following two fields, which are used to implement the [component invariants]: @@ -1165,9 +1164,9 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): There are a number of things to note about this definition: Uncaught Core WebAssembly [exceptions] result in a trap at component -boundaries. Thus, if a component wishes to signal an error, it must -use some sort of explicit interface type such as `expected` (whose `error` case -particular language bindings may choose to map to and from exceptions). +boundaries. Thus, if a component wishes to signal an error, it must use some +sort of explicit type such as `expected` (whose `error` case particular +language bindings may choose to map to and from exceptions). The contract assumed by `canon_lift` (and ensured by `canon_lower` below) is that the caller of `canon_lift` *must* call `post_return` right after lowering @@ -1196,14 +1195,14 @@ component linking configurations, hence the eager error helps ensure compositionality. -### `canon.lower` +### `lower` For a function: ``` -(func $f (canon.lower $opts:* $callee:)) +(canon lower $opts:* $callee: (core func $f)) ``` where `$callee` has type `$ft`, validation specifies: -* `$f` is given type `flatten($ft, 'canon.lower')` +* `$f` is given type `flatten($ft, 'lower')` * a `memory` is present if required by lifting and is a subtype of `(memory 1)` * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` * there is no `post-return` in `$opts` @@ -1249,7 +1248,7 @@ lifting and lowering), with a few exceptions: `i32` parameter. A useful consequence of the above rules for `may_enter` and `may_leave` is that -attempting to `canon.lower` to a `callee` in the same instance is a guaranteed, +attempting to `canon lower` to a `callee` in the same instance is a guaranteed, immediate trap which a link-time compiler can eagerly compile to an `unreachable`. This avoids what would otherwise be a surprising form of memory aliasing that could introduce obscure bugs. @@ -1263,9 +1262,9 @@ the elimination of string operations on the labels of records and variants) as well as post-MVP [adapter functions]. -[Function Definitions]: Explainer.md#function-definitions -[`canonopt`]: Explainer.md#function-definitions -[`func`]: Explainer.md#function-definitions +[Canonical Definitions]: Explainer.md#canonical-definitions +[`canonopt`]: Explainer.md#canonical-definitions +[`canon`]: Explainer.md#canonical-definitions [Type Definitions]: Explainer.md#type-definitions [Component Invariants]: Explainer.md#component-invariants [JavaScript Embedding]: Explainer.md#JavaScript-embedding diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 85d418a..6eb1df9 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -1,15 +1,15 @@ # Component Model Explainer This explainer walks through the assembly-level definition of a -[component](../high-level) and the proposed embedding of components into a -native JavaScript runtime. +[component](../high-level) and the proposed embedding of components into +native JavaScript runtimes. * [Grammar](#grammar) * [Component definitions](#component-definitions) * [Instance definitions](#instance-definitions) * [Alias definitions](#alias-definitions) * [Type definitions](#type-definitions) - * [Function definitions](#function-definitions) + * [Canonical definitions](#canonical-definitions) * [Start definitions](#start-definitions) * [Import and export definitions](#import-and-export-definitions) * [Component invariants](#component-invariants) @@ -20,7 +20,7 @@ native JavaScript runtime. * [TODO](#TODO) (Based on the previous [scoping and layering] proposal to the WebAssembly CG, -this repo merges and supersedes the [Module Linking] and [Interface Types] +this repo merges and supersedes the [module-linking] and [interface-types] proposals, pushing some of their original features into the post-MVP [future feature](FutureFeatures.md) backlog.) @@ -51,44 +51,62 @@ below. At the top-level, a `component` is a sequence of definitions of various kinds: ``` component ::= (component ? *) -definition ::= +definition ::= core-prefix() + | core-prefix() + | core-prefix() + | core-prefix() | | | | - | + | | | | ``` -Core WebAssembly modules (henceforth just "modules") are also sequences of -(different kinds of) definitions. However, unlike modules, components allow -arbitrarily interleaving the different kinds of definitions. As we'll see -below, this arbitrary interleaving reflects the need for different kinds of -definitions to be able to refer back to each other. Importantly, though, -component definitions are acyclic: definitions can only refer back to preceding -definitions (in the AST, text format or binary format). - -The first kind of component definition is a module, as defined by the existing -Core WebAssembly specification's [`core:module`] top-level production. Thus, -components physically embed one or more modules and can be thought of as a -kind of container format for modules. - -The second kind of definition is, recursively, a component itself. Thus, -components form trees with modules (and all other kinds of definitions) only -appearing at the leaves. - -With what's defined so far, we can define the following component: +Components are like Core WebAssembly modules in that their contained +definitions are acyclic: definitions can only refer to preceding definitions +(in the AST, text format and binary format). However, unlike modules, +components can arbitrarily interleave different kinds of definitions. + +The `core-prefix` meta-function transforms a grammatical rule for parsing a +Core WebAssembly definition into a grammatical rule for parsing the same +definition, but with a `core` token added right after the leftmost paren: +``` +core-prefix(X) ::= '(' 'core' Y ')' where X = '(' Y ')' +``` +For example, `core:module` accepts `(module (func))` so +`core-prefix()` accepts `(core module (func))`. Note that the +inner `func` doesn't need a `core` prefix; the `core` token is used to mark the +*transition* from parsing component definitions into core definitions. + +The [`core:module`] production is unmodified by the Component Model and thus +components embed Core WebAssemby (text and binary format) modules as currently +standardized, allowing reuse of an unmodified Core WebAssembly implementation. +The next two productions, `core:instance` and `core:alias`, are not currently +included in Core WebAssembly, but would be if Core WebAssembly adopted the +[module-linking] proposal. These two new core definitions are introduced below, +alongside their component-level counterparts. Finally, the existing +[`core:type`] production is extended below to add core module types as proposed +for module-linking. Thus, the overall idea is to represent core definitions (in +the AST, binary and text format) as-if they had already been added to Core +WebAssembly so that, if they eventually are, the implementation of decoding and +validation can be shared in a layered fashion. + +The next kind of definition is, recursively, a component itself. Thus, +components form trees with all other kinds of definitions only appearing at the +leaves. For example, with what's defined so far, we can write the following +component: ```wasm (component (component - (module (func (export "one") (result i32) (i32.const 1))) - (module (func (export "two") (result f32) (f32.const 2))) + (core module (func (export "one") (result i32) (i32.const 1))) + (core module (func (export "two") (result f32) (f32.const 2))) ) - (module (func (export "three") (result i64) (i64.const 3))) + (core module (func (export "three") (result i64) (i64.const 3))) (component (component - (module (func (export "four") (result f64) (f64.const 4))) + (core module (func (export "four") (result f64) (f64.const 4))) ) ) (component) @@ -96,7 +114,7 @@ With what's defined so far, we can define the following component: ``` This top-level component roots a tree with 4 modules and 1 component as leaves. However, in the absence of any `instance` definitions (introduced -next), nothing will be instantiated or executed at runtime: everything here is +next), nothing will be instantiated or executed at runtime; everything here is dead code. @@ -105,125 +123,150 @@ dead code. Whereas modules and components represent immutable *code*, instances associate code with potentially-mutable *state* (e.g., linear memory) and thus are necessary to create before being able to *run* the code. Instance definitions -create module or component instances by selecting a module/component and -supplying a set of named *arguments* which satisfy all the named *imports* of -the selected module/component: -``` -instance ::= (instance ? ) -instanceexpr ::= (instantiate (module ) (with )*) - | (instantiate (component ) (with )*) - | * - | core * -modulearg ::= (instance ) - | (instance *) -componentarg ::= (module ) - | (component ) - | (instance ) - | (func ) - | (value ) - | (type ) - | (instance *) -export ::= (export ) -``` -When instantiating a module via -`(instantiate (module $M) (with )*)`, the two-level imports of -the module `$M` are resolved as follows: -1. The first `name` of an import is looked up in the named list of `modulearg` - to select a module instance. -2. The second `name` of an import is looked up in the named list of exports of - the module instance found by the first step to select the imported - core definition (a `func`, `memory`, `table`, `global`, etc). - -Based on this, we can link two modules `$A` and `$B` together with the +create module or component instances by selecting a module or component and +then supplying a set of named *arguments* which satisfy all the named *imports* +of the selected module or component. + +The syntax for defining a core module instance is: +``` +core:instance ::= (instance ? ) +core:instanceexpr ::= (instantiate *) + | * +core:instantiatearg ::= (with ) + | (with (instance *)) +core:sortidx ::= ( ) +core:sort ::= func + | table + | memory + | global + | type + | module + | instance +core:export ::= (export ) +``` +When instantiating a module via `instantiate`, the two-level imports of the +core modules are resolved as follows: +1. The first `name` of the import is looked up in the named list of + `core:instantiatearg` to select a core module instance. +2. The second `name` of the import is looked up in the named list of exports of + the core module instance found by the first step to select the imported + core definition. + +Each `core:sort` corresponds 1:1 with a distinct [index space] that contains +only core definitions of that *sort*. The `varu32` field of `core:sortidx` +indexes into the sort's associated index space to select a definition. + +Based on this, we can link two core modules `$A` and `$B` together with the following component: ```wasm (component - (module $A + (core module $A (func (export "one") (result i32) (i32.const 1)) ) - (module $B + (core module $B (func (import "a" "one") (result i32)) ) - (instance $a (instantiate (module $A))) - (instance $b (instantiate (module $B) (with "a" (instance $a)))) + (core instance $a (instantiate $A)) + (core instance $b (instantiate $B (with "a" (instance $a)))) ) ``` -Components, as we'll see below, have single-level imports, i.e., each import -has only a single `name`, and thus every different kind of definition can be -passed as a `componentarg` when instantiating a component, not just instances. -Component instantiation will be revisited below after introducing the -prerequisite type and import definitions. +To see examples of other sorts, we'll need `alias` definitions, which are +introduced in the next section. + +The `*` form of `core:instanceexpr` allows module instances to be +created by directly tupling together preceding definitions, without the need to +`instantiate` a helper module. The "inline" form of `*` inside +`(with ...)` is syntactic sugar that is expanded during text format parsing +into an out-of-line instance definition referenced by `with`. To show an +example of these, we'll also need the `alias` definitions introduced in the +next section. + +The syntax for defining component instances is symmetric to core module +instances, but with a distinct component-level definition of `sort`: +``` +instance ::= (instance ? ) +instanceexpr ::= (instantiate *) + | * +instantiatearg ::= (with ) + | (with (instance *)) +sortidx ::= ( ) +sort ::= core-prefix() + | func + | value + | type + | component + | instance +export ::= (export ) +``` +Because component-level function, type and instance definitions are different +than core-level function, type and instance definitions, they are put into +disjoint index spaces which are indexed separately by `sortidx` and +`core:sortidx`, respectively. Components may import or export core modules +(since core modules are immutable values and thus do not break the +[shared-nothing] model) and so `sortidx` includes `core:sortidx` (which +validation then restricts to core modules; in the future, other immutable core +definitions could be allowed, such as `data` segments). -Lastly, the `(instance *)` and `(instance *)` -expressions allow component and module instances to be created by directly -tupling together preceding definitions, without the need to `instantiate` -anything. The "inline" forms of these expressions in `modulearg` -and `componentarg` are text format sugar for the "out of line" form in -`instanceexpr`. To show an example of how these instance-creation forms are -useful, we'll first need to introduce the `alias` definitions in the next -section. +To see a non-trivial example of component instantiation, we'll first need to +introduce a few other definitions below that allow components to import, define +and export component functions. ### Alias Definitions -Alias definitions project definitions out of other components' index spaces +Alias definitions project definitions out of other components' index spaces and into the current component's index spaces. As represented in the AST below, -there are two kinds of "targets" for an alias: the `export` of a component -instance, or a local definition of an `outer` component that contains the -current component: -``` -alias ::= (alias ) -aliastarget ::= export - | outer -aliaskind ::= (module ?) - | (component ?) - | (instance ?) - | (func ?) - | (value ?) - | (type ?) - | (table ?) - | (memory ?) - | (global ?) - | ... other Post-MVP Core definition kinds -``` -Aliases add a new element to the index space indicated by `aliaskind`. -(Validation ensures that the `aliastarget` does indeed refer to a matching -definition kind.) The `id` in `aliaskind` is bound to this new index and -thus can be used anywhere a normal `id` can be used. - -In the case of `export` aliases, validation requires that `instanceidx` refers -to an instance which exports `name`. - -In the case of `outer` aliases, the (`outeridx`, `idx`) pair serves as a -[de Bruijn index], with `outeridx` being the number of enclosing components to -skip and `idx` being an index into the target component's `aliaskind` index -space. In particular, `outeridx` can be `0`, in which case the outer alias -refers to the current component. To maintain the acyclicity of module +there are two kinds of "targets" for an alias: the `export` of an instance and +a definition in an index space of an `outer` component (containing the current +component): +``` +core:alias ::= (alias ( ?)) +core:aliastarget ::= export + | outer + +alias ::= (alias ( ?)) +aliastarget ::= export + | outer +``` +The `core:sort`/`sort` immediate of the alias specifies which index space in +the target component is being read from and which index space of the containing +component is being added to. If present, the `id` of the alias is bound to the +new index added by the alias and can be used anywhere a normal `id` can be +used. + +In the case of `export` aliases, validation ensures `name` is an export in the +target instance and has a matching sort. + +In the case of `outer` aliases, the `varu32` pair serves as a [de Bruijn +index], with first `varu32` being the number of enclosing components to skip +and the second `varu32` being an index into the target component's sort's index +space. In particular, the first `varu32` can be `0`, in which case the outer +alias refers to the current component. To maintain the acyclicity of module instantiation, outer aliases are only allowed to refer to *preceding* outer definitions. Components containing outer aliases effectively produce a [closure] at instantiation time, including a copy of the outer-aliased definitions. Because -of the prevalent assumption that components are (stateless) *values*, outer -aliases are restricted to only refer to stateless definitions: components, -modules and types. (In the future, outer aliases to all kinds of definitions -could be allowed by recording the statefulness of the resulting component in -its type via some kind of "`stateful`" type attribute.) +of the prevalent assumption that components are immutable values, outer aliases +are restricted to only refer to immutable definitions: types, modules and +components. (In the future, outer aliases to all sorts of definitions could be +allowed by recording the statefulness of the resulting component in its type +via some kind of "`stateful`" type attribute.) Both kinds of aliases come with syntactic sugar for implicitly declaring them inline: -For `export` aliases, the inline sugar has the form `(kind +)` -and can be used anywhere a `kind` index appears in the AST. For example, the +For `export` aliases, the inline sugar has the form `(sort +)` +and can be used anywhere a `sort` index appears in the AST. For example, the following snippet uses an inline function alias: ```wasm -(instance $j (instantiate (component $J) (with "f" (func $i "f")))) -(export "x" (func $j "g" "h")) +(instance $j (instantiate $J (with "f" (func $i "f")))) +(export "x" (func (func $j "g" "h"))) ``` which is desugared into: ```wasm (alias export $i "f" (func $f_alias)) -(instance $j (instantiate (component $J) (with "f" (func $f_alias)))) +(instance $j (instantiate $J (with "f" (func $f_alias)))) (alias export $j "g" (instance $g_alias)) (alias export $g_alias "h" (func $h_alias)) (export "x" (func $h_alias)) @@ -234,129 +277,186 @@ definition, resolved using normal lexical scoping rules. For example, the following component: ```wasm (component - (module $M ...) + (core module $M ...) (component - (instance (instantiate (module $M))) + (core instance (instantiate $M)) ) ) ``` is desugared into: ```wasm (component $C - (module $M ...) + (core module $M ...) (component - (alias outer $C $M (module $C_M)) - (instance (instantiate (module $C_M))) + (core alias outer $C $M (module $C_M)) + (core instance (instantiate $C_M)) ) ) ``` Lastly, for symmetry with [imports][func-import-abbrev], aliases can be written -in an inverted form that puts the definition kind first: +in an inverted form that puts the sort first: ```wasm -(func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) ;; (existing) -(func $g (alias $i "g1")) ≡ (alias $i "g1" (func $g)) ;; (new) +(func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) (WebAssembly 1.0) +(func $g (alias export $i "g1")) ≡ (alias export $i "g1" (func $g)) +(core func $g (alias export $i "g1")) ≡ (core alias export $i "g1" (func $g)) ``` With what's defined so far, we're able to link modules with arbitrary renamings: ```wasm (component - (module $A + (core module $A (func (export "one") (result i32) (i32.const 1)) (func (export "two") (result i32) (i32.const 2)) (func (export "three") (result i32) (i32.const 3)) ) - (module $B + (core module $B (func (import "a" "one") (result i32)) ) - (instance $a (instantiate (module $A))) - (instance $b1 (instantiate (module $B) - (with "a" (instance $a)) ;; no renaming + (core instance $a (instantiate $A)) + (core instance $b1 (instantiate $B + (with "a" (instance $a)) ;; no renaming )) - (func $a_two (alias export $a "two")) ;; ≡ (alias export $a "two" (func $a_two)) - (instance $b2 (instantiate (module $B) + (core func $a_two (alias export $a "two")) ;; ≡ (core alias export $a "two" (func $a_two)) + (core instance $b2 (instantiate $B (with "a" (instance - (export "one" (func $a_two)) ;; renaming, using explicit alias + (export "one" (func $a_two)) ;; renaming, using out-of-line alias )) )) - (instance $b3 (instantiate (module $B) + (core instance $b3 (instantiate $B (with "a" (instance - (export "one" (func $a "three")) ;; renaming, using inline alias sugar + (export "one" (func $a "three")) ;; renaming, using inline alias sugar )) )) ) ``` -To show analogous examples of linking components, we'll first need to define -a new set of types and functions for components to use. +To show analogous examples of linking components, we'll need component-level +type and function definitions which are introduced in the next two sections. ### Type Definitions -The type grammar below defines two levels of types, with the second level -building on the first: -1. `intertype` (also referred to as "interface types" below): the set of - types of first-class, high-level values communicated across shared-nothing - component interface boundaries -2. `deftype`: the set of types of second-class component definitions which are - imported/exported at instantiation-time. - -The top-level `type` definition is used to define types out-of-line so that -they can be reused via `typeidx` by future definitions. -``` -type ::= (type ? ) -typeexpr ::= - | -deftype ::= - | - | - | - | -moduletype ::= (module ? *) -moduletype-def ::= - | - | (export ) -core:deftype ::= - | ... Post-MVP additions -componenttype ::= (component ? *) -componenttype-def ::= - | -import ::= (import ) -instancetype ::= (instance ? *) -instancetype-def ::= - | - | (export ) -functype ::= (func ? (param ? )* (result )) -valuetype ::= (value ? ) -intertype ::= unit | bool - | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 - | float32 | float64 - | char | string - | (record (field )*) - | (variant (case (refines )?)+) - | (list ) - | (tuple *) - | (flags *) - | (enum +) - | (union +) - | (option ) - | (expected ) -``` -On a technical note: this type grammar uses `` and `` -recursively to allow it to more-precisely indicate the kinds of types allowed. -The formal spec AST would instead use a `` with validation rules to -restrict the target type while the formal text format would use something like -[`core:typeuse`], allowing any of: (1) a `typeidx`, (2) an identifier `$T` -resolving to a type definition (using `(type $T)` in cases where there is a -grammatical ambiguity), or (3) an inline type definition that is desugared into -a deduplicated out-of-line type definition. - -On another technical note: the optional `id` in all the `deftype` type -constructors (e.g., `(module ? ...)`) is only allowed to be present in the -context of `import` since this is the only context in which binding an -identifier makes sense. - -Starting with interface types, the set of values allowed for the *fundamental* -interface types is given by the following table: +The syntax for defining core types extends the existing core type definition +syntax, adding a `module` type constructor: +``` +core:type ::= (type ? ) (GC proposal) +core:deftype ::= (WebAssembly 1.0) + | (GC proposal) + | (GC proposal) + | +core:moduletype ::= (module ? *) +core:moduledecl ::= + | + | + | +core:importdecl ::= (import ) +core:exportdecl ::= (export ) +core:externdesc ::= (WebAssembly 1.0) +``` +Here, `core:deftype` (short for "defined type") is inherited from the [gc] +proposal and extended with a `module` type constructor. If module-linking is +added to Core WebAssembly, an `instance` type constructor would be added as +well but, for now, it's left out since it's unnecessary. Also, in the MVP, +validation will reject nested `core:moduletype`, since, before module-linking, +core modules cannot themselves import or export other core modules. + +The body of a module type contains an ordered list of "module declarators" +which describe, at a type level, the imports and exports of the module. In a +module-type context, import and export declarators can both reuse the existing +[`core:importdesc`] production defined in WebAssembly 1.0. To avoid confusion, +`core:importdesc` is renamed to `core:externdesc` (for symmetry with +[`core:externtype`]). + +In preparation for the forthcoming addition of [type-imports] to Core +WebAssembly, module types start with an empty type index space so that the type +index space can be populated with fresh type definitions constructed from type +imports. Thus, `core:moduledecl` also includes a `type` declarator for defining +the types used by the `import` and `export` declarators. An `alias` declarator +is also necessary in the future for defining type-sharing constraints between +type imports. In the short-term, `alias` declarators are restricted to only +allowing `outer` `type` aliases, thereby enabling a module type to reuse a +parent's type definition instead of re-defining it locally. + +As an example, the following component defines two equivalent module types, +where the former defines the function via `type` declarator and the latter via +`alias` declarator. In both cases, the type is given index `0` since the module +type starts with an empty type index space. +```wasm +(component $C + (core type $M1 (module + (type (func (param i32) (result i32))) + (import "a" "b" (func (type 0))) + (export "c" (func (type 0))) + )) + (core type $F (func (param i32) (result i32))) + (core type $M2 (module + (alias outer $C $F (type)) + (import "a" "b" (func (type 0))) + (export "c" (func (type 0))) + )) +) +``` + +Component-level type definitions are symmetric to core-level type definitions, +but use a completely different set of value types. Unlike [`core:valtype`] +which is low-level and assumes a shared linear memory for communicating +compound values, component-level value types assume no shared memory and must +therefore be high-level, describing entire compound values. +``` +type ::= (type ? ) +deftype ::= + | + | + | +functype ::= (func ? (param ? )* (result )) +componenttype ::= (component ? *) +instancetype ::= (instance ? *) +componentdecl ::= + | +instancedecl ::= + | + | +importdecl ::= (import ) +exportdecl ::= (export ) +externdesc ::= core-prefix() + | + | + | + | (value ? ) + | (type ? ) +typebound ::= (eq ) +valtype ::= unit + | bool + | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 + | float32 | float64 + | char | string + | (record (field )*) + | (variant (case (refines )?)+) + | (list ) + | (tuple *) + | (flags *) + | (enum +) + | (union +) + | (option ) + | (expected ) +``` +This type grammar uses productions like `` and `` recursively +to allow it to more-precisely indicate what's allowed. The formal AST and +[binary format](Binary.md#type-definitions) instead use a `` with +validation rules to restrict the target type while the formal text format would +use something like [`core:typeuse`], allowing any of: (1) a `typeidx`, (2) an +identifier `$T` resolving to a type definition (using `(type $T)` in cases +where there is a grammatical ambiguity), or (3) an inline type definition that +is desugared into a deduplicated out-of-line type definition. + +The optional `id` after all the type constructors (e.g., `(module ? ...)`) +is only allowed to be present in the context of `import` since this is the only +context in which binding an identifier makes sense. + +The value types in `valtype` can be broken into two categories: *fundamental* +value types and *specialized* value types, where the latter are defined by +expansion into the former. The *fundamental value types* have the following +sets of abstract values: | Type | Values | | ------------------------- | ------ | | `bool` | `true` and `false` | @@ -364,11 +464,12 @@ interface types is given by the following table: | `u8`, `u16`, `u32`, `u64` | integers in the range [0, 2N-1] | | `float32`, `float64` | [IEEE754] floating-pointer numbers with a single, canonical "Not a Number" ([NaN]) value | | `char` | [Unicode Scalar Values] | -| `record` | heterogeneous [tuples] of named `intertype` values | -| `variant` | heterogeneous [tagged unions] of named `intertype` values | -| `list` | homogeneous, variable-length [sequences] of `intertype` values | +| `record` | heterogeneous [tuples] of named values | +| `variant` | heterogeneous [tagged unions] of named values | +| `list` | homogeneous, variable-length [sequences] of values | -NaN values are canonicalized to a single value so that: +The `float32` and `float64` values have their NaNs canonicalized to a single +value so that: 1. consumers of NaN values are free to use the rest of the NaN payload for optimization purposes (like [NaN boxing]) without needing to worry about whether the NaN payload bits were significant; and @@ -383,73 +484,64 @@ subtyping. In particular, a `variant` subtype can contain a `case` not present in the supertype if the subtype's `case` `refines` (directly or transitively) some `case` in the supertype. -The sets of values allowed for the remaining *specialized* interface types are +The sets of values allowed for the remaining *specialized value types* are defined by the following mapping: ``` - (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... - (flags *) ↦ (record (field bool)*) - unit ↦ (record) - (enum +) ↦ (variant (case unit)+) - (option ) ↦ (variant (case "none") (case "some" )) - (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... -(expected ) ↦ (variant (case "ok" ) (case "error" )) - string ↦ (list char) + (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... + (flags *) ↦ (record (field bool)*) + unit ↦ (record) + (enum +) ↦ (variant (case unit)+) + (option ) ↦ (variant (case "none") (case "some" )) + (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... +(expected ) ↦ (variant (case "ok" ) (case "error" )) + string ↦ (list char) ``` Note that, at least initially, variants are required to have a non-empty list of cases. This could be relaxed in the future to allow an empty list of cases, with the empty `(variant)` effectively serving as a [bottom type] and indicating unreachability. -Building on these interface types, there are four kinds of types describing the -four kinds of importable/exportable component definitions. (In the future, a -fifth type will be added for [resource types][Resource and Handle Types].) - -A `functype` describes a component function whose parameters and results are -`intertype` values. Thus `functype` is completely disjoint from -[`core:functype`] in the WebAssembly Core spec, whose parameters and results -are [`core:valtype`] values. As a low-level compiler target, `core:functype` -returns zero or more results. In contrast, as a high-level interface type -designed to be maximally bound to a variety of source languages, `functype` -always returns a single type, with `unit` being used for functions that don't -return an interesting value (analogous to "void" in some languages). As -syntactic sugar, the text format of `functype` additionally allows `result` to -be absent, interpreting this as `(result unit)`. Since `core:functype` can only -appear syntactically within a `(module ...)` S-expression, there is never a -need to syntactically distinguish `functype` from `core:functype` in the text -format: the context dictates which one a `(func ...)` S-expression parses into. - -A `valuetype` describes a single `intertype` value that is to be consumed -exactly once during component instantiation. How this happens is described +The remaining 5 type constructors use `valtype` to complete the description +of a shared-nothing component interface: + +The `func` type constructor describes a component-level function definition +that takes and returns component-level value types. In contrast to +[`core:functype`] which, as a low-level compiler target for a stack machine, +returns zero or more results, `functype` always returns a single type, with +`unit` being used for functions that don't return an interesting value +(analogous to "void" in some languages). Having a single return type simplifies +the binding of `functype` into a wide variety of source languages. As syntactic +sugar, the text format of `functype` additionally allows `result` to be absent, +interpreting this as `(result unit)`. + +The `component` type constructor is symmetric to the core `module` type +constructor, although its grammar is factored to share declarators with the +`instance` type constructor. The `import` and `export` declarator names +must be distinct within a single type. + +The `externdesc` production (used to declare the types of imported/exported +values) includes two additional type constructors that are not currently +present in `deftype` (since there is currently no reason for allowing them to +be shared or named as type definitions): + +The `value` case describes an imported or exported `valtype` value that is to +be consumed exactly once during instantiation. How this happens is described below along with [`start` definitions](#start-definitions). -As described above, components and modules are immutable values representing -code that cannot be run until instantiated via `instance` definition. Thus, -`moduletype` and `componenttype` describe *uninstantiated code*. `moduletype` -and `componenttype` contain not just import and export definitions, but also -type and alias definitions, allowing them to capture type sharing relationships -between imports and exports. This type sharing becomes necessary (not just a -size optimization) with the upcoming addition of [type imports and exports] to -Core WebAssembly and, symmetrically, [resource and handle types] to the -Component Model. - -The `instancetype` type constructor describes component instances, which are -named tuples of other definitions. Although `instance` definitions can produce -both module *and* component instances, only *component* instances can be -imported or exported (due to the overall [shared-nothing design](../high-level/Choices.md) -of the Component Model) and thus only *component* instances need explicit type -definitions. Consequently, the text format of `instancetype` does not include -a syntax for defining *module* instance types. As with `componenttype` and -`moduletype`, `instancetype` allows nested type and alias definitions to allow -type sharing. - -Lastly, to ensure cross-language interoperability, `moduletype`, -`componenttype` and `instancetype` all require import and export names to be -unique (within a particular module, component, instance or type thereof). In -the case of `moduletype` and two-level imports, this translates to requiring -that import name *pairs* must be *pair*-wise unique. Since the current Core -WebAssembly validation rules allow duplicate imports, this means that some -valid modules will not be typeable and will fail validation if used with the -Component Model. +The `type` case describes an imported or exported type along with its bounds, +which currently only has an `eq` option that says that the imported/exported +type must be exactly equal to the given immediate type. There are two main use +cases for this in the short-term: +* Type exports allow a component or interface to associate a name with a + structural type (e.g., `(export "nanos" (type (eq u64)))`) which bindings + generators can use to generate type aliases (e.g., `typedef uint64_t nanos;`). +* Type imports and exports allow a component to explicitly specify the + type parameters used to monomorphize a generic interface being imported + or exported. + +When [resource and handle types] are added to the explainer, `typebound` will +be extended with a `sub` option (symmetric to the [type-imports] proposal) that +allows importing and exporting *abstract* types. With what's defined so far, we can define component types using a mix of inline and out-of-line type definitions: @@ -462,52 +554,50 @@ and out-of-line type definitions: (alias outer $C $T (type $C_T)) (type $L (list $C_T)) (import "f" (func (param $L) (result (list u8)))) - (import "g" $G) - (export "g" $G) + (import "g" (func (type $G))) + (export "g" (func (type $G))) (export "h" (func (result $U))) )) ) ``` -Note that the inline use of `$G` and `$U` are inline `outer` aliases. +Note that the inline use of `$G` and `$U` are syntactic sugar for `outer` +aliases. -### Function Definitions +### Canonical Definitions -To implement or call interface-typed functions, we need to be able to cross a +To implement or call a component-level function, we need to cross a shared-nothing boundary. Traditionally, this problem is solved by defining a -serialization format for copying data across the boundary. The Component Model -MVP takes roughly this same approach, defining a linear-memory-based [ABI] -called the "Canonical ABI" which specifies, for any interface function type, a -[corresponding](CanonicalABI.md#flattening) core function type and -[rules](CanonicalABI.md#lifting-and-lowering) for copying values into or out of -linear memory. The Component Model differs from traditional approaches, though, -in that the ABI is configurable, allowing different memory representations for -the same abstract value. In the MVP, this configurability is limited to the -small set of `canonopt` shown below. However, Post-MVP, [adapter functions] -could be added to allow far more programmatic control. +serialization format. The Component Model MVP uses roughly this same approach, +defining a linear-memory-based [ABI] called the "Canonical ABI" which +specifies, for any `functype`, a [corresponding](CanonicalABI.md#flattening) +`core:functype` and [rules](CanonicalABI.md#lifting-and-lowering) for copying +values into and out of linear memory. The Component Model differs from +traditional approaches, though, in that the ABI is configurable, allowing +multiple different memory representations of the same abstract value. In the +MVP, this configurability is limited to the small set of `canonopt` shown +below. However, Post-MVP, [adapter functions] could be added to allow far more +programmatic control. The Canonical ABI is explicitly applied to "wrap" existing functions in one of two directions: -* `canon.lift` wraps a core function (of type `core:functype`) inside the - current component to produce a component function (of type `functype`) - that can be exported to other components. -* `canon.lower` wraps a component function (of type `functype`) that can - have been imported from another component to produce a core function (of type - `core:functype`) that can be imported and called from Core WebAssembly code - within the current component. - -Function definitions specify one of these two wrapping directions along with a -set of Canonical ABI configuration options. -``` -func ::= (func ? ) -funcbody ::= (canon.lift * ) - | (canon.lower * ) -canonopt ::= string-encoding=utf8 - | string-encoding=utf16 - | string-encoding=latin1+utf16 - | (memory ) - | (realloc ) - | (post-return ) +* `lift` wraps a core function (of type `core:functype`) to produce a component + function (of type `functype`) that can be passed to other components. +* `lower` wraps a component function (of type `functype`) to produce a core + function (of type `core:functype`) that can be imported and called from Core + WebAssembly code inside the current component. + +Canonical definitions specify one of these two wrapping directions, the function +to wrap and a list of configuration options: +``` +canon ::= (canon lift core-prefix() * (func ?)) + | (canon lower * (core func ?)) +canonopt ::= string-encoding=utf8 + | string-encoding=utf16 + | string-encoding=latin1+utf16 + | (memory core-prefix()) + | (realloc core-prefix()) + | (post-return core-prefix()) ``` The `string-encoding` option specifies the encoding the Canonical ABI will use for the `string` type. The `latin1+utf16` encoding captures a common string @@ -518,12 +608,12 @@ Point range) or UTF-16 (which can express all Code Points, but uses either default is UTF-8. It is a validation error to include more than one `string-encoding` option. -The `(memory )` option specifies the memory that the Canonical ABI will +The `(memory ...)` option specifies the memory that the Canonical ABI will use to load and store values. If the Canonical ABI needs to load or store, validation requires this option to be present (there is no default). -The `(realloc )` option specifies a core function that is validated to -have the following signature: +The `(realloc ...)` option specifies a core function that is validated to +have the following core function type: ```wasm (func (param $originalPtr i32) (param $originalSize i32) @@ -535,22 +625,22 @@ The Canonical ABI will use `realloc` both to allocate (passing `0` for the first two parameters) and reallocate. If the Canonical ABI needs `realloc`, validation requires this option to be present (there is no default). -The `(post-return )` option may only be present in `canon.lift` and -specifies a core function to be called with the original return values after -they have finished being read, allowing memory to be deallocated and +The `(post-return ...)` option may only be present in `canon lift` +and specifies a core function to be called with the original return values +after they have finished being read, allowing memory to be deallocated and destructors called. This immediate is always optional but, if present, is validated to have parameters matching the callee's return type and empty results. -Based on this description of the AST, the [Canonical ABI explainer][Canonical ABI] -gives a detailed walkthrough of the static and dynamic semantics of -`canon.lift` and `canon.lower`. +Based on this description of the AST, the [Canonical ABI explainer][Canonical +ABI] gives a detailed walkthrough of the static and dynamic semantics of `lift` +and `lower`. -One high-level consequence of the dynamic semantics of `canon.lift` given in +One high-level consequence of the dynamic semantics of `canon lift` given in the Canonical ABI explainer is that component functions are different from core functions in that all control flow transfer is explicitly reflected in their -type. For example, with Core WebAssembly [exception handling] and -[stack switching], a core function with type `(func (result i32))` can return +type. For example, with Core WebAssembly [exception-handling] and +[stack-switching], a core function with type `(func (result i32))` can return an `i32`, throw, suspend or trap. In contrast, a component function with type `(func (result string))` may only return a `string` or trap. To express failure, component functions can return `expected` and languages with exception @@ -558,23 +648,33 @@ handling can bind exceptions to the `error` case. Similarly, the forthcoming addition of [future and stream types] would explicitly declare patterns of stack-switching in component function signatures. -Using function definitions, we can finally write a non-trivial component that +Similar to the `import` and `alias` abbreviations shown above, `canon` +definitions can also be written in an inverted form that puts the sort first: +```wasm + (func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) (WebAssembly 1.0) + (func $h (canon lift ...)) ≡ (canon lift ... (func $h)) +(core func $h (canon lower ...)) ≡ (canon lower ... (core func $h)) +``` +Note: in the future, `canon` may be generalized to define other sorts than +functions (such as types), hence the explicit `sort`. + +Using canonical definitions, we can finally write a non-trivial component that takes a string, does some logging, then returns a string. ```wasm (component (import "wasi:logging" (instance $logging (export "log" (func (param string))) )) - (import "libc" (module $Libc + (import "libc" (core module $Libc (export "mem" (memory 1)) (export "realloc" (func (param i32 i32) (result i32))) )) - (instance $libc (instantiate (module $Libc))) - (func $log (canon.lower - (memory (memory $libc "mem")) (realloc (func $libc "realloc")) + (core instance $libc (instantiate $Libc)) + (core func $log (canon lower (func $logging "log") + (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "realloc" (func (param i32 i32) (result i32))) (import "wasi:logging" "log" (func $log (param i32 i32))) @@ -582,14 +682,14 @@ takes a string, does some logging, then returns a string. ... (call $log) ... ) ) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate $Main (with "libc" (instance $libc)) (with "wasi:logging" (instance (export "log" (func $log)))) )) - (func (export "run") (canon.lift + (func (export "run") (canon lift + (core func $main "run") (func (param string) (result string)) - (memory (memory $libc "mem")) (realloc (func $libc "realloc")) - (func $main "run") + (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) ) ``` @@ -597,81 +697,76 @@ This example shows the pattern of splitting out a reusable language runtime module (`$Libc`) from a component-specific, non-reusable module (`$Main`). In addition to reducing code size and increasing code-sharing in multi-component scenarios, this separation allows `$libc` to be created first, so that its -exports are available for reference by `canon.lower`. Without this separation +exports are available for reference by `canon lower`. Without this separation (if `$Main` contained the `memory` and allocation functions), there would be a -cyclic dependency between `canon.lower` and `$Main` that would have to be -broken by the toolchain emitting an auxiliary module that broke the cycle using -a shared `funcref` table and `call_indirect`. +cyclic dependency between `canon lower` and `$Main` that would have to be +broken using an auxiliary module performing `call_indirect`. ### Start Definitions Like modules, components can have start functions that are called during instantiation. Unlike modules, components can call start functions at multiple -points during instantiation with each such call having interface-typed -parameters and results. Thus, `start` definitions in components look like -function calls: +points during instantiation with each such call having parameters and results. +Thus, `start` definitions in components look like function calls: ``` start ::= (start (value )* (result (value ))?) ``` The `(value )*` list specifies the arguments passed to `funcidx` by indexing into the *value index space*. Value definitions (in the value index -space) are like immutable `global` definitions in Core WebAssembly except they -must be consumed exactly once at instantiation-time. +space) are like immutable `global` definitions in Core WebAssembly except that +validation requires them to be consumed exactly once at instantiation-time +(i.e., they are [linear]). -As with any other definition kind, value definitions may be supplied to -components through `import` definitions. Using the grammar of `import` already -defined [above](#type-definitions), an example *value import* can be written: +As with all definition sorts, values may be imported and exported by +components. As an example value import: ``` (import "env" (value $env (record (field "locale" (option string))))) ``` As this example suggests, value imports can serve as generalized [environment -variables], allowing not just `string`, but the full range of interface types -to describe the imported configuration schema. +variables], allowing not just `string`, but the full range of `valtype`. With this, we can define a component that imports a string and computes a new -exported string, all at instantiation time: +exported string at instantiation time: ```wasm (component (import "name" (value $name string)) - (import "libc" (module $Libc + (import "libc" (core module $Libc (export "memory" (memory 1)) (export "realloc" (func (param i32 i32 i32 i32) (result i32))) )) - (instance $libc (instantiate (module $Libc))) - (module $Main + (core instance $libc (instantiate $Libc)) + (core module $Main (import "libc" ...) (func (export "start") (param i32 i32) (result i32 i32) ... general-purpose compute ) ) - (instance $main (instantiate (module $Main) (with "libc" (instance $libc)))) - (func $start (canon.lift + (core instance $main (instantiate $Main (with "libc" (instance $libc)))) + (func $start (canon lift + (core func $main "start") (func (param string) (result string)) - (memory (memory $libc "mem")) (realloc (func $libc "realloc")) - (func $main "start") + (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) (start $start (value $name) (result (value $greeting))) (export "greeting" (value $greeting)) ) ``` As this example shows, start functions reuse the same Canonical ABI machinery -as normal imports and exports for getting interface typed values into and out -of linear memory. +as normal imports and exports for getting component-level values into and out +of core linear memory. ### Import and Export Definitions -The rules for [`import`](#type-definitions) and [`export`](#instance-definitions) -definitions have actually already been defined above (with the caveat that the -real text format for `import` definitions would additionally allow binding an -identifier (e.g., adding the `$foo` in `(import "foo" (func $foo))`): +Lastly, imports and exports are defined in terms of the above as: ``` -import ::= already defined above as part of -export ::= already defined above as part of +import ::= (import ) +export ::= (export ) ``` +All import and export names within a component must be unique, respectively. -With what's defined so far, we can define a component that imports, links and +With what's defined so far, we can write a component that imports, links and exports other components: ```wasm (component @@ -684,10 +779,10 @@ exports other components: )) (export "g" (func (result string))) )) - (instance $d1 (instantiate (component $D) + (instance $d1 (instantiate $D (with "c" (instance $c)) )) - (instance $d2 (instantiate (component $D) + (instance $d2 (instantiate $D (with "c" (instance (export "f" (func $d1 "g")) )) @@ -706,11 +801,11 @@ note that all definitions are acyclic as is the resulting instance graph. As a consequence of the shared-nothing design described above, all calls into or out of a component instance necessarily transit through a component function definition. Thus, component functions form a "membrane" around the collection -of module instances contained by a component instance, allowing the Component -Model to establish invariants that increase optimizability and composability in -ways not otherwise possible in the shared-everything setting of Core -WebAssembly. The Component Model proposes establishing the following three -runtime invariants: +of core module instances contained by a component instance, allowing the +Component Model to establish invariants that increase optimizability and +composability in ways not otherwise possible in the shared-everything setting +of Core WebAssembly. The Component Model proposes establishing the following +three runtime invariants: 1. Components define a "lockdown" state that prevents continued execution after a trap. This both prevents continued execution with corrupt state and also allows more-aggressive compiler optimizations (e.g., store reordering). @@ -754,8 +849,8 @@ these same JS API functions to accept component binaries and produce new `WebAssembly.Component` objects that represent decoded and validated components. The [binary format of components](Binary.md) is designed to allow modules and components to be distinguished by the first 8 bytes of the binary -(splitting the 32-bit [`version`] field into a 16-bit `version` field and a -16-bit `kind` field with `0` for modules and `1` for components). +(splitting the 32-bit [`core:version`] field into a 16-bit `version` field and +a 16-bit `layer` field with `0` for modules and `1` for components). Once compiled, a `WebAssemby.Component` could be instantiated using the existing JS API `WebAssembly.instantiate(Streaming)`. Since components have the @@ -768,7 +863,7 @@ instantiated module, `WebAssembly.instantiate` would always produce a Lastly, when given a component binary, the compile-then-instantiate overloads of `WebAssembly.instantiate(Streaming)` would inherit the compound behavior of -the abovementioned functions (again, using the `version` field to eagerly +the abovementioned functions (again, using the `layer` field to eagerly distinguish between modules and components). For example, the following component: @@ -779,7 +874,7 @@ For example, the following component: (import "two" (value string)) (import "three" (instance (export "four" (instance - (export "five" (module + (export "five" (core module (import "six" "a" (func)) (import "six" "b" (func)) )) @@ -812,11 +907,11 @@ WebAssembly.instantiateStreaming(fetch('./a.wasm'), { The other significant addition to the JS API would be the expansion of the set of WebAssembly types coerced to and from JavaScript values (by [`ToJSValue`] -and [`ToWebAssemblyValue`]) to include all of [`intertype`](#type-definitions). +and [`ToWebAssemblyValue`]) to include all of [`valtype`](#type-definitions). At a high level, the additional coercions would be: -| Interface Type | `ToJSValue` | `ToWebAssemblyValue` | -| -------------- | ----------- | -------------------- | +| Type | `ToJSValue` | `ToWebAssemblyValue` | +| ---- | ----------- | -------------------- | | `unit` | `null` | accept everything | | `bool` | `true` or `false` | `ToBoolean` | | `s8`, `s16`, `s32` | as a Number value | `ToInt32` | @@ -852,8 +947,8 @@ Notes: ### ESM-integration -Like the JS API, [ESM-integration] can be extended to load components in all -the same places where modules can be loaded today, branching on the `kind` +Like the JS API, [esm-integration] can be extended to load components in all +the same places where modules can be loaded today, branching on the `layer` field in the binary format to determine whether to decode as a module or a component. The main question is how to deal with component imports having a single string as well as the new importable component, module and instance @@ -927,20 +1022,21 @@ and will be added over the coming months to complete the MVP proposal: [Structure Section]: https://webassembly.github.io/spec/core/syntax/index.html -[`core:module`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-module -[`core:export`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-export -[`core:import`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-import -[`core:importdesc`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-importdesc -[`core:functype`]: https://webassembly.github.io/spec/core/syntax/types.html#syntax-functype -[`core:valtype`]: https://webassembly.github.io/spec/core/syntax/types.html#value-types - [Text Format Section]: https://webassembly.github.io/spec/core/text/index.html +[Binary Format Section]: https://webassembly.github.io/spec/core/binary/index.html + +[Index Space]: https://webassembly.github.io/spec/core/syntax/modules.html#indices [Abbreviations]: https://webassembly.github.io/spec/core/text/conventions.html#abbreviations + +[`core:module`]: https://webassembly.github.io/spec/core/text/modules.html#text-module +[`core:type`]: https://webassembly.github.io/spec/core/text/modules.html#types +[`core:importdesc`]: https://webassembly.github.io/spec/core/text/modules.html#text-importdesc +[`core:externtype`]: https://webassembly.github.io/spec/core/syntax/types.html#external-types +[`core:valtype`]: https://webassembly.github.io/spec/core/text/types.html#value-types [`core:typeuse`]: https://webassembly.github.io/spec/core/text/modules.html#type-uses +[`core:functype`]: https://webassembly.github.io/spec/core/text/types.html#function-types [func-import-abbrev]: https://webassembly.github.io/spec/core/text/modules.html#text-func-abbrev - -[Binary Format Section]: https://webassembly.github.io/spec/core/binary/index.html -[`version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version +[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version [JS API]: https://webassembly.github.io/spec/js-api/index.html [*read the imports*]: https://webassembly.github.io/spec/js-api/index.html#read-the-imports @@ -958,7 +1054,6 @@ and will be added over the coming months to complete the MVP proposal: [Module Specifier]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-ModuleSpecifier [Named Imports]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-NamedImports [Imported Default Binding]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-ImportedDefaultBinding - [JS Tuple]: https://github.com/tc39/proposal-record-tuple [JS Record]: https://github.com/tc39/proposal-record-tuple @@ -974,16 +1069,19 @@ and will be added over the coming months to complete the MVP proposal: [Sequences]: https://en.wikipedia.org/wiki/Sequence [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface [Environment Variables]: https://en.wikipedia.org/wiki/Environment_variable +[Linear]: https://en.wikipedia.org/wiki/Substructural_type_system#Linear_type_systems -[Module Linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md -[Interface Types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md -[Type Imports and Exports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md -[Exception Handling]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md -[Stack Switching]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Overview.md -[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[module-linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md +[interface-types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md +[type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md +[exception-handling]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md +[stack-switching]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Overview.md +[esm-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[gc]: https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions [Canonical ABI]: CanonicalABI.md +[Shared-Nothing]: ../high-level/Choices.md [`wizer`]: https://github.com/bytecodealliance/wizer diff --git a/design/mvp/FutureFeatures.md b/design/mvp/FutureFeatures.md index cf986b6..360a77e 100644 --- a/design/mvp/FutureFeatures.md +++ b/design/mvp/FutureFeatures.md @@ -15,23 +15,22 @@ serialization format, as this often incurs extra copying when the source or destination language-runtime data structures don't precisely match the fixed serialization format. A significant amount of work was spent designing a language of [adapter functions] that provided fairly general programmatic -control over the process of serializing and deserializing interface-typed values. +control over the process of serializing and deserializing high-level values. (The Interface Types Explainer currently contains a snapshot of this design.) However, a significant amount of additional design work remained, including (likely) changing the underlying semantic foundations from lazy evaluation to algebraic effects. -In pursuit of a timely MVP and as part of the overall [scoping and layering proposal], -the goal of avoiding a fixed serialization format was dropped from the MVP, by -instead defining a [Canonical ABI](CanonicalABI.md) in the MVP. However, the -current design of [function definitions](Explainer.md#function-definitions) -anticipates a future extension whereby function bodies can contain not just the -fixed Canonical ABI-following `canon.lift` and `canon.lower` but, -alternatively, general adapter function code. +In pursuit of a timely MVP and as part of the overall [scoping and layering +proposal], the goal of avoiding a fixed serialization format was dropped from +the MVP by instead defining a [Canonical ABI](CanonicalABI.md) in the MVP. +However, the current design anticipates a future extension whereby lifting and +lowering functions can be generated not just from `canon lift` and `canon +lower`, but, alternatively, general-purpose serialization/deserialization code. -In this future state, `canon.lift` and `canon.lower` could be specified by -simple expansion into the adapter code, making these instructions effectively -macros. However, even in this future state, there is still concrete value in +In this future state, `canon lift` and `canon lower` could be specified by +simple expansion into the general-purpose code, making these instructions +effectively macros. However, even in this future state, there is still value in having a fixedly-defined Canonical ABI as it allows more-aggressive optimization of calls between components (which both use the Canonical ABI) and between a component and the host (which often must use a fixed ABI for calling @@ -53,8 +52,8 @@ Additionally, having two similar-but-different, partially-overlapping concepts makes the whole proposal harder to explain. Thus, the MVP drops the concept of "adapter modules", including only shared-nothing "components". However, if concrete future use cases emerged for creating modules that partially used -interface types and partially shared linear memory, "adapter modules" could be -added as a future feature. +shared-nothing component values and partially shared linear memory, "adapter +modules" could be added as a future feature. ## Shared-everything Module Linking in Core WebAssembly diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index 608dc08..7114f05 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -6,7 +6,7 @@ But roughly speaking: | Type | Subtyping | | ------------------------- | --------- | -| `unit` | every interface type is a subtype of `unit` | +| `unit` | every value type is a subtype of `unit` | | `bool` | | | `s8`, `s16`, `s32`, `s64`, `u8`, `u16`, `u32`, `u64` | lossless coercions are allowed | | `float32`, `float64` | `float32 <: float64` | @@ -20,5 +20,5 @@ But roughly speaking: | `union` | `T <: (union ... T ...)` | | `func` | parameter names must match in order; contravariant parameter subtyping; superfluous parameters can be ignored in the subtype; `option` parameters can be ignored in the supertype; covariant result subtyping | -The remaining specialized interface types inherit their subtyping from their -fundamental interface types. +The remaining specialized value types inherit their subtyping from their +fundamental value types. diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 949ae02..183ed04 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -19,74 +19,74 @@ def trap_if(cond): if cond: raise Trap() -class InterfaceType: pass -class Unit(InterfaceType): pass -class Bool(InterfaceType): pass -class S8(InterfaceType): pass -class U8(InterfaceType): pass -class S16(InterfaceType): pass -class U16(InterfaceType): pass -class S32(InterfaceType): pass -class U32(InterfaceType): pass -class S64(InterfaceType): pass -class U64(InterfaceType): pass -class Float32(InterfaceType): pass -class Float64(InterfaceType): pass -class Char(InterfaceType): pass -class String(InterfaceType): pass +class ValType: pass +class Unit(ValType): pass +class Bool(ValType): pass +class S8(ValType): pass +class U8(ValType): pass +class S16(ValType): pass +class U16(ValType): pass +class S32(ValType): pass +class U32(ValType): pass +class S64(ValType): pass +class U64(ValType): pass +class Float32(ValType): pass +class Float64(ValType): pass +class Char(ValType): pass +class String(ValType): pass @dataclass -class List(InterfaceType): - t: InterfaceType +class List(ValType): + t: ValType @dataclass class Field: label: str - t: InterfaceType + t: ValType @dataclass -class Record(InterfaceType): +class Record(ValType): fields: [Field] @dataclass -class Tuple(InterfaceType): - ts: [InterfaceType] +class Tuple(ValType): + ts: [ValType] @dataclass -class Flags(InterfaceType): +class Flags(ValType): labels: [str] @dataclass class Case: label: str - t: InterfaceType + t: ValType refines: str = None @dataclass -class Variant(InterfaceType): +class Variant(ValType): cases: [Case] @dataclass -class Enum(InterfaceType): +class Enum(ValType): labels: [str] @dataclass -class Union(InterfaceType): - ts: [InterfaceType] +class Union(ValType): + ts: [ValType] @dataclass -class Option(InterfaceType): - t: InterfaceType +class Option(ValType): + t: ValType @dataclass -class Expected(InterfaceType): - ok: InterfaceType - error: InterfaceType +class Expected(ValType): + ok: ValType + error: ValType @dataclass class Func: - params: [InterfaceType] - result: InterfaceType + params: [ValType] + result: ValType ### Despecialization @@ -603,9 +603,9 @@ def flatten(functype, context): flat_results = flatten_type(functype.result) if len(flat_results) > MAX_FLAT_RESULTS: match context: - case 'canon.lift': + case 'lift': flat_results = ['i32'] - case 'canon.lower': + case 'lower': flat_params += ['i32'] flat_results = [] @@ -869,7 +869,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): flat_vals += lower_flat(opts, vs[i], ts[i]) return flat_vals -### `canon.lift` +### `lift` class Instance: may_leave = True @@ -898,7 +898,7 @@ def post_return(): return (result, post_return) -### `canon.lower` +### `lower` def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): trap_if(not caller_instance.may_leave) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 9e6bb0c..8f270bd 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -312,13 +312,13 @@ def test_flatten(t, params, results): if len(results) > definitions.MAX_FLAT_RESULTS: expect['results'] = ['i32'] - got = flatten(t, 'canon.lift') + got = flatten(t, 'lift') assert(got == expect) if len(results) > definitions.MAX_FLAT_RESULTS: expect['params'] += ['i32'] expect['results'] = [] - got = flatten(t, 'canon.lower') + got = flatten(t, 'lower') assert(got == expect) test_flatten(Func([U8(),Float32(),Float64()],Unit()), ['i32','f32','f64'], []) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 0957faa..30f7590 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -157,10 +157,10 @@ would look like: (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) - (func (export "zip") (canon.lift + (func (export "zip") (canon lift + (func $main "zip") (func (param (list u8)) (result (list u8))) (memory (memory $libc "memory")) (realloc (func $libc "realloc")) - (func $main "zip") )) ) ``` @@ -236,10 +236,10 @@ component-aware `clang`, the resulting component would look like: (with "libc" (instance $libc)) (with "libimg" (instance $libimg)) )) - (func (export "transform") (canon.lift + (func (export "transform") (canon lift + (func $main "transform") (func (param (list u8)) (result (list u8))) (memory (memory $libc "memory")) (realloc (func $libc "realloc")) - (func $main "transform") )) ) ``` @@ -283,23 +283,23 @@ components. The resulting component could look like: )) (instance $libc (instantiate (module $Libc))) - (func $zip (canon.lower - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $zip (canon lower (func $zipper "zip") - )) - (func $transform (canon.lower (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + )) + (func $transform (canon lower (func $imgmgk "transform") + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) (instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "zipper" (instance (export "zip" (func $zipper "zip")))) (with "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) - (func (export "run") (canon.lift + (func (export "run") (canon lift + (func $main "run") (func (param string) (result string)) (memory (memory $libc "memory")) (realloc (func $libc "realloc")) - (func $main "run") )) ) ``` From 6e78729e14b0f7cbf1bf83670515de87fde1b518 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 3 May 2022 13:11:05 -0500 Subject: [PATCH 046/301] Restore value type binary encoding, refactor type grammar slightly --- design/mvp/Binary.md | 51 +++++++++---------- design/mvp/Explainer.md | 106 ++++++++++++++++++++-------------------- 2 files changed, 78 insertions(+), 79 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index a37f4c5..7da2ee9 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -73,7 +73,7 @@ instanceexpr ::= 0x00 c: arg*:vec() => (i | 0x01 e*:vec() => e* instantiatearg ::= n: si: => (with n si) sortidx ::= sort: idx: => (sort idx) -sort ::= 0x00 si: => si +sort ::= 0x00 => core module | 0x01 => func | 0x02 => value | 0x03 => type @@ -150,30 +150,12 @@ Notes: ``` type ::= dt: => (type dt) -deftype ::= vt: => vt +deftype ::= dvt: => dvt | ft: => ft + | tt: => tt | ct: => ct | it: => it -functype ::= 0x40 param*:vec() t: => (func param* (result t)) -param ::= 0x00 t: => (param t) - | 0x01 n: t: => (param n t) -componenttype ::= 0x41 cd*:vec() => (component cd*) -instancetype ::= 0x42 id*:vec() => (instance id*) -componentdecl ::= 0x00 id: => id - | id: => id -instancedecl ::= 0x01 t: => t - | 0x02 a: => a - | 0x03 ed: => ed -importdecl ::= n: ed: => (import n ed) -exportdecl ::= n: ed: => (export n ed) -externdesc ::= 0x00 i: => core-type-index-space[i] (must be moduletype) - | 0x01 i: => type-index-space[i] (must be func|instance|componenttype) - | 0x02 t: => (value t) - | 0x03 tb: => (type tb) -typebound ::= 0x00 i: => (eq type-index-space[i]) (any deftype) - | 0x00 t: => (eq t) -valtype ::= i: => type-index-space[i] (must be valtype) - | 0x7f => unit +primvaltype ::= 0x7f => unit | 0x7e => bool | 0x7d => s8 | 0x7c => u8 @@ -187,6 +169,7 @@ valtype ::= i: => type-index-space[i] ( | 0x74 => float64 | 0x73 => char | 0x72 => string +defvaltype ::= pvt: => pvt | 0x71 field*:vec() => (record field*) | 0x70 case*:vec() => (variant case*) | 0x6f t: => (list t) @@ -196,9 +179,27 @@ valtype ::= i: => type-index-space[i] ( | 0x6b t*:vec() => (union t*) | 0x6a t: => (option t) | 0x69 t: u: => (expected t u) +valtype ::= i: => type-index-space[i] (must be defvaltype) + | pit: => pit field ::= n: t: => (field n t) case ::= n: t: 0x0 => (case n t) | n: t: 0x1 i: => (case n t (refines case-label[i])) +typetype ::= tb: (type tb) +typebound ::= 0x00 i: => (eq type-index-space[i]) +functype ::= 0x40 param*:vec() t: => (func param* (result t)) +param ::= 0x00 t: => (param t) + | 0x01 n: t: => (param n t) +componenttype ::= 0x41 cd*:vec() => (component cd*) +instancetype ::= 0x42 id*:vec() => (instance id*) +componentdecl ::= 0x00 id: => id + | id: => id +instancedecl ::= 0x01 t: => t + | 0x02 a: => a + | 0x03 ed: => ed +importdecl ::= n: et: => (import n et) +exportdecl ::= n: et: => (export n et) +externtype ::= 0x00 i: => (core module core-type-index-space[i]) + | sort: i: => (sort type-index-space[i]) (sort must match type) ``` Notes: * The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, @@ -209,10 +210,6 @@ Notes: * As described in the explainer, each component and instance type is validated with an initially-empty type index space. Outer aliases can be used to pull in type definitions from containing components. -* The rule for `typebound` contains both an unrestricted `` case and, - within `valtype`, a `valtype`-restricted `` case. Since the former - is a strict generalization of the latter, there is no ambiguity. The net - effect is that `eq` accepts all types. ## Canonical Definitions @@ -273,7 +270,7 @@ flags are set. (See [Import and Export Definitions](Explainer.md#import-and-export-definitions) in the explainer.) ``` -import ::= n: ed: => (import n ed) +import ::= n: et: => (import n et) export ::= n: si: => (export n si) ``` Notes: diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 6eb1df9..5994aba 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -190,7 +190,7 @@ instanceexpr ::= (instantiate *) instantiatearg ::= (with ) | (with (instance *)) sortidx ::= ( ) -sort ::= core-prefix() +sort ::= core module | func | value | type @@ -201,11 +201,14 @@ export ::= (export ) Because component-level function, type and instance definitions are different than core-level function, type and instance definitions, they are put into disjoint index spaces which are indexed separately by `sortidx` and -`core:sortidx`, respectively. Components may import or export core modules -(since core modules are immutable values and thus do not break the -[shared-nothing] model) and so `sortidx` includes `core:sortidx` (which -validation then restricts to core modules; in the future, other immutable core -definitions could be allowed, such as `data` segments). +`core:sortidx`, respectively. Components may also import or export core modules +since core modules are immutable values and thus do not break the +[shared-nothing] model. In the future, other immutable core sorts could be +added to this list such as, if it was made importable/exportable, `data`. + +The `value` sort refers to a value that is provided and consumed during +instantiation. How this works is described in the +[start definitions](#start-definitions) section. To see a non-trivial example of component instantiation, we'll first need to introduce a few other definitions below that allow components to import, define @@ -405,26 +408,11 @@ therefore be high-level, describing entire compound values. ``` type ::= (type ? ) deftype ::= - | - | - | -functype ::= (func ? (param ? )* (result )) -componenttype ::= (component ? *) -instancetype ::= (instance ? *) -componentdecl ::= - | -instancedecl ::= - | - | -importdecl ::= (import ) -exportdecl ::= (export ) -externdesc ::= core-prefix() - | + | +nonvaltype ::= + | | | - | (value ? ) - | (type ? ) -typebound ::= (eq ) valtype ::= unit | bool | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 @@ -439,10 +427,25 @@ valtype ::= unit | (union +) | (option ) | (expected ) +functype ::= (func ? (param ? )* (result )) +typetype ::= (type ? ) +typebound ::= (eq ) +componenttype ::= (component ? *) +instancetype ::= (instance ? *) +componentdecl ::= + | +instancedecl ::= + | + | +importdecl ::= (import ) +exportdecl ::= (export ) +externtype ::= core-prefix() + | (value ? ) + | ``` -This type grammar uses productions like `` and `` recursively -to allow it to more-precisely indicate what's allowed. The formal AST and -[binary format](Binary.md#type-definitions) instead use a `` with +This grammar defines `type` recursively to allow it to more-precisely indicate +what's allowed at each point in the recursion. The formal AST and +[binary format](Binary.md#type-definitions) would instead use a `typeidx` with validation rules to restrict the target type while the formal text format would use something like [`core:typeuse`], allowing any of: (1) a `typeidx`, (2) an identifier `$T` resolving to a type definition (using `(type $T)` in cases @@ -504,6 +507,21 @@ unreachability. The remaining 5 type constructors use `valtype` to complete the description of a shared-nothing component interface: +The `type` type-constructor describes an imported or exported type along with +its bounds, which currently only has an `eq` option that says that the +imported/exported type must be exactly equal to the given immediate type. There +are two main use cases for this in the short-term: +* Type exports allow a component or interface to associate a name with a + structural type (e.g., `(export "nanos" (type (eq u64)))`) which bindings + generators can use to generate type aliases (e.g., `typedef uint64_t nanos;`). +* Type imports and exports allow a component to explicitly specify the + type parameters used to monomorphize a generic interface being imported + or exported. + +When [resource and handle types] are added to the explainer, `typebound` will +be extended with a `sub` option (symmetric to the [type-imports] proposal) that +allows importing and exporting *abstract* types. + The `func` type constructor describes a component-level function definition that takes and returns component-level value types. In contrast to [`core:functype`] which, as a low-level compiler target for a stack machine, @@ -517,31 +535,15 @@ interpreting this as `(result unit)`. The `component` type constructor is symmetric to the core `module` type constructor, although its grammar is factored to share declarators with the `instance` type constructor. The `import` and `export` declarator names -must be distinct within a single type. - -The `externdesc` production (used to declare the types of imported/exported -values) includes two additional type constructors that are not currently -present in `deftype` (since there is currently no reason for allowing them to -be shared or named as type definitions): +must be distinct within a single type. The `externtype` production shared by +the `import` and `export` declarators is symmetric to [`core:externtype`] and +includes all importable/exportable types. -The `value` case describes an imported or exported `valtype` value that is to -be consumed exactly once during instantiation. How this happens is described -below along with [`start` definitions](#start-definitions). - -The `type` case describes an imported or exported type along with its bounds, -which currently only has an `eq` option that says that the imported/exported -type must be exactly equal to the given immediate type. There are two main use -cases for this in the short-term: -* Type exports allow a component or interface to associate a name with a - structural type (e.g., `(export "nanos" (type (eq u64)))`) which bindings - generators can use to generate type aliases (e.g., `typedef uint64_t nanos;`). -* Type imports and exports allow a component to explicitly specify the - type parameters used to monomorphize a generic interface being imported - or exported. - -When [resource and handle types] are added to the explainer, `typebound` will -be extended with a `sub` option (symmetric to the [type-imports] proposal) that -allows importing and exporting *abstract* types. +The family of value types, `valtype`, is unified by a *single* type +constructor, `value`, that corresponds 1:1 with the `value` sort (described in +the [start definitions](#start-definitions) section below). As a type +constructor, `value` is symmetric to `global` in Core WebAssembly, but without +a mutability option. With what's defined so far, we can define component types using a mix of inline and out-of-line type definitions: @@ -761,7 +763,7 @@ of core linear memory. Lastly, imports and exports are defined in terms of the above as: ``` -import ::= (import ) +import ::= (import ) export ::= (export ) ``` All import and export names within a component must be unique, respectively. From 24975fc0d758beb0179b8489c6fb80aee44982e6 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 3 May 2022 18:42:34 -0500 Subject: [PATCH 047/301] Remove 'outer' option from core:alias --- design/mvp/Binary.md | 4 --- design/mvp/Explainer.md | 55 +++++++++++++++++++++-------------------- 2 files changed, 28 insertions(+), 31 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 7da2ee9..66a2189 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -102,7 +102,6 @@ Notes: ``` core:alias ::= sort: target: => (core alias target (sort)) core:aliastarget ::= 0x00 i: n: => export i n - | 0x01 ct: idx: => outer ct idx alias ::= sort: target: => (alias target (sort)) aliastarget ::= 0x00 i: n: => export i n @@ -131,7 +130,6 @@ core:deftype ::= ft: => ft ( core:moduletype ::= 0x50 md*:vec() => (module md*) core:moduledecl ::= 0x00 i: => i | 0x01 t: => t - | 0x02 a: => a | 0x03 e: => e core:import ::= m: f: ed: => (import m f ed) (WebAssembly 1.0) core:externdesc ::= id: => id (WebAssembly 1.0) @@ -142,8 +140,6 @@ Notes: * `core:import` as written above is binary-compatible with [`core:import`]. * Validation of `core:moduledecl` (currently) rejects `core:moduletype` definitions inside `type` declarators (i.e., nested core module types). -* Validation of `core:moduledecl` (currently) only allows `outer` `type` - `alias` declarators. * As described in the explainer, each module type is validated with an initially-empty type index space. Outer aliases can be used to pull in type definitions from containing components. diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 5994aba..3b01731 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -225,7 +225,6 @@ component): ``` core:alias ::= (alias ( ?)) core:aliastarget ::= export - | outer alias ::= (alias ( ?)) aliastarget ::= export @@ -248,6 +247,11 @@ alias refers to the current component. To maintain the acyclicity of module instantiation, outer aliases are only allowed to refer to *preceding* outer definitions. +There is no `outer` option in `core:aliastarget` because it would only be able +to refer to enclosing *core* modules and module types and, until +module-linking, modules and module types can't nest. In a module-linking +future, outer aliases would be added, making `core:alias` symmetric to `alias`. + Components containing outer aliases effectively produce a [closure] at instantiation time, including a copy of the outer-aliased definitions. Because of the prevalent assumption that components are immutable values, outer aliases @@ -350,7 +354,6 @@ core:deftype ::= (WebAssembly 1.0) core:moduletype ::= (module ? *) core:moduledecl ::= | - | | core:importdecl ::= (import ) core:exportdecl ::= (export ) @@ -374,31 +377,7 @@ In preparation for the forthcoming addition of [type-imports] to Core WebAssembly, module types start with an empty type index space so that the type index space can be populated with fresh type definitions constructed from type imports. Thus, `core:moduledecl` also includes a `type` declarator for defining -the types used by the `import` and `export` declarators. An `alias` declarator -is also necessary in the future for defining type-sharing constraints between -type imports. In the short-term, `alias` declarators are restricted to only -allowing `outer` `type` aliases, thereby enabling a module type to reuse a -parent's type definition instead of re-defining it locally. - -As an example, the following component defines two equivalent module types, -where the former defines the function via `type` declarator and the latter via -`alias` declarator. In both cases, the type is given index `0` since the module -type starts with an empty type index space. -```wasm -(component $C - (core type $M1 (module - (type (func (param i32) (result i32))) - (import "a" "b" (func (type 0))) - (export "c" (func (type 0))) - )) - (core type $F (func (param i32) (result i32))) - (core type $M2 (module - (alias outer $C $F (type)) - (import "a" "b" (func (type 0))) - (export "c" (func (type 0))) - )) -) -``` +the types used by the `import` and `export` declarators. Component-level type definitions are symmetric to core-level type definitions, but use a completely different set of value types. Unlike [`core:valtype`] @@ -539,6 +518,28 @@ must be distinct within a single type. The `externtype` production shared by the `import` and `export` declarators is symmetric to [`core:externtype`] and includes all importable/exportable types. +Component and instance types also include an `alias` declarator for projecting +the exports out of imported instances and sharing types with outer components. +As an example, the following component defines two equivalent component types, +where the former defines the function type via `type` declarator and the latter +via `alias` declarator. In both cases, the type is given index `0` since +component types start with an empty type index space. +```wasm +(component $C + (type $C1 (component + (type (func (param string) (result string))) + (import "a" "b" (func (type 0))) + (export "c" (func (type 0))) + )) + (type $F (func (param string) (result string))) + (type $C2 (component + (alias outer $C $F (type)) + (import "a" "b" (func (type 0))) + (export "c" (func (type 0))) + )) +) +``` + The family of value types, `valtype`, is unified by a *single* type constructor, `value`, that corresponds 1:1 with the `value` sort (described in the [start definitions](#start-definitions) section below). As a type From 67608edac5aeeac6c9b198cb28365d778bce359d Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 4 May 2022 09:40:48 -0500 Subject: [PATCH 048/301] Tweak grammar to be more regular --- design/mvp/Binary.md | 4 ++-- design/mvp/Explainer.md | 15 ++++++++------- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 66a2189..b75c187 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -73,7 +73,7 @@ instanceexpr ::= 0x00 c: arg*:vec() => (i | 0x01 e*:vec() => e* instantiatearg ::= n: si: => (with n si) sortidx ::= sort: idx: => (sort idx) -sort ::= 0x00 => core module +sort ::= 0x00 csi: => core csi | 0x01 => func | 0x02 => value | 0x03 => type @@ -194,7 +194,7 @@ instancedecl ::= 0x01 t: => t | 0x03 ed: => ed importdecl ::= n: et: => (import n et) exportdecl ::= n: et: => (export n et) -externtype ::= 0x00 i: => (core module core-type-index-space[i]) +externtype ::= 0x00 0x10 i: => (core module core-type-index-space[i]) (must be moduletype) | sort: i: => (sort type-index-space[i]) (sort must match type) ``` Notes: diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 3b01731..5b3c8f9 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -182,7 +182,7 @@ example of these, we'll also need the `alias` definitions introduced in the next section. The syntax for defining component instances is symmetric to core module -instances, but with a distinct component-level definition of `sort`: +instances, but with an expanded component-level definition of `sort`: ``` instance ::= (instance ? ) instanceexpr ::= (instantiate *) @@ -190,7 +190,7 @@ instanceexpr ::= (instantiate *) instantiatearg ::= (with ) | (with (instance *)) sortidx ::= ( ) -sort ::= core module +sort ::= core-prefix() | func | value | type @@ -200,11 +200,12 @@ export ::= (export ) ``` Because component-level function, type and instance definitions are different than core-level function, type and instance definitions, they are put into -disjoint index spaces which are indexed separately by `sortidx` and -`core:sortidx`, respectively. Components may also import or export core modules -since core modules are immutable values and thus do not break the -[shared-nothing] model. In the future, other immutable core sorts could be -added to this list such as, if it was made importable/exportable, `data`. +disjoint index spaces which are indexed separately. Components may import +and export various core definitions (when they are compatible with the +[shared-nothing] model, which currently means only `module`, but may in the +future include `data`). Thus, component-level `sort` injects the full set +of `core:sort`, so that they may be referenced (leaving it up to validation +rules to throw out the core sorts that aren't allowed in various contexts). The `value` sort refers to a value that is provided and consumed during instantiation. How this works is described in the From 14ae2d05d42e27cd068d4392e2cc6ccab4ddb301 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 4 May 2022 10:31:14 -0500 Subject: [PATCH 049/301] Fix whitespace --- design/mvp/Explainer.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 5b3c8f9..a2e52e3 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -594,14 +594,14 @@ two directions: Canonical definitions specify one of these two wrapping directions, the function to wrap and a list of configuration options: ``` -canon ::= (canon lift core-prefix() * (func ?)) - | (canon lower * (core func ?)) -canonopt ::= string-encoding=utf8 - | string-encoding=utf16 - | string-encoding=latin1+utf16 - | (memory core-prefix()) - | (realloc core-prefix()) - | (post-return core-prefix()) +canon ::= (canon lift core-prefix() * (func ?)) + | (canon lower * (core func ?)) +canonopt ::= string-encoding=utf8 + | string-encoding=utf16 + | string-encoding=latin1+utf16 + | (memory core-prefix()) + | (realloc core-prefix()) + | (post-return core-prefix()) ``` The `string-encoding` option specifies the encoding the Canonical ABI will use for the `string` type. The `latin1+utf16` encoding captures a common string From 0d99c78b0da9596877f114ee59cf246ccf7b49ea Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 5 May 2022 13:25:26 -0500 Subject: [PATCH 050/301] Fix bug in outer alias example --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index a2e52e3..4caa7b8 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -296,7 +296,7 @@ is desugared into: (component $C (core module $M ...) (component - (core alias outer $C $M (module $C_M)) + (alias outer $C $M (core module $C_M)) (core instance (instantiate $C_M)) ) ) From 18917081b558b3352f9037cc1c255d99841aa77e Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 5 May 2022 13:39:08 -0500 Subject: [PATCH 051/301] Fix bug in outer alias example (better) --- design/mvp/Explainer.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 4caa7b8..88764e2 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -285,19 +285,19 @@ definition, resolved using normal lexical scoping rules. For example, the following component: ```wasm (component - (core module $M ...) + (component $C ...) (component - (core instance (instantiate $M)) + (instance (instantiate $C)) ) ) ``` is desugared into: ```wasm -(component $C - (core module $M ...) +(component $Parent + (component $C ...) (component - (alias outer $C $M (core module $C_M)) - (core instance (instantiate $C_M)) + (alias outer $Parent $C (component $Parent_C)) + (instance (instantiate $Parent_C)) ) ) ``` From 9e510969a59d0b60c0bcc7e4dafad51496b1569a Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 5 May 2022 14:54:57 -0500 Subject: [PATCH 052/301] Fix thinko in definition of 'sort' --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 88764e2..21bc9ae 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -190,7 +190,7 @@ instanceexpr ::= (instantiate *) instantiatearg ::= (with ) | (with (instance *)) sortidx ::= ( ) -sort ::= core-prefix() +sort ::= core-prefix() | func | value | type From a6e40d16cfc63c695227d6e3387e7e48597b4270 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 5 May 2022 14:57:02 -0500 Subject: [PATCH 053/301] ... and in Binary.md too --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index b75c187..29ed7b2 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -73,7 +73,7 @@ instanceexpr ::= 0x00 c: arg*:vec() => (i | 0x01 e*:vec() => e* instantiatearg ::= n: si: => (with n si) sortidx ::= sort: idx: => (sort idx) -sort ::= 0x00 csi: => core csi +sort ::= 0x00 cs: => core cs | 0x01 => func | 0x02 => value | 0x03 => type From fce98d20916c265116ee1e8786cef82d271d1af4 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 10 May 2022 15:22:20 -0500 Subject: [PATCH 054/301] Remove ambiguous hand-waving from type grammar --- design/mvp/Binary.md | 31 +++++---- design/mvp/Explainer.md | 150 ++++++++++++++++++++-------------------- 2 files changed, 93 insertions(+), 88 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 29ed7b2..af9014c 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -131,13 +131,11 @@ core:moduletype ::= 0x50 md*:vec() => (module md*) core:moduledecl ::= 0x00 i: => i | 0x01 t: => t | 0x03 e: => e -core:import ::= m: f: ed: => (import m f ed) (WebAssembly 1.0) -core:externdesc ::= id: => id (WebAssembly 1.0) -core:exportdecl ::= n: ed: => (export n ed) +core:importdecl ::= i: => i +core:exportdecl ::= n: d: => (export n d) ``` Notes: -* Reused Core binary rules: [`core:importdesc`], [`core:functype`] -* `core:import` as written above is binary-compatible with [`core:import`]. +* Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] * Validation of `core:moduledecl` (currently) rejects `core:moduletype` definitions inside `type` declarators (i.e., nested core module types). * As described in the explainer, each module type is validated with an @@ -148,7 +146,6 @@ Notes: type ::= dt: => (type dt) deftype ::= dvt: => dvt | ft: => ft - | tt: => tt | ct: => ct | it: => it primvaltype ::= 0x7f => unit @@ -175,13 +172,11 @@ defvaltype ::= pvt: => pvt | 0x6b t*:vec() => (union t*) | 0x6a t: => (option t) | 0x69 t: u: => (expected t u) -valtype ::= i: => type-index-space[i] (must be defvaltype) - | pit: => pit field ::= n: t: => (field n t) case ::= n: t: 0x0 => (case n t) | n: t: 0x1 i: => (case n t (refines case-label[i])) -typetype ::= tb: (type tb) -typebound ::= 0x00 i: => (eq type-index-space[i]) +valtype ::= i: => i + | pvt: => pvt functype ::= 0x40 param*:vec() t: => (func param* (result t)) param ::= 0x00 t: => (param t) | 0x01 n: t: => (param n t) @@ -192,20 +187,28 @@ componentdecl ::= 0x00 id: => id instancedecl ::= 0x01 t: => t | 0x02 a: => a | 0x03 ed: => ed -importdecl ::= n: et: => (import n et) -exportdecl ::= n: et: => (export n et) -externtype ::= 0x00 0x10 i: => (core module core-type-index-space[i]) (must be moduletype) - | sort: i: => (sort type-index-space[i]) (sort must match type) +importdecl ::= n: ed: => (import n ed) +exportdecl ::= n: ed: => (export n ed) +externdesc ::= 0x00 0x10 i: => (core module (type i)) + | 0x01 i: => (func (type i)) + | 0x02 t: => (value t) + | 0x03 b: => (type b) + | 0x04 i: => (instance (type i)) + | 0x05 i: => (component (type i)) +typebound ::= 0x00 i: => (eq i) ``` Notes: * The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, with type opcodes starting at SLEB128(-1) (`0x7f`) and going down, reserving the nonnegative SLEB128s for type indices. +* Validation of `valtype` requires the `typeidx` to refer to a `defvaltype`. * Validation of `moduledecl` (currently) only allows `outer` `type` `alias` declarators. * As described in the explainer, each component and instance type is validated with an initially-empty type index space. Outer aliases can be used to pull in type definitions from containing components. +* Validation of `externdesc` requires the various `typeidx` type constructors + to match the preceding `sort`. ## Canonical Definitions diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 21bc9ae..978b2e8 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -63,6 +63,8 @@ definition ::= core-prefix() | | | + +where core-prefix(X) parses '(' 'core' Y ')' when X parses '(' Y ')' ``` Components are like Core WebAssembly modules in that their contained definitions are acyclic: definitions can only refer to preceding definitions @@ -71,10 +73,7 @@ components can arbitrarily interleave different kinds of definitions. The `core-prefix` meta-function transforms a grammatical rule for parsing a Core WebAssembly definition into a grammatical rule for parsing the same -definition, but with a `core` token added right after the leftmost paren: -``` -core-prefix(X) ::= '(' 'core' Y ')' where X = '(' Y ')' -``` +definition, but with a `core` token added right after the leftmost paren. For example, `core:module` accepts `(module (func))` so `core-prefix()` accepts `(core module (func))`. Note that the inner `func` doesn't need a `core` prefix; the `core` token is used to mark the @@ -356,10 +355,13 @@ core:moduletype ::= (module ? *) core:moduledecl ::= | | -core:importdecl ::= (import ) -core:exportdecl ::= (export ) -core:externdesc ::= (WebAssembly 1.0) +core:importdecl ::= (import ) (WebAssembly 1.0) +core:exportdecl ::= (export ) +core:exportdesc ::= strip-id() + +where strip-id(X) parses '(' sort Y ')' when X parses '(' sort ? Y ')' ``` + Here, `core:deftype` (short for "defined type") is inherited from the [gc] proposal and extended with a `module` type constructor. If module-linking is added to Core WebAssembly, an `instance` type constructor would be added as @@ -370,9 +372,9 @@ core modules cannot themselves import or export other core modules. The body of a module type contains an ordered list of "module declarators" which describe, at a type level, the imports and exports of the module. In a module-type context, import and export declarators can both reuse the existing -[`core:importdesc`] production defined in WebAssembly 1.0. To avoid confusion, -`core:importdesc` is renamed to `core:externdesc` (for symmetry with -[`core:externtype`]). +[`core:importdesc`] production defined in WebAssembly 1.0, with the only +difference being that, in the text format, `core:importdesc` can bind an +identifier for later reuse while `core:exportdesc` cannot. In preparation for the forthcoming addition of [type-imports] to Core WebAssembly, module types start with an empty type index space so that the type @@ -387,13 +389,11 @@ compound values, component-level value types assume no shared memory and must therefore be high-level, describing entire compound values. ``` type ::= (type ? ) -deftype ::= - | -nonvaltype ::= - | +deftype ::= + | | | -valtype ::= unit +defvaltype ::= unit | bool | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 | float32 | float64 @@ -407,35 +407,30 @@ valtype ::= unit | (union +) | (option ) | (expected ) -functype ::= (func ? (param ? )* (result )) -typetype ::= (type ? ) -typebound ::= (eq ) -componenttype ::= (component ? *) -instancetype ::= (instance ? *) +valtype ::= + | +functype ::= (func (param ? )* (result )) +componenttype ::= (component *) +instancetype ::= (instance *) componentdecl ::= | instancedecl ::= | | -importdecl ::= (import ) -exportdecl ::= (export ) -externtype ::= core-prefix() - | (value ? ) - | -``` -This grammar defines `type` recursively to allow it to more-precisely indicate -what's allowed at each point in the recursion. The formal AST and -[binary format](Binary.md#type-definitions) would instead use a `typeidx` with -validation rules to restrict the target type while the formal text format would -use something like [`core:typeuse`], allowing any of: (1) a `typeidx`, (2) an -identifier `$T` resolving to a type definition (using `(type $T)` in cases -where there is a grammatical ambiguity), or (3) an inline type definition that -is desugared into a deduplicated out-of-line type definition. - -The optional `id` after all the type constructors (e.g., `(module ? ...)`) -is only allowed to be present in the context of `import` since this is the only -context in which binding an identifier makes sense. +importdecl ::= (import ) +exportdecl ::= (export ) +importdesc ::= bind-id() +exportdesc ::= ( (type ) ) + | core-prefix() + | + | + | + | (value ) + | (type ) +typebound ::= (eq ) +where bind-id(X) parses '(' sort ? Y ')' when X parses '(' sort Y ')' +``` The value types in `valtype` can be broken into two categories: *fundamental* value types and *specialized* value types, where the latter are defined by expansion into the former. The *fundamental value types* have the following @@ -484,13 +479,43 @@ cases. This could be relaxed in the future to allow an empty list of cases, with the empty `(variant)` effectively serving as a [bottom type] and indicating unreachability. -The remaining 5 type constructors use `valtype` to complete the description -of a shared-nothing component interface: +The remaining 3 type constructors in `deftype` use `valtype` to describe +shared-nothing functions, components and component instances: + +The `func` type constructor describes a component-level function definition +that takes and returns `valtype`. In contrast to [`core:functype`] which, as a +low-level compiler target for a stack machine, returns zero or more results, +`functype` always returns a single type, with `unit` being used for functions +that don't return an interesting value (analogous to "void" in some languages). +Having a single return type simplifies the binding of `functype` into a wide +variety of source languages. As syntactic sugar, the text format of `functype` +additionally allows `result` to be absent, interpreting this as `(result +unit)`. + +The `instance` type constructor represents the result of instantiating a +component and thus is the same as a `component` type minus the description +of imports. -The `type` type-constructor describes an imported or exported type along with -its bounds, which currently only has an `eq` option that says that the -imported/exported type must be exactly equal to the given immediate type. There -are two main use cases for this in the short-term: +The `component` type constructor is symmetric to the core `module` type +constructor and is built from a sequence of "declarators" which are used to +describe the imports and exports of the component. There are four kinds of +declarators: + +As with core modules, `importdecl` and `exportdecl` classify component `import` +and `export` definitions, with `importdecl` allowing an identifier to be +bound for use within the type. Following the precedent of [`core:typeuse`], the +text format allows both references to out-of-line type definitions (via +`(type )`) and inline type expressions that the text format desugars +into out-of-line type definitions. + +The `value` case of `importdesc`/`exportdesc` describes a runtime value +that is imported or exported at instantiation time as described in the [start +definitions](#start-definitions) section below. + +The `type` case of `importdesc`/`exportdesc` describes an imported or exported +type along with its bounds. The bounds currently only have an `eq` option that +says that the imported/exported type must be exactly equal to the referenced +type. There are two main use cases for this in the short-term: * Type exports allow a component or interface to associate a name with a structural type (e.g., `(export "nanos" (type (eq u64)))`) which bindings generators can use to generate type aliases (e.g., `typedef uint64_t nanos;`). @@ -502,29 +527,12 @@ When [resource and handle types] are added to the explainer, `typebound` will be extended with a `sub` option (symmetric to the [type-imports] proposal) that allows importing and exporting *abstract* types. -The `func` type constructor describes a component-level function definition -that takes and returns component-level value types. In contrast to -[`core:functype`] which, as a low-level compiler target for a stack machine, -returns zero or more results, `functype` always returns a single type, with -`unit` being used for functions that don't return an interesting value -(analogous to "void" in some languages). Having a single return type simplifies -the binding of `functype` into a wide variety of source languages. As syntactic -sugar, the text format of `functype` additionally allows `result` to be absent, -interpreting this as `(result unit)`. - -The `component` type constructor is symmetric to the core `module` type -constructor, although its grammar is factored to share declarators with the -`instance` type constructor. The `import` and `export` declarator names -must be distinct within a single type. The `externtype` production shared by -the `import` and `export` declarators is symmetric to [`core:externtype`] and -includes all importable/exportable types. - -Component and instance types also include an `alias` declarator for projecting -the exports out of imported instances and sharing types with outer components. -As an example, the following component defines two equivalent component types, -where the former defines the function type via `type` declarator and the latter -via `alias` declarator. In both cases, the type is given index `0` since -component types start with an empty type index space. +Lastly, component and instance types also include an `alias` declarator for +projecting the exports out of imported instances and sharing types with outer +components. As an example, the following component defines two equivalent +component types, where the former defines the function type via `type` +declarator and the latter via `alias` declarator. In both cases, the type is +given index `0` since component types start with an empty type index space. ```wasm (component $C (type $C1 (component @@ -541,12 +549,6 @@ component types start with an empty type index space. ) ``` -The family of value types, `valtype`, is unified by a *single* type -constructor, `value`, that corresponds 1:1 with the `value` sort (described in -the [start definitions](#start-definitions) section below). As a type -constructor, `value` is symmetric to `global` in Core WebAssembly, but without -a mutability option. - With what's defined so far, we can define component types using a mix of inline and out-of-line type definitions: ```wasm @@ -765,7 +767,7 @@ of core linear memory. Lastly, imports and exports are defined in terms of the above as: ``` -import ::= (import ) +import ::= (import ) export ::= (export ) ``` All import and export names within a component must be unique, respectively. From 9d50001b21eadda59faea9e7e99a715edef93e4b Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 10 May 2022 18:28:29 -0500 Subject: [PATCH 055/301] Add better validation notes in Binary.md, normalize on 'externdesc' --- design/mvp/Binary.md | 15 ++++++++------- design/mvp/Explainer.md | 8 ++++---- 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index af9014c..aaa72a6 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -84,17 +84,17 @@ export ::= n: si: => (e Notes: * Reused Core binary rules: [`core:name`] * The `core:sort` values are chosen to match the discriminant opcodes of - [`core:importdesc`] so that `core:exportdesc` (below) is identical. + [`core:importdesc`]. * `type` is added to `core:sort` in anticipation of the [type-imports] proposal. Until that proposal, core modules won't be able to actually import or export types, however, the `type` sort is allowed as part of outer aliases (below). * `module` and `instance` are added to `core:sort` in anticipation of the [module-linking] - proposal, which would add these types to Core WebAssembly. Again, core modules won't be - able to actually import or export modules/instances, but they are used for aliases. + proposal, which would add these types to Core WebAssembly. Until then, they are useful + for aliases (below). +* Validation of `core:instantiatearg` would initially only allow the `instance` + sort, but would be extended to accept other sorts as core wasm is extended. * The indices in `sortidx` are validated according to their `sort`'s index spaces, which are built incrementally as each definition is validated. -* The types of arguments supplied by `instantiate` are validated against the - types of the matching import according to the [subtyping](Subtyping.md) rules. ## Alias Definitions @@ -269,12 +269,13 @@ flags are set. (See [Import and Export Definitions](Explainer.md#import-and-export-definitions) in the explainer.) ``` -import ::= n: et: => (import n et) +import ::= n: ed: => (import n ed) export ::= n: si: => (export n si) ``` Notes: * Validation requires all import and export `name`s are unique. - +* Validation requires any exported `sortidx` to have a valid `externdesc` + (which disallows core sorts other than `core module`). [`core:section`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-section diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 978b2e8..4cf5faf 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -355,7 +355,7 @@ core:moduletype ::= (module ? *) core:moduledecl ::= | | -core:importdecl ::= (import ) (WebAssembly 1.0) +core:importdecl ::= (import ) core:exportdecl ::= (export ) core:exportdesc ::= strip-id() @@ -418,9 +418,9 @@ instancedecl ::= | | importdecl ::= (import ) -exportdecl ::= (export ) -importdesc ::= bind-id() -exportdesc ::= ( (type ) ) +exportdecl ::= (export ) +importdesc ::= bind-id() +externdesc ::= ( (type ) ) | core-prefix() | | From 45f433b59f951f03199041b2d7f784e201319a14 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 10:21:44 -0500 Subject: [PATCH 056/301] s/varu32/u32/ because to match actual core wasm spec --- design/mvp/Binary.md | 14 ++++++++------ design/mvp/Explainer.md | 18 +++++++++--------- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index aaa72a6..73e53c3 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -58,7 +58,7 @@ core:instance ::= ie: => (i core:instanceexpr ::= 0x00 m: arg*:vec() => (instantiate m arg*) | 0x01 e*:vec() => e* core:instantiatearg ::= n: si: => (with n si) -core:sortidx ::= sort: idx: => (sort idx) +core:sortidx ::= sort: idx: => (sort idx) core:sort ::= 0x00 => func | 0x01 => table | 0x02 => memory @@ -72,7 +72,7 @@ instance ::= ie: => (i instanceexpr ::= 0x00 c: arg*:vec() => (instantiate c arg*) | 0x01 e*:vec() => e* instantiatearg ::= n: si: => (with n si) -sortidx ::= sort: idx: => (sort idx) +sortidx ::= sort: idx: => (sort idx) sort ::= 0x00 cs: => core cs | 0x01 => func | 0x02 => value @@ -82,7 +82,7 @@ sort ::= 0x00 cs: => co export ::= n: si: => (export n si) ``` Notes: -* Reused Core binary rules: [`core:name`] +* Reused Core binary rules: [`core:name`], (variable-length encoded) [`core:u32`] * The `core:sort` values are chosen to match the discriminant opcodes of [`core:importdesc`]. * `type` is added to `core:sort` in anticipation of the [type-imports] proposal. Until that @@ -105,9 +105,10 @@ core:aliastarget ::= 0x00 i: n: => export i n alias ::= sort: target: => (alias target (sort)) aliastarget ::= 0x00 i: n: => export i n - | 0x01 ct: idx: => outer ct idx + | 0x01 ct: idx: => outer ct idx ``` Notes: +* Reused Core binary rules: (variable-length encoded) [`core:u32`] * For `export` aliases, `i` is validated to refer to an instance in the instance index space that exports `n` with the specified `sort`. * For `outer` aliases, `ct` is validated to be *less or equal than* the number @@ -174,7 +175,7 @@ defvaltype ::= pvt: => pvt | 0x69 t: u: => (expected t u) field ::= n: t: => (field n t) case ::= n: t: 0x0 => (case n t) - | n: t: 0x1 i: => (case n t (refines case-label[i])) + | n: t: 0x1 i: => (case n t (refines case-label[i])) valtype ::= i: => i | pvt: => pvt functype ::= 0x40 param*:vec() t: => (func param* (result t)) @@ -227,7 +228,7 @@ canonopt ::= 0x00 => string-encod ``` Notes: * The second `0x00` byte in `canon` stands for the `func` sort and thus the - `0x00 ` pair standards for a `func` `sortidx` or `core:sortidx`. + `0x00 ` pair standards for a `func` `sortidx` or `core:sortidx`. * Validation prevents duplicate or conflicting `canonopt`. * Validation of `canon lift` requires `f` to have type `flatten(ft)` (defined by the [Canonical ABI](CanonicalABI.md#flattening)). The function being @@ -278,6 +279,7 @@ Notes: (which disallows core sorts other than `core module`). +[`core:u32`]: https://webassembly.github.io/spec/core/binary/values.html#integers [`core:section`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-section [`core:custom`]: https://webassembly.github.io/spec/core/binary/modules.html#custom-section [`core:module`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-module diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 4cf5faf..2c9f520 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -133,7 +133,7 @@ core:instanceexpr ::= (instantiate *) | * core:instantiatearg ::= (with ) | (with (instance *)) -core:sortidx ::= ( ) +core:sortidx ::= ( ) core:sort ::= func | table | memory @@ -152,7 +152,7 @@ core modules are resolved as follows: core definition. Each `core:sort` corresponds 1:1 with a distinct [index space] that contains -only core definitions of that *sort*. The `varu32` field of `core:sortidx` +only core definitions of that *sort*. The `u32` field of `core:sortidx` indexes into the sort's associated index space to select a definition. Based on this, we can link two core modules `$A` and `$B` together with the @@ -188,7 +188,7 @@ instanceexpr ::= (instantiate *) | * instantiatearg ::= (with ) | (with (instance *)) -sortidx ::= ( ) +sortidx ::= ( ) sort ::= core-prefix() | func | value @@ -228,7 +228,7 @@ core:aliastarget ::= export alias ::= (alias ( ?)) aliastarget ::= export - | outer + | outer ``` The `core:sort`/`sort` immediate of the alias specifies which index space in the target component is being read from and which index space of the containing @@ -239,10 +239,10 @@ used. In the case of `export` aliases, validation ensures `name` is an export in the target instance and has a matching sort. -In the case of `outer` aliases, the `varu32` pair serves as a [de Bruijn -index], with first `varu32` being the number of enclosing components to skip -and the second `varu32` being an index into the target component's sort's index -space. In particular, the first `varu32` can be `0`, in which case the outer +In the case of `outer` aliases, the `u32` pair serves as a [de Bruijn +index], with first `u32` being the number of enclosing components to skip +and the second `u32` being an index into the target component's sort's index +space. In particular, the first `u32` can be `0`, in which case the outer alias refers to the current component. To maintain the acyclicity of module instantiation, outer aliases are only allowed to refer to *preceding* outer definitions. @@ -420,7 +420,7 @@ instancedecl ::= importdecl ::= (import ) exportdecl ::= (export ) importdesc ::= bind-id() -externdesc ::= ( (type ) ) +externdesc ::= ( (type ) ) | core-prefix() | | From 2e7167610db9dd3427074baa2fc31ab8f0d60035 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 10:47:32 -0500 Subject: [PATCH 057/301] Clamp down core (with ...) expressions to just the 'instance' sort --- design/mvp/Binary.md | 4 ++-- design/mvp/Explainer.md | 6 ++++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 73e53c3..5faf204 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -57,7 +57,7 @@ Notes: core:instance ::= ie: => (instance ie) core:instanceexpr ::= 0x00 m: arg*:vec() => (instantiate m arg*) | 0x01 e*:vec() => e* -core:instantiatearg ::= n: si: => (with n si) +core:instantiatearg ::= n: 0x11 i: => (with n (instance i)) core:sortidx ::= sort: idx: => (sort idx) core:sort ::= 0x00 => func | 0x01 => table @@ -91,7 +91,7 @@ Notes: * `module` and `instance` are added to `core:sort` in anticipation of the [module-linking] proposal, which would add these types to Core WebAssembly. Until then, they are useful for aliases (below). -* Validation of `core:instantiatearg` would initially only allow the `instance` +* Validation of `core:instantiatearg` initially only allows the `instance` sort, but would be extended to accept other sorts as core wasm is extended. * The indices in `sortidx` are validated according to their `sort`'s index spaces, which are built incrementally as each definition is validated. diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 2c9f520..2c501c8 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -131,7 +131,7 @@ The syntax for defining a core module instance is: core:instance ::= (instance ? ) core:instanceexpr ::= (instantiate *) | * -core:instantiatearg ::= (with ) +core:instantiatearg ::= (with (instance )) | (with (instance *)) core:sortidx ::= ( ) core:sort ::= func @@ -146,7 +146,9 @@ core:export ::= (export ) When instantiating a module via `instantiate`, the two-level imports of the core modules are resolved as follows: 1. The first `name` of the import is looked up in the named list of - `core:instantiatearg` to select a core module instance. + `core:instantiatearg` to select a core module instance. (In the future, + other `core:sort`s could be allowed if core wasm adds single-level + imports.) 2. The second `name` of the import is looked up in the named list of exports of the core module instance found by the first step to select the imported core definition. From e84e499df60fc095f1fb5f1f05bbe73b0f3706d4 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 10:51:04 -0500 Subject: [PATCH 058/301] Remove dangling ? --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 2c501c8..ac59b46 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -353,7 +353,7 @@ core:deftype ::= (WebAssembly 1.0) | (GC proposal) | (GC proposal) | -core:moduletype ::= (module ? *) +core:moduletype ::= (module *) core:moduledecl ::= | | From e3e1a9852dbe9d935c103304df09bab0456c837f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 10:52:50 -0500 Subject: [PATCH 059/301] Fix typo in core:exportdesc --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ac59b46..824b043 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -359,7 +359,7 @@ core:moduledecl ::= | core:importdecl ::= (import ) core:exportdecl ::= (export ) -core:exportdesc ::= strip-id() +core:exportdesc ::= strip-id() where strip-id(X) parses '(' sort Y ')' when X parses '(' sort ? Y ')' ``` From 5934e70230be9d7e92aec35ad4a8feda451e8db5 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 12:03:17 -0500 Subject: [PATCH 060/301] Improve explanation of type imports and fresh type index spaces --- design/mvp/Explainer.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 824b043..93ef4b5 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -378,11 +378,24 @@ module-type context, import and export declarators can both reuse the existing difference being that, in the text format, `core:importdesc` can bind an identifier for later reuse while `core:exportdesc` cannot. -In preparation for the forthcoming addition of [type-imports] to Core -WebAssembly, module types start with an empty type index space so that the type -index space can be populated with fresh type definitions constructed from type -imports. Thus, `core:moduledecl` also includes a `type` declarator for defining -the types used by the `import` and `export` declarators. +With the Core WebAssembly [type-imports], module types will need the ability to +define the types of exports based on the types of imports. In preparation for +this, module types start with an empty type index space that is populated by +`type` declarators, so that, in the future, these `type` declarators can refer to +type imports local to the module type itself. For example, in the future, the +following module type would be expressible: +``` +(component $C + (type $M (module + (import "" "T" (type $T)) + (type $PairT (struct (field (ref $T)) (field (ref $T)))) + (export "make_pair" (func (param (ref $T)) (result (ref $PairT)))) + )) +) +``` +In this example, `$M` has a distinct type index space from `$C`, where element +0 is the imported type, element 1 is the `struct` type, and element 2 is an +implicitly-created `func` type referring to both. Component-level type definitions are symmetric to core-level type definitions, but use a completely different set of value types. Unlike [`core:valtype`] From 43a615682a736a24479f014bd8ee888691942a67 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 12:39:06 -0500 Subject: [PATCH 061/301] s/Bottom type/Empty type/ --- design/mvp/Explainer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 93ef4b5..156f595 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -491,7 +491,7 @@ defined by the following mapping: ``` Note that, at least initially, variants are required to have a non-empty list of cases. This could be relaxed in the future to allow an empty list of cases, with -the empty `(variant)` effectively serving as a [bottom type] and indicating +the empty `(variant)` effectively serving as a [empty type] and indicating unreachability. The remaining 3 type constructors in `deftype` use `valtype` to describe @@ -1080,7 +1080,7 @@ and will be added over the coming months to complete the MVP proposal: [De Bruijn Index]: https://en.wikipedia.org/wiki/De_Bruijn_index [Closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming) -[Bottom Type]: https://en.wikipedia.org/wiki/Bottom_type +[Empty Type]: https://en.wikipedia.org/w/index.php?title=Empty_type [IEEE754]: https://en.wikipedia.org/wiki/IEEE_754 [NaN]: https://en.wikipedia.org/wiki/NaN [NaN Boxing]: https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations From 88816fc721529c0613f80f73512ed618b2b868d7 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 12:57:45 -0500 Subject: [PATCH 062/301] Tweak wording around type imports/exports rationale --- design/mvp/Explainer.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 156f595..6957ced 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -534,9 +534,8 @@ type. There are two main use cases for this in the short-term: * Type exports allow a component or interface to associate a name with a structural type (e.g., `(export "nanos" (type (eq u64)))`) which bindings generators can use to generate type aliases (e.g., `typedef uint64_t nanos;`). -* Type imports and exports allow a component to explicitly specify the - type parameters used to monomorphize a generic interface being imported - or exported. +* Type imports and exports can provide additional information to toolchains and + runtimes for defining the behavior of host APIs. When [resource and handle types] are added to the explainer, `typebound` will be extended with a `sub` option (symmetric to the [type-imports] proposal) that From 1d8607691fa43871de6d4da333a91ff431d162dd Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 14:29:39 -0500 Subject: [PATCH 063/301] Don't use core-prefix in --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 6957ced..4930f89 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -191,7 +191,7 @@ instanceexpr ::= (instantiate *) instantiatearg ::= (with ) | (with (instance *)) sortidx ::= ( ) -sort ::= core-prefix() +sort ::= core | func | value | type From 61314cf0b10f4b20a9c931a27af5ad7e2e20fb2f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 14:38:21 -0500 Subject: [PATCH 064/301] Fix bug in example --- design/mvp/Explainer.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 4930f89..74d9fb9 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -551,14 +551,14 @@ given index `0` since component types start with an empty type index space. (component $C (type $C1 (component (type (func (param string) (result string))) - (import "a" "b" (func (type 0))) - (export "c" (func (type 0))) + (import "a" (func (type 0))) + (export "b" (func (type 0))) )) (type $F (func (param string) (result string))) (type $C2 (component (alias outer $C $F (type)) - (import "a" "b" (func (type 0))) - (export "c" (func (type 0))) + (import "a" (func (type 0))) + (export "b" (func (type 0))) )) ) ``` From a0eb04369903047edc493d8472ec58cbf587d8c5 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 14:41:17 -0500 Subject: [PATCH 065/301] Update example to match explicit sort in exportdesc --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 74d9fb9..4899de3 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -804,7 +804,7 @@ exports other components: )) (instance $d2 (instantiate $D (with "c" (instance - (export "f" (func $d1 "g")) + (export "f" (func (func $d1 "g"))) )) )) (export "d2" (instance $d2)) From 89ed4355005c06040d00b811d1009121b9076616 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 11 May 2022 17:44:24 -0500 Subject: [PATCH 066/301] Revert previous; update inline alias syntax description to match all the examples --- design/mvp/Explainer.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 4899de3..ba84901 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -266,13 +266,14 @@ Both kinds of aliases come with syntactic sugar for implicitly declaring them inline: For `export` aliases, the inline sugar has the form `(sort +)` -and can be used anywhere a `sort` index appears in the AST. For example, the -following snippet uses an inline function alias: +and can be used in place of a `sortidx` or any sort-specific index (such as a +`typeidx` or `funcidx`). For example, the following snippet uses two inline +function aliases: ```wasm (instance $j (instantiate $J (with "f" (func $i "f")))) -(export "x" (func (func $j "g" "h"))) +(export "x" (func $j "g" "h")) ``` -which is desugared into: +which are desugared into: ```wasm (alias export $i "f" (func $f_alias)) (instance $j (instantiate $J (with "f" (func $f_alias)))) @@ -804,7 +805,7 @@ exports other components: )) (instance $d2 (instantiate $D (with "c" (instance - (export "f" (func (func $d1 "g"))) + (export "f" (func $d1 "g")) )) )) (export "d2" (instance $d2)) From f3d60a233c79efb7a5ab6b4f4905af62da287f19 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 21 May 2022 01:23:49 +0200 Subject: [PATCH 067/301] Tweak validation wording Co-authored-by: Peter Huene --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 5faf204..ba20cbb 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -203,7 +203,7 @@ Notes: with type opcodes starting at SLEB128(-1) (`0x7f`) and going down, reserving the nonnegative SLEB128s for type indices. * Validation of `valtype` requires the `typeidx` to refer to a `defvaltype`. -* Validation of `moduledecl` (currently) only allows `outer` `type` `alias` +* Validation of `instancedecl` (currently) only allows `outer` `type` `alias` declarators. * As described in the explainer, each component and instance type is validated with an initially-empty type index space. Outer aliases can be used to pull From 33f8e37e5d3c3abe193e67b89f48cd6ff27df0f3 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 21 May 2022 01:24:36 +0200 Subject: [PATCH 068/301] Avoid EH conflicts in binary encoding of core:sort Co-authored-by: Peter Huene --- design/mvp/Binary.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index ba20cbb..9128b10 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -63,9 +63,9 @@ core:sort ::= 0x00 => fu | 0x01 => table | 0x02 => memory | 0x03 => global - | 0x04 => type - | 0x10 => module - | 0x11 => instance + | 0x10 => type + | 0x11 => module + | 0x12 => instance core:export ::= n: si: => (export n si) instance ::= ie: => (instance ie) From 080f4c3f8fd53c7e512cbf665238f4baeada5d3b Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 23 May 2022 14:21:01 -0500 Subject: [PATCH 069/301] Sync externdesc with preceding binary format opcode change Co-authored-by: Peter Huene --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 9128b10..06a0fe9 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -190,7 +190,7 @@ instancedecl ::= 0x01 t: => t | 0x03 ed: => ed importdecl ::= n: ed: => (import n ed) exportdecl ::= n: ed: => (export n ed) -externdesc ::= 0x00 0x10 i: => (core module (type i)) +externdesc ::= 0x00 0x11 i: => (core module (type i)) | 0x01 i: => (func (type i)) | 0x02 t: => (value t) | 0x03 b: => (type b) From 7401e4c80bf9ea8af7a8a67f64d36f838890a7b6 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 24 May 2022 16:58:23 -0500 Subject: [PATCH 070/301] Sync core:instantiatearg with preceding binary format opcode change Co-authored-by: Peter Huene --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 06a0fe9..8354d91 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -57,7 +57,7 @@ Notes: core:instance ::= ie: => (instance ie) core:instanceexpr ::= 0x00 m: arg*:vec() => (instantiate m arg*) | 0x01 e*:vec() => e* -core:instantiatearg ::= n: 0x11 i: => (with n (instance i)) +core:instantiatearg ::= n: 0x12 i: => (with n (instance i)) core:sortidx ::= sort: idx: => (sort idx) core:sort ::= 0x00 => func | 0x01 => table From 49fb1171a49259e1aa6dd74e8f05feb8df856306 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 24 May 2022 18:04:47 -0500 Subject: [PATCH 071/301] Make in 'canon lift' symmetric to imports --- design/mvp/Binary.md | 2 +- design/mvp/Explainer.md | 38 ++++++++++--------- .../SharedEverythingDynamicLinking.md | 12 +++--- 3 files changed, 27 insertions(+), 25 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 8354d91..d3c7918 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -216,7 +216,7 @@ Notes: (See [Canonical Definitions](Explainer.md#canonical-definitions) in the explainer.) ``` -canon ::= 0x00 0x00 f: ft: opts: => (canon lift f type-index-space[ft] opts (func)) +canon ::= 0x00 0x00 f: opts: ft: => (canon lift f opts type-index-space[ft]) | 0x01 0x00 f: opts: => (canon lower f opts (core func)) opts ::= opt*:vec() => opt* canonopt ::= 0x00 => string-encoding=utf8 diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ba84901..2995774 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -433,9 +433,8 @@ componentdecl ::= instancedecl ::= | | -importdecl ::= (import ) +importdecl ::= (import bind-id()) exportdecl ::= (export ) -importdesc ::= bind-id() externdesc ::= ( (type ) ) | core-prefix() | @@ -524,14 +523,14 @@ text format allows both references to out-of-line type definitions (via `(type )`) and inline type expressions that the text format desugars into out-of-line type definitions. -The `value` case of `importdesc`/`exportdesc` describes a runtime value -that is imported or exported at instantiation time as described in the [start -definitions](#start-definitions) section below. +The `value` case of `externdesc` describes a runtime value that is imported or +exported at instantiation time as described in the +[start definitions](#start-definitions) section below. -The `type` case of `importdesc`/`exportdesc` describes an imported or exported -type along with its bounds. The bounds currently only have an `eq` option that -says that the imported/exported type must be exactly equal to the referenced -type. There are two main use cases for this in the short-term: +The `type` case of `externdesc` describes an imported or exported type along +with its bounds. The bounds currently only have an `eq` option that says that +the imported/exported type must be exactly equal to the referenced type. There +are two main use cases for this in the short-term: * Type exports allow a component or interface to associate a name with a structural type (e.g., `(export "nanos" (type (eq u64)))`) which bindings generators can use to generate type aliases (e.g., `typedef uint64_t nanos;`). @@ -611,7 +610,7 @@ two directions: Canonical definitions specify one of these two wrapping directions, the function to wrap and a list of configuration options: ``` -canon ::= (canon lift core-prefix() * (func ?)) +canon ::= (canon lift core-prefix() * bind-id()) | (canon lower * (core func ?)) canonopt ::= string-encoding=utf8 | string-encoding=utf16 @@ -620,6 +619,10 @@ canonopt ::= string-encoding=utf8 | (realloc core-prefix()) | (post-return core-prefix()) ``` +While the production `externdesc` accepts any `sort`, the validation rules +for `canon lift` would only allow the `func` sort. In the future, other sorts +may be added (viz., types), hence the explicit sort. + The `string-encoding` option specifies the encoding the Canonical ABI will use for the `string` type. The `latin1+utf16` encoding captures a common string encoding across Java, JavaScript and .NET VMs and allows a dynamic choice @@ -672,9 +675,9 @@ stack-switching in component function signatures. Similar to the `import` and `alias` abbreviations shown above, `canon` definitions can also be written in an inverted form that puts the sort first: ```wasm - (func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) (WebAssembly 1.0) - (func $h (canon lift ...)) ≡ (canon lift ... (func $h)) -(core func $h (canon lower ...)) ≡ (canon lower ... (core func $h)) + (func $f ...type... (import "i" "f")) ≡ (import "i" "f" (func $f ...type...)) (WebAssembly 1.0) + (func $h ...type... (canon lift ...)) ≡ (canon lift ... (func $h ...type...)) +(core func $h ...type... (canon lower ...)) ≡ (canon lower ... (core func $h ...type...)) ``` Note: in the future, `canon` may be generalized to define other sorts than functions (such as types), hence the explicit `sort`. @@ -707,11 +710,11 @@ takes a string, does some logging, then returns a string. (with "libc" (instance $libc)) (with "wasi:logging" (instance (export "log" (func $log)))) )) - (func (export "run") (canon lift + (func $run (param string) (result string) (canon lift (core func $main "run") - (func (param string) (result string)) (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) + (export "run" (func $run)) ) ``` This example shows the pattern of splitting out a reusable language runtime @@ -764,9 +767,8 @@ exported string at instantiation time: ) ) (core instance $main (instantiate $Main (with "libc" (instance $libc)))) - (func $start (canon lift + (func $start (param string) (result string) (canon lift (core func $main "start") - (func (param string) (result string)) (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) (start $start (value $name) (result (value $greeting))) @@ -782,7 +784,7 @@ of core linear memory. Lastly, imports and exports are defined in terms of the above as: ``` -import ::= (import ) +import ::= export ::= (export ) ``` All import and export names within a component must be unique, respectively. diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 30f7590..2ccfd4b 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -157,11 +157,11 @@ would look like: (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) - (func (export "zip") (canon lift + (func $zip (param (list u8)) (result (list u8)) (canon lift (func $main "zip") - (func (param (list u8)) (result (list u8))) (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) + (export "zip" (func $zip)) ) ``` Here, `zipper` links its own private module code (`$Main`) with the shareable @@ -236,11 +236,11 @@ component-aware `clang`, the resulting component would look like: (with "libc" (instance $libc)) (with "libimg" (instance $libimg)) )) - (func (export "transform") (canon lift + (func $transform (param (list u8)) (result (list u8)) (canon lift (func $main "transform") - (func (param (list u8)) (result (list u8))) (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) + (export "transform" (func $transform)) ) ``` Here, we see the general pattern emerging of the dependency DAG between @@ -296,11 +296,11 @@ components. The resulting component could look like: (with "zipper" (instance (export "zip" (func $zipper "zip")))) (with "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) - (func (export "run") (canon lift + (func $run (param string) (result string) (canon lift (func $main "run") - (func (param string) (result string)) (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) + (export "run" (func $run)) ) ``` Note here that `$Libc` is passed to the nested `zipper` and `imgmgk` instances From 2d1d00fdbd20b82a7a1bdd754a07904f1c5a4618 Mon Sep 17 00:00:00 2001 From: Peter Huene Date: Sat, 28 May 2022 10:44:50 -0700 Subject: [PATCH 072/301] Fix inverted func/alias syntax examples. This commit fixes the syntax examples for the inverted forms of canon definitions and function aliases. --- design/mvp/Explainer.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 2995774..2aa5dee 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -307,7 +307,7 @@ is desugared into: Lastly, for symmetry with [imports][func-import-abbrev], aliases can be written in an inverted form that puts the sort first: ```wasm -(func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) (WebAssembly 1.0) +(func $f (import "i" "f") ...type...) ≡ (import "i" "f" (func $f ...type...)) (WebAssembly 1.0) (func $g (alias export $i "g1")) ≡ (alias export $i "g1" (func $g)) (core func $g (alias export $i "g1")) ≡ (core alias export $i "g1" (func $g)) ``` @@ -675,9 +675,9 @@ stack-switching in component function signatures. Similar to the `import` and `alias` abbreviations shown above, `canon` definitions can also be written in an inverted form that puts the sort first: ```wasm - (func $f ...type... (import "i" "f")) ≡ (import "i" "f" (func $f ...type...)) (WebAssembly 1.0) - (func $h ...type... (canon lift ...)) ≡ (canon lift ... (func $h ...type...)) -(core func $h ...type... (canon lower ...)) ≡ (canon lower ... (core func $h ...type...)) +(func $f (import "i" "f") ...type...) ≡ (import "i" "f" (func $f ...type...)) (WebAssembly 1.0) +(func $g ...type... (canon lift ...)) ≡ (canon lift ... (func $g ...type...)) +(core func $h (canon lower ...)) ≡ (canon lower ... (core func $h)) ``` Note: in the future, `canon` may be generalized to define other sorts than functions (such as types), hence the explicit `sort`. From 912d32b407eb1230b66e1f4fa0dc096b333f6083 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 1 Jun 2022 14:52:52 -0500 Subject: [PATCH 073/301] Remove trapping checks on unused lifted bytes --- design/mvp/CanonicalABI.md | 39 +++++++++----------- design/mvp/canonical-abi/definitions.py | 26 ++++++-------- design/mvp/canonical-abi/run_tests.py | 47 +++++++++++++------------ 3 files changed, 51 insertions(+), 61 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 02173fc..93b0c7e 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -207,7 +207,7 @@ class Opts: def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) match despecialize(t): - case Bool() : return narrow_uint_to_bool(load_int(opts, ptr, 1)) + case Bool() : return convert_int_to_bool(load_int(opts, ptr, 1)) case U8() : return load_int(opts, ptr, 1) case U16() : return load_int(opts, ptr, 2) case U32() : return load_int(opts, ptr, 4) @@ -234,12 +234,11 @@ def load_int(opts, ptr, nbytes, signed = False): return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) ``` -As a general rule, the Canonical ABI traps when given extraneous bits, so the -narrowing conversion from a byte to a `bool` traps if the high 7 bits are set. +Integer-to-boolean conversions treats `0` as `false` and all other bit-patterns +as `true`: ```python -def narrow_uint_to_bool(i): +def convert_int_to_bool(i): assert(i >= 0) - trap_if(i > 1) return bool(i) ``` @@ -392,7 +391,6 @@ def unpack_flags_from_int(i, labels): for l in labels: record[l] = bool(i & 1) i >>= 1 - trap_if(i) return record ``` @@ -829,7 +827,7 @@ class ValueIter: def lift_flat(opts, vi, t): match despecialize(t): - case Bool() : return narrow_uint_to_bool(vi.next('i32')) + case Bool() : return convert_int_to_bool(vi.next('i32')) case U8() : return lift_flat_unsigned(vi, 32, 8) case U16() : return lift_flat_unsigned(vi, 32, 16) case U32() : return lift_flat_unsigned(vi, 32, 32) @@ -850,26 +848,22 @@ def lift_flat(opts, vi, t): Integers are lifted from core `i32` or `i64` values using the signedness of the target type to interpret the high-order bit. When the target type is narrower -than an `i32`, the Canonical ABI specifies a dynamic range check in order to -catch bugs. The conversion logic here assumes that `i32` values are always -represented as unsigned Python `int`s and thus lifting to a signed type -performs a manual 2s complement conversion in the Python (which would be a -no-op in hardware). +than an `i32`, the Canonical ABI ignores the unused high bits (like `load_int`). +The conversion logic here assumes that `i32` values are always represented as +unsigned Python `int`s and thus lifting to a signed type performs a manual 2s +complement conversion in the Python (which would be a no-op in hardware). ```python def lift_flat_unsigned(vi, core_width, t_width): i = vi.next('i' + str(core_width)) assert(0 <= i < (1 << core_width)) - trap_if(i >= (1 << t_width)) - return i + return i % (1 << t_width) def lift_flat_signed(vi, core_width, t_width): i = vi.next('i' + str(core_width)) assert(0 <= i < (1 << core_width)) + i %= (1 << t_width) if i >= (1 << (t_width - 1)): - i -= (1 << core_width) - trap_if(i < -(1 << (t_width - 1))) - return i - trap_if(i >= (1 << (t_width - 1))) + return i - (1 << (t_width - 1)) return i ``` @@ -917,8 +911,8 @@ def lift_flat_variant(opts, vi, cases): x = vi.next(have) match (have, want): case ('i32', 'f32') : return reinterpret_i32_as_float(x) - case ('i64', 'i32') : return narrow_i64_to_i32(x) - case ('i64', 'f32') : return reinterpret_i32_as_float(narrow_i64_to_i32(x)) + case ('i64', 'i32') : return wrap_i64_to_i32(x) + case ('i64', 'f32') : return reinterpret_i32_as_float(wrap_i64_to_i32(x)) case ('i64', 'f64') : return reinterpret_i64_as_float(x) case _ : return x v = lift_flat(opts, CoerceValueIter(), case.t) @@ -926,10 +920,9 @@ def lift_flat_variant(opts, vi, cases): _ = vi.next(have) return { case_label_with_refinements(case, cases): v } -def narrow_i64_to_i32(i): +def wrap_i64_to_i32(i): assert(0 <= i < (1 << 64)) - trap_if(i >= (1 << 32)) - return i + return i % (1 << 32) ``` Finally, flags are lifted by OR-ing together all the flattened `i32` values diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 183ed04..6db94f0 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -201,7 +201,7 @@ class Opts: def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) match despecialize(t): - case Bool() : return narrow_uint_to_bool(load_int(opts, ptr, 1)) + case Bool() : return convert_int_to_bool(load_int(opts, ptr, 1)) case U8() : return load_int(opts, ptr, 1) case U16() : return load_int(opts, ptr, 2) case U32() : return load_int(opts, ptr, 4) @@ -227,9 +227,8 @@ def load_int(opts, ptr, nbytes, signed = False): # -def narrow_uint_to_bool(i): +def convert_int_to_bool(i): assert(i >= 0) - trap_if(i > 1) return bool(i) # @@ -352,7 +351,6 @@ def unpack_flags_from_int(i, labels): for l in labels: record[l] = bool(i & 1) i >>= 1 - trap_if(i) return record ### Storing @@ -666,7 +664,7 @@ def next(self, t): def lift_flat(opts, vi, t): match despecialize(t): - case Bool() : return narrow_uint_to_bool(vi.next('i32')) + case Bool() : return convert_int_to_bool(vi.next('i32')) case U8() : return lift_flat_unsigned(vi, 32, 8) case U16() : return lift_flat_unsigned(vi, 32, 16) case U32() : return lift_flat_unsigned(vi, 32, 32) @@ -689,17 +687,14 @@ def lift_flat(opts, vi, t): def lift_flat_unsigned(vi, core_width, t_width): i = vi.next('i' + str(core_width)) assert(0 <= i < (1 << core_width)) - trap_if(i >= (1 << t_width)) - return i + return i % (1 << t_width) def lift_flat_signed(vi, core_width, t_width): i = vi.next('i' + str(core_width)) assert(0 <= i < (1 << core_width)) + i %= (1 << t_width) if i >= (1 << (t_width - 1)): - i -= (1 << core_width) - trap_if(i < -(1 << (t_width - 1))) - return i - trap_if(i >= (1 << (t_width - 1))) + return i - (1 << t_width) return i # @@ -736,8 +731,8 @@ def next(self, want): x = vi.next(have) match (have, want): case ('i32', 'f32') : return reinterpret_i32_as_float(x) - case ('i64', 'i32') : return narrow_i64_to_i32(x) - case ('i64', 'f32') : return reinterpret_i32_as_float(narrow_i64_to_i32(x)) + case ('i64', 'i32') : return wrap_i64_to_i32(x) + case ('i64', 'f32') : return reinterpret_i32_as_float(wrap_i64_to_i32(x)) case ('i64', 'f64') : return reinterpret_i64_as_float(x) case _ : return x v = lift_flat(opts, CoerceValueIter(), case.t) @@ -745,10 +740,9 @@ def next(self, want): _ = vi.next(have) return { case_label_with_refinements(case, cases): v } -def narrow_i64_to_i32(i): +def wrap_i64_to_i32(i): assert(0 <= i < (1 << 64)) - trap_if(i >= (1 << 32)) - return i + return i % (1 << 32) # diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 8f270bd..1d85edd 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -99,20 +99,20 @@ def test_name(): test(t, [0], {'a':False,'b':False}) test(t, [2], {'a':False,'b':True}) test(t, [3], {'a':True,'b':True}) -test(t, [4], None) +test(t, [4], {'a':False,'b':False}) test(Flags([str(i) for i in range(33)]), [0xffffffff,0x1], { str(i):True for i in range(33) }) t = Variant([Case('x',U8()),Case('y',Float32()),Case('z',Unit())]) test(t, [0,42], {'x': 42}) -test(t, [0,256], None) +test(t, [0,256], {'x': 0}) test(t, [1,0x4048f5c3], {'y': 3.140000104904175}) test(t, [2,0xffffffff], {'z': {}}) t = Union([U32(),U64()]) test(t, [0,42], {'0':42}) -test(t, [0,(1<<35)], None) +test(t, [0,(1<<35)], {'0':0}) test(t, [1,(1<<35)], {'1':(1<<35)}) t = Union([Float32(), U64()]) test(t, [0,0x4048f5c3], {'0': 3.140000104904175}) -test(t, [0,(1<<35)], None) +test(t, [0,(1<<35)], {'0': 0}) test(t, [1,(1<<35)], {'1': (1<<35)}) t = Union([Float64(), U64()]) test(t, [0,0x40091EB851EB851F], {'0': 3.14}) @@ -121,7 +121,7 @@ def test_name(): t = Union([U8()]) test(t, [0,42], {'0':42}) test(t, [1,256], None) -test(t, [0,256], None) +test(t, [0,256], {'0':0}) t = Union([Tuple([U8(),Float32()]), U64()]) test(t, [0,42,3.14], {'0': {'0':42, '1':3.14}}) test(t, [1,(1<<35),0], {'1': (1<<35)}) @@ -145,15 +145,15 @@ def test_pairs(t, pairs): for arg,expect in pairs: test(t, [arg], expect) -test_pairs(Bool(), [(0,False),(1,True),(2,None),(4294967295,None)]) -test_pairs(U8(), [(127,127),(128,128),(255,255),(256,None), - (4294967295,None),(4294967168,None),(4294967167,None)]) -test_pairs(S8(), [(127,127),(128,None),(255,None),(256,None), - (4294967295,-1),(4294967168,-128),(4294967167,None)]) -test_pairs(U16(), [(32767,32767),(32768,32768),(65535,65535),(65536,None), - ((1<<32)-1,None),((1<<32)-32768,None),((1<<32)-32769,None)]) -test_pairs(S16(), [(32767,32767),(32768,None),(65535,None),(65536,None), - ((1<<32)-1,-1),((1<<32)-32768,-32768),((1<<32)-32769,None)]) +test_pairs(Bool(), [(0,False),(1,True),(2,True),(4294967295,True)]) +test_pairs(U8(), [(127,127),(128,128),(255,255),(256,0), + (4294967295,255),(4294967168,128),(4294967167,127)]) +test_pairs(S8(), [(127,127),(128,-128),(255,-1),(256,0), + (4294967295,-1),(4294967168,-128),(4294967167,127)]) +test_pairs(U16(), [(32767,32767),(32768,32768),(65535,65535),(65536,0), + ((1<<32)-1,65535),((1<<32)-32768,32768),((1<<32)-32769,32767)]) +test_pairs(S16(), [(32767,32767),(32768,-32768),(65535,-1),(65536,0), + ((1<<32)-1,-1),((1<<32)-32768,-32768),((1<<32)-32769,32767)]) test_pairs(U32(), [((1<<31)-1,(1<<31)-1),(1<<31,1<<31),(((1<<32)-1),(1<<32)-1)]) test_pairs(S32(), [((1<<31)-1,(1<<31)-1),(1<<31,-(1<<31)),((1<<32)-1,-1)]) test_pairs(U64(), [((1<<63)-1,(1<<63)-1), (1<<63,1<<63), ((1<<64)-1,(1<<64)-1)]) @@ -242,7 +242,7 @@ def test_heap(t, expect, args, byte_array): test_heap(List(Unit()), [{},{},{}], [0,3], []) test_heap(List(Bool()), [True,False,True], [0,3], [1,0,1]) -test_heap(List(Bool()), None, [0,3], [1,0,2]) +test_heap(List(Bool()), [True,False,True], [0,3], [1,0,2]) test_heap(List(Bool()), [True,False,True], [3,3], [0xff,0xff,0xff, 1,0,1]) test_heap(List(U8()), [1,2,3], [0,3], [1,2,3]) test_heap(List(U16()), [1,2,3], [0,3], [1,0, 2,0, 3,0 ]) @@ -286,22 +286,25 @@ def test_heap(t, expect, args, byte_array): t = List(Flags(['a','b'])) test_heap(t, [{'a':False,'b':False},{'a':False,'b':True},{'a':True,'b':True}], [0,3], [0,2,3]) -test_heap(t, None, [0,3], +test_heap(t, [{'a':False,'b':False},{'a':False,'b':True},{'a':False,'b':False}], [0,3], [0,2,4]) t = List(Flags([str(i) for i in range(9)])) -test_heap(t, [{ str(i):b for i in range(9) } for b in [True,False]], [0,2], +v = [{ str(i):b for i in range(9) } for b in [True,False]] +test_heap(t, v, [0,2], [0xff,0x1, 0,0]) -test_heap(t, None, [0,2], +test_heap(t, v, [0,2], [0xff,0x3, 0,0]) t = List(Flags([str(i) for i in range(17)])) -test_heap(t, [{ str(i):b for i in range(17) } for b in [True,False]], [0,2], +v = [{ str(i):b for i in range(17) } for b in [True,False]] +test_heap(t, v, [0,2], [0xff,0xff,0x1,0, 0,0,0,0]) -test_heap(t, None, [0,2], +test_heap(t, v, [0,2], [0xff,0xff,0x3,0, 0,0,0,0]) t = List(Flags([str(i) for i in range(33)])) -test_heap(t, [{ str(i):b for i in range(33) } for b in [True,False]], [0,2], +v = [{ str(i):b for i in range(33) } for b in [True,False]] +test_heap(t, v, [0,2], [0xff,0xff,0xff,0xff,0x1,0,0,0, 0,0,0,0,0,0,0,0]) -test_heap(t, None, [0,2], +test_heap(t, v, [0,2], [0xff,0xff,0xff,0xff,0x3,0,0,0, 0,0,0,0,0,0,0,0]) def test_flatten(t, params, results): From 3781cfe4b4a217b6fa4d165b7efff385dcdfb928 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 2 Jun 2022 17:44:00 -0500 Subject: [PATCH 074/301] Update presentation link --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 2aa5dee..8a21b26 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -1110,4 +1110,4 @@ and will be added over the coming months to complete the MVP proposal: [Scoping and Layering]: https://docs.google.com/presentation/d/1PSC3Q5oFsJEaYyV5lNJvVgh-SNxhySWUqZ6puyojMi8 [Resource and Handle Types]: https://docs.google.com/presentation/d/1ikwS2Ps-KLXFofuS5VAs6Bn14q4LBEaxMjPfLj61UZE -[Future and Stream Types]: https://docs.google.com/presentation/d/1WtnO_WlaoZu1wp4gI93yc7T_fWTuq3RZp8XUHlrQHl4 +[Future and Stream Types]: https://docs.google.com/presentation/d/1MNVOZ8hdofO3tI0szg_i-Yoy0N2QPU2C--LzVuoGSlE From 051e5bf3dca34bba13349477fd9122085a5dbe2d Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 3 Jun 2022 09:40:37 -0500 Subject: [PATCH 075/301] Add to --- design/mvp/Binary.md | 7 ++++--- design/mvp/Explainer.md | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index d3c7918..668b283 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -183,11 +183,12 @@ param ::= 0x00 t: => (param t) | 0x01 n: t: => (param n t) componenttype ::= 0x41 cd*:vec() => (component cd*) instancetype ::= 0x42 id*:vec() => (instance id*) -componentdecl ::= 0x00 id: => id +componentdecl ::= 0x03 id: => id | id: => id -instancedecl ::= 0x01 t: => t +instancedecl ::= 0x00 t: => t + | 0x01 t: => t | 0x02 a: => a - | 0x03 ed: => ed + | 0x04 ed: => ed importdecl ::= n: ed: => (import n ed) exportdecl ::= n: ed: => (export n ed) externdesc ::= 0x00 0x11 i: => (core module (type i)) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 8a21b26..971d689 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -430,7 +430,8 @@ componenttype ::= (component *) instancetype ::= (instance *) componentdecl ::= | -instancedecl ::= +instancedecl ::= core-prefix() + | | | importdecl ::= (import bind-id()) From 7481ec903f0c7fdf52ce29ae1f9f597f65904d27 Mon Sep 17 00:00:00 2001 From: Peter Huene Date: Sun, 5 Jun 2022 14:04:53 -0700 Subject: [PATCH 076/301] Update post-return validation in binary spec. This commit updates the post-return validation to align it with what is specified in the explainer. The post-return option should accept the return values from the core function being lifted as parameters and itself have no return values. --- design/mvp/Binary.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index d3c7918..4181fd3 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -240,8 +240,10 @@ Notes: access to `memory` or `realloc`, then validation requires these options to be present. If present, `realloc` must have core type `(func (param i32 i32 i32 i32) (result i32))`. -* `post-return` is always optional, but, if present, must have core type - `(func)`. +* The `post-return` option is only valid for `canon lift` and it is always + optional; if present, it must have core type `(func (param ...))` where the + number and types of the parameters must match the results of the core function + being lifted and itself have no result values. ## Start Definitions From 0ab926ded9bf73e554d18e87f88884d3b5c4cede Mon Sep 17 00:00:00 2001 From: Peter Huene Date: Mon, 6 Jun 2022 11:43:07 -0700 Subject: [PATCH 077/301] Clarify type validation rules in the binary format. This commit clarifies that names within certain type definitions must be unique. It also adds validation of the refines clause in a variant case. --- design/mvp/Binary.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 9d4be8f..b3469fd 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -211,6 +211,11 @@ Notes: in type definitions from containing components. * Validation of `externdesc` requires the various `typeidx` type constructors to match the preceding `sort`. +* Validation of record field names, variant case names, flag names, and enum case + names requires that the name be unique for the record, variant, flags, or enum + type definition. +* Validation of the optional `refines` clause of a variant case requires that + the case index is within bounds for the variant type's cases. ## Canonical Definitions From 7093bbe8697a9a4ed3b3bb10f1bdbb84531a0ac3 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 7 Jun 2022 14:20:45 -0500 Subject: [PATCH 078/301] Tweak text format of refines clause --- design/mvp/Binary.md | 3 ++- design/mvp/Explainer.md | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index b3469fd..1d89b12 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -215,7 +215,8 @@ Notes: names requires that the name be unique for the record, variant, flags, or enum type definition. * Validation of the optional `refines` clause of a variant case requires that - the case index is within bounds for the variant type's cases. + the case index is less than the current case's index (and therefore + cases are acyclic). ## Canonical Definitions diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 971d689..cd7841d 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -415,7 +415,7 @@ defvaltype ::= unit | float32 | float64 | char | string | (record (field )*) - | (variant (case (refines )?)+) + | (variant (case ? (refines )?)+) | (list ) | (tuple *) | (flags *) From 7d29dee0ca8d6794f46ad6df1458d9f1a60dae3c Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Wed, 8 Jun 2022 09:24:27 -0700 Subject: [PATCH 079/301] Adjust canonical ABI option syntax to be less verbose Currently the syntax for specifying a memory with a canonical option is: (memory (core memory $memory)) and optionally instance alias sugar can also be used: (memory (core memory $libc "memory")) This PR proposes changing these two syntaxes to: (memory $memory) (memory $libc "memory") with the theory that the "core" part is already implied by the canonical option itself and otherwise saying "memory" twice is redundant. --- design/mvp/Explainer.md | 12 ++++++------ .../mvp/examples/SharedEverythingDynamicLinking.md | 10 +++++----- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index cd7841d..e8043c5 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -616,9 +616,9 @@ canon ::= (canon lift core-prefix() * bind-id()) - | (realloc core-prefix()) - | (post-return core-prefix()) + | (memory ) + | (realloc ) + | (post-return ) ``` While the production `externdesc` accepts any `sort`, the validation rules for `canon lift` would only allow the `func` sort. In the future, other sorts @@ -697,7 +697,7 @@ takes a string, does some logging, then returns a string. (core instance $libc (instantiate $Libc)) (core func $log (canon lower (func $logging "log") - (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) + (memory (core memory $libc "mem")) (realloc (func $libc "realloc")) )) (core module $Main (import "libc" "memory" (memory 1)) @@ -713,7 +713,7 @@ takes a string, does some logging, then returns a string. )) (func $run (param string) (result string) (canon lift (core func $main "run") - (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) + (memory $libc "mem") (realloc (func $libc "realloc")) )) (export "run" (func $run)) ) @@ -770,7 +770,7 @@ exported string at instantiation time: (core instance $main (instantiate $Main (with "libc" (instance $libc)))) (func $start (param string) (result string) (canon lift (core func $main "start") - (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) + (memory $libc "mem") (realloc (func $libc "realloc")) )) (start $start (value $name) (result (value $greeting))) (export "greeting" (value $greeting)) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 2ccfd4b..247d65f 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -159,7 +159,7 @@ would look like: )) (func $zip (param (list u8)) (result (list u8)) (canon lift (func $main "zip") - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (memory $libc "memory") (realloc (func $libc "realloc")) )) (export "zip" (func $zip)) ) @@ -238,7 +238,7 @@ component-aware `clang`, the resulting component would look like: )) (func $transform (param (list u8)) (result (list u8)) (canon lift (func $main "transform") - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (memory $libc "memory") (realloc (func $libc "realloc")) )) (export "transform" (func $transform)) ) @@ -285,11 +285,11 @@ components. The resulting component could look like: (instance $libc (instantiate (module $Libc))) (func $zip (canon lower (func $zipper "zip") - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (memory $libc "memory") (realloc (func $libc "realloc")) )) (func $transform (canon lower (func $imgmgk "transform") - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (memory $libc "memory") (realloc (func $libc "realloc")) )) (instance $main (instantiate (module $Main) (with "libc" (instance $libc)) @@ -298,7 +298,7 @@ components. The resulting component could look like: )) (func $run (param string) (result string) (canon lift (func $main "run") - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (memory $libc "memory") (realloc (func $libc "realloc")) )) (export "run" (func $run)) ) From 9248a5f92ac49a5da9ac006c6860aa2ad560f8f4 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 9 Jun 2022 15:17:18 -0500 Subject: [PATCH 080/301] Add to --- design/mvp/Binary.md | 8 ++++-- design/mvp/Explainer.md | 55 ++++++++++++++++++++++++----------------- 2 files changed, 38 insertions(+), 25 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 1d89b12..9d4594a 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -131,6 +131,7 @@ core:deftype ::= ft: => ft ( core:moduletype ::= 0x50 md*:vec() => (module md*) core:moduledecl ::= 0x00 i: => i | 0x01 t: => t + | 0x02 a: => a | 0x03 e: => e core:importdecl ::= i: => i core:exportdecl ::= n: d: => (export n d) @@ -140,8 +141,11 @@ Notes: * Validation of `core:moduledecl` (currently) rejects `core:moduletype` definitions inside `type` declarators (i.e., nested core module types). * As described in the explainer, each module type is validated with an - initially-empty type index space. Outer aliases can be used to pull - in type definitions from containing components. + initially-empty type index space. +* Validation of `alias` declarators only allows `outer` `type` aliases. + Validation of these aliases cannot see beyond the enclosing core type index + space. Since core modules and core module types cannot nest in the MVP, this + means that the maximum `ct` in an MVP `alias` declarator is `1`. ``` type ::= dt: => (type dt) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index e8043c5..1ed1689 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -357,6 +357,7 @@ core:deftype ::= (WebAssembly 1.0) core:moduletype ::= (module *) core:moduledecl ::= | + | | core:importdecl ::= (import ) core:exportdecl ::= (export ) @@ -387,7 +388,7 @@ type imports local to the module type itself. For example, in the future, the following module type would be expressible: ``` (component $C - (type $M (module + (core type $M (module (import "" "T" (type $T)) (type $PairT (struct (field (ref $T)) (field (ref $T)))) (export "make_pair" (func (param (ref $T)) (result (ref $PairT)))) @@ -398,6 +399,36 @@ In this example, `$M` has a distinct type index space from `$C`, where element 0 is the imported type, element 1 is the `struct` type, and element 2 is an implicitly-created `func` type referring to both. +Lastly, the `core:alias` module declarator allows a module type definition to +reuse (rather than redefine) type definitions in the enclosing component's core +type index space via `outer` `type` alias. In the MVP, validation restricts +`core:alias` module declarators to *only* allow `outer` `type` aliases but, +in the future, more kinds of aliases would be meaningful and allowed. + +As an example, the following component defines two semantically-equivalent +module types, where the former defines the function type via `type` declarator +and the latter refers via `alias` declarator. Note that, since core type +definitions are validated in a Core WebAssembly context that doesn't "know" +anything about components, the module type `$C2` can't name `$C` directly in +the text format but must instead use the appropriate [de Bruijn] index (`1`). +In both cases, the defined/aliased function type is given index `0` since +module types always start with an empty type index space. +```wasm +(component $C + (core type $C1 (module + (type (func (param i32) (result i32))) + (import "a" (func (type 0))) + (export "b" (func (type 0))) + )) + (core type $F (func (param i32) (result i32))) + (core type $C2 (module + (alias outer 1 $F (type)) + (import "a" (func (type 0))) + (export "b" (func (type 0))) + )) +) +``` + Component-level type definitions are symmetric to core-level type definitions, but use a completely different set of value types. Unlike [`core:valtype`] which is low-level and assumes a shared linear memory for communicating @@ -542,28 +573,6 @@ When [resource and handle types] are added to the explainer, `typebound` will be extended with a `sub` option (symmetric to the [type-imports] proposal) that allows importing and exporting *abstract* types. -Lastly, component and instance types also include an `alias` declarator for -projecting the exports out of imported instances and sharing types with outer -components. As an example, the following component defines two equivalent -component types, where the former defines the function type via `type` -declarator and the latter via `alias` declarator. In both cases, the type is -given index `0` since component types start with an empty type index space. -```wasm -(component $C - (type $C1 (component - (type (func (param string) (result string))) - (import "a" (func (type 0))) - (export "b" (func (type 0))) - )) - (type $F (func (param string) (result string))) - (type $C2 (component - (alias outer $C $F (type)) - (import "a" (func (type 0))) - (export "b" (func (type 0))) - )) -) -``` - With what's defined so far, we can define component types using a mix of inline and out-of-line type definitions: ```wasm From 488b5d7f59f01b859687ad2b0e1e8dc1bab5cd64 Mon Sep 17 00:00:00 2001 From: Liam Murphy Date: Mon, 13 Jun 2022 20:14:26 +1000 Subject: [PATCH 081/301] Clarify conversion from JS values to flags Resolves #26 Clarify that the booleans are optional, and default to false when not provided. --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 1ed1689..195c782 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -958,7 +958,7 @@ At a high level, the additional coercions would be: | `list` | same as [`sequence`] | same as [`sequence`] | | `string` | same as [`USVString`] | same as [`USVString`] | | `tuple` | TBD: maybe a [JS Tuple]? | TBD | -| `flags` | TBD: maybe a [JS Record]? | same as [`dictionary`] of `boolean` fields | +| `flags` | TBD: maybe a [JS Record]? | same as [`dictionary`] of optional `boolean` fields with default values of `false` | | `enum` | same as [`enum`] | same as [`enum`] | | `option` | same as [`T?`] | same as [`T?`] | | `union` | same as [`union`] | same as [`union`] | From 85acf897c3972ec0b430d9c3086b14f6de1de80b Mon Sep 17 00:00:00 2001 From: Peter Huene Date: Mon, 13 Jun 2022 12:32:02 -0700 Subject: [PATCH 082/301] Fix binary encoding for outer core aliases. This commit is a follow-up to #44 that was missed in review. For outer aliases in module type decls to work, `core:aliastarget` must have an "outer" variant. The validation rules for this variant should only permit `type` sort. --- design/mvp/Binary.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 9d4594a..fdeefe7 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -102,6 +102,7 @@ Notes: ``` core:alias ::= sort: target: => (core alias target (sort)) core:aliastarget ::= 0x00 i: n: => export i n + | 0x01 ct: idx: => outer ct idx alias ::= sort: target: => (alias target (sort)) aliastarget ::= 0x00 i: n: => export i n @@ -115,8 +116,10 @@ Notes: of enclosing components and `i` is validated to be a valid index in the `sort` index space of the `i`th enclosing component (counting outward, starting with `0` referring to the current component). -* For `outer` aliases, validation restricts the `sort` of the `aliastarget` - to one of `type`, `module` or `component`. +* For `outer` aliases of `core:aliastarget`, validation restricts the `sort` to + `type`. +* For `outer` aliases of `aliastarget`, validation restricts the `sort` to one + of `type`, `module` or `component`. ## Type Definitions From 03af667c99f42f11aef284a22d21697a8c39ff19 Mon Sep 17 00:00:00 2001 From: Joel Dice Date: Fri, 17 Jun 2022 11:05:38 -0600 Subject: [PATCH 083/301] clarify inline sugar syntax in Explainer.md I had trouble understanding the examples in the `Aliases` section, partly due to not seeing how the abstract syntax in the prose matched up with the concrete syntax in the examples. This change will hopefully save others from the same confusion. Also, I corrected a couple of misspellings. --- design/mvp/Explainer.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 195c782..a4c6f9e 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -80,7 +80,7 @@ inner `func` doesn't need a `core` prefix; the `core` token is used to mark the *transition* from parsing component definitions into core definitions. The [`core:module`] production is unmodified by the Component Model and thus -components embed Core WebAssemby (text and binary format) modules as currently +components embed Core WebAssembly (text and binary format) modules as currently standardized, allowing reuse of an unmodified Core WebAssembly implementation. The next two productions, `core:instance` and `core:alias`, are not currently included in Core WebAssembly, but would be if Core WebAssembly adopted the @@ -265,7 +265,7 @@ via some kind of "`stateful`" type attribute.) Both kinds of aliases come with syntactic sugar for implicitly declaring them inline: -For `export` aliases, the inline sugar has the form `(sort +)` +For `export` aliases, the inline sugar has the form `( +)` and can be used in place of a `sortidx` or any sort-specific index (such as a `typeidx` or `funcidx`). For example, the following snippet uses two inline function aliases: @@ -885,7 +885,7 @@ modules and components to be distinguished by the first 8 bytes of the binary (splitting the 32-bit [`core:version`] field into a 16-bit `version` field and a 16-bit `layer` field with `0` for modules and `1` for components). -Once compiled, a `WebAssemby.Component` could be instantiated using the +Once compiled, a `WebAssembly.Component` could be instantiated using the existing JS API `WebAssembly.instantiate(Streaming)`. Since components have the same basic import/export structure as modules, this mostly just means extending the [*read the imports*] logic to support single-level imports as well as From 3361a13038c6179846853cea05e5eef8257e4d13 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 17 Jun 2022 14:06:10 -0500 Subject: [PATCH 084/301] Use sub-32-bit int conversions in JS API #48 --- design/mvp/Explainer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 195c782..8e04041 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -947,8 +947,8 @@ At a high level, the additional coercions would be: | ---- | ----------- | -------------------- | | `unit` | `null` | accept everything | | `bool` | `true` or `false` | `ToBoolean` | -| `s8`, `s16`, `s32` | as a Number value | `ToInt32` | -| `u8`, `u16`, `u32` | as a Number value | `ToUint32` | +| `s8`, `s16`, `s32` | as a Number value | `ToInt8`, `ToInt16`, `ToInt32` | +| `u8`, `u16`, `u32` | as a Number value | `ToUint8`, `ToUint16`, `ToUint32` | | `s64` | as a BigInt value | `ToBigInt64` | | `u64` | as a BigInt value | `ToBigUint64` | | `float32`, `float64` | as a Number, mapping the canonical NaN to [JS NaN] | `ToNumber` mapping [JS NaN] to the canonical NaN | From 42a9939eb96e79e7d660f9ca0b68b8bbffa0544f Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Fri, 17 Jun 2022 15:10:45 -0700 Subject: [PATCH 085/301] Update URL for the RLBox docs --- design/high-level/UseCases.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/high-level/UseCases.md b/design/high-level/UseCases.md index a65e76c..7aa2cc6 100644 --- a/design/high-level/UseCases.md +++ b/design/high-level/UseCases.md @@ -325,7 +325,7 @@ to call imports, which could break other components' single-threaded assumptions the imported function to have been explicitly `shared` and thus callable from any `fork`ed thread. -[RLBox]: https://plsyssec.github.io/rlbox_sandboxing_api/sphinx/ +[RLBox]: https://docs.rlbox.dev/ [Principle of Least Authority]: https://en.wikipedia.org/wiki/Principle_of_least_privilege [Modular Programming]: https://en.wikipedia.org/wiki/Modular_programming [start function]: https://webassembly.github.io/spec/core/intro/overview.html#semantic-phases From cc35c094c75ab474533b7edd0962d8db21a2241b Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 21 Jun 2022 11:42:31 -0700 Subject: [PATCH 086/301] Change utf16+latin1 to always use alignment of 2 Currently the `realloc` signature doesn't allow for specifying both a new and an old alignment so switching the alignment isn't valid for existing implementations. Instead of starting with alignment 1 this instead starts with alignment 2 for allocations related to latin1+utf16 strings and keeps the alignment at 2 even if the latin1 encoding ends up being used. --- design/mvp/CanonicalABI.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 93b0c7e..1b139a4 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -585,7 +585,7 @@ bytes): ```python def store_string_to_latin1_or_utf16(opts, src, src_code_units): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, src_code_units) + ptr = opts.realloc(0, 0, 2, src_code_units) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): @@ -605,7 +605,7 @@ def store_string_to_latin1_or_utf16(opts, src, src_code_units): tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) if dst_byte_length < src_code_units: - ptr = opts.realloc(ptr, src_code_units, 1, dst_byte_length) + ptr = opts.realloc(ptr, src_code_units, 2, dst_byte_length) return (ptr, dst_byte_length) ``` From 08a068fed14a50a76a594484cafe0ade4236d55a Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 28 Jun 2022 12:41:26 -0500 Subject: [PATCH 087/301] Tweak may_enter rules to ensure lockdown-on-trap #55 --- design/mvp/CanonicalABI.md | 40 ++++++++++--------------- design/mvp/canonical-abi/definitions.py | 8 ++--- 2 files changed, 17 insertions(+), 31 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 93b0c7e..b543c27 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1134,6 +1134,7 @@ Given the above closure arguments, `canon_lift` is defined: ```python def canon_lift(callee_opts, callee_instance, callee, functype, args): trap_if(not callee_instance.may_enter) + callee_instance.may_enter = False assert(callee_instance.may_leave) callee_instance.may_leave = False @@ -1145,12 +1146,11 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): except CoreWebAssemblyException: trap() - callee_instance.may_enter = False [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) def post_return(): - callee_instance.may_enter = True if callee_opts.post_return is not None: callee_opts.post_return(flat_results) + callee_instance.may_enter = True return (result, post_return) ``` @@ -1161,6 +1161,14 @@ boundaries. Thus, if a component wishes to signal an error, it must use some sort of explicit type such as `expected` (whose `error` case particular language bindings may choose to map to and from exceptions). +The clearing of `may_enter` for the entire duration of `canon_lift` and the +fact that `canon_lift` brackets all calls into a component ensure that +components cannot be reentered, which is a [component invariant]. Furthermore, +because `may_enter` is not cleared on the exceptional exit path taken by +`trap()`, if there is a trap during Core WebAssembly execution or +lifting/lowering, the component is left permanently un-enterable, ensuring the +lockdown-after-trap [component invariant]. + The contract assumed by `canon_lift` (and ensured by `canon_lower` below) is that the caller of `canon_lift` *must* call `post_return` right after lowering `result`. This ordering ensures that the engine can reliably copy directly from @@ -1170,22 +1178,12 @@ the callee's linear memory (read by `lift`) into the caller's linear memory freed and so the engine would need to eagerly make an intermediate copy in `lift`. -Even assuming this `post_return` contract, if the callee could be re-entered -by the caller in the middle of the caller's `lower` (e.g., via `realloc`), then -either the engine has to make an eager intermediate copy in `lift` *or* the -Canonical ABI would have to specify a precise interleaving of side effects -which is more complicated and would inhibit some optimizations. Instead, the -`may_enter` guard set before `lift` and cleared in `post_return` prevents this -re-entrance. Thus, it is the combination of `post_return` and the re-entrance -guard that ensures `lift` does not need to make an eager copy. - The `may_leave` guard wrapping the lowering of parameters conservatively -ensures that `realloc` calls during lowering do not accidentally call imports -that accidentally re-enter the instance that lifted the same parameters. -While the `may_enter` guards of *those* component instances would also prevent -this re-entrance, it would be an error that only manifested in certain -component linking configurations, hence the eager error helps ensure -compositionality. +ensures that `realloc` calls during lowering do not call imports that +indirectly re-enter the instance that lifted the same parameters. While the +`may_enter` guards of *those* component instances would also prevent this +re-entrance, it would be an error that only manifested in certain component +linking configurations, hence the eager error helps ensure compositionality. ### `lower` @@ -1210,9 +1208,6 @@ and, when called from Core WebAssembly code, calls `canon_lower`, which is defin def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): trap_if(not caller_instance.may_leave) - assert(caller_instance.may_enter) - caller_instance.may_enter = False - flat_args = ValueIter(flat_args) args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) @@ -1224,15 +1219,10 @@ def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): post_return() - caller_instance.may_enter = True return flat_results ``` The definitions of `canon_lift` and `canon_lower` are mostly symmetric (swapping lifting and lowering), with a few exceptions: -* The calling instance cannot be re-entered over the course of the entire call, - not just while lifting the parameters. This ensures not just the needs of the - Canonical ABI, but the general non-re-entrance expectations outlined in the - [component invariants]. * The caller does not need a `post-return` function since the Core WebAssembly caller simply regains control when `canon_lower` returns, allowing it to free (or not) any memory passed as `flat_args`. diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 6db94f0..7b3f9a2 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -872,6 +872,7 @@ class Instance: def canon_lift(callee_opts, callee_instance, callee, functype, args): trap_if(not callee_instance.may_enter) + callee_instance.may_enter = False assert(callee_instance.may_leave) callee_instance.may_leave = False @@ -883,12 +884,11 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): except CoreWebAssemblyException: trap() - callee_instance.may_enter = False [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) def post_return(): - callee_instance.may_enter = True if callee_opts.post_return is not None: callee_opts.post_return(flat_results) + callee_instance.may_enter = True return (result, post_return) @@ -897,9 +897,6 @@ def post_return(): def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): trap_if(not caller_instance.may_leave) - assert(caller_instance.may_enter) - caller_instance.may_enter = False - flat_args = ValueIter(flat_args) args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) @@ -911,5 +908,4 @@ def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): post_return() - caller_instance.may_enter = True return flat_results From 712e038abf597687de3398ecd79014249cb12e11 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 30 Jun 2022 14:57:38 -0500 Subject: [PATCH 088/301] Ensure 2-byte latin1+utf16 alignment in all cases Resolves #59 --- design/mvp/canonical-abi/definitions.py | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 7b3f9a2..05c8be1 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -421,31 +421,31 @@ def store_string_into_range(opts, v): match opts.string_encoding: case 'utf8': match src_simple_encoding: - case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 'utf-8') + case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 1, 'utf-8') case 'utf16' : return store_utf16_to_utf8(opts, src, src_code_units) case 'latin1' : return store_latin1_to_utf8(opts, src, src_code_units) case 'utf16': match src_simple_encoding: case 'utf8' : return store_utf8_to_utf16(opts, src, src_code_units) - case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') - case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') + case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') + case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') case 'latin1+utf16': match src_encoding: case 'utf8' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) case 'utf16' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) case 'latin1+utf16' : match src_simple_encoding: - case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 'latin-1') + case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 2, 'latin-1') case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units) # MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 -def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_encoding): +def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_alignment, dst_encoding): dst_byte_length = dst_code_unit_size * src_code_units trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, dst_code_unit_size, dst_byte_length) + ptr = opts.realloc(0, 0, dst_alignment, dst_byte_length) encoded = src.encode(dst_encoding) assert(dst_byte_length == len(encoded)) opts.memory[ptr : ptr+len(encoded)] = encoded From 799a7f7f2819ce7e0cfb9ac76fde2d9c09f10d00 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 1 Jul 2022 17:31:06 -0500 Subject: [PATCH 089/301] Re-sync CanonicalABI.md and canonical_abi/definitions.py --- design/mvp/CanonicalABI.md | 14 +++++++------- design/mvp/canonical-abi/definitions.py | 4 ++-- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index ae01f1d..bb0ddac 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -493,21 +493,21 @@ def store_string_into_range(opts, v): match opts.string_encoding: case 'utf8': match src_simple_encoding: - case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 'utf-8') + case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 1, 'utf-8') case 'utf16' : return store_utf16_to_utf8(opts, src, src_code_units) case 'latin1' : return store_latin1_to_utf8(opts, src, src_code_units) case 'utf16': match src_simple_encoding: case 'utf8' : return store_utf8_to_utf16(opts, src, src_code_units) - case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') - case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le') + case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') + case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') case 'latin1+utf16': match src_encoding: case 'utf8' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) case 'utf16' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) case 'latin1+utf16' : match src_simple_encoding: - case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 'latin-1') + case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 2, 'latin-1') case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units) ``` @@ -517,10 +517,10 @@ byte after every Latin-1 byte). ```python MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 -def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_encoding): +def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_alignment, dst_encoding): dst_byte_length = dst_code_unit_size * src_code_units trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, dst_code_unit_size, dst_byte_length) + ptr = opts.realloc(0, 0, dst_alignment, dst_byte_length) encoded = src.encode(dst_encoding) assert(dst_byte_length == len(encoded)) opts.memory[ptr : ptr+len(encoded)] = encoded @@ -863,7 +863,7 @@ def lift_flat_signed(vi, core_width, t_width): assert(0 <= i < (1 << core_width)) i %= (1 << t_width) if i >= (1 << (t_width - 1)): - return i - (1 << (t_width - 1)) + return i - (1 << t_width) return i ``` diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 05c8be1..ed9490a 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -492,7 +492,7 @@ def store_utf8_to_utf16(opts, src, src_code_units): def store_string_to_latin1_or_utf16(opts, src, src_code_units): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, src_code_units) + ptr = opts.realloc(0, 0, 2, src_code_units) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): @@ -512,7 +512,7 @@ def store_string_to_latin1_or_utf16(opts, src, src_code_units): tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) if dst_byte_length < src_code_units: - ptr = opts.realloc(ptr, src_code_units, 1, dst_byte_length) + ptr = opts.realloc(ptr, src_code_units, 2, dst_byte_length) return (ptr, dst_byte_length) # From a513df658be88448a60964cd6390d4ad89574d1f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 1 Jul 2022 18:13:44 -0500 Subject: [PATCH 090/301] Loosen the reentrance rules to allow parent components to wrap child components' imports and exports --- design/mvp/CanonicalABI.md | 76 +++++++++++++------------ design/mvp/canonical-abi/definitions.py | 12 ++-- design/mvp/canonical-abi/run_tests.py | 2 +- 3 files changed, 48 insertions(+), 42 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index bb0ddac..0323e55 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1132,9 +1132,12 @@ the outside world through an export. Given the above closure arguments, `canon_lift` is defined: ```python -def canon_lift(callee_opts, callee_instance, callee, functype, args): - trap_if(not callee_instance.may_enter) - callee_instance.may_enter = False +def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_export): + if called_as_export: + trap_if(not callee_instance.may_enter) + callee_instance.may_enter = False + else: + assert(not callee_instance.may_enter) assert(callee_instance.may_leave) callee_instance.may_leave = False @@ -1150,7 +1153,8 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): def post_return(): if callee_opts.post_return is not None: callee_opts.post_return(flat_results) - callee_instance.may_enter = True + if called_as_export: + callee_instance.may_enter = True return (result, post_return) ``` @@ -1161,29 +1165,20 @@ boundaries. Thus, if a component wishes to signal an error, it must use some sort of explicit type such as `expected` (whose `error` case particular language bindings may choose to map to and from exceptions). -The clearing of `may_enter` for the entire duration of `canon_lift` and the -fact that `canon_lift` brackets all calls into a component ensure that -components cannot be reentered, which is a [component invariant]. Furthermore, -because `may_enter` is not cleared on the exceptional exit path taken by -`trap()`, if there is a trap during Core WebAssembly execution or -lifting/lowering, the component is left permanently un-enterable, ensuring the -lockdown-after-trap [component invariant]. +The `called_as_export` parameter indicates whether `canon_lift` is being called +as part of a component export or whether this `canon_lift` is being called +internally (for example, by a child component instance). By clearing +`may_enter` for the duration of `canon_lift` when called as an export, the +dynamic traps ensure that components cannot be reentered, which is a [component +invariant]. Furthermore, because `may_enter` is not cleared on the exceptional +exit path taken by `trap()`, if there is a trap during Core WebAssembly +execution or lifting/lowering, the component is left permanently un-enterable, +ensuring the lockdown-after-trap [component invariant]. The contract assumed by `canon_lift` (and ensured by `canon_lower` below) is that the caller of `canon_lift` *must* call `post_return` right after lowering -`result`. This ordering ensures that the engine can reliably copy directly from -the callee's linear memory (read by `lift`) into the caller's linear memory -(written by `lower`). If `post_return` were called earlier (e.g., before -`canon_lift` returned), the callee's linear memory would have already been -freed and so the engine would need to eagerly make an intermediate copy in -`lift`. - -The `may_leave` guard wrapping the lowering of parameters conservatively -ensures that `realloc` calls during lowering do not call imports that -indirectly re-enter the instance that lifted the same parameters. While the -`may_enter` guards of *those* component instances would also prevent this -re-entrance, it would be an error that only manifested in certain component -linking configurations, hence the eager error helps ensure compositionality. +`result`. This ensures that `post_return` can be used to perform cleanup +actions after the lowering is complete. ### `lower` @@ -1230,19 +1225,26 @@ lifting and lowering), with a few exceptions: the caller pass in a pointer to caller-allocated memory as a final `i32` parameter. -A useful consequence of the above rules for `may_enter` and `may_leave` is that -attempting to `canon lower` to a `callee` in the same instance is a guaranteed, -immediate trap which a link-time compiler can eagerly compile to an -`unreachable`. This avoids what would otherwise be a surprising form of memory -aliasing that could introduce obscure bugs. - -The net effect here is that any cross-component call necessarily -transits through a composed `canon_lower`/`canon_lift` pair, allowing a link-time -compiler to fuse the lifting/lowering steps of these two definitions into a -single, efficient trampoline. This fusion model allows efficient compilation of -the permissive [subtyping](Subtyping.md) allowed between components (including -the elimination of string operations on the labels of records and variants) as -well as post-MVP [adapter functions]. +Since any cross-component call necessarily transits through a statically-known +`canon_lower`+`canon_lift` call pair, an AOT compiler can fuse `canon_lift` and +`canon_lower` into a single, efficient trampoline. This allows efficient +compilation of the permissive [subtyping](Subtyping.md) allowed between +components (including the elimination of string operations on the labels of +records and variants) as well as post-MVP [adapter functions]. + +The `may_leave` flag set during lowering in `canon_lift` and `canon_lower` +ensures that the relative ordering of the side effects of `lift` and `lower` +cannot be observed via import calls and thus an implementation may reliably +interleave `lift` and `lower` whenever making a cross-component call to avoid +the intermediate copy performed by `lift`. This unobservability of interleaving +depends on the shared-nothing property of components which guarantees that all +the low-level state touched by `lift` and `lower` are disjoint. Though it +should be rare, same-component-instance `canon_lift`+`canon_lower` call pairs +are technically allowed by the above rules (and may arise unintentionally in +component reexport scenarios). Such cases can be statically distinguished by +the AOT compiler as requiring an intermediate copy to implement the above +`lift`-then-`lower` semantics. + [Canonical Definitions]: Explainer.md#canonical-definitions diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index ed9490a..3a90ec5 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -870,9 +870,12 @@ class Instance: may_enter = True # ... -def canon_lift(callee_opts, callee_instance, callee, functype, args): - trap_if(not callee_instance.may_enter) - callee_instance.may_enter = False +def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_export): + if called_as_export: + trap_if(not callee_instance.may_enter) + callee_instance.may_enter = False + else: + assert(not callee_instance.may_enter) assert(callee_instance.may_leave) callee_instance.may_leave = False @@ -888,7 +891,8 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): def post_return(): if callee_opts.post_return is not None: callee_opts.post_return(flat_results) - callee_instance.may_enter = True + if called_as_export: + callee_instance.may_enter = True return (result, post_return) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 1d85edd..e658637 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -342,7 +342,7 @@ def test_roundtrip(t, v): callee_heap = Heap(1000) callee_opts = mk_opts(callee_heap.memory, 'utf8', callee_heap.realloc, lambda x: () ) - lifted_callee = lambda args: canon_lift(callee_opts, callee_instance, callee, ft, args) + lifted_callee = lambda args: canon_lift(callee_opts, callee_instance, callee, ft, args, True) caller_heap = Heap(1000) caller_instance = Instance() From 7323ff69d7e77f0b1d5902c990be02a29c790a3c Mon Sep 17 00:00:00 2001 From: Ivan Mikushin Date: Sat, 9 Jul 2022 14:38:31 -0700 Subject: [PATCH 091/301] Fix the link to Explainer.md --- design/mvp/examples/SharedEverythingDynamicLinking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 247d65f..552ee5e 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -10,7 +10,7 @@ should be able to leverage of existing support for native dynamic linking (of Shared-everything dynamic linking should be *complementary* to the shared-nothing dynamic linking of components described in the -[explainer](Explainer.md). In particular, dynamically-linked modules must not +[explainer](../Explainer.md). In particular, dynamically-linked modules must not share linear memory across component instance boundaries. For example, we want the static dependency graph on the left to produce the runtime instance graph on the right: create the runtime instance graph on the right: From 7674850a1d67dbebaefd6da9b43bce3b19f96589 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 11 Jul 2022 18:45:26 -0500 Subject: [PATCH 092/301] Remove stale sentence about trapping Made stale by #35. --- design/mvp/CanonicalABI.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 0323e55..16f3164 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -927,8 +927,7 @@ def wrap_i64_to_i32(i): Finally, flags are lifted by OR-ing together all the flattened `i32` values and then lifting to a record the same way as when loading flags from linear -memory. The dynamic checks in `unpack_flags_from_int` will trap if any -bits are set in an `i32` that don't correspond to a flag. +memory. ```python def lift_flat_flags(vi, labels): i = 0 From c6065fd1937dcb0acccd0fe3fdc537220d18a106 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 12 Jul 2022 10:15:56 -0500 Subject: [PATCH 093/301] Say lists of numbers produce typed arrays in the JS API Resolves #66 --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 329f48f..ea4f0f0 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -955,7 +955,7 @@ At a high level, the additional coercions would be: | `char` | same as [`USVString`] | same as [`USVString`], throw if the USV length is not 1 | | `record` | TBD: maybe a [JS Record]? | same as [`dictionary`] | | `variant` | TBD | TBD | -| `list` | same as [`sequence`] | same as [`sequence`] | +| `list` | create a typed array copy for number types; otherwise produce a JS array (like [`sequence`]) | same as [`sequence`] | | `string` | same as [`USVString`] | same as [`USVString`] | | `tuple` | TBD: maybe a [JS Tuple]? | TBD | | `flags` | TBD: maybe a [JS Record]? | same as [`dictionary`] of optional `boolean` fields with default values of `false` | From 1e29f9a5ba5880792275123e956545b642e23000 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Jul 2022 12:18:48 -0500 Subject: [PATCH 094/301] Add WIT.md and link into existing docs --- README.md | 3 +- design/mvp/Explainer.md | 23 +- design/mvp/WIT.md | 550 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 569 insertions(+), 7 deletions(-) create mode 100644 design/mvp/WIT.md diff --git a/README.md b/README.md index 88b72e2..0e91a1a 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Component Model design and specification This repository describes the high-level [goals], [use cases], [design choices] -and [FAQ] of the component model as well as a more-detailed [explainer], +and [FAQ] of the component model as well as a more-detailed [explainer], [IDL], [binary format] and [ABI] covering the initial Minimum Viable Product (MVP) release. @@ -20,6 +20,7 @@ To contribute to any of these repositories, see the Community Group's [design choices]: design/high-level/Choices.md [FAQ]: design/high-level/FAQ.md [explainer]: design/mvp/Explainer.md +[IDL]: design/mvp/WIT.md [binary format]: design/mvp/Binary.md [ABI]: design/mvp/CanonicalABI.md [formal spec]: spec/ diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ea4f0f0..c88074f 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -539,14 +539,24 @@ variety of source languages. As syntactic sugar, the text format of `functype` additionally allows `result` to be absent, interpreting this as `(result unit)`. -The `instance` type constructor represents the result of instantiating a -component and thus is the same as a `component` type minus the description -of imports. +The `instance` type constructor describes a list of named, typed definitions +that can be imported or exported by a component. Informally, instance types +correspond to the usual concept of an "interface" and instance types thus serve +as static interface descriptions. In addition to the S-Expression text format +defined here, which is meant to go inside component definitions, interfaces can +also be defined as standalone, human-friendly text files in the [`wit`](WIT.md) +[Interface Definition Language]. The `component` type constructor is symmetric to the core `module` type -constructor and is built from a sequence of "declarators" which are used to -describe the imports and exports of the component. There are four kinds of -declarators: +constructor and contains *two* lists of named definitions for the imports +and exports of a component, respectively. As suggested above, instance types +can show up in *both* the import and export types of a component type. + +Both `instance` and `component` type constructors are built from a sequence of +"declarators", of which there are four kinds—`type`, `alias`, `import` and +`export`—where only `component` type constructors can contain `import` +declarators. The meanings of these declarators is basically the same as the +core module declarators introduced above. As with core modules, `importdecl` and `exportdecl` classify component `import` and `export` definitions, with `importdecl` allowing an identifier to be @@ -1103,6 +1113,7 @@ and will be added over the coming months to complete the MVP proposal: [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface [Environment Variables]: https://en.wikipedia.org/wiki/Environment_variable [Linear]: https://en.wikipedia.org/wiki/Substructural_type_system#Linear_type_systems +[Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language [module-linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md [interface-types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md new file mode 100644 index 0000000..6f28bfc --- /dev/null +++ b/design/mvp/WIT.md @@ -0,0 +1,550 @@ +# The `wit` format + +This is intended to document the `wit` format as it exists today. The goal is +to provide an overview to understand what features `wit` files give you and how +they're structured. This isn't intended to be a formal grammar, although it's +expected that one day we'll have a formal grammar for `wit` files. + +If you're curious to give things a spin try out the [online +demo](https://bytecodealliance.github.io/wit-bindgen/) of `wit-bindgen` where +you can input `wit` on the left and see output of generated bindings for +languages on the right. If you're looking to start you can try out the +"markdown" output mode which generates documentation for the input document on +the left. + +## Lexical structure + +The `wit` format is a curly-braced-based format where whitespace is optional (but +recommended). It is intended to be easily human readable and supports features +like comments, multi-line comments, and custom identifiers. A `wit` document +is parsed as a unicode string, and when stored in a file is expected to be +encoded as UTF-8. + +Additionally, wit files must not contain any bidirectional override scalar values, +control codes other than newline, carriage return, and horizontal tab, or +codepoints that Unicode officially deprecates or strongly discourages. + +The current structure of tokens are: + +```wit +token ::= whitespace + | comment + | operator + | keyword + | identifier +``` + +Whitespace and comments are ignored when parsing structures defined elsewhere +here. + +### Whitespace + +A `whitespace` token in `wit` is a space, a newline, a carriage return, or a +tab character: + +```wit +whitespace ::= ' ' | '\n' | '\r' | '\t' +``` + +### Comments + +A `comment` token in `wit` is either a line comment preceded with `//` which +ends at the next newline (`\n`) character or it's a block comment which starts +with `/*` and ends with `*/`. Note that block comments are allowed to be nested +and their delimiters must be balanced + +```wit +comment ::= '//' character-that-isnt-a-newline* + | '/*' any-unicode-character* '*/' +``` + +There is a special type of comment called `documentation comment`. A +`doc-comment` is either a line comment preceded with `///` whichends at the next +newline (`\n`) character or it's a block comment which starts with `/**` and ends +with `*/`. Note that block comments are allowed to be nested and their delimiters +must be balanced + +```wit +doc-comment ::= '///' character-that-isnt-a-newline* + | '/**' any-unicode-character* '*/' +``` + +### Operators + +There are some common operators in the lexical structure of `wit` used for +various constructs. Note that delimiters such as `{` and `(` must all be +balanced. + +```wit +operator ::= '=' | ',' | ':' | ';' | '(' | ')' | '{' | '}' | '<' | '>' | '*' | '->' +``` + +### Keywords + +Certain identifiers are reserved for use in `wit` documents and cannot be used +bare as an identifier. These are used to help parse the format, and the list of +keywords is still in flux at this time but the current set is: + +```wit +keyword ::= 'use' + | 'type' + | 'resource' + | 'func' + | 'u8' | 'u16' | 'u32' | 'u64' + | 's8' | 's16' | 's32' | 's64' + | 'float32' | 'float64' + | 'char' + | 'handle' + | 'record' + | 'enum' + | 'flags' + | 'variant' + | 'union' + | 'bool' + | 'string' + | 'option' + | 'list' + | 'expected' + | 'unit' + | 'as' + | 'from' + | 'static' + | 'interface' + | 'tuple' + | 'async' + | 'future' + | 'stream' +``` + +## Top-level items + +A `wit` document is a sequence of items specified at the top level. These items +come one after another and it's recommended to separate them with newlines for +readability but this isn't required. + +## Item: `use` + +A `use` statement enables importing type or resource definitions from other +wit documents. The structure of a use statement is: + +```wit +use * from other-file +use { a, list, of, names } from another-file +use { name as other-name } from yet-another-file +``` + +Specifically the structure of this is: + +```wit +use-item ::= 'use' use-names 'from' id + +use-names ::= '*' + | '{' use-names-list '}' + +use-names-list ::= use-names-item + | use-names-item ',' use-names-list? + +use-names-item ::= id + | id 'as' id +``` + +Note: Here `use-names-list?` means at least one `use-name-list` term. + +## Items: type + +There are a number of methods of defining types in a `wit` document, and all of +the types that can be defined in `wit` are intended to map directly to types in +the [interface types specification](https://github.com/WebAssembly/interface-types). + +### Item: `type` (alias) + +A `type` statement declares a new named type in the `wit` document. This name can +be later referred to when defining items using this type. This construct is +similar to a type alias in other languages + +```wit +type my-awesome-u32 = u32 +type my-complicated-tuple = tuple +``` + +Specifically the structure of this is: + +```wit +type-item ::= 'type' id '=' ty +``` + +### Item: `record` (bag of named fields) + +A `record` statement declares a new named structure with named fields. Records +are similar to a `struct` in many languages. Instances of a `record` always have +their fields defined. + +```wit +record pair { + x: u32, + y: u32, +} + +record person { + name: string, + age: u32, + has-lego-action-figure: bool, +} +``` + +Specifically the structure of this is: + +```wit +record-item ::= 'record' id '{' record-fields '}' + +record-fields ::= record-field + | record-field ',' record-fields? + +record-field ::= id ':' ty +``` + +### Item: `flags` (bag-of-bools) + +A `flags` statement defines a new `record`-like structure where all the fields +are booleans. The `flags` type is distinct from `record` in that it typically is +represented as a bit flags representation in the canonical ABI. For the purposes +of type-checking, however, it's simply syntactic sugar for a record-of-booleans. + +```wit +flags properties { + lego, + marvel-superhero, + supervillan, +} + +// type-wise equivalent to: +// +// record properties { +// lego: bool, +// marvel-superhero: bool, +// supervillan: bool, +// } +``` + +Specifically the structure of this is: + +```wit +flags-items ::= 'flags' id '{' flags-fields '}' + +flags-fields ::= id, + | id ',' flags-fields? +``` + +### Item: `variant` (one of a set of types) + +A `variant` statement defines a new type where instances of the type match +exactly one of the variants listed for the type. This is similar to a "sum" type +in algebraic datatypes (or an `enum` in Rust if you're familiar with it). +Variants can be thought of as tagged unions as well. + +Each case of a variant can have an optional type associated with it which is +present when values have that particular case's tag. + +All `variant` type must have at least one case specified. + +```wit +variant filter { + all, + none, + some(list), +} +``` + +Specifically the structure of this is: + +```wit +variant-items ::= 'variant' id '{' variant-cases '}' + +variant-cases ::= variant-case, + | variant-case ',' variant-cases? + +variant-case ::= id + | id '(' ty ')' +``` + +### Item: `enum` (variant but with no payload) + +An `enum` statement defines a new type which is semantically equivalent to a +`variant` where none of the cases have a payload type. This is special-cased, +however, to possibly have a different representation in the language ABIs or +have different bindings generated in for languages. + +```wit +enum color { + red, + green, + blue, + yellow, + other, +} + +// type-wise equivalent to: +// +// variant color { +// red, +// green, +// blue, +// yellow, +// other, +// } +``` + +Specifically the structure of this is: + +```wit +enum-items ::= 'enum' id '{' enum-cases '}' + +enum-cases ::= id, + | id ',' enum-cases? +``` + +### Item: `union` (variant but with no case names) + +A `union` statement defines a new type which is semantically equivalent to a +`variant` where all of the cases have a payload type and the case names are +numerical. This is special-cased, however, to possibly have a different +representation in the language ABIs or have different bindings generated in for +languages. + +```wit +union configuration { + string, + list, +} + +// type-wise equivalent to: +// +// variant configuration { +// 0(string), +// 1(list), +// } +``` + +Specifically the structure of this is: + +```wit +union-items ::= 'union' id '{' union-cases '}' + +union-cases ::= ty, + | ty ',' union-cases? +``` + +## Item: `func` + +Functions can also be defined in a `wit` document. Functions have a name, +parameters, and results. Functions can optionally also be declared as `async` +functions. + +```wit +thunk: func() +fibonacci: func(n: u32) -> u32 +sleep: async func(ms: u64) +``` + +Specifically functions have the structure: + +```wit +func-item ::= id ':' 'async'? 'func' '(' func-args ')' func-ret + +func-args ::= func-arg + | func-arg ',' func-args? + +func-arg ::= id ':' ty + +func-ret ::= nil + | '->' ty +``` + +## Item: `resource` + +Resources represent a value that has a hidden representation not known to the +outside world. This means that the resource is operated on through a "handle" (a +pointer of sorts). Resources also have ownership associated with them and +languages will have to manage the lifetime of resources manually (they're +similar to file descriptors). + +Resources can also optionally have functions defined within them which adds an +implicit "self" argument as the first argument to each function of the same type +of the including resource, unless the function is flagged as `static`. + +```wit +resource file-descriptor + +resource request { + static new: func() -> request + + body: async func() -> list + headers: func() -> list +} +``` + +Specifically resources have the structure: + +```wit +resource-item ::= 'resource' id resource-contents + +resource-contents ::= nil + | '{' resource-defs '}' + +resource-defs ::= resource-def resource-defs? + +resource-def ::= 'static'? func-item +``` + +## Types + +As mentioned previously the intention of `wit` is to allow defining types +corresponding to the interface types specification. Many of the top-level items +above are introducing new named types but "anonymous" types are also supported, +such as built-ins. For example: + +```wit +type number = u32 +type fallible-function-result = expected +type headers = list +``` + +Specifically the following types are available: + +```wit +ty ::= 'u8' | 'u16' | 'u32' | 'u64' + | 's8' | 's16' | 's32' | 's64' + | 'float32' | 'float64' + | 'char' + | 'bool' + | 'string' + | 'unit' + | tuple + | list + | option + | expected + | future + | stream + | id + +tuple ::= 'tuple' '<' tuple-list '>' +tuple-list ::= ty + | ty ',' tuple-list? + +list ::= 'list' '<' ty '>' + +option ::= 'option' '<' ty '>' + +expected ::= 'expected' '<' ty ',' ty '>' + +future ::= 'future' '<' ty '>' + +stream ::= 'stream' '<' ty ',' ty '>' +``` + +The `tuple` type is semantically equivalent to a `record` with numerical fields, +but it frequently can have language-specific meaning so it's provided as a +first-class type. + +Similarly the `option` and `expected` types are semantically equivalent to the +variants: + +```wit +variant option { + none, + some(ty), +} + +variant expected { + ok(ok-ty) + err(err-ty), +} +``` + +These types are so frequently used and frequently have language-specific +meanings though so they're also provided as first-class types. + +Finally the last case of a `ty` is simply an `id` which is intended to refer to +another type or resource defined in the document. Note that definitions can come +through a `use` statement or they can be defined locally. + +## Identifiers + +Identifiers in `wit` can be defined with two different forms. The first is a +lower-case [stream-safe] [NFC] [kebab-case] identifier where each part delimited +by '-'s starts with a `XID_Start` scalar value with a zero Canonical Combining +Class: + +```wit +foo: func(bar: u32) + +red-green-blue: func(r: u32, g: u32, b: u32) +``` + +This form can't name identifiers which have the same name as wit keywords, so +the second form is the same syntax with the same restrictions as the first, but +prefixed with '%': + +```wit +%foo: func(%bar: u32) + +%red-green-blue: func(%r: u32, %g: u32, %b: u32) + +// This form also supports identifiers that would otherwise be keywords. +%variant: func(%enum: s32) +``` + +[kebab-case]: https://en.wikipedia.org/wiki/Letter_case#Kebab_case +[Unicode identifier]: http://www.unicode.org/reports/tr31/ +[stream-safe]: https://unicode.org/reports/tr15/#Stream_Safe_Text_Format +[NFC]: https://unicode.org/reports/tr15/#Norm_Forms + +## Name resolution + +A `wit` document is resolved after parsing to ensure that all names resolve +correctly. For example this is not a valid `wit` document: + +```wit +type foo = bar // ERROR: name `bar` not defined +``` + +Type references primarily happen through the `id` production of `ty`. + +Additionally names in a `wit` document can only be defined once: + +```wit +type foo = u32 +type foo = u64 // ERROR: name `foo` already defined +``` + +Names do not need to be defined before they're used (unlike in C or C++), +it's ok to define a type after it's used: + +```wit +type foo = bar + +record bar { + age: u32, +} +``` + +Types, however, cannot be recursive: + +```wit +type foo = foo // ERROR: cannot refer to itself + +record bar1 { + a: bar2, +} + +record bar2 { + a: bar1, // ERROR: record cannot refer to itself +} +``` + +The intention of `wit` is that it maps down to interface types, so the goal of +name resolution is to effectively create the type section of a wasm module using +interface types. The restrictions about self-referential types and such come +from how types can be defined in the interface types section. Additionally +definitions of named types such as `record foo { ... }` are intended to map +roughly to declarations in the type section of new types. From 4d1e879cc1373886a285fc94fe3416bb97a51b8f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 14 Jul 2022 18:07:29 -0500 Subject: [PATCH 095/301] Switch (back) to multi-return, remove unit, s/expected/result/ Resolves #41 --- design/mvp/Binary.md | 113 +++++++++++----------- design/mvp/CanonicalABI.md | 117 ++++++++++++++-------- design/mvp/Explainer.md | 55 ++++++----- design/mvp/Subtyping.md | 1 - design/mvp/canonical-abi/definitions.py | 123 +++++++++++++++--------- design/mvp/canonical-abi/run_tests.py | 97 ++++++++++--------- 6 files changed, 292 insertions(+), 214 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index fdeefe7..906a6f1 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -151,60 +151,59 @@ Notes: means that the maximum `ct` in an MVP `alias` declarator is `1`. ``` -type ::= dt: => (type dt) -deftype ::= dvt: => dvt - | ft: => ft - | ct: => ct - | it: => it -primvaltype ::= 0x7f => unit - | 0x7e => bool - | 0x7d => s8 - | 0x7c => u8 - | 0x7b => s16 - | 0x7a => u16 - | 0x79 => s32 - | 0x78 => u32 - | 0x77 => s64 - | 0x76 => u64 - | 0x75 => float32 - | 0x74 => float64 - | 0x73 => char - | 0x72 => string -defvaltype ::= pvt: => pvt - | 0x71 field*:vec() => (record field*) - | 0x70 case*:vec() => (variant case*) - | 0x6f t: => (list t) - | 0x6e t*:vec() => (tuple t*) - | 0x6d n*:vec() => (flags n*) - | 0x6c n*:vec() => (enum n*) - | 0x6b t*:vec() => (union t*) - | 0x6a t: => (option t) - | 0x69 t: u: => (expected t u) -field ::= n: t: => (field n t) -case ::= n: t: 0x0 => (case n t) - | n: t: 0x1 i: => (case n t (refines case-label[i])) -valtype ::= i: => i - | pvt: => pvt -functype ::= 0x40 param*:vec() t: => (func param* (result t)) -param ::= 0x00 t: => (param t) - | 0x01 n: t: => (param n t) -componenttype ::= 0x41 cd*:vec() => (component cd*) -instancetype ::= 0x42 id*:vec() => (instance id*) -componentdecl ::= 0x03 id: => id - | id: => id -instancedecl ::= 0x00 t: => t - | 0x01 t: => t - | 0x02 a: => a - | 0x04 ed: => ed -importdecl ::= n: ed: => (import n ed) -exportdecl ::= n: ed: => (export n ed) -externdesc ::= 0x00 0x11 i: => (core module (type i)) - | 0x01 i: => (func (type i)) - | 0x02 t: => (value t) - | 0x03 b: => (type b) - | 0x04 i: => (instance (type i)) - | 0x05 i: => (component (type i)) -typebound ::= 0x00 i: => (eq i) +type ::= dt: => (type dt) +deftype ::= dvt: => dvt + | ft: => ft + | ct: => ct + | it: => it +primvaltype ::= 0x7f => bool + | 0x7e => s8 + | 0x7d => u8 + | 0x7c => s16 + | 0x7b => u16 + | 0x7a => s32 + | 0x79 => u32 + | 0x78 => s64 + | 0x77 => u64 + | 0x76 => float32 + | 0x75 => float64 + | 0x74 => char + | 0x73 => string +defvaltype ::= pvt: => pvt + | 0x72 nt*:vec() => (record (field nt)*) + | 0x71 case*:vec() => (variant case*) + | 0x70 t: => (list t) + | 0x6f t*:vec() => (tuple t*) + | 0x6e n*:vec() => (flags n*) + | 0x6d n*:vec() => (enum n*) + | 0x6c t*:vec() => (union t*) + | 0x6b t: => (option t) + | 0x6a t*:vec() u*:vec() => (result t* (error u*)) +namedtype ::= n: t: => (field n t) +case ::= nt*:vec() 0x0 => (case nt*) + | nt*:vec() 0x1 i: => (case nt* (refines case-label[i])) +valtype ::= i: => i + | pvt: => pvt +functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) +prlist ::= 0x00 t: => [t] + | 0x01 nt*:vec() => nt* +componenttype ::= 0x41 cd*:vec() => (component cd*) +instancetype ::= 0x42 id*:vec() => (instance id*) +componentdecl ::= 0x03 id: => id + | id: => id +instancedecl ::= 0x00 t: => t + | 0x01 t: => t + | 0x02 a: => a + | 0x04 ed: => ed +importdecl ::= n: ed: => (import n ed) +exportdecl ::= n: ed: => (export n ed) +externdesc ::= 0x00 0x11 i: => (core module (type i)) + | 0x01 i: => (func (type i)) + | 0x02 t: => (value t) + | 0x03 b: => (type b) + | 0x04 i: => (instance (type i)) + | 0x05 i: => (component (type i)) +typebound ::= 0x00 i: => (eq i) ``` Notes: * The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, @@ -218,9 +217,9 @@ Notes: in type definitions from containing components. * Validation of `externdesc` requires the various `typeidx` type constructors to match the preceding `sort`. -* Validation of record field names, variant case names, flag names, and enum case - names requires that the name be unique for the record, variant, flags, or enum - type definition. +* Validation of function parameter and result names, record field names, + variant case names, flag names, and enum case names requires that the name be + unique for the func, record, variant, flags, or enum type definition. * Validation of the optional `refines` clause of a variant case requires that the case index is less than the current case's index (and therefore cases are acyclic). diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 16f3164..c1771e3 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -68,13 +68,12 @@ function to replace specialized value types with their expansion: ```python def despecialize(t): match t: - case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) - case Unit() : return Record([]) - case Union(ts) : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ]) - case Enum(labels) : return Variant([ Case(l, Unit()) for l in labels ]) - case Option(t) : return Variant([ Case("none", Unit()), Case("some", t) ]) - case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) - case _ : return t + case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) + case Union(ts) : return Variant([ Case(str(i), [t]) for i,t in enumerate(ts) ]) + case Enum(labels) : return Variant([ Case(l, []) for l in labels ]) + case Option(t) : return Variant([ Case("none", []), Case("some", [t]) ]) + case Result(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) + case _ : return t ``` The specialized value types `string` and `flags` are missing from this list because they are given specialized canonical ABI representations distinct from @@ -98,14 +97,17 @@ def alignment(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 4 - case Record(fields) : return max_alignment(types_of(fields)) - case Variant(cases) : return max_alignment(types_of(cases) + [discriminant_type(cases)]) + case Record(fields) : return alignment_tuple(field_types(fields)) + case Variant(cases) : return alignment_variant(cases) case Flags(labels) : return alignment_flags(labels) +``` -def types_of(fields_or_cases): - return [x.t for x in fields_or_cases] +Record alignment is tuple alignment, with the definitions split for reuse below: +```python +def field_types(fields): + return [f.t for f in fields] -def max_alignment(ts): +def alignment_tuple(ts): a = 1 for t in ts: a = max(a, alignment(t)) @@ -117,6 +119,9 @@ covering the number of cases in the variant. Depending on the payload type, this can allow more compact representations of variants in memory. This smallest integer type is selected by the following function, used above and below: ```python +def alignment_variant(cases): + return max(alignment(discriminant_type(cases)), max_case_alignment(cases)) + def discriminant_type(cases): n = len(cases) assert(0 < n < (1 << 32)) @@ -125,6 +130,12 @@ def discriminant_type(cases): case 1: return U8() case 2: return U16() case 3: return U32() + +def max_case_alignment(cases): + a = 1 + for c in cases: + a = max(a, alignment_tuple(c.ts)) + return a ``` As an optimization, `flags` are represented as packed bit-vectors. Like variant @@ -155,28 +166,28 @@ def size(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 8 - case Record(fields) : return size_record(fields) + case Record(fields) : return size_tuple(field_types(fields)) case Variant(cases) : return size_variant(cases) case Flags(labels) : return size_flags(labels) -def size_record(fields): +def size_tuple(ts): s = 0 - for f in fields: - s = align_to(s, alignment(f.t)) - s += size(f.t) - return align_to(s, alignment(Record(fields))) + for t in ts: + s = align_to(s, alignment(t)) + s += size(t) + return align_to(s, alignment_tuple(ts)) def align_to(ptr, alignment): return math.ceil(ptr / alignment) * alignment def size_variant(cases): s = size(discriminant_type(cases)) - s = align_to(s, max_alignment(types_of(cases))) + s = align_to(s, max_case_alignment(cases)) cs = 0 for c in cases: - cs = max(cs, size(c.t)) + cs = max(cs, size_tuple(c.ts)) s += cs - return align_to(s, alignment(Variant(cases))) + return align_to(s, alignment_variant(cases)) def size_flags(labels): n = len(labels) @@ -360,8 +371,8 @@ def load_variant(opts, ptr, cases): ptr += disc_size trap_if(disc >= len(cases)) case = cases[disc] - ptr = align_to(ptr, max_alignment(types_of(cases))) - return { case_label_with_refinements(case, cases): load(opts, ptr, case.t) } + ptr = align_to(ptr, max_case_alignment(cases)) + return { case_label_with_refinements(case, cases): load_tuple(opts, ptr, case.ts) } def case_label_with_refinements(case, cases): label = case.label @@ -376,6 +387,14 @@ def find_case(label, cases): if len(matches) == 1: return matches[0] return -1 + +def load_tuple(opts, ptr, ts): + a = [] + for t in ts: + ptr = align_to(ptr, alignment(t)) + a.append(load(opts, ptr, t)) + ptr += size(t) + return a ``` Finally, flags are converted from a bit-vector to a dictionary whose keys are @@ -675,8 +694,8 @@ def store_variant(opts, v, ptr, cases): disc_size = size(discriminant_type(cases)) store_int(opts, case_index, ptr, disc_size) ptr += disc_size - ptr = align_to(ptr, max_alignment(types_of(cases))) - store(opts, case_value, cases[case_index].t, ptr) + ptr = align_to(ptr, max_case_alignment(cases)) + store_tuple(opts, case_value, ptr, cases[case_index].ts) def match_case(v, cases): assert(len(v.keys()) == 1) @@ -686,6 +705,12 @@ def match_case(v, cases): case_index = find_case(label, cases) if case_index != -1: return (case_index, value) + +def store_tuple(opts, v, ptr, ts): + for i,t in enumerate(ts): + ptr = align_to(ptr, alignment(t)) + store(opts, v[i], t, ptr) + ptr += size(t) ``` Finally, flags are converted from a dictionary to a bit-vector by iterating @@ -740,11 +765,11 @@ MAX_FLAT_PARAMS = 16 MAX_FLAT_RESULTS = 1 def flatten(functype, context): - flat_params = flatten_types(functype.params) + flat_params = flatten_tuple(functype.params) if len(flat_params) > MAX_FLAT_PARAMS: flat_params = ['i32'] - flat_results = flatten_type(functype.result) + flat_results = flatten_tuple(functype.results) if len(flat_results) > MAX_FLAT_RESULTS: match context: case 'lift': @@ -755,7 +780,7 @@ def flatten(functype, context): return { 'params': flat_params, 'results': flat_results } -def flatten_types(ts): +def flatten_tuple(ts): return [ft for t in ts for ft in flatten_type(t)] ``` @@ -772,7 +797,7 @@ def flatten_type(t): case Float64() : return ['f64'] case Char() : return ['i32'] case String() | List(_) : return ['i32', 'i32'] - case Record(fields) : return flatten_types(types_of(fields)) + case Record(fields) : return flatten_tuple(field_types(fields)) case Variant(cases) : return flatten_variant(cases) case Flags(labels) : return ['i32'] * num_i32_flags(labels) ``` @@ -789,7 +814,7 @@ an `i32` into an `i64`. def flatten_variant(cases): flat = [] for c in cases: - for i,ft in enumerate(flatten_type(c.t)): + for i,ft in enumerate(flatten_tuple(c.ts)): if i < len(flat): flat[i] = join(flat[i], ft) else: @@ -915,7 +940,7 @@ def lift_flat_variant(opts, vi, cases): case ('i64', 'f32') : return reinterpret_i32_as_float(wrap_i64_to_i32(x)) case ('i64', 'f64') : return reinterpret_i64_as_float(x) case _ : return x - v = lift_flat(opts, CoerceValueIter(), case.t) + v = lift_flat_tuple(opts, CoerceValueIter(), case.ts) for have in flat_types: _ = vi.next(have) return { case_label_with_refinements(case, cases): v } @@ -923,6 +948,12 @@ def lift_flat_variant(opts, vi, cases): def wrap_i64_to_i32(i): assert(0 <= i < (1 << 64)) return i % (1 << 32) + +def lift_flat_tuple(opts, vi, ts): + a = [] + for t in ts: + a.append(lift_flat(opts, vi, t)) + return a ``` Finally, flags are lifted by OR-ing together all the flattened `i32` values @@ -1007,7 +1038,7 @@ def lower_flat_variant(opts, v, cases): case_index, case_value = match_case(v, cases) flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') - payload = lower_flat(opts, case_value, cases[case_index].t) + payload = lower_flat_tuple(opts, case_value, cases[case_index].ts) for i,have in enumerate(payload): want = flat_types.pop(0) match (have.t, want): @@ -1019,6 +1050,12 @@ def lower_flat_variant(opts, v, cases): for want in flat_types: payload.append(Value(want, 0)) return [Value('i32', case_index)] + payload + +def lower_flat_tuple(opts, v, ts): + flat = [] + for i,t in enumerate(ts): + flat += lower_flat(opts, v[i], t) + return flat ``` Finally, flags are lowered by slicing the bit vector into `i32` chunks: @@ -1040,7 +1077,7 @@ parameters or results given by the `ValueIter` `vi` into a tuple of values with types `ts`: ```python def lift(opts, max_flat, vi, ts): - flat_types = flatten_types(ts) + flat_types = flatten_tuple(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') tuple_type = Tuple(ts) @@ -1057,7 +1094,7 @@ greater-than-`max_flat` case by either allocating storage with `realloc` or accepting a caller-allocated buffer as an out-param: ```python def lower(opts, max_flat, vs, ts, out_param = None): - flat_types = flatten_types(ts) + flat_types = flatten_tuple(ts) if len(flat_types) > max_flat: tuple_type = Tuple(functype.params) tuple_value = {str(i): v for i,v in enumerate(vs)} @@ -1148,21 +1185,21 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_e except CoreWebAssemblyException: trap() - [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) + results = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), functype.results) def post_return(): if callee_opts.post_return is not None: callee_opts.post_return(flat_results) if called_as_export: callee_instance.may_enter = True - return (result, post_return) + return (results, post_return) ``` There are a number of things to note about this definition: Uncaught Core WebAssembly [exceptions] result in a trap at component boundaries. Thus, if a component wishes to signal an error, it must use some -sort of explicit type such as `expected` (whose `error` case particular -language bindings may choose to map to and from exceptions). +sort of explicit type such as `result` (whose `error` case particular language +bindings may choose to map to and from exceptions). The `called_as_export` parameter indicates whether `canon_lift` is being called as part of a component export or whether this `canon_lift` is being called @@ -1205,10 +1242,10 @@ def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): flat_args = ValueIter(flat_args) args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) - result, post_return = callee(args) + results, post_return = callee(args) caller_instance.may_leave = False - flat_results = lower(caller_opts, MAX_FLAT_RESULTS, [result], [functype.result], flat_args) + flat_results = lower(caller_opts, MAX_FLAT_RESULTS, results, functype.results, flat_args) caller_instance.may_leave = True post_return() diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index c88074f..4be4e37 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -440,23 +440,26 @@ deftype ::= | | | -defvaltype ::= unit - | bool +defvaltype ::= bool | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 | float32 | float64 | char | string | (record (field )*) - | (variant (case ? (refines )?)+) + | (variant (case ? * (refines )?)+) | (list ) | (tuple *) | (flags *) | (enum +) | (union +) | (option ) - | (expected ) + | (result (error *)?) valtype ::= | -functype ::= (func (param ? )* (result )) +functype ::= (func ) +paramlist ::= (param )* + | (param ) +resultlist ::= (result )* + | (result *) instancetype ::= (instance *) componentdecl ::= @@ -512,14 +515,13 @@ some `case` in the supertype. The sets of values allowed for the remaining *specialized value types* are defined by the following mapping: ``` - (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... - (flags *) ↦ (record (field bool)*) - unit ↦ (record) - (enum +) ↦ (variant (case unit)+) - (option ) ↦ (variant (case "none") (case "some" )) - (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... -(expected ) ↦ (variant (case "ok" ) (case "error" )) - string ↦ (list char) + (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... + (flags *) ↦ (record (field bool)*) + (enum +) ↦ (variant (case )+) + (option ) ↦ (variant (case "none") (case "some" )) + (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... +(result * (error *)?) ↦ (variant (case "ok" *) (case "error" *)) + string ↦ (list char) ``` Note that, at least initially, variants are required to have a non-empty list of cases. This could be relaxed in the future to allow an empty list of cases, with @@ -530,14 +532,13 @@ The remaining 3 type constructors in `deftype` use `valtype` to describe shared-nothing functions, components and component instances: The `func` type constructor describes a component-level function definition -that takes and returns `valtype`. In contrast to [`core:functype`] which, as a -low-level compiler target for a stack machine, returns zero or more results, -`functype` always returns a single type, with `unit` being used for functions -that don't return an interesting value (analogous to "void" in some languages). -Having a single return type simplifies the binding of `functype` into a wide -variety of source languages. As syntactic sugar, the text format of `functype` -additionally allows `result` to be absent, interpreting this as `(result -unit)`. +that takes and returns a list of `valtype`. In contrast to [`core:functype`], +the parameters and results of `functype` can have associated names which +validation requires to be unique. If a name is not present, the name is taken +to be a special "empty" name and uniqueness still requires there to only be one +unnamed parameter/result. To avoid unnecessary complexity for language binding +generators, parameter and result lists are not allowed to contain both named +and unnamed parameters. The `instance` type constructor describes a list of named, typed definitions that can be imported or exported by a component. Informally, instance types @@ -687,7 +688,7 @@ type. For example, with Core WebAssembly [exception-handling] and [stack-switching], a core function with type `(func (result i32))` can return an `i32`, throw, suspend or trap. In contrast, a component function with type `(func (result string))` may only return a `string` or trap. To express -failure, component functions can return `expected` and languages with exception +failure, component functions can return `result` and languages with exception handling can bind exceptions to the `error` case. Similarly, the forthcoming addition of [future and stream types] would explicitly declare patterns of stack-switching in component function signatures. @@ -955,7 +956,6 @@ At a high level, the additional coercions would be: | Type | `ToJSValue` | `ToWebAssemblyValue` | | ---- | ----------- | -------------------- | -| `unit` | `null` | accept everything | | `bool` | `true` or `false` | `ToBoolean` | | `s8`, `s16`, `s32` | as a Number value | `ToInt8`, `ToInt16`, `ToInt32` | | `u8`, `u16`, `u32` | as a Number value | `ToUint8`, `ToUint16`, `ToUint32` | @@ -972,9 +972,16 @@ At a high level, the additional coercions would be: | `enum` | same as [`enum`] | same as [`enum`] | | `option` | same as [`T?`] | same as [`T?`] | | `union` | same as [`union`] | same as [`union`] | -| `expected` | same as `variant`, but coerce a top-level `error` return value to a thrown exception | same as `variant`, but coerce uncaught exceptions to top-level `error` return values | +| `result` | same as `variant`, but coerce a top-level `error` return value to a thrown exception | same as `variant`, but coerce uncaught exceptions to top-level `error` return values | Notes: +* Function parameter names are ignored since JavaScript doesn't have named + parameters. +* If a function's result type list is empty, the JavaScript function returns + `undefined`. If the result type list contains a single unnamed result, then + the return value is specified by `ToJSValue` above. Otherwise, the function + result is wrapped into a JS object whose field names are taken from the result + names and whose field values are specified by `ToJSValue` above. * The forthcoming addition of [resource and handle types] would additionally allow coercion to and from the remaining Symbol and Object JavaScript value types. diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index 7114f05..e6f86f7 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -6,7 +6,6 @@ But roughly speaking: | Type | Subtyping | | ------------------------- | --------- | -| `unit` | every value type is a subtype of `unit` | | `bool` | | | `s8`, `s16`, `s32`, `s64`, `u8`, `u16`, `u32`, `u64` | lossless coercions are allowed | | `float32`, `float64` | `float32 <: float64` | diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 3a90ec5..e7f4608 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -20,7 +20,6 @@ def trap_if(cond): raise Trap() class ValType: pass -class Unit(ValType): pass class Bool(ValType): pass class S8(ValType): pass class U8(ValType): pass @@ -59,7 +58,7 @@ class Flags(ValType): @dataclass class Case: label: str - t: ValType + ts: [ValType] refines: str = None @dataclass @@ -79,26 +78,25 @@ class Option(ValType): t: ValType @dataclass -class Expected(ValType): - ok: ValType - error: ValType +class Result(ValType): + ok: [ValType] + error: [ValType] @dataclass class Func: params: [ValType] - result: ValType + results: [ValType] ### Despecialization def despecialize(t): match t: - case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) - case Unit() : return Record([]) - case Union(ts) : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ]) - case Enum(labels) : return Variant([ Case(l, Unit()) for l in labels ]) - case Option(t) : return Variant([ Case("none", Unit()), Case("some", t) ]) - case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) - case _ : return t + case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) + case Union(ts) : return Variant([ Case(str(i), [t]) for i,t in enumerate(ts) ]) + case Enum(labels) : return Variant([ Case(l, []) for l in labels ]) + case Option(t) : return Variant([ Case("none", []), Case("some", [t]) ]) + case Result(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) + case _ : return t ### Alignment @@ -113,14 +111,16 @@ def alignment(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 4 - case Record(fields) : return max_alignment(types_of(fields)) - case Variant(cases) : return max_alignment(types_of(cases) + [discriminant_type(cases)]) + case Record(fields) : return alignment_tuple(field_types(fields)) + case Variant(cases) : return alignment_variant(cases) case Flags(labels) : return alignment_flags(labels) -def types_of(fields_or_cases): - return [x.t for x in fields_or_cases] +# + +def field_types(fields): + return [f.t for f in fields] -def max_alignment(ts): +def alignment_tuple(ts): a = 1 for t in ts: a = max(a, alignment(t)) @@ -128,6 +128,9 @@ def max_alignment(ts): # +def alignment_variant(cases): + return max(alignment(discriminant_type(cases)), max_case_alignment(cases)) + def discriminant_type(cases): n = len(cases) assert(0 < n < (1 << 32)) @@ -137,6 +140,12 @@ def discriminant_type(cases): case 2: return U16() case 3: return U32() +def max_case_alignment(cases): + a = 1 + for c in cases: + a = max(a, alignment_tuple(c.ts)) + return a + # def alignment_flags(labels): @@ -158,28 +167,28 @@ def size(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 8 - case Record(fields) : return size_record(fields) + case Record(fields) : return size_tuple(field_types(fields)) case Variant(cases) : return size_variant(cases) case Flags(labels) : return size_flags(labels) -def size_record(fields): +def size_tuple(ts): s = 0 - for f in fields: - s = align_to(s, alignment(f.t)) - s += size(f.t) - return align_to(s, alignment(Record(fields))) + for t in ts: + s = align_to(s, alignment(t)) + s += size(t) + return align_to(s, alignment_tuple(ts)) def align_to(ptr, alignment): return math.ceil(ptr / alignment) * alignment def size_variant(cases): s = size(discriminant_type(cases)) - s = align_to(s, max_alignment(types_of(cases))) + s = align_to(s, max_case_alignment(cases)) cs = 0 for c in cases: - cs = max(cs, size(c.t)) + cs = max(cs, size_tuple(c.ts)) s += cs - return align_to(s, alignment(Variant(cases))) + return align_to(s, alignment_variant(cases)) def size_flags(labels): n = len(labels) @@ -323,8 +332,8 @@ def load_variant(opts, ptr, cases): ptr += disc_size trap_if(disc >= len(cases)) case = cases[disc] - ptr = align_to(ptr, max_alignment(types_of(cases))) - return { case_label_with_refinements(case, cases): load(opts, ptr, case.t) } + ptr = align_to(ptr, max_case_alignment(cases)) + return { case_label_with_refinements(case, cases): load_tuple(opts, ptr, case.ts) } def case_label_with_refinements(case, cases): label = case.label @@ -340,6 +349,14 @@ def find_case(label, cases): return matches[0] return -1 +def load_tuple(opts, ptr, ts): + a = [] + for t in ts: + ptr = align_to(ptr, alignment(t)) + a.append(load(opts, ptr, t)) + ptr += size(t) + return a + # def load_flags(opts, ptr, labels): @@ -562,8 +579,8 @@ def store_variant(opts, v, ptr, cases): disc_size = size(discriminant_type(cases)) store_int(opts, case_index, ptr, disc_size) ptr += disc_size - ptr = align_to(ptr, max_alignment(types_of(cases))) - store(opts, case_value, cases[case_index].t, ptr) + ptr = align_to(ptr, max_case_alignment(cases)) + store_tuple(opts, case_value, ptr, cases[case_index].ts) def match_case(v, cases): assert(len(v.keys()) == 1) @@ -574,6 +591,12 @@ def match_case(v, cases): if case_index != -1: return (case_index, value) +def store_tuple(opts, v, ptr, ts): + for i,t in enumerate(ts): + ptr = align_to(ptr, alignment(t)) + store(opts, v[i], t, ptr) + ptr += size(t) + # def store_flags(opts, v, ptr, labels): @@ -594,11 +617,11 @@ def pack_flags_into_int(v, labels): MAX_FLAT_RESULTS = 1 def flatten(functype, context): - flat_params = flatten_types(functype.params) + flat_params = flatten_tuple(functype.params) if len(flat_params) > MAX_FLAT_PARAMS: flat_params = ['i32'] - flat_results = flatten_type(functype.result) + flat_results = flatten_tuple(functype.results) if len(flat_results) > MAX_FLAT_RESULTS: match context: case 'lift': @@ -609,7 +632,7 @@ def flatten(functype, context): return { 'params': flat_params, 'results': flat_results } -def flatten_types(ts): +def flatten_tuple(ts): return [ft for t in ts for ft in flatten_type(t)] # @@ -624,7 +647,7 @@ def flatten_type(t): case Float64() : return ['f64'] case Char() : return ['i32'] case String() | List(_) : return ['i32', 'i32'] - case Record(fields) : return flatten_types(types_of(fields)) + case Record(fields) : return flatten_tuple(field_types(fields)) case Variant(cases) : return flatten_variant(cases) case Flags(labels) : return ['i32'] * num_i32_flags(labels) @@ -633,7 +656,7 @@ def flatten_type(t): def flatten_variant(cases): flat = [] for c in cases: - for i,ft in enumerate(flatten_type(c.t)): + for i,ft in enumerate(flatten_tuple(c.ts)): if i < len(flat): flat[i] = join(flat[i], ft) else: @@ -735,7 +758,7 @@ def next(self, want): case ('i64', 'f32') : return reinterpret_i32_as_float(wrap_i64_to_i32(x)) case ('i64', 'f64') : return reinterpret_i64_as_float(x) case _ : return x - v = lift_flat(opts, CoerceValueIter(), case.t) + v = lift_flat_tuple(opts, CoerceValueIter(), case.ts) for have in flat_types: _ = vi.next(have) return { case_label_with_refinements(case, cases): v } @@ -744,6 +767,12 @@ def wrap_i64_to_i32(i): assert(0 <= i < (1 << 64)) return i % (1 << 32) +def lift_flat_tuple(opts, vi, ts): + a = [] + for t in ts: + a.append(lift_flat(opts, vi, t)) + return a + # def lift_flat_flags(vi, labels): @@ -807,7 +836,7 @@ def lower_flat_variant(opts, v, cases): case_index, case_value = match_case(v, cases) flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') - payload = lower_flat(opts, case_value, cases[case_index].t) + payload = lower_flat_tuple(opts, case_value, cases[case_index].ts) for i,have in enumerate(payload): want = flat_types.pop(0) match (have.t, want): @@ -820,6 +849,12 @@ def lower_flat_variant(opts, v, cases): payload.append(Value(want, 0)) return [Value('i32', case_index)] + payload +def lower_flat_tuple(opts, v, ts): + flat = [] + for i,t in enumerate(ts): + flat += lower_flat(opts, v[i], t) + return flat + # def lower_flat_flags(v, labels): @@ -834,7 +869,7 @@ def lower_flat_flags(v, labels): ### Lifting and Lowering def lift(opts, max_flat, vi, ts): - flat_types = flatten_types(ts) + flat_types = flatten_tuple(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') tuple_type = Tuple(ts) @@ -846,7 +881,7 @@ def lift(opts, max_flat, vi, ts): # def lower(opts, max_flat, vs, ts, out_param = None): - flat_types = flatten_types(ts) + flat_types = flatten_tuple(ts) if len(flat_types) > max_flat: tuple_type = Tuple(functype.params) tuple_value = {str(i): v for i,v in enumerate(vs)} @@ -887,14 +922,14 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_e except CoreWebAssemblyException: trap() - [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result]) + results = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), functype.results) def post_return(): if callee_opts.post_return is not None: callee_opts.post_return(flat_results) if called_as_export: callee_instance.may_enter = True - return (result, post_return) + return (results, post_return) ### `lower` @@ -904,10 +939,10 @@ def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): flat_args = ValueIter(flat_args) args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) - result, post_return = callee(args) + results, post_return = callee(args) caller_instance.may_leave = False - flat_results = lower(caller_opts, MAX_FLAT_RESULTS, [result], [functype.result], flat_args) + flat_results = lower(caller_opts, MAX_FLAT_RESULTS, results, functype.results, flat_args) caller_instance.may_leave = True post_return() diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index e658637..7190343 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -92,7 +92,7 @@ def test_name(): if not equal_modulo_string_encoding(got, lower_v): fail("{} re-lift expected {} but got {}".format(test_name(), lower_v, got)) -test(Unit(), [], {}) +test(Record([]), [], {}) test(Record([Field('x',U8()), Field('y',U16()), Field('z',U32())]), [1,2,3], {'x':1,'y':2,'z':3}) test(Tuple([Tuple([U8(),U8()]),U8()]), [1,2,3], {'0':{'0':1,'1':2},'1':3}) t = Flags(['a','b']) @@ -101,45 +101,45 @@ def test_name(): test(t, [3], {'a':True,'b':True}) test(t, [4], {'a':False,'b':False}) test(Flags([str(i) for i in range(33)]), [0xffffffff,0x1], { str(i):True for i in range(33) }) -t = Variant([Case('x',U8()),Case('y',Float32()),Case('z',Unit())]) -test(t, [0,42], {'x': 42}) -test(t, [0,256], {'x': 0}) -test(t, [1,0x4048f5c3], {'y': 3.140000104904175}) -test(t, [2,0xffffffff], {'z': {}}) +t = Variant([Case('x',[U8()]),Case('y',[Float32()]),Case('z',[])]) +test(t, [0,42], {'x': [42]}) +test(t, [0,256], {'x': [0]}) +test(t, [1,0x4048f5c3], {'y':[3.140000104904175]}) +test(t, [2,0xffffffff], {'z':[]}) t = Union([U32(),U64()]) -test(t, [0,42], {'0':42}) -test(t, [0,(1<<35)], {'0':0}) -test(t, [1,(1<<35)], {'1':(1<<35)}) +test(t, [0,42], {'0':[42]}) +test(t, [0,(1<<35)], {'0':[0]}) +test(t, [1,(1<<35)], {'1':[1<<35]}) t = Union([Float32(), U64()]) -test(t, [0,0x4048f5c3], {'0': 3.140000104904175}) -test(t, [0,(1<<35)], {'0': 0}) -test(t, [1,(1<<35)], {'1': (1<<35)}) +test(t, [0,0x4048f5c3], {'0':[3.140000104904175]}) +test(t, [0,(1<<35)], {'0':[0]}) +test(t, [1,(1<<35)], {'1':[1<<35]}) t = Union([Float64(), U64()]) -test(t, [0,0x40091EB851EB851F], {'0': 3.14}) -test(t, [0,(1<<35)], {'0': 1.69759663277e-313}) -test(t, [1,(1<<35)], {'1': (1<<35)}) +test(t, [0,0x40091EB851EB851F], {'0':[3.14]}) +test(t, [0,(1<<35)], {'0':[1.69759663277e-313]}) +test(t, [1,(1<<35)], {'1':[1<<35]}) t = Union([U8()]) -test(t, [0,42], {'0':42}) +test(t, [0,42], {'0':[42]}) test(t, [1,256], None) -test(t, [0,256], {'0':0}) +test(t, [0,256], {'0':[0]}) t = Union([Tuple([U8(),Float32()]), U64()]) -test(t, [0,42,3.14], {'0': {'0':42, '1':3.14}}) -test(t, [1,(1<<35),0], {'1': (1<<35)}) +test(t, [0,42,3.14], {'0':[{'0':42, '1':3.14}]}) +test(t, [1,(1<<35),0], {'1':[1<<35]}) t = Option(Float32()) -test(t, [0,3.14], {'none':{}}) -test(t, [1,3.14], {'some':3.14}) -t = Expected(U8(),U32()) -test(t, [0, 42], {'ok':42}) -test(t, [1, 1000], {'error':1000}) -t = Variant([Case('w',U8()), Case('x',U8(),'w'), Case('y',U8()), Case('z',U8(),'x')]) -test(t, [0, 42], {'w':42}) -test(t, [1, 42], {'x|w':42}) -test(t, [2, 42], {'y':42}) -test(t, [3, 42], {'z|x|w':42}) -t2 = Variant([Case('w',U8())]) -test(t, [0, 42], {'w':42}, lower_t=t2, lower_v={'w':42}) -test(t, [1, 42], {'x|w':42}, lower_t=t2, lower_v={'w':42}) -test(t, [3, 42], {'z|x|w':42}, lower_t=t2, lower_v={'w':42}) +test(t, [0,3.14], {'none':[]}) +test(t, [1,3.14], {'some':[3.14]}) +t = Result([U8()],[U32()]) +test(t, [0, 42], {'ok':[42]}) +test(t, [1, 1000], {'error':[1000]}) +t = Variant([Case('w',[U8()]), Case('x',[U8()],'w'), Case('y',[U8()]), Case('z',[U8()],'x')]) +test(t, [0, 42], {'w':[42]}) +test(t, [1, 42], {'x|w':[42]}) +test(t, [2, 42], {'y':[42]}) +test(t, [3, 42], {'z|x|w':[42]}) +t2 = Variant([Case('w',[U8()])]) +test(t, [0, 42], {'w':[42]}, lower_t=t2, lower_v={'w':[42]}) +test(t, [1, 42], {'x|w':[42]}, lower_t=t2, lower_v={'w':[42]}) +test(t, [3, 42], {'z|x|w':[42]}, lower_t=t2, lower_v={'w':[42]}) def test_pairs(t, pairs): for arg,expect in pairs: @@ -162,7 +162,7 @@ def test_pairs(t, pairs): test_pairs(Float64(), [(3.14,3.14)]) test_pairs(Char(), [(0,'\x00'), (65,'A'), (0xD7FF,'\uD7FF'), (0xD800,None), (0xDFFF,None)]) test_pairs(Char(), [(0xE000,'\uE000'), (0x10FFFF,'\U0010FFFF'), (0x110000,None), (0xFFFFFFFF,None)]) -test_pairs(Enum(['a','b']), [(0,{'a':{}}), (1,{'b':{}}), (2,None)]) +test_pairs(Enum(['a','b']), [(0,{'a':[]}), (1,{'b':[]}), (2,None)]) def test_nan32(inbits, outbits): f = lift_flat(Opts(), ValueIter([Value('f32', reinterpret_i32_as_float(inbits))]), Float32()) @@ -240,7 +240,7 @@ def test_heap(t, expect, args, byte_array): opts = mk_opts(heap.memory, 'utf8', None, None) test(t, args, expect, opts) -test_heap(List(Unit()), [{},{},{}], [0,3], []) +test_heap(List(Record([])), [{},{},{}], [0,3], []) test_heap(List(Bool()), [True,False,True], [0,3], [1,0,1]) test_heap(List(Bool()), [True,False,True], [0,3], [1,0,2]) test_heap(List(Bool()), [True,False,True], [3,3], [0xff,0xff,0xff, 1,0,1]) @@ -274,14 +274,14 @@ def test_heap(t, expect, args, byte_array): [6,0, 7, 0x0ff, 8,0, 9, 0xff]) test_heap(List(Tuple([Tuple([U16(),U8()]),U8()])), [mk_tup([4,5],6),mk_tup([7,8],9)], [0,2], [4,0, 5,0xff, 6,0xff, 7,0, 8,0xff, 9,0xff]) -test_heap(List(Union([Unit(),U8(),Tuple([U8(),U16()])])), [{'0':{}}, {'1':42}, {'2':mk_tup(6,7)}], [0,3], +test_heap(List(Union([Tuple([]),U8(),Tuple([U8(),U16()])])), [{'0':[{}]}, {'1':[42]}, {'2':[mk_tup(6,7)]}], [0,3], [0,0xff,0xff,0xff,0xff,0xff, 1,0xff,42,0xff,0xff,0xff, 2,0xff,6,0xff,7,0]) -test_heap(List(Union([U32(),U8()])), [{'0':256}, {'1':42}], [0,2], +test_heap(List(Union([U32(),U8()])), [{'0':[256]}, {'1':[42]}], [0,2], [0,0xff,0xff,0xff,0,1,0,0, 1,0xff,0xff,0xff,42,0xff,0xff,0xff]) test_heap(List(Tuple([Union([U8(),Tuple([U16(),U8()])]),U8()])), - [mk_tup({'1':mk_tup(5,6)},7),mk_tup({'0':8},9)], [0,2], + [mk_tup({'1':[mk_tup(5,6)]},7),mk_tup({'0':[8]},9)], [0,2], [1,0xff,5,0,6,0xff,7,0xff, 0,0xff,8,0xff,0xff,0xff,9,0xff]) -test_heap(List(Union([U8()])), [{'0':6},{'0':7},{'0':8}], [0,3], +test_heap(List(Union([U8()])), [{'0':[6]},{'0':[7]},{'0':[8]}], [0,3], [0,6, 0,7, 0,8]) t = List(Flags(['a','b'])) test_heap(t, [{'a':False,'b':False},{'a':False,'b':True},{'a':True,'b':True}], [0,3], @@ -324,19 +324,20 @@ def test_flatten(t, params, results): got = flatten(t, 'lower') assert(got == expect) -test_flatten(Func([U8(),Float32(),Float64()],Unit()), ['i32','f32','f64'], []) -test_flatten(Func([U8(),Float32(),Float64()],Float32()), ['i32','f32','f64'], ['f32']) -test_flatten(Func([U8(),Float32(),Float64()],U8()), ['i32','f32','f64'], ['i32']) -test_flatten(Func([U8(),Float32(),Float64()],Tuple([Float32()])), ['i32','f32','f64'], ['f32']) -test_flatten(Func([U8(),Float32(),Float64()],Tuple([Float32(),Float32()])), ['i32','f32','f64'], ['f32','f32']) -test_flatten(Func([U8() for _ in range(17)],Unit()), ['i32' for _ in range(17)], []) -test_flatten(Func([U8() for _ in range(17)],Tuple([U8(),U8()])), ['i32' for _ in range(17)], ['i32','i32']) +test_flatten(Func([U8(),Float32(),Float64()],[]), ['i32','f32','f64'], []) +test_flatten(Func([U8(),Float32(),Float64()],[Float32()]), ['i32','f32','f64'], ['f32']) +test_flatten(Func([U8(),Float32(),Float64()],[U8()]), ['i32','f32','f64'], ['i32']) +test_flatten(Func([U8(),Float32(),Float64()],[Tuple([Float32()])]), ['i32','f32','f64'], ['f32']) +test_flatten(Func([U8(),Float32(),Float64()],[Tuple([Float32(),Float32()])]), ['i32','f32','f64'], ['f32','f32']) +test_flatten(Func([U8(),Float32(),Float64()],[Float32(),Float32()]), ['i32','f32','f64'], ['f32','f32']) +test_flatten(Func([U8() for _ in range(17)],[]), ['i32' for _ in range(17)], []) +test_flatten(Func([U8() for _ in range(17)],[Tuple([U8(),U8()])]), ['i32' for _ in range(17)], ['i32','i32']) def test_roundtrip(t, v): before = definitions.MAX_FLAT_RESULTS definitions.MAX_FLAT_RESULTS = 16 - ft = Func([t],t) + ft = Func([t],[t]) callee_instance = Instance() callee = lambda x: x @@ -363,6 +364,6 @@ def test_roundtrip(t, v): test_roundtrip(Tuple([U16(),U16()]), mk_tup(3,4)) test_roundtrip(List(String()), [mk_str("hello there")]) test_roundtrip(List(List(String())), [[mk_str("one"),mk_str("two")],[mk_str("three")]]) -test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':mk_tup(mk_str("answer"),42)}]) +test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':[mk_tup(mk_str("answer"),42)]}]) print("All tests passed") From 2cf4cd186bff5700a7e44ef08993e32047cafb05 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 14 Jul 2022 18:59:06 -0500 Subject: [PATCH 096/301] Add missing * to result type list --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 4be4e37..efa1da0 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -452,7 +452,7 @@ defvaltype ::= bool | (enum +) | (union +) | (option ) - | (result (error *)?) + | (result * (error *)?) valtype ::= | functype ::= (func ) From e91bcca259ff044e60d0fd0411da2e6b14134bf4 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 15 Jul 2022 13:53:24 -0500 Subject: [PATCH 097/301] Fix bug in binary encoding of case Co-authored-by: Peter Huene --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 906a6f1..eff1476 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -181,7 +181,7 @@ defvaltype ::= pvt: => pvt | 0x6a t*:vec() u*:vec() => (result t* (error u*)) namedtype ::= n: t: => (field n t) case ::= nt*:vec() 0x0 => (case nt*) - | nt*:vec() 0x1 i: => (case nt* (refines case-label[i])) + | n: t*:vec() 0x1 i: => (case n t* (refines case-label[i])) valtype ::= i: => i | pvt: => pvt functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) From 95abbc874af8fecac6e78f7ad2edc776d0ac4759 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 15 Jul 2022 15:24:08 -0500 Subject: [PATCH 098/301] Fix the other case of case Co-authored-by: Peter Huene --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index eff1476..85e0514 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -180,7 +180,7 @@ defvaltype ::= pvt: => pvt | 0x6b t: => (option t) | 0x6a t*:vec() u*:vec() => (result t* (error u*)) namedtype ::= n: t: => (field n t) -case ::= nt*:vec() 0x0 => (case nt*) +case ::= n: t*:vec() 0x0 => (case n t*) | n: t*:vec() 0x1 i: => (case n t* (refines case-label[i])) valtype ::= i: => i | pvt: => pvt From fc877eed110626791fbfc8182ce47c8f8cf32ff5 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 16 Jul 2022 15:11:49 -0500 Subject: [PATCH 099/301] Add missing > --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index efa1da0..9e7bffb 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -459,7 +459,7 @@ functype ::= (func ) paramlist ::= (param )* | (param ) resultlist ::= (result )* - | (result ) componenttype ::= (component *) instancetype ::= (instance *) componentdecl ::= From 4c19f5ed4bb9671d4aa4015dc5e80ec7f91d3c9c Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 16 Jul 2022 16:32:26 -0500 Subject: [PATCH 100/301] Switch variant/result case payloads from T* to T? --- design/mvp/Binary.md | 108 +++++++++--------- design/mvp/CanonicalABI.md | 139 +++++++++++------------ design/mvp/Explainer.md | 18 +-- design/mvp/canonical-abi/definitions.py | 145 ++++++++++++------------ design/mvp/canonical-abi/run_tests.py | 80 ++++++------- 5 files changed, 241 insertions(+), 249 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 85e0514..ae79921 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -151,59 +151,61 @@ Notes: means that the maximum `ct` in an MVP `alias` declarator is `1`. ``` -type ::= dt: => (type dt) -deftype ::= dvt: => dvt - | ft: => ft - | ct: => ct - | it: => it -primvaltype ::= 0x7f => bool - | 0x7e => s8 - | 0x7d => u8 - | 0x7c => s16 - | 0x7b => u16 - | 0x7a => s32 - | 0x79 => u32 - | 0x78 => s64 - | 0x77 => u64 - | 0x76 => float32 - | 0x75 => float64 - | 0x74 => char - | 0x73 => string -defvaltype ::= pvt: => pvt - | 0x72 nt*:vec() => (record (field nt)*) - | 0x71 case*:vec() => (variant case*) - | 0x70 t: => (list t) - | 0x6f t*:vec() => (tuple t*) - | 0x6e n*:vec() => (flags n*) - | 0x6d n*:vec() => (enum n*) - | 0x6c t*:vec() => (union t*) - | 0x6b t: => (option t) - | 0x6a t*:vec() u*:vec() => (result t* (error u*)) -namedtype ::= n: t: => (field n t) -case ::= n: t*:vec() 0x0 => (case n t*) - | n: t*:vec() 0x1 i: => (case n t* (refines case-label[i])) -valtype ::= i: => i - | pvt: => pvt -functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) -prlist ::= 0x00 t: => [t] - | 0x01 nt*:vec() => nt* -componenttype ::= 0x41 cd*:vec() => (component cd*) -instancetype ::= 0x42 id*:vec() => (instance id*) -componentdecl ::= 0x03 id: => id - | id: => id -instancedecl ::= 0x00 t: => t - | 0x01 t: => t - | 0x02 a: => a - | 0x04 ed: => ed -importdecl ::= n: ed: => (import n ed) -exportdecl ::= n: ed: => (export n ed) -externdesc ::= 0x00 0x11 i: => (core module (type i)) - | 0x01 i: => (func (type i)) - | 0x02 t: => (value t) - | 0x03 b: => (type b) - | 0x04 i: => (instance (type i)) - | 0x05 i: => (component (type i)) -typebound ::= 0x00 i: => (eq i) +type ::= dt: => (type dt) +deftype ::= dvt: => dvt + | ft: => ft + | ct: => ct + | it: => it +primvaltype ::= 0x7f => bool + | 0x7e => s8 + | 0x7d => u8 + | 0x7c => s16 + | 0x7b => u16 + | 0x7a => s32 + | 0x79 => u32 + | 0x78 => s64 + | 0x77 => u64 + | 0x76 => float32 + | 0x75 => float64 + | 0x74 => char + | 0x73 => string +defvaltype ::= pvt: => pvt + | 0x72 nt*:vec() => (record (field nt)*) + | 0x71 case*:vec() => (variant case*) + | 0x70 t: => (list t) + | 0x6f t*:vec() => (tuple t*) + | 0x6e n*:vec() => (flags n*) + | 0x6d n*:vec() => (enum n*) + | 0x6c t*:vec() => (union t*) + | 0x6b t: => (option t) + | 0x6a t?: u?: => (result t? (error u)?) +namedvaltype ::= n: t: => n t +case ::= n: t?: 0x0 => (case n t?) + | n: t?: 0x1 i: => (case n t? (refines case-label[i])) +casetype ::= 0x00 => + | 0x01 t: => t +valtype ::= i: => i + | pvt: => pvt +functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) +prlist ::= 0x00 t: => [t] + | 0x01 nt*:vec() => nt* +componenttype ::= 0x41 cd*:vec() => (component cd*) +instancetype ::= 0x42 id*:vec() => (instance id*) +componentdecl ::= 0x03 id: => id + | id: => id +instancedecl ::= 0x00 t: => t + | 0x01 t: => t + | 0x02 a: => a + | 0x04 ed: => ed +importdecl ::= n: ed: => (import n ed) +exportdecl ::= n: ed: => (export n ed) +externdesc ::= 0x00 0x11 i: => (core module (type i)) + | 0x01 i: => (func (type i)) + | 0x02 t: => (value t) + | 0x03 b: => (type b) + | 0x04 i: => (instance (type i)) + | 0x05 i: => (component (type i)) +typebound ::= 0x00 i: => (eq i) ``` Notes: * The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index c1771e3..bd62b31 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -69,9 +69,9 @@ function to replace specialized value types with their expansion: def despecialize(t): match t: case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) - case Union(ts) : return Variant([ Case(str(i), [t]) for i,t in enumerate(ts) ]) - case Enum(labels) : return Variant([ Case(l, []) for l in labels ]) - case Option(t) : return Variant([ Case("none", []), Case("some", [t]) ]) + case Union(ts) : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ]) + case Enum(labels) : return Variant([ Case(l, None) for l in labels ]) + case Option(t) : return Variant([ Case("none", None), Case("some", t) ]) case Result(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) case _ : return t ``` @@ -97,20 +97,17 @@ def alignment(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 4 - case Record(fields) : return alignment_tuple(field_types(fields)) + case Record(fields) : return alignment_record(fields) case Variant(cases) : return alignment_variant(cases) case Flags(labels) : return alignment_flags(labels) ``` Record alignment is tuple alignment, with the definitions split for reuse below: ```python -def field_types(fields): - return [f.t for f in fields] - -def alignment_tuple(ts): +def alignment_record(fields): a = 1 - for t in ts: - a = max(a, alignment(t)) + for f in fields: + a = max(a, alignment(f.t)) return a ``` @@ -134,7 +131,8 @@ def discriminant_type(cases): def max_case_alignment(cases): a = 1 for c in cases: - a = max(a, alignment_tuple(c.ts)) + if c.t is not None: + a = max(a, alignment(c.t)) return a ``` @@ -166,16 +164,16 @@ def size(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 8 - case Record(fields) : return size_tuple(field_types(fields)) + case Record(fields) : return size_record(fields) case Variant(cases) : return size_variant(cases) case Flags(labels) : return size_flags(labels) -def size_tuple(ts): +def size_record(fields): s = 0 - for t in ts: - s = align_to(s, alignment(t)) - s += size(t) - return align_to(s, alignment_tuple(ts)) + for f in fields: + s = align_to(s, alignment(f.t)) + s += size(f.t) + return align_to(s, alignment_record(fields)) def align_to(ptr, alignment): return math.ceil(ptr / alignment) * alignment @@ -185,7 +183,8 @@ def size_variant(cases): s = align_to(s, max_case_alignment(cases)) cs = 0 for c in cases: - cs = max(cs, size_tuple(c.ts)) + if c.t is not None: + cs = max(cs, size(c.t)) s += cs return align_to(s, alignment_variant(cases)) @@ -367,18 +366,21 @@ string operations. ```python def load_variant(opts, ptr, cases): disc_size = size(discriminant_type(cases)) - disc = load_int(opts, ptr, disc_size) + case_index = load_int(opts, ptr, disc_size) ptr += disc_size - trap_if(disc >= len(cases)) - case = cases[disc] + trap_if(case_index >= len(cases)) + c = cases[case_index] ptr = align_to(ptr, max_case_alignment(cases)) - return { case_label_with_refinements(case, cases): load_tuple(opts, ptr, case.ts) } - -def case_label_with_refinements(case, cases): - label = case.label - while case.refines is not None: - case = cases[find_case(case.refines, cases)] - label += '|' + case.label + case_label = case_label_with_refinements(c, cases) + if c.t is None: + return { case_label: None } + return { case_label: load(opts, ptr, c.t) } + +def case_label_with_refinements(c, cases): + label = c.label + while c.refines is not None: + c = cases[find_case(c.refines, cases)] + label += '|' + c.label return label def find_case(label, cases): @@ -387,14 +389,6 @@ def find_case(label, cases): if len(matches) == 1: return matches[0] return -1 - -def load_tuple(opts, ptr, ts): - a = [] - for t in ts: - ptr = align_to(ptr, alignment(t)) - a.append(load(opts, ptr, t)) - ptr += size(t) - return a ``` Finally, flags are converted from a bit-vector to a dictionary whose keys are @@ -695,7 +689,9 @@ def store_variant(opts, v, ptr, cases): store_int(opts, case_index, ptr, disc_size) ptr += disc_size ptr = align_to(ptr, max_case_alignment(cases)) - store_tuple(opts, case_value, ptr, cases[case_index].ts) + c = cases[case_index] + if c.t is not None: + store(opts, case_value, c.t, ptr) def match_case(v, cases): assert(len(v.keys()) == 1) @@ -705,12 +701,6 @@ def match_case(v, cases): case_index = find_case(label, cases) if case_index != -1: return (case_index, value) - -def store_tuple(opts, v, ptr, ts): - for i,t in enumerate(ts): - ptr = align_to(ptr, alignment(t)) - store(opts, v[i], t, ptr) - ptr += size(t) ``` Finally, flags are converted from a dictionary to a bit-vector by iterating @@ -765,11 +755,11 @@ MAX_FLAT_PARAMS = 16 MAX_FLAT_RESULTS = 1 def flatten(functype, context): - flat_params = flatten_tuple(functype.params) + flat_params = flatten_types(functype.params) if len(flat_params) > MAX_FLAT_PARAMS: flat_params = ['i32'] - flat_results = flatten_tuple(functype.results) + flat_results = flatten_types(functype.results) if len(flat_results) > MAX_FLAT_RESULTS: match context: case 'lift': @@ -780,7 +770,7 @@ def flatten(functype, context): return { 'params': flat_params, 'results': flat_results } -def flatten_tuple(ts): +def flatten_types(ts): return [ft for t in ts for ft in flatten_type(t)] ``` @@ -797,11 +787,20 @@ def flatten_type(t): case Float64() : return ['f64'] case Char() : return ['i32'] case String() | List(_) : return ['i32', 'i32'] - case Record(fields) : return flatten_tuple(field_types(fields)) + case Record(fields) : return flatten_record(fields) case Variant(cases) : return flatten_variant(cases) case Flags(labels) : return ['i32'] * num_i32_flags(labels) ``` +Record flattening simply flattens each field in sequence. +```python +def flatten_record(fields): + flat = [] + for f in fields: + flat += flatten_type(f.t) + return flat +``` + Variant flattening is more involved due to the fact that each case payload can have a totally different flattening. Rather than giving up when there is a type mismatch, the Canonical ABI relies on the fact that the 4 core value types can @@ -814,11 +813,12 @@ an `i32` into an `i64`. def flatten_variant(cases): flat = [] for c in cases: - for i,ft in enumerate(flatten_tuple(c.ts)): - if i < len(flat): - flat[i] = join(flat[i], ft) - else: - flat.append(ft) + if c.t is not None: + for i,ft in enumerate(flatten_type(c.t)): + if i < len(flat): + flat[i] = join(flat[i], ft) + else: + flat.append(ft) return flatten_type(discriminant_type(cases)) + flat def join(a, b): @@ -927,9 +927,8 @@ high bits of an `i64` are set for a 32-bit type: def lift_flat_variant(opts, vi, cases): flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') - disc = vi.next('i32') - trap_if(disc >= len(cases)) - case = cases[disc] + case_index = vi.next('i32') + trap_if(case_index >= len(cases)) class CoerceValueIter: def next(self, want): have = flat_types.pop(0) @@ -940,20 +939,18 @@ def lift_flat_variant(opts, vi, cases): case ('i64', 'f32') : return reinterpret_i32_as_float(wrap_i64_to_i32(x)) case ('i64', 'f64') : return reinterpret_i64_as_float(x) case _ : return x - v = lift_flat_tuple(opts, CoerceValueIter(), case.ts) + c = cases[case_index] + if c.t is None: + v = None + else: + v = lift_flat(opts, CoerceValueIter(), c.t) for have in flat_types: _ = vi.next(have) - return { case_label_with_refinements(case, cases): v } + return { case_label_with_refinements(c, cases): v } def wrap_i64_to_i32(i): assert(0 <= i < (1 << 64)) return i % (1 << 32) - -def lift_flat_tuple(opts, vi, ts): - a = [] - for t in ts: - a.append(lift_flat(opts, vi, t)) - return a ``` Finally, flags are lifted by OR-ing together all the flattened `i32` values @@ -1038,7 +1035,11 @@ def lower_flat_variant(opts, v, cases): case_index, case_value = match_case(v, cases) flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') - payload = lower_flat_tuple(opts, case_value, cases[case_index].ts) + c = cases[case_index] + if c.t is None: + payload = [] + else: + payload = lower_flat(opts, case_value, c.t) for i,have in enumerate(payload): want = flat_types.pop(0) match (have.t, want): @@ -1050,12 +1051,6 @@ def lower_flat_variant(opts, v, cases): for want in flat_types: payload.append(Value(want, 0)) return [Value('i32', case_index)] + payload - -def lower_flat_tuple(opts, v, ts): - flat = [] - for i,t in enumerate(ts): - flat += lower_flat(opts, v[i], t) - return flat ``` Finally, flags are lowered by slicing the bit vector into `i32` chunks: @@ -1077,7 +1072,7 @@ parameters or results given by the `ValueIter` `vi` into a tuple of values with types `ts`: ```python def lift(opts, max_flat, vi, ts): - flat_types = flatten_tuple(ts) + flat_types = flatten_types(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') tuple_type = Tuple(ts) @@ -1094,7 +1089,7 @@ greater-than-`max_flat` case by either allocating storage with `realloc` or accepting a caller-allocated buffer as an out-param: ```python def lower(opts, max_flat, vs, ts, out_param = None): - flat_types = flatten_tuple(ts) + flat_types = flatten_types(ts) if len(flat_types) > max_flat: tuple_type = Tuple(functype.params) tuple_value = {str(i): v for i,v in enumerate(vs)} diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 9e7bffb..4ac3720 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -445,14 +445,14 @@ defvaltype ::= bool | float32 | float64 | char | string | (record (field )*) - | (variant (case ? * (refines )?)+) + | (variant (case ? ? (refines )?)+) | (list ) | (tuple *) | (flags *) | (enum +) | (union +) | (option ) - | (result * (error *)?) + | (result ? (error )?) valtype ::= | functype ::= (func ) @@ -515,13 +515,13 @@ some `case` in the supertype. The sets of values allowed for the remaining *specialized value types* are defined by the following mapping: ``` - (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... - (flags *) ↦ (record (field bool)*) - (enum +) ↦ (variant (case )+) - (option ) ↦ (variant (case "none") (case "some" )) - (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... -(result * (error *)?) ↦ (variant (case "ok" *) (case "error" *)) - string ↦ (list char) + (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... + (flags *) ↦ (record (field bool)*) + (enum +) ↦ (variant (case )+) + (option ) ↦ (variant (case "none") (case "some" )) + (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... +(result ? (error )?) ↦ (variant (case "ok" ?) (case "error" ?)) + string ↦ (list char) ``` Note that, at least initially, variants are required to have a non-empty list of cases. This could be relaxed in the future to allow an empty list of cases, with diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index e7f4608..6caf2e2 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -8,6 +8,7 @@ import struct import types from dataclasses import dataclass +from typing import Optional class Trap(BaseException): pass class CoreWebAssemblyException(BaseException): pass @@ -58,7 +59,7 @@ class Flags(ValType): @dataclass class Case: label: str - ts: [ValType] + t: Optional[ValType] refines: str = None @dataclass @@ -79,8 +80,8 @@ class Option(ValType): @dataclass class Result(ValType): - ok: [ValType] - error: [ValType] + ok: Optional[ValType] + error: Optional[ValType] @dataclass class Func: @@ -92,9 +93,9 @@ class Func: def despecialize(t): match t: case Tuple(ts) : return Record([ Field(str(i), t) for i,t in enumerate(ts) ]) - case Union(ts) : return Variant([ Case(str(i), [t]) for i,t in enumerate(ts) ]) - case Enum(labels) : return Variant([ Case(l, []) for l in labels ]) - case Option(t) : return Variant([ Case("none", []), Case("some", [t]) ]) + case Union(ts) : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ]) + case Enum(labels) : return Variant([ Case(l, None) for l in labels ]) + case Option(t) : return Variant([ Case("none", None), Case("some", t) ]) case Result(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) case _ : return t @@ -111,19 +112,16 @@ def alignment(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 4 - case Record(fields) : return alignment_tuple(field_types(fields)) + case Record(fields) : return alignment_record(fields) case Variant(cases) : return alignment_variant(cases) case Flags(labels) : return alignment_flags(labels) # -def field_types(fields): - return [f.t for f in fields] - -def alignment_tuple(ts): +def alignment_record(fields): a = 1 - for t in ts: - a = max(a, alignment(t)) + for f in fields: + a = max(a, alignment(f.t)) return a # @@ -143,7 +141,8 @@ def discriminant_type(cases): def max_case_alignment(cases): a = 1 for c in cases: - a = max(a, alignment_tuple(c.ts)) + if c.t is not None: + a = max(a, alignment(c.t)) return a # @@ -167,16 +166,16 @@ def size(t): case Float64() : return 8 case Char() : return 4 case String() | List(_) : return 8 - case Record(fields) : return size_tuple(field_types(fields)) + case Record(fields) : return size_record(fields) case Variant(cases) : return size_variant(cases) case Flags(labels) : return size_flags(labels) -def size_tuple(ts): +def size_record(fields): s = 0 - for t in ts: - s = align_to(s, alignment(t)) - s += size(t) - return align_to(s, alignment_tuple(ts)) + for f in fields: + s = align_to(s, alignment(f.t)) + s += size(f.t) + return align_to(s, alignment_record(fields)) def align_to(ptr, alignment): return math.ceil(ptr / alignment) * alignment @@ -186,7 +185,8 @@ def size_variant(cases): s = align_to(s, max_case_alignment(cases)) cs = 0 for c in cases: - cs = max(cs, size_tuple(c.ts)) + if c.t is not None: + cs = max(cs, size(c.t)) s += cs return align_to(s, alignment_variant(cases)) @@ -328,18 +328,21 @@ def load_record(opts, ptr, fields): def load_variant(opts, ptr, cases): disc_size = size(discriminant_type(cases)) - disc = load_int(opts, ptr, disc_size) + case_index = load_int(opts, ptr, disc_size) ptr += disc_size - trap_if(disc >= len(cases)) - case = cases[disc] + trap_if(case_index >= len(cases)) + c = cases[case_index] ptr = align_to(ptr, max_case_alignment(cases)) - return { case_label_with_refinements(case, cases): load_tuple(opts, ptr, case.ts) } - -def case_label_with_refinements(case, cases): - label = case.label - while case.refines is not None: - case = cases[find_case(case.refines, cases)] - label += '|' + case.label + case_label = case_label_with_refinements(c, cases) + if c.t is None: + return { case_label: None } + return { case_label: load(opts, ptr, c.t) } + +def case_label_with_refinements(c, cases): + label = c.label + while c.refines is not None: + c = cases[find_case(c.refines, cases)] + label += '|' + c.label return label def find_case(label, cases): @@ -349,14 +352,6 @@ def find_case(label, cases): return matches[0] return -1 -def load_tuple(opts, ptr, ts): - a = [] - for t in ts: - ptr = align_to(ptr, alignment(t)) - a.append(load(opts, ptr, t)) - ptr += size(t) - return a - # def load_flags(opts, ptr, labels): @@ -580,7 +575,9 @@ def store_variant(opts, v, ptr, cases): store_int(opts, case_index, ptr, disc_size) ptr += disc_size ptr = align_to(ptr, max_case_alignment(cases)) - store_tuple(opts, case_value, ptr, cases[case_index].ts) + c = cases[case_index] + if c.t is not None: + store(opts, case_value, c.t, ptr) def match_case(v, cases): assert(len(v.keys()) == 1) @@ -591,12 +588,6 @@ def match_case(v, cases): if case_index != -1: return (case_index, value) -def store_tuple(opts, v, ptr, ts): - for i,t in enumerate(ts): - ptr = align_to(ptr, alignment(t)) - store(opts, v[i], t, ptr) - ptr += size(t) - # def store_flags(opts, v, ptr, labels): @@ -617,11 +608,11 @@ def pack_flags_into_int(v, labels): MAX_FLAT_RESULTS = 1 def flatten(functype, context): - flat_params = flatten_tuple(functype.params) + flat_params = flatten_types(functype.params) if len(flat_params) > MAX_FLAT_PARAMS: flat_params = ['i32'] - flat_results = flatten_tuple(functype.results) + flat_results = flatten_types(functype.results) if len(flat_results) > MAX_FLAT_RESULTS: match context: case 'lift': @@ -632,7 +623,7 @@ def flatten(functype, context): return { 'params': flat_params, 'results': flat_results } -def flatten_tuple(ts): +def flatten_types(ts): return [ft for t in ts for ft in flatten_type(t)] # @@ -647,20 +638,29 @@ def flatten_type(t): case Float64() : return ['f64'] case Char() : return ['i32'] case String() | List(_) : return ['i32', 'i32'] - case Record(fields) : return flatten_tuple(field_types(fields)) + case Record(fields) : return flatten_record(fields) case Variant(cases) : return flatten_variant(cases) case Flags(labels) : return ['i32'] * num_i32_flags(labels) # +def flatten_record(fields): + flat = [] + for f in fields: + flat += flatten_type(f.t) + return flat + +# + def flatten_variant(cases): flat = [] for c in cases: - for i,ft in enumerate(flatten_tuple(c.ts)): - if i < len(flat): - flat[i] = join(flat[i], ft) - else: - flat.append(ft) + if c.t is not None: + for i,ft in enumerate(flatten_type(c.t)): + if i < len(flat): + flat[i] = join(flat[i], ft) + else: + flat.append(ft) return flatten_type(discriminant_type(cases)) + flat def join(a, b): @@ -745,9 +745,8 @@ def lift_flat_record(opts, vi, fields): def lift_flat_variant(opts, vi, cases): flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') - disc = vi.next('i32') - trap_if(disc >= len(cases)) - case = cases[disc] + case_index = vi.next('i32') + trap_if(case_index >= len(cases)) class CoerceValueIter: def next(self, want): have = flat_types.pop(0) @@ -758,21 +757,19 @@ def next(self, want): case ('i64', 'f32') : return reinterpret_i32_as_float(wrap_i64_to_i32(x)) case ('i64', 'f64') : return reinterpret_i64_as_float(x) case _ : return x - v = lift_flat_tuple(opts, CoerceValueIter(), case.ts) + c = cases[case_index] + if c.t is None: + v = None + else: + v = lift_flat(opts, CoerceValueIter(), c.t) for have in flat_types: _ = vi.next(have) - return { case_label_with_refinements(case, cases): v } + return { case_label_with_refinements(c, cases): v } def wrap_i64_to_i32(i): assert(0 <= i < (1 << 64)) return i % (1 << 32) -def lift_flat_tuple(opts, vi, ts): - a = [] - for t in ts: - a.append(lift_flat(opts, vi, t)) - return a - # def lift_flat_flags(vi, labels): @@ -836,7 +833,11 @@ def lower_flat_variant(opts, v, cases): case_index, case_value = match_case(v, cases) flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') - payload = lower_flat_tuple(opts, case_value, cases[case_index].ts) + c = cases[case_index] + if c.t is None: + payload = [] + else: + payload = lower_flat(opts, case_value, c.t) for i,have in enumerate(payload): want = flat_types.pop(0) match (have.t, want): @@ -849,12 +850,6 @@ def lower_flat_variant(opts, v, cases): payload.append(Value(want, 0)) return [Value('i32', case_index)] + payload -def lower_flat_tuple(opts, v, ts): - flat = [] - for i,t in enumerate(ts): - flat += lower_flat(opts, v[i], t) - return flat - # def lower_flat_flags(v, labels): @@ -869,7 +864,7 @@ def lower_flat_flags(v, labels): ### Lifting and Lowering def lift(opts, max_flat, vi, ts): - flat_types = flatten_tuple(ts) + flat_types = flatten_types(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') tuple_type = Tuple(ts) @@ -881,7 +876,7 @@ def lift(opts, max_flat, vi, ts): # def lower(opts, max_flat, vs, ts, out_param = None): - flat_types = flatten_tuple(ts) + flat_types = flatten_types(ts) if len(flat_types) > max_flat: tuple_type = Tuple(functype.params) tuple_value = {str(i): v for i,v in enumerate(vs)} diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 7190343..cc8d3a5 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -2,11 +2,11 @@ from definitions import * def equal_modulo_string_encoding(s, t): + if s is None and t is None: + return True if isinstance(s, (bool,int,float,str)) and isinstance(t, (bool,int,float,str)): return s == t if isinstance(s, tuple) and isinstance(t, tuple): - if s == () and t == (): - return True assert(isinstance(s[0], str)) assert(isinstance(t[0], str)) return s[0] == t[0] @@ -101,45 +101,45 @@ def test_name(): test(t, [3], {'a':True,'b':True}) test(t, [4], {'a':False,'b':False}) test(Flags([str(i) for i in range(33)]), [0xffffffff,0x1], { str(i):True for i in range(33) }) -t = Variant([Case('x',[U8()]),Case('y',[Float32()]),Case('z',[])]) -test(t, [0,42], {'x': [42]}) -test(t, [0,256], {'x': [0]}) -test(t, [1,0x4048f5c3], {'y':[3.140000104904175]}) -test(t, [2,0xffffffff], {'z':[]}) +t = Variant([Case('x',U8()),Case('y',Float32()),Case('z',None)]) +test(t, [0,42], {'x': 42}) +test(t, [0,256], {'x': 0}) +test(t, [1,0x4048f5c3], {'y': 3.140000104904175}) +test(t, [2,0xffffffff], {'z': None}) t = Union([U32(),U64()]) -test(t, [0,42], {'0':[42]}) -test(t, [0,(1<<35)], {'0':[0]}) -test(t, [1,(1<<35)], {'1':[1<<35]}) +test(t, [0,42], {'0':42}) +test(t, [0,(1<<35)], {'0':0}) +test(t, [1,(1<<35)], {'1':(1<<35)}) t = Union([Float32(), U64()]) -test(t, [0,0x4048f5c3], {'0':[3.140000104904175]}) -test(t, [0,(1<<35)], {'0':[0]}) -test(t, [1,(1<<35)], {'1':[1<<35]}) +test(t, [0,0x4048f5c3], {'0': 3.140000104904175}) +test(t, [0,(1<<35)], {'0': 0}) +test(t, [1,(1<<35)], {'1': (1<<35)}) t = Union([Float64(), U64()]) -test(t, [0,0x40091EB851EB851F], {'0':[3.14]}) -test(t, [0,(1<<35)], {'0':[1.69759663277e-313]}) -test(t, [1,(1<<35)], {'1':[1<<35]}) +test(t, [0,0x40091EB851EB851F], {'0': 3.14}) +test(t, [0,(1<<35)], {'0': 1.69759663277e-313}) +test(t, [1,(1<<35)], {'1': (1<<35)}) t = Union([U8()]) -test(t, [0,42], {'0':[42]}) +test(t, [0,42], {'0':42}) test(t, [1,256], None) -test(t, [0,256], {'0':[0]}) +test(t, [0,256], {'0':0}) t = Union([Tuple([U8(),Float32()]), U64()]) -test(t, [0,42,3.14], {'0':[{'0':42, '1':3.14}]}) -test(t, [1,(1<<35),0], {'1':[1<<35]}) +test(t, [0,42,3.14], {'0': {'0':42, '1':3.14}}) +test(t, [1,(1<<35),0], {'1': (1<<35)}) t = Option(Float32()) -test(t, [0,3.14], {'none':[]}) -test(t, [1,3.14], {'some':[3.14]}) -t = Result([U8()],[U32()]) -test(t, [0, 42], {'ok':[42]}) -test(t, [1, 1000], {'error':[1000]}) -t = Variant([Case('w',[U8()]), Case('x',[U8()],'w'), Case('y',[U8()]), Case('z',[U8()],'x')]) -test(t, [0, 42], {'w':[42]}) -test(t, [1, 42], {'x|w':[42]}) -test(t, [2, 42], {'y':[42]}) -test(t, [3, 42], {'z|x|w':[42]}) -t2 = Variant([Case('w',[U8()])]) -test(t, [0, 42], {'w':[42]}, lower_t=t2, lower_v={'w':[42]}) -test(t, [1, 42], {'x|w':[42]}, lower_t=t2, lower_v={'w':[42]}) -test(t, [3, 42], {'z|x|w':[42]}, lower_t=t2, lower_v={'w':[42]}) +test(t, [0,3.14], {'none':None}) +test(t, [1,3.14], {'some':3.14}) +t = Result(U8(),U32()) +test(t, [0, 42], {'ok':42}) +test(t, [1, 1000], {'error':1000}) +t = Variant([Case('w',U8()), Case('x',U8(),'w'), Case('y',U8()), Case('z',U8(),'x')]) +test(t, [0, 42], {'w':42}) +test(t, [1, 42], {'x|w':42}) +test(t, [2, 42], {'y':42}) +test(t, [3, 42], {'z|x|w':42}) +t2 = Variant([Case('w',U8())]) +test(t, [0, 42], {'w':42}, lower_t=t2, lower_v={'w':42}) +test(t, [1, 42], {'x|w':42}, lower_t=t2, lower_v={'w':42}) +test(t, [3, 42], {'z|x|w':42}, lower_t=t2, lower_v={'w':42}) def test_pairs(t, pairs): for arg,expect in pairs: @@ -162,7 +162,7 @@ def test_pairs(t, pairs): test_pairs(Float64(), [(3.14,3.14)]) test_pairs(Char(), [(0,'\x00'), (65,'A'), (0xD7FF,'\uD7FF'), (0xD800,None), (0xDFFF,None)]) test_pairs(Char(), [(0xE000,'\uE000'), (0x10FFFF,'\U0010FFFF'), (0x110000,None), (0xFFFFFFFF,None)]) -test_pairs(Enum(['a','b']), [(0,{'a':[]}), (1,{'b':[]}), (2,None)]) +test_pairs(Enum(['a','b']), [(0,{'a':None}), (1,{'b':None}), (2,None)]) def test_nan32(inbits, outbits): f = lift_flat(Opts(), ValueIter([Value('f32', reinterpret_i32_as_float(inbits))]), Float32()) @@ -274,14 +274,14 @@ def test_heap(t, expect, args, byte_array): [6,0, 7, 0x0ff, 8,0, 9, 0xff]) test_heap(List(Tuple([Tuple([U16(),U8()]),U8()])), [mk_tup([4,5],6),mk_tup([7,8],9)], [0,2], [4,0, 5,0xff, 6,0xff, 7,0, 8,0xff, 9,0xff]) -test_heap(List(Union([Tuple([]),U8(),Tuple([U8(),U16()])])), [{'0':[{}]}, {'1':[42]}, {'2':[mk_tup(6,7)]}], [0,3], +test_heap(List(Union([Record([]),U8(),Tuple([U8(),U16()])])), [{'0':{}}, {'1':42}, {'2':mk_tup(6,7)}], [0,3], [0,0xff,0xff,0xff,0xff,0xff, 1,0xff,42,0xff,0xff,0xff, 2,0xff,6,0xff,7,0]) -test_heap(List(Union([U32(),U8()])), [{'0':[256]}, {'1':[42]}], [0,2], +test_heap(List(Union([U32(),U8()])), [{'0':256}, {'1':42}], [0,2], [0,0xff,0xff,0xff,0,1,0,0, 1,0xff,0xff,0xff,42,0xff,0xff,0xff]) test_heap(List(Tuple([Union([U8(),Tuple([U16(),U8()])]),U8()])), - [mk_tup({'1':[mk_tup(5,6)]},7),mk_tup({'0':[8]},9)], [0,2], + [mk_tup({'1':mk_tup(5,6)},7),mk_tup({'0':8},9)], [0,2], [1,0xff,5,0,6,0xff,7,0xff, 0,0xff,8,0xff,0xff,0xff,9,0xff]) -test_heap(List(Union([U8()])), [{'0':[6]},{'0':[7]},{'0':[8]}], [0,3], +test_heap(List(Union([U8()])), [{'0':6},{'0':7},{'0':8}], [0,3], [0,6, 0,7, 0,8]) t = List(Flags(['a','b'])) test_heap(t, [{'a':False,'b':False},{'a':False,'b':True},{'a':True,'b':True}], [0,3], @@ -364,6 +364,6 @@ def test_roundtrip(t, v): test_roundtrip(Tuple([U16(),U16()]), mk_tup(3,4)) test_roundtrip(List(String()), [mk_str("hello there")]) test_roundtrip(List(List(String())), [[mk_str("one"),mk_str("two")],[mk_str("three")]]) -test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':[mk_tup(mk_str("answer"),42)]}]) +test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':mk_tup(mk_str("answer"),42)}]) print("All tests passed") From 922436b46cc10e145fdbf10a2d60e28da4a357ab Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Jul 2022 13:21:14 -0500 Subject: [PATCH 101/301] Fix typo in comment Co-authored-by: Dan Gohman --- design/mvp/WIT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 6f28bfc..b76420b 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -59,7 +59,7 @@ comment ::= '//' character-that-isnt-a-newline* ``` There is a special type of comment called `documentation comment`. A -`doc-comment` is either a line comment preceded with `///` whichends at the next +`doc-comment` is either a line comment preceded with `///` which ends at the next newline (`\n`) character or it's a block comment which starts with `/**` and ends with `*/`. Note that block comments are allowed to be nested and their delimiters must be balanced From 6e35c8fa3adb8617ba070fbf94e6586f670f812b Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Jul 2022 13:29:12 -0500 Subject: [PATCH 102/301] Rebase onto add-wit and update WIT.md --- design/mvp/WIT.md | 33 +++++++++++++++++++-------------- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 6f28bfc..1bfe896 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -104,8 +104,7 @@ keyword ::= 'use' | 'string' | 'option' | 'list' - | 'expected' - | 'unit' + | 'result' | 'as' | 'from' | 'static' @@ -349,15 +348,15 @@ sleep: async func(ms: u64) Specifically functions have the structure: ```wit -func-item ::= id ':' 'async'? 'func' '(' func-args ')' func-ret +func-item ::= id ':' 'async'? 'func' func-tuple '->' func-tuple -func-args ::= func-arg - | func-arg ',' func-args? +func-tuple ::= ty + | '(' func-named-type-list ')' -func-arg ::= id ':' ty +func-named-type-list ::= nil + | func-named-type ( ',' func-named-type )* -func-ret ::= nil - | '->' ty +func-named-type ::= id ':' ty ``` ## Item: `resource` @@ -405,7 +404,7 @@ such as built-ins. For example: ```wit type number = u32 -type fallible-function-result = expected +type fallible-function-result = result type headers = list ``` @@ -418,11 +417,10 @@ ty ::= 'u8' | 'u16' | 'u32' | 'u64' | 'char' | 'bool' | 'string' - | 'unit' | tuple | list | option - | expected + | result | future | stream | id @@ -435,18 +433,25 @@ list ::= 'list' '<' ty '>' option ::= 'option' '<' ty '>' -expected ::= 'expected' '<' ty ',' ty '>' +result ::= 'result' '<' ty ',' ty '>' + | 'result' '<' '_' ',' ty '>' + | 'result' '<' ty '>' + | 'result' future ::= 'future' '<' ty '>' + | 'future' stream ::= 'stream' '<' ty ',' ty '>' + | 'stream' '<' '_' ',' ty '>' + | 'stream' '<' ty '>' + | 'stream' ``` The `tuple` type is semantically equivalent to a `record` with numerical fields, but it frequently can have language-specific meaning so it's provided as a first-class type. -Similarly the `option` and `expected` types are semantically equivalent to the +Similarly the `option` and `result` types are semantically equivalent to the variants: ```wit @@ -455,7 +460,7 @@ variant option { some(ty), } -variant expected { +variant result { ok(ok-ty) err(err-ty), } From 8f029a94451fedfdef1be5a6c5502290bd6ca9e4 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Jul 2022 13:34:36 -0500 Subject: [PATCH 103/301] Restore the optionality of arrow in function types returning nothing --- design/mvp/WIT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 1bfe896..bea069b 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -348,7 +348,7 @@ sleep: async func(ms: u64) Specifically functions have the structure: ```wit -func-item ::= id ':' 'async'? 'func' func-tuple '->' func-tuple +func-item ::= id ':' 'async'? 'func' func-tuple ( '->' func-tuple )? func-tuple ::= ty | '(' func-named-type-list ')' From 18e4642f9116f77cfdd35a795c6f16dbe18ba3cb Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Jul 2022 14:33:57 -0500 Subject: [PATCH 104/301] Make arrow mandatory in function types --- design/mvp/WIT.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index bea069b..884da5b 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -340,15 +340,15 @@ parameters, and results. Functions can optionally also be declared as `async` functions. ```wit -thunk: func() +thunk: func() -> () fibonacci: func(n: u32) -> u32 -sleep: async func(ms: u64) +sleep: async func(ms: u64) -> () ``` Specifically functions have the structure: ```wit -func-item ::= id ':' 'async'? 'func' func-tuple ( '->' func-tuple )? +func-item ::= id ':' 'async'? 'func' func-tuple '->' func-tuple func-tuple ::= ty | '(' func-named-type-list ')' @@ -481,9 +481,9 @@ by '-'s starts with a `XID_Start` scalar value with a zero Canonical Combining Class: ```wit -foo: func(bar: u32) +foo: func(bar: u32) -> () -red-green-blue: func(r: u32, g: u32, b: u32) +red-green-blue: func(r: u32, g: u32, b: u32) -> () ``` This form can't name identifiers which have the same name as wit keywords, so @@ -491,12 +491,12 @@ the second form is the same syntax with the same restrictions as the first, but prefixed with '%': ```wit -%foo: func(%bar: u32) +%foo: func(%bar: u32) -> () -%red-green-blue: func(%r: u32, %g: u32, %b: u32) +%red-green-blue: func(%r: u32, %g: u32, %b: u32) -> () // This form also supports identifiers that would otherwise be keywords. -%variant: func(%enum: s32) +%variant: func(%enum: s32) -> () ``` [kebab-case]: https://en.wikipedia.org/wiki/Letter_case#Kebab_case From e409eb6d886314411dbe12c3012b7a6b54170c3d Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 18 Jul 2022 18:20:51 -0500 Subject: [PATCH 105/301] Use 'funcvec' as common grammatical production name instead of 'prlist' or 'functuple' --- design/mvp/Binary.md | 4 ++-- design/mvp/WIT.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index ae79921..f1e659f 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -186,8 +186,8 @@ casetype ::= 0x00 => | 0x01 t: => t valtype ::= i: => i | pvt: => pvt -functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) -prlist ::= 0x00 t: => [t] +functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) +funcvec ::= 0x00 t: => [t] | 0x01 nt*:vec() => nt* componenttype ::= 0x41 cd*:vec() => (component cd*) instancetype ::= 0x42 id*:vec() => (instance id*) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 884da5b..0147ecb 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -348,10 +348,10 @@ sleep: async func(ms: u64) -> () Specifically functions have the structure: ```wit -func-item ::= id ':' 'async'? 'func' func-tuple '->' func-tuple +func-item ::= id ':' 'async'? 'func' func-vec '->' func-vec -func-tuple ::= ty - | '(' func-named-type-list ')' +func-vec ::= ty + | '(' func-named-type-list ')' func-named-type-list ::= nil | func-named-type ( ',' func-named-type )* From dfbb7218e02f3716272129eec31c0b619dc81490 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 5 Jul 2022 12:54:49 -0500 Subject: [PATCH 106/301] Actually define a "Canonical" ABI --- design/mvp/CanonicalABI.md | 343 +++++++++++++++++++++--- design/mvp/canonical-abi/definitions.py | 264 ++++++++++++++++-- design/mvp/canonical-abi/run_tests.py | 127 +++++++-- 3 files changed, 660 insertions(+), 74 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index bd62b31..041bcbd 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1,8 +1,7 @@ # Canonical ABI Explainer -This explainer walks through the Canonical ABI used by [canonical definitions] -to convert between high-level Component Model values and low-level Core -WebAssembly values. +This document walks defines the Canonical ABI used to convert between +high-level Component Model values and low-level Core WebAssembly values. * [Supporting definitions](#supporting-definitions) * [Despecialization](#Despecialization) @@ -13,10 +12,14 @@ WebAssembly values. * [Flattening](#flattening) * [Flat Lifting](#flat-lifting) * [Flat Lowering](#flat-lowering) - * [Lifting and Lowering](#lifting-and-lowering) + * [Lifting and Lowering Values](#lifting-and-lowering-values) + * [Lifting and Lowering Functions](#lifting-and-lowering-functions) * [Canonical definitions](#canonical-definitions) - * [`lift`](#lift) - * [`lower`](#lower) + * [`canon lift`](#canon-lift) + * [`canon lower`](#canon-lower) +* [Canonical ABI](#canonical-abi) + * [Canonical Module Type](#canonical-module-type) + * [Lifting Canonical Modules](#lifting-canonical-modules) ## Supporting definitions @@ -211,8 +214,8 @@ analysis: class Opts: string_encoding: str memory: bytearray - realloc: types.FunctionType - post_return: types.FunctionType + realloc: Callable[[int,int,int,int],int] + post_return: Callable[[],None] def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) @@ -754,12 +757,12 @@ Given all this, the top-level definition of `flatten` is: MAX_FLAT_PARAMS = 16 MAX_FLAT_RESULTS = 1 -def flatten(functype, context): - flat_params = flatten_types(functype.params) +def flatten_functype(ft, context): + flat_params = flatten_types(ft.param_types()) if len(flat_params) > MAX_FLAT_PARAMS: flat_params = ['i32'] - flat_results = flatten_types(functype.results) + flat_results = flatten_types(ft.result_types()) if len(flat_results) > MAX_FLAT_RESULTS: match context: case 'lift': @@ -768,7 +771,7 @@ def flatten(functype, context): flat_params += ['i32'] flat_results = [] - return { 'params': flat_params, 'results': flat_results } + return CoreFuncType(flat_params, flat_results) def flatten_types(ts): return [ft for t in ts for ft in flatten_type(t)] @@ -1065,13 +1068,13 @@ def lower_flat_flags(v, labels): return flat ``` -### Lifting and Lowering +### Lifting and Lowering Values -The `lift` function defines how to lift a list of at most `max_flat` core -parameters or results given by the `ValueIter` `vi` into a tuple of values with -types `ts`: +The `lift_values` function defines how to lift a list of at most `max_flat` +core parameters or results given by the `ValueIter` `vi` into a tuple of values +with types `ts`: ```python -def lift(opts, max_flat, vi, ts): +def lift_values(opts, max_flat, vi, ts): flat_types = flatten_types(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') @@ -1082,16 +1085,16 @@ def lift(opts, max_flat, vi, ts): return [ lift_flat(opts, vi, t) for t in ts ] ``` -The `lower` function defines how to lower a list of component-level values `vs` -of types `ts` into a list of at most `max_flat` core values. As already -described for [`flatten`](#flattening) above, lowering handles the +The `lower_values` function defines how to lower a list of component-level +values `vs` of types `ts` into a list of at most `max_flat` core values. As +already described for [`flatten`](#flattening) above, lowering handles the greater-than-`max_flat` case by either allocating storage with `realloc` or accepting a caller-allocated buffer as an out-param: ```python -def lower(opts, max_flat, vs, ts, out_param = None): +def lower_values(opts, max_flat, vs, ts, out_param = None): flat_types = flatten_types(ts) if len(flat_types) > max_flat: - tuple_type = Tuple(functype.params) + tuple_type = Tuple(ts) tuple_value = {str(i): v for i,v in enumerate(vs)} if out_param is None: ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) @@ -1107,19 +1110,22 @@ def lower(opts, max_flat, vs, ts, out_param = None): return flat_vals ``` -## Canonical ABI built-ins +## Canonical Definitions Using the above supporting definitions, we can describe the static and dynamic -semantics of [`canon`], whose AST is defined in the main explainer as: +semantics of component-level [`canon`] definitions, which have the following +AST (copied from the [explainer][Canonical Definitions]): ``` canon ::= (canon lift * (func ?)) | (canon lower * (core func ?)) ``` -The following subsections define the static and dynamic semantics of each -case of `funcbody`. +The following subsections cover each of these cases (which will soon be +extended to include [async](https://docs.google.com/presentation/d/1MNVOZ8hdofO3tI0szg_i-Yoy0N2QPU2C--LzVuoGSlE/edit#slide=id.g13600a23b7f_16_0) +and [resource/handle](https://github.com/alexcrichton/interface-types/blob/40f157ad429772c2b6a8b66ce7b4df01e83ae76d/proposals/interface-types/CanonicalABI.md#handle-intrinsics) +built-ins). -### `lift` +### `canon lift` For a function: ``` @@ -1163,7 +1169,7 @@ the outside world through an export. Given the above closure arguments, `canon_lift` is defined: ```python -def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_export): +def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export): if called_as_export: trap_if(not callee_instance.may_enter) callee_instance.may_enter = False @@ -1172,7 +1178,7 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_e assert(callee_instance.may_leave) callee_instance.may_leave = False - flat_args = lower(callee_opts, MAX_FLAT_PARAMS, args, functype.params) + flat_args = lower_values(callee_opts, MAX_FLAT_PARAMS, args, ft.param_types()) callee_instance.may_leave = True try: @@ -1180,7 +1186,7 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_e except CoreWebAssemblyException: trap() - results = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), functype.results) + results = lift_values(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) def post_return(): if callee_opts.post_return is not None: callee_opts.post_return(flat_results) @@ -1212,7 +1218,7 @@ that the caller of `canon_lift` *must* call `post_return` right after lowering actions after the lowering is complete. -### `lower` +### `canon lower` For a function: ``` @@ -1231,16 +1237,16 @@ Thus, from the perspective of Core WebAssembly, `$f` is a [function instance] containing a `hostfunc` that closes over `$opts`, `$inst`, `$callee` and `$ft` and, when called from Core WebAssembly code, calls `canon_lower`, which is defined as: ```python -def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): +def canon_lower(caller_opts, caller_instance, callee, ft, flat_args): trap_if(not caller_instance.may_leave) flat_args = ValueIter(flat_args) - args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) + args = lift_values(caller_opts, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) caller_instance.may_leave = False - flat_results = lower(caller_opts, MAX_FLAT_RESULTS, results, functype.results, flat_args) + flat_results = lower_values(caller_opts, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) caller_instance.may_leave = True post_return() @@ -1277,6 +1283,268 @@ the AOT compiler as requiring an intermediate copy to implement the above `lift`-then-`lower` semantics. +## Canonical ABI + +The above `canon` definitions are parameterized, giving each component a small +space of ABI options for interfacing with its contained core modules. Moreover, +each component can choose its ABI options independently of each other component, +with compiled adapter trampolines handling any conversions at cross-component +call boundaries. However, in some contexts, it is useful to fix a **single**, +"**canonical**" ABI that is fully determined by a given component type (which +itself is fully determined by a set of [`wit`](WIT.md) files). For example, +this allows existing Core WebAssembly toolchains to continue targeting [WASI] +by importing and exporting fixed Core Module functions signatures, without +having to add any new component-model concepts. + +To support these use cases, the following section defines two new mappings: +1. `canonical-module-type : componenttype -> core:moduletype` +2. `lift-canonical-module : core:module -> component` + +The `canonical-module-type` mapping defines the collection of core function +signatures that a core module must import and export to implement the given +component type via the Canonical ABI. + +The `lift-canonical-module` mapping defines the runtime behavior of a core +module that has successfully implemented `canonical-module-type` by fixing +a canonical set of ABI options that are passed to the above-defined `canon` +definitions. + +Together, these definitions are intended to satisfy the invariant: +``` +for all m : core:module, mt : core:moduletype, ct : componenttype: + m : mt and mt = canonical-module-type(ct) implies lift-canonical-module(m) : ct +``` +One consequence of this is that the canonical `core:moduletype` must encode +enough high-level type information for `lift-canonical-module` to be able to +reconstruct a working component. This is achieved using [name mangling]. Unlike +traditional C-family name mangling, which uses unreadable, space-efficient +mangling schemes to support millions of *internal* names, the Canonical ABI +only needs to mangle *external* names, of which there will only be a handful. +Therefore, squeezing out every byte is a lower concern and so, for simplicity +and readability, type information is mangled using a subset of the +[`wit`](WIT.md) syntax. + +One final point of note is that `lift-canonical-module` is only able to produce +a *subset* of all possible components (e.g., not covering nesting and +virtualization scenarios); to express the full variety of components, a +toolchain needs to emit proper components directly. Thus, the Canonical ABI +serves as an incremental adoption path to the full component model, allowing +existing Core WebAssembly toolchains to produce simple components simply by +emitting module imports and exports with the appropriate mangled names (e.g., +in LLVM using the [`import_name`] and [`export_name`] attributes). + + +### Canonical Module Type + +For the same reason that core module and component [binaries](Binary.md) +include a version number (that is intended to never change after it reaches +1.0), the Canonical ABI defines its own version that is explicitly declared by +a core module. Before reaching stable 1.0, the Canonical ABI is explicitly +allowed to make breaking changes, so this version also serves the purpose of +coordinating breaking changes in pre-1.0 tools and runtimes. +```python +CABI_VERSION = '0.1' +``` +Working top-down, a canonical module type is defined by the following mapping: +```python +def canonical_module_type(ct: ComponentType) -> ModuleType: + start_params, import_funcs = mangle_instances(ct.imports) + start_results, export_funcs = mangle_instances(ct.exports) + + imports = [] + for name,ft in import_funcs: + flat_ft = flatten_functype(ft, 'lower') + imports.append(CoreImportDecl('', mangle_funcname(name, ft), flat_ft)) + + exports = [] + exports.append(CoreExportDecl('_memory', CoreMemoryType(initial=0, maximum=None))) + exports.append(CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) + + start_ft = FuncType(start_params, start_results) + start_name = mangle_funcname('_start{cabi=' + CABI_VERSION + '}', start_ft) + exports.append(CoreExportDecl(start_name, flatten_functype(start_ft, 'lift'))) + + for name,ft in export_funcs: + flat_ft = flatten_functype(ft, 'lift') + exports.append(CoreExportDecl(mangle_funcname(name, ft), flat_ft)) + if any(contains_dynamic_allocation(t) for t in ft.results): + exports.append(CoreExportDecl('_post-' + name, CoreFuncType(flat_ft.results, []))) + + return ModuleType(imports, exports) + +def contains_dynamic_allocation(t): + match despecialize(t): + case String() : return True + case List(t) : return True + case Record(fields) : return any(contains_dynamic_allocation(f.t) for f in fields) + case Variant(cases) : return any(contains_dynamic_allocation(c.t) for c in cases) + case _ : return False +``` +This definition starts by mangling all nested instances into the names of the +leaf fields, so that instances can be subsequently ignored. Next, each +component-level function import/export is mapped to corresponding core function +import/export with the function type mangled into the name. Additionally, each +export whose return type implies possible dynamic allocation is given a +`post-return` function so that it can deallocate after the caller reads the +return value. Lastly, all value imports and exports are concatenated into a +synthetic `_start` function that is called immediately after instantiation. + +For imports (which in Core WebAssembly are [two-level]), the first-level name +is set to be a zero-length string so that the entire rest of the first-level +string space is available for [shared-everything linking]. + +For imports and exports, the Canonical ABI assumes that `_` is not a valid +first character in a component-level import/export (as is currently the case in +`wit` [identifiers](WIT.md#identifiers)) and thus can safely be used to prefix +auxiliary Canonical ABI-induced imports/exports. + +Instance-mangling recursively builds a dotted path string (of instance names) +that is included in the mangled core import/export name: +```python +def mangle_instances(xs, path = ''): + values = [] + funcs = [] + for x in xs: + name = path + x.name + match x.t: + case ValueType(t): + values.append( (name, t) ) + case FuncType(params,results): + funcs.append( (name, x.t) ) + case InstanceType(exports): + vs,fs = mangle_instances(exports, name + '.') + values += vs + funcs += fs + case TypeType(bounds): + assert(False) # TODO: resource types + case ComponentType(imports, exports): + assert(False) # TODO: `canon instantiate` + case ModuleType(imports, exports): + assert(False) # TODO: canonical shared-everything linking + return (values, funcs) +``` +The three `TODO` cases are intended to be filled in by future PRs extending +the Canonical ABI. + +Function and value types are recursively mangled into +[`wit`](WIT.md)-compatible syntax: +```python +def mangle_funcname(name, ft): + return '{name}: func {params} -> {results}'.format( + name = name, + params = mangle_funcvec(ft.params), + results = mangle_funcvec(ft.results)) + +def mangle_funcvec(es): + if len(es) == 1 and isinstance(es[0], ValType): + return mangle_valtype(es[0]) + assert(all(type(e) == tuple and len(e) == 2 for e in es)) + mangled_elems = (e[0] + ':' + mangle_valtype(e[1]) for e in es) + return '(' + ','.join(mangled_elems) + ')' + +def mangle_valtype(t): + match t: + case Bool() : return 'bool' + case S8() : return 's8' + case U8() : return 'u8' + case S16() : return 's16' + case U16() : return 'u16' + case S32() : return 's32' + case U32() : return 'u32' + case S64() : return 's64' + case U64() : return 'u64' + case Float32() : return 'float32' + case Float64() : return 'float64' + case Char() : return 'char' + case String() : return 'string' + case List(t) : return 'list<' + mangle_valtype(t) + '>' + case Record(fields) : return mangle_recordtype(fields) + case Tuple(ts) : return mangle_tupletype(ts) + case Flags(labels) : return mangle_flags(labels) + case Variant(cases) : return mangle_varianttype(cases) + case Enum(labels) : return mangle_enumtype(labels) + case Union(ts) : return mangle_uniontype(ts) + case Option(t) : return mangle_optiontype(t) + case Result(ok,error) : return mangle_resulttype(ok,error) + +def mangle_recordtype(fields): + mangled_fields = (f.label + ':' + mangle_valtype(f.t) for f in fields) + return 'record{' + ','.join(mangled_fields) + '}' + +def mangle_tupletype(ts): + return 'tuple<' + ','.join(mangle_valtype(t) for t in ts) + '>' + +def mangle_flags(labels): + return 'flags{' + ','.join(labels) + '}' + +def mangle_varianttype(cases): + mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) + return 'variant{' + ','.join(mangled_cases) + '}' + +def mangle_enumtype(labels): + return 'enum{' + ','.join(labels) + '}' + +def mangle_uniontype(ts): + return 'union{' + ','.join(mangle_valtype(t) for t in ts) + '}' + +def mangle_optiontype(t): + return 'option<' + mangle_valtype(t) + '>' + +def mangle_resulttype(ok, error): + return 'result<' + mangle_maybevaltype(ok) + ',' + mangle_maybevaltype(error) + '>' + +def mangle_maybevaltype(t): + if t is None: + return '_' + return mangle_valtype(t) +``` +As an example, given a component type: +```wasm +(component + (import "foo" (func)) + (import "a" (instance + (export "bar" (func (param "x" u32) (param "y" u32) (result u32))) + )) + (import "v1" (value string)) + (export "baz" (func (result string))) + (export "v2" (value list>)) +) +``` +the `canonical_module_type` would be: +```wasm +(module + (import "" "foo: func () -> ()" (func)) + (import "" "a.bar: func (x:u32,y:u32) -> u32" (func param i32 i32) (result i32)) + (export "_memory" (memory 0)) + (export "_realloc" (func (param i32 i32 i32 i32) (result i32))) + (export "_start{cabi=0.1}: func (v1:string) -> (v2:list>)" (func (param i32 i32) (result i32))) + (export "baz: func () -> string" (func (result i32))) + (export "_post-baz" (func (param i32))) +) +``` + +### Lifting Canonical Modules + +TODO + +```python +class Module: + t: ModuleType + instantiate: Callable[typing.List[typing.Tuple[str,str,Value]], typing.List[typing.Tuple[str,Value]]] + +class Component: + t: ComponentType + instantiate: Callable[typing.List[typing.Tuple[str,any]], typing.List[typing.Tuple[str,any]]] + +def lift_canonical_module(module: Module) -> Component: + # TODO: define component.instantiate by: + # 1. creating canonical import adapters + # 2. creating a core module instance that imports (1) + # 3. creating canonical export adapters from the exports of (2) + pass +``` + + [Canonical Definitions]: Explainer.md#canonical-definitions [`canonopt`]: Explainer.md#canonical-definitions @@ -1285,13 +1553,16 @@ the AOT compiler as requiring an intermediate copy to implement the above [Component Invariants]: Explainer.md#component-invariants [JavaScript Embedding]: Explainer.md#JavaScript-embedding [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions +[Shared-Everything Linking]: examples/SharedEverythingLinking.md [Administrative Instructions]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-instr-admin [Implementation Limits]: https://webassembly.github.io/spec/core/appendix/implementation.html [Function Instance]: https://webassembly.github.io/spec/core/exec/runtime.html#function-instances +[Two-level]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-import [Multi-value]: https://github.com/WebAssembly/multi-value/blob/master/proposals/multi-value/Overview.md [Exceptions]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md +[WASI]: https://github.com/webassembly/wasi [Alignment]: https://en.wikipedia.org/wiki/Data_structure_alignment [UTF-8]: https://en.wikipedia.org/wiki/UTF-8 @@ -1300,3 +1571,7 @@ the AOT compiler as requiring an intermediate copy to implement the above [Unicode Scalar Value]: https://unicode.org/glossary/#unicode_scalar_value [Unicode Code Point]: https://unicode.org/glossary/#code_point [Surrogate]: https://unicode.org/faq/utf_bom.html#utf16-2 +[Name Mangling]: https://en.wikipedia.org/wiki/Name_mangling + +[`import_name`]: https://clang.llvm.org/docs/AttributeReference.html#import-name +[`export_name`]: https://clang.llvm.org/docs/AttributeReference.html#export-name diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 6caf2e2..6b989f2 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -6,9 +6,10 @@ import math import struct -import types from dataclasses import dataclass +import typing from typing import Optional +from typing import Callable class Trap(BaseException): pass class CoreWebAssemblyException(BaseException): pass @@ -20,7 +21,80 @@ def trap_if(cond): if cond: raise Trap() -class ValType: pass +class Type: pass +class ValType(Type): pass +class ExternType(Type): pass +class CoreExternType(Type): pass + +@dataclass +class CoreImportDecl: + module: str + field: str + t: CoreExternType + +@dataclass +class CoreExportDecl: + name: str + t: CoreExternType + +@dataclass +class ModuleType(ExternType): + imports: [CoreImportDecl] + exports: [CoreExportDecl] + +@dataclass +class CoreFuncType(CoreExternType): + params: [str] + results: [str] + +@dataclass +class CoreMemoryType(CoreExternType): + initial: [int] + maximum: Optional[int] + +@dataclass +class ExternDecl: + name: str + t: ExternType + +@dataclass +class ComponentType(ExternType): + imports: [ExternDecl] + exports: [ExternDecl] + +@dataclass +class InstanceType(ExternType): + exports: [ExternDecl] + +@dataclass +class FuncType(ExternType): + params: [ValType|typing.Tuple[str,ValType]] + results: [ValType|typing.Tuple[str,ValType]] + def param_types(self): + return self.extract_types(self.params) + def result_types(self): + return self.extract_types(self.results) + def extract_types(self, vec): + if len(vec) == 0: + return [] + if isinstance(vec[0], ValType): + return vec + return [t for name,t in vec] + +@dataclass +class ValueType(ExternType): + t: ValType + +class Bounds: pass + +@dataclass +class Eq(Bounds): + t: Type + +@dataclass +class TypeType(ExternType): + bounds: Bounds + class Bool(ValType): pass class S8(ValType): pass class U8(ValType): pass @@ -83,11 +157,6 @@ class Result(ValType): ok: Optional[ValType] error: Optional[ValType] -@dataclass -class Func: - params: [ValType] - results: [ValType] - ### Despecialization def despecialize(t): @@ -204,8 +273,8 @@ def num_i32_flags(labels): class Opts: string_encoding: str memory: bytearray - realloc: types.FunctionType - post_return: types.FunctionType + realloc: Callable[[int,int,int,int],int] + post_return: Callable[[],None] def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) @@ -607,12 +676,12 @@ def pack_flags_into_int(v, labels): MAX_FLAT_PARAMS = 16 MAX_FLAT_RESULTS = 1 -def flatten(functype, context): - flat_params = flatten_types(functype.params) +def flatten_functype(ft, context): + flat_params = flatten_types(ft.param_types()) if len(flat_params) > MAX_FLAT_PARAMS: flat_params = ['i32'] - flat_results = flatten_types(functype.results) + flat_results = flatten_types(ft.result_types()) if len(flat_results) > MAX_FLAT_RESULTS: match context: case 'lift': @@ -621,7 +690,7 @@ def flatten(functype, context): flat_params += ['i32'] flat_results = [] - return { 'params': flat_params, 'results': flat_results } + return CoreFuncType(flat_params, flat_results) def flatten_types(ts): return [ft for t in ts for ft in flatten_type(t)] @@ -861,9 +930,9 @@ def lower_flat_flags(v, labels): assert(i == 0) return flat -### Lifting and Lowering +### Lifting and Lowering Values -def lift(opts, max_flat, vi, ts): +def lift_values(opts, max_flat, vi, ts): flat_types = flatten_types(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') @@ -875,10 +944,10 @@ def lift(opts, max_flat, vi, ts): # -def lower(opts, max_flat, vs, ts, out_param = None): +def lower_values(opts, max_flat, vs, ts, out_param = None): flat_types = flatten_types(ts) if len(flat_types) > max_flat: - tuple_type = Tuple(functype.params) + tuple_type = Tuple(ts) tuple_value = {str(i): v for i,v in enumerate(vs)} if out_param is None: ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) @@ -900,7 +969,7 @@ class Instance: may_enter = True # ... -def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_export): +def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export): if called_as_export: trap_if(not callee_instance.may_enter) callee_instance.may_enter = False @@ -909,7 +978,7 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_e assert(callee_instance.may_leave) callee_instance.may_leave = False - flat_args = lower(callee_opts, MAX_FLAT_PARAMS, args, functype.params) + flat_args = lower_values(callee_opts, MAX_FLAT_PARAMS, args, ft.param_types()) callee_instance.may_leave = True try: @@ -917,7 +986,7 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args, called_as_e except CoreWebAssemblyException: trap() - results = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), functype.results) + results = lift_values(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) def post_return(): if callee_opts.post_return is not None: callee_opts.post_return(flat_results) @@ -928,18 +997,167 @@ def post_return(): ### `lower` -def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): +def canon_lower(caller_opts, caller_instance, callee, ft, flat_args): trap_if(not caller_instance.may_leave) flat_args = ValueIter(flat_args) - args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params) + args = lift_values(caller_opts, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) caller_instance.may_leave = False - flat_results = lower(caller_opts, MAX_FLAT_RESULTS, results, functype.results, flat_args) + flat_results = lower_values(caller_opts, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) caller_instance.may_leave = True post_return() return flat_results + +### Canonical Module Type + +CABI_VERSION = '0.1' + +# + +def canonical_module_type(ct: ComponentType) -> ModuleType: + start_params, import_funcs = mangle_instances(ct.imports) + start_results, export_funcs = mangle_instances(ct.exports) + + imports = [] + for name,ft in import_funcs: + flat_ft = flatten_functype(ft, 'lower') + imports.append(CoreImportDecl('', mangle_funcname(name, ft), flat_ft)) + + exports = [] + exports.append(CoreExportDecl('_memory', CoreMemoryType(initial=0, maximum=None))) + exports.append(CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) + + start_ft = FuncType(start_params, start_results) + start_name = mangle_funcname('_start{cabi=' + CABI_VERSION + '}', start_ft) + exports.append(CoreExportDecl(start_name, flatten_functype(start_ft, 'lift'))) + + for name,ft in export_funcs: + flat_ft = flatten_functype(ft, 'lift') + exports.append(CoreExportDecl(mangle_funcname(name, ft), flat_ft)) + if any(contains_dynamic_allocation(t) for t in ft.results): + exports.append(CoreExportDecl('_post-' + name, CoreFuncType(flat_ft.results, []))) + + return ModuleType(imports, exports) + +def contains_dynamic_allocation(t): + match despecialize(t): + case String() : return True + case List(t) : return True + case Record(fields) : return any(contains_dynamic_allocation(f.t) for f in fields) + case Variant(cases) : return any(contains_dynamic_allocation(c.t) for c in cases) + case _ : return False + +# + +def mangle_instances(xs, path = ''): + values = [] + funcs = [] + for x in xs: + name = path + x.name + match x.t: + case ValueType(t): + values.append( (name, t) ) + case FuncType(params,results): + funcs.append( (name, x.t) ) + case InstanceType(exports): + vs,fs = mangle_instances(exports, name + '.') + values += vs + funcs += fs + case TypeType(bounds): + assert(False) # TODO: resource types + case ComponentType(imports, exports): + assert(False) # TODO: `canon instantiate` + case ModuleType(imports, exports): + assert(False) # TODO: canonical shared-everything linking + return (values, funcs) + +# + +def mangle_funcname(name, ft): + return '{name}: func {params} -> {results}'.format( + name = name, + params = mangle_funcvec(ft.params), + results = mangle_funcvec(ft.results)) + +def mangle_funcvec(es): + if len(es) == 1 and isinstance(es[0], ValType): + return mangle_valtype(es[0]) + assert(all(type(e) == tuple and len(e) == 2 for e in es)) + mangled_elems = (e[0] + ':' + mangle_valtype(e[1]) for e in es) + return '(' + ','.join(mangled_elems) + ')' + +# + +def mangle_valtype(t): + match t: + case Bool() : return 'bool' + case S8() : return 's8' + case U8() : return 'u8' + case S16() : return 's16' + case U16() : return 'u16' + case S32() : return 's32' + case U32() : return 'u32' + case S64() : return 's64' + case U64() : return 'u64' + case Float32() : return 'float32' + case Float64() : return 'float64' + case Char() : return 'char' + case String() : return 'string' + case List(t) : return 'list<' + mangle_valtype(t) + '>' + case Record(fields) : return mangle_recordtype(fields) + case Tuple(ts) : return mangle_tupletype(ts) + case Flags(labels) : return mangle_flags(labels) + case Variant(cases) : return mangle_varianttype(cases) + case Enum(labels) : return mangle_enumtype(labels) + case Union(ts) : return mangle_uniontype(ts) + case Option(t) : return mangle_optiontype(t) + case Result(ok,error) : return mangle_resulttype(ok,error) + +def mangle_recordtype(fields): + mangled_fields = (f.label + ':' + mangle_valtype(f.t) for f in fields) + return 'record{' + ','.join(mangled_fields) + '}' + +def mangle_tupletype(ts): + return 'tuple<' + ','.join(mangle_valtype(t) for t in ts) + '>' + +def mangle_flags(labels): + return 'flags{' + ','.join(labels) + '}' + +def mangle_varianttype(cases): + mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) + return 'variant{' + ','.join(mangled_cases) + '}' + +def mangle_enumtype(labels): + return 'enum{' + ','.join(labels) + '}' + +def mangle_uniontype(ts): + return 'union{' + ','.join(mangle_valtype(t) for t in ts) + '}' + +def mangle_optiontype(t): + return 'option<' + mangle_valtype(t) + '>' + +def mangle_resulttype(ok, error): + return 'result<' + mangle_maybevaltype(ok) + ',' + mangle_maybevaltype(error) + '>' + +def mangle_maybevaltype(t): + if t is None: + return '_' + return mangle_valtype(t) + +## Lifting Canonical Modules + +class Module: + t: ModuleType + instantiate: Callable[typing.List[typing.Tuple[str,str,Value]], typing.List[typing.Tuple[str,Value]]] + +class Component: + t: ComponentType + instantiate: Callable[typing.List[typing.Tuple[str,any]], typing.List[typing.Tuple[str,any]]] + +def lift_canonical_module(module: Module) -> Component: + pass # TODO diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index cc8d3a5..7a2b4a8 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -308,36 +308,36 @@ def test_heap(t, expect, args, byte_array): [0xff,0xff,0xff,0xff,0x3,0,0,0, 0,0,0,0,0,0,0,0]) def test_flatten(t, params, results): - expect = { 'params':params, 'results':results } + expect = CoreFuncType(params, results) if len(params) > definitions.MAX_FLAT_PARAMS: - expect['params'] = ['i32'] + expect.params = ['i32'] if len(results) > definitions.MAX_FLAT_RESULTS: - expect['results'] = ['i32'] - got = flatten(t, 'lift') + expect.results = ['i32'] + got = flatten_functype(t, 'lift') assert(got == expect) if len(results) > definitions.MAX_FLAT_RESULTS: - expect['params'] += ['i32'] - expect['results'] = [] - got = flatten(t, 'lower') + expect.params += ['i32'] + expect.results = [] + got = flatten_functype(t, 'lower') assert(got == expect) - -test_flatten(Func([U8(),Float32(),Float64()],[]), ['i32','f32','f64'], []) -test_flatten(Func([U8(),Float32(),Float64()],[Float32()]), ['i32','f32','f64'], ['f32']) -test_flatten(Func([U8(),Float32(),Float64()],[U8()]), ['i32','f32','f64'], ['i32']) -test_flatten(Func([U8(),Float32(),Float64()],[Tuple([Float32()])]), ['i32','f32','f64'], ['f32']) -test_flatten(Func([U8(),Float32(),Float64()],[Tuple([Float32(),Float32()])]), ['i32','f32','f64'], ['f32','f32']) -test_flatten(Func([U8(),Float32(),Float64()],[Float32(),Float32()]), ['i32','f32','f64'], ['f32','f32']) -test_flatten(Func([U8() for _ in range(17)],[]), ['i32' for _ in range(17)], []) -test_flatten(Func([U8() for _ in range(17)],[Tuple([U8(),U8()])]), ['i32' for _ in range(17)], ['i32','i32']) + +test_flatten(FuncType([U8(),Float32(),Float64()],[]), ['i32','f32','f64'], []) +test_flatten(FuncType([U8(),Float32(),Float64()],[Float32()]), ['i32','f32','f64'], ['f32']) +test_flatten(FuncType([U8(),Float32(),Float64()],[U8()]), ['i32','f32','f64'], ['i32']) +test_flatten(FuncType([U8(),Float32(),Float64()],[Tuple([Float32()])]), ['i32','f32','f64'], ['f32']) +test_flatten(FuncType([U8(),Float32(),Float64()],[Tuple([Float32(),Float32()])]), ['i32','f32','f64'], ['f32','f32']) +test_flatten(FuncType([U8(),Float32(),Float64()],[Float32(),Float32()]), ['i32','f32','f64'], ['f32','f32']) +test_flatten(FuncType([U8() for _ in range(17)],[]), ['i32' for _ in range(17)], []) +test_flatten(FuncType([U8() for _ in range(17)],[Tuple([U8(),U8()])]), ['i32' for _ in range(17)], ['i32','i32']) def test_roundtrip(t, v): before = definitions.MAX_FLAT_RESULTS definitions.MAX_FLAT_RESULTS = 16 - ft = Func([t],[t]) + ft = FuncType([t],[t]) callee_instance = Instance() callee = lambda x: x @@ -366,4 +366,97 @@ def test_roundtrip(t, v): test_roundtrip(List(List(String())), [[mk_str("one"),mk_str("two")],[mk_str("three")]]) test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':mk_tup(mk_str("answer"),42)}]) +def test_mangle_functype(params, results, expect): + ft = FuncType(params, results) + got = mangle_funcname('x', ft) + expect = 'x: ' + expect + if got != expect: + fail("test_mangle_func({}) expected {}, got {}".format(ft, expect, got)) + +test_mangle_functype([U8()], [U8()], 'func u8 -> u8') +test_mangle_functype([U8()], [], 'func u8 -> ()') +test_mangle_functype([], [U8()], 'func () -> u8') +test_mangle_functype([('x',U8())], [('y',U8())], 'func (x:u8) -> (y:u8)') +test_mangle_functype([('a',Bool()),('b',U8()),('c',S16()),('d',U32()),('e',S64())], + [('a',S8()),('b',U16()),('c',S32()),('d',U64())], + 'func (a:bool,b:u8,c:s16,d:u32,e:s64) -> (a:s8,b:u16,c:s32,d:u64)') +test_mangle_functype([List(List(String()))], [], + 'func list> -> ()') +test_mangle_functype([Record([Field('x',Record([Field('y',String())])),Field('z',U32())])], [], + 'func record{x:record{y:string},z:u32} -> ()') +test_mangle_functype([Tuple([U8()])], [Tuple([U8(),U8()])], + 'func tuple -> tuple') +test_mangle_functype([Flags(['a','b'])], [Enum(['a','b'])], + 'func flags{a,b} -> enum{a,b}') +test_mangle_functype([Variant([Case('a',None),Case('b',U8())])], [Union([U8(),List(String())])], + 'func variant{a(_),b(u8)} -> union{u8,list}') +test_mangle_functype([Option(Bool())],[Option(List(U8()))], + 'func option -> option>') +test_mangle_functype([], [('a',Result(None,None)),('b',Result(U8(),None)),('c',Result(None,U8()))], + 'func () -> (a:result<_,_>,b:result,c:result<_,u8>)') + +def test_cabi(ct, expect): + got = canonical_module_type(ct) + if got != expect: + fail("test_cabi() got:\n {}\nexpected:\n {}".format(got, expect)) + +test_cabi( + ComponentType( + [ExternDecl('a', FuncType([U8()],[U8()])), + ExternDecl('b', ValueType(String()))], + [ExternDecl('c', FuncType([S8()],[S8()])), + ExternDecl('d', ValueType(List(U8())))] + ), + ModuleType( + [CoreImportDecl('','a: func u8 -> u8', CoreFuncType(['i32'],['i32']))], + [CoreExportDecl('_memory', CoreMemoryType(0, None)), + CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), + CoreExportDecl('_start{cabi=0.1}: func (b:string) -> (d:list)', + CoreFuncType(['i32','i32'],['i32'])), + CoreExportDecl('c: func s8 -> s8', CoreFuncType(['i32'],['i32']))] + ) +) +test_cabi( + ComponentType( + [ExternDecl('a', InstanceType([ + ExternDecl('b', FuncType([U8()],[U8()])), + ExternDecl('c', ValueType(Float32())) + ]))], + [ExternDecl('d', InstanceType([ + ExternDecl('e', FuncType([], [List(String())])), + ExternDecl('f', ValueType(Float64())) + ]))] + ), + ModuleType( + [CoreImportDecl('','a.b: func u8 -> u8', CoreFuncType(['i32'],['i32']))], + [CoreExportDecl('_memory', CoreMemoryType(0, None)), + CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), + CoreExportDecl('_start{cabi=0.1}: func (a.c:float32) -> (d.f:float64)', + CoreFuncType(['f32'],['f64'])), + CoreExportDecl('d.e: func () -> list', CoreFuncType([],['i32'])), + CoreExportDecl('_post-d.e', CoreFuncType(['i32'],[]))] + ) +) +test_cabi( # from CanonicalABI.md + ComponentType( + [ExternDecl('foo', FuncType([],[])), + ExternDecl('a', InstanceType([ + ExternDecl('bar', FuncType([('x', U32()),('y', U32())],[U32()])) + ])), + ExternDecl('v1', ValueType(String()))], + [ExternDecl('baz', FuncType([], [String()])), + ExternDecl('v2', ValueType(List(List(String()))))] + ), + ModuleType( + [CoreImportDecl('','foo: func () -> ()', CoreFuncType([],[])), + CoreImportDecl('','a.bar: func (x:u32,y:u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], + [CoreExportDecl('_memory', CoreMemoryType(0, None)), + CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), + CoreExportDecl('_start{cabi=0.1}: func (v1:string) -> (v2:list>)', + CoreFuncType(['i32','i32'],['i32'])), + CoreExportDecl('baz: func () -> string', CoreFuncType([],['i32'])), + CoreExportDecl('_post-baz', CoreFuncType(['i32'],[]))] + ) +) + print("All tests passed") From fb583bc54ed8476e46d6f3a3e424d9600d10c66c Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 25 Jul 2022 17:43:26 -0500 Subject: [PATCH 107/301] Remove extraneous word Co-authored-by: Dan Gohman --- design/mvp/CanonicalABI.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 041bcbd..c8e0309 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1,6 +1,6 @@ # Canonical ABI Explainer -This document walks defines the Canonical ABI used to convert between +This document defines the Canonical ABI used to convert between high-level Component Model values and low-level Core WebAssembly values. * [Supporting definitions](#supporting-definitions) From 6e58eb875c69297dc3e8c4dbda56dd67fd50f246 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 25 Jul 2022 18:45:59 -0500 Subject: [PATCH 108/301] Improve intro summary wording --- design/mvp/CanonicalABI.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index c8e0309..0e628d4 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1,7 +1,8 @@ # Canonical ABI Explainer -This document defines the Canonical ABI used to convert between -high-level Component Model values and low-level Core WebAssembly values. +This document defines the Canonical ABI used to convert between the values and +functions of components in the Component Model and the values and functions +of modules in Core WebAssembly. * [Supporting definitions](#supporting-definitions) * [Despecialization](#Despecialization) From 09947c7f7a70c35887079aa1c22fb234d9382165 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 25 Jul 2022 18:53:10 -0500 Subject: [PATCH 109/301] Clarify invariant text --- design/mvp/CanonicalABI.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 0e628d4..7542106 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1312,8 +1312,8 @@ definitions. Together, these definitions are intended to satisfy the invariant: ``` -for all m : core:module, mt : core:moduletype, ct : componenttype: - m : mt and mt = canonical-module-type(ct) implies lift-canonical-module(m) : ct +for all m : core:module and ct : componenttype: + module-type(m) = canonical-module-type(ct) implies ct = type-of(lift-canonical-module(m)) ``` One consequence of this is that the canonical `core:moduletype` must encode enough high-level type information for `lift-canonical-module` to be able to From 455851510c485c1dd1407be3d62f737942d1f419 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 25 Jul 2022 18:57:56 -0500 Subject: [PATCH 110/301] Improve name-mangling rationale text --- design/mvp/CanonicalABI.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 7542106..805ce24 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1318,9 +1318,10 @@ for all m : core:module and ct : componenttype: One consequence of this is that the canonical `core:moduletype` must encode enough high-level type information for `lift-canonical-module` to be able to reconstruct a working component. This is achieved using [name mangling]. Unlike -traditional C-family name mangling, which uses unreadable, space-efficient -mangling schemes to support millions of *internal* names, the Canonical ABI -only needs to mangle *external* names, of which there will only be a handful. +traditional C-family name mangling, which have a limited character set imposed +by linkers and aim to be space-efficient enough to support millions of +*internal* names, the Canonical ABI can use any valid UTF-8 string and only +needs to mangle *external* names, of which there will only be a handful. Therefore, squeezing out every byte is a lower concern and so, for simplicity and readability, type information is mangled using a subset of the [`wit`](WIT.md) syntax. From 0c4290675f4a1faa16402d8cceae9d8725a3d647 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 25 Jul 2022 19:03:50 -0500 Subject: [PATCH 111/301] Add space after comma in name mangling --- design/mvp/CanonicalABI.md | 18 +++++++++--------- design/mvp/canonical-abi/definitions.py | 18 ++++++++---------- design/mvp/canonical-abi/run_tests.py | 14 +++++++------- 3 files changed, 24 insertions(+), 26 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 805ce24..d69b099 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1442,7 +1442,7 @@ def mangle_funcvec(es): return mangle_valtype(es[0]) assert(all(type(e) == tuple and len(e) == 2 for e in es)) mangled_elems = (e[0] + ':' + mangle_valtype(e[1]) for e in es) - return '(' + ','.join(mangled_elems) + ')' + return '(' + ', '.join(mangled_elems) + ')' def mangle_valtype(t): match t: @@ -1471,29 +1471,29 @@ def mangle_valtype(t): def mangle_recordtype(fields): mangled_fields = (f.label + ':' + mangle_valtype(f.t) for f in fields) - return 'record{' + ','.join(mangled_fields) + '}' + return 'record{' + ', '.join(mangled_fields) + '}' def mangle_tupletype(ts): - return 'tuple<' + ','.join(mangle_valtype(t) for t in ts) + '>' + return 'tuple<' + ', '.join(mangle_valtype(t) for t in ts) + '>' def mangle_flags(labels): - return 'flags{' + ','.join(labels) + '}' + return 'flags{' + ', '.join(labels) + '}' def mangle_varianttype(cases): mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) - return 'variant{' + ','.join(mangled_cases) + '}' + return 'variant{' + ', '.join(mangled_cases) + '}' def mangle_enumtype(labels): - return 'enum{' + ','.join(labels) + '}' + return 'enum{' + ', '.join(labels) + '}' def mangle_uniontype(ts): - return 'union{' + ','.join(mangle_valtype(t) for t in ts) + '}' + return 'union{' + ', '.join(mangle_valtype(t) for t in ts) + '}' def mangle_optiontype(t): return 'option<' + mangle_valtype(t) + '>' def mangle_resulttype(ok, error): - return 'result<' + mangle_maybevaltype(ok) + ',' + mangle_maybevaltype(error) + '>' + return 'result<' + mangle_maybevaltype(ok) + ', ' + mangle_maybevaltype(error) + '>' def mangle_maybevaltype(t): if t is None: @@ -1516,7 +1516,7 @@ the `canonical_module_type` would be: ```wasm (module (import "" "foo: func () -> ()" (func)) - (import "" "a.bar: func (x:u32,y:u32) -> u32" (func param i32 i32) (result i32)) + (import "" "a.bar: func (x:u32, y:u32) -> u32" (func param i32 i32) (result i32)) (export "_memory" (memory 0)) (export "_realloc" (func (param i32 i32 i32 i32) (result i32))) (export "_start{cabi=0.1}: func (v1:string) -> (v2:list>)" (func (param i32 i32) (result i32))) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 6b989f2..c84ad6a 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -1089,9 +1089,7 @@ def mangle_funcvec(es): return mangle_valtype(es[0]) assert(all(type(e) == tuple and len(e) == 2 for e in es)) mangled_elems = (e[0] + ':' + mangle_valtype(e[1]) for e in es) - return '(' + ','.join(mangled_elems) + ')' - -# + return '(' + ', '.join(mangled_elems) + ')' def mangle_valtype(t): match t: @@ -1120,29 +1118,29 @@ def mangle_valtype(t): def mangle_recordtype(fields): mangled_fields = (f.label + ':' + mangle_valtype(f.t) for f in fields) - return 'record{' + ','.join(mangled_fields) + '}' + return 'record{' + ', '.join(mangled_fields) + '}' def mangle_tupletype(ts): - return 'tuple<' + ','.join(mangle_valtype(t) for t in ts) + '>' + return 'tuple<' + ', '.join(mangle_valtype(t) for t in ts) + '>' def mangle_flags(labels): - return 'flags{' + ','.join(labels) + '}' + return 'flags{' + ', '.join(labels) + '}' def mangle_varianttype(cases): mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) - return 'variant{' + ','.join(mangled_cases) + '}' + return 'variant{' + ', '.join(mangled_cases) + '}' def mangle_enumtype(labels): - return 'enum{' + ','.join(labels) + '}' + return 'enum{' + ', '.join(labels) + '}' def mangle_uniontype(ts): - return 'union{' + ','.join(mangle_valtype(t) for t in ts) + '}' + return 'union{' + ', '.join(mangle_valtype(t) for t in ts) + '}' def mangle_optiontype(t): return 'option<' + mangle_valtype(t) + '>' def mangle_resulttype(ok, error): - return 'result<' + mangle_maybevaltype(ok) + ',' + mangle_maybevaltype(error) + '>' + return 'result<' + mangle_maybevaltype(ok) + ', ' + mangle_maybevaltype(error) + '>' def mangle_maybevaltype(t): if t is None: diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 7a2b4a8..294cee1 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -379,21 +379,21 @@ def test_mangle_functype(params, results, expect): test_mangle_functype([('x',U8())], [('y',U8())], 'func (x:u8) -> (y:u8)') test_mangle_functype([('a',Bool()),('b',U8()),('c',S16()),('d',U32()),('e',S64())], [('a',S8()),('b',U16()),('c',S32()),('d',U64())], - 'func (a:bool,b:u8,c:s16,d:u32,e:s64) -> (a:s8,b:u16,c:s32,d:u64)') + 'func (a:bool, b:u8, c:s16, d:u32, e:s64) -> (a:s8, b:u16, c:s32, d:u64)') test_mangle_functype([List(List(String()))], [], 'func list> -> ()') test_mangle_functype([Record([Field('x',Record([Field('y',String())])),Field('z',U32())])], [], - 'func record{x:record{y:string},z:u32} -> ()') + 'func record{x:record{y:string}, z:u32} -> ()') test_mangle_functype([Tuple([U8()])], [Tuple([U8(),U8()])], - 'func tuple -> tuple') + 'func tuple -> tuple') test_mangle_functype([Flags(['a','b'])], [Enum(['a','b'])], - 'func flags{a,b} -> enum{a,b}') + 'func flags{a, b} -> enum{a, b}') test_mangle_functype([Variant([Case('a',None),Case('b',U8())])], [Union([U8(),List(String())])], - 'func variant{a(_),b(u8)} -> union{u8,list}') + 'func variant{a(_), b(u8)} -> union{u8, list}') test_mangle_functype([Option(Bool())],[Option(List(U8()))], 'func option -> option>') test_mangle_functype([], [('a',Result(None,None)),('b',Result(U8(),None)),('c',Result(None,U8()))], - 'func () -> (a:result<_,_>,b:result,c:result<_,u8>)') + 'func () -> (a:result<_, _>, b:result, c:result<_, u8>)') def test_cabi(ct, expect): got = canonical_module_type(ct) @@ -449,7 +449,7 @@ def test_cabi(ct, expect): ), ModuleType( [CoreImportDecl('','foo: func () -> ()', CoreFuncType([],[])), - CoreImportDecl('','a.bar: func (x:u32,y:u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], + CoreImportDecl('','a.bar: func (x:u32, y:u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], [CoreExportDecl('_memory', CoreMemoryType(0, None)), CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), CoreExportDecl('_start{cabi=0.1}: func (v1:string) -> (v2:list>)', From 1442779d30525893aac8638bc8271a5202fb38db Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 25 Jul 2022 19:13:36 -0500 Subject: [PATCH 112/301] Use cabi_ prefix instead of _ and use _ instead if - in post-return function --- design/mvp/CanonicalABI.md | 14 +++++++------- design/mvp/canonical-abi/definitions.py | 8 ++++---- design/mvp/canonical-abi/run_tests.py | 22 +++++++++++----------- 3 files changed, 22 insertions(+), 22 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index d69b099..7265cf9 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1359,18 +1359,18 @@ def canonical_module_type(ct: ComponentType) -> ModuleType: imports.append(CoreImportDecl('', mangle_funcname(name, ft), flat_ft)) exports = [] - exports.append(CoreExportDecl('_memory', CoreMemoryType(initial=0, maximum=None))) - exports.append(CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) + exports.append(CoreExportDecl('cabi_memory', CoreMemoryType(initial=0, maximum=None))) + exports.append(CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) start_ft = FuncType(start_params, start_results) - start_name = mangle_funcname('_start{cabi=' + CABI_VERSION + '}', start_ft) + start_name = mangle_funcname('cabi_start{cabi=' + CABI_VERSION + '}', start_ft) exports.append(CoreExportDecl(start_name, flatten_functype(start_ft, 'lift'))) for name,ft in export_funcs: flat_ft = flatten_functype(ft, 'lift') exports.append(CoreExportDecl(mangle_funcname(name, ft), flat_ft)) if any(contains_dynamic_allocation(t) for t in ft.results): - exports.append(CoreExportDecl('_post-' + name, CoreFuncType(flat_ft.results, []))) + exports.append(CoreExportDecl('cabi_post_' + name, CoreFuncType(flat_ft.results, []))) return ModuleType(imports, exports) @@ -1389,15 +1389,15 @@ import/export with the function type mangled into the name. Additionally, each export whose return type implies possible dynamic allocation is given a `post-return` function so that it can deallocate after the caller reads the return value. Lastly, all value imports and exports are concatenated into a -synthetic `_start` function that is called immediately after instantiation. +synthetic `cabi_start` function that is called immediately after instantiation. For imports (which in Core WebAssembly are [two-level]), the first-level name is set to be a zero-length string so that the entire rest of the first-level string space is available for [shared-everything linking]. For imports and exports, the Canonical ABI assumes that `_` is not a valid -first character in a component-level import/export (as is currently the case in -`wit` [identifiers](WIT.md#identifiers)) and thus can safely be used to prefix +character in a component-level import/export (as is currently the case in `wit` +[identifiers](WIT.md#identifiers)) and thus can safely be used to prefix auxiliary Canonical ABI-induced imports/exports. Instance-mangling recursively builds a dotted path string (of instance names) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index c84ad6a..b84b291 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -1029,18 +1029,18 @@ def canonical_module_type(ct: ComponentType) -> ModuleType: imports.append(CoreImportDecl('', mangle_funcname(name, ft), flat_ft)) exports = [] - exports.append(CoreExportDecl('_memory', CoreMemoryType(initial=0, maximum=None))) - exports.append(CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) + exports.append(CoreExportDecl('cabi_memory', CoreMemoryType(initial=0, maximum=None))) + exports.append(CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) start_ft = FuncType(start_params, start_results) - start_name = mangle_funcname('_start{cabi=' + CABI_VERSION + '}', start_ft) + start_name = mangle_funcname('cabi_start{cabi=' + CABI_VERSION + '}', start_ft) exports.append(CoreExportDecl(start_name, flatten_functype(start_ft, 'lift'))) for name,ft in export_funcs: flat_ft = flatten_functype(ft, 'lift') exports.append(CoreExportDecl(mangle_funcname(name, ft), flat_ft)) if any(contains_dynamic_allocation(t) for t in ft.results): - exports.append(CoreExportDecl('_post-' + name, CoreFuncType(flat_ft.results, []))) + exports.append(CoreExportDecl('cabi_post_' + name, CoreFuncType(flat_ft.results, []))) return ModuleType(imports, exports) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 294cee1..46425b8 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -409,9 +409,9 @@ def test_cabi(ct, expect): ), ModuleType( [CoreImportDecl('','a: func u8 -> u8', CoreFuncType(['i32'],['i32']))], - [CoreExportDecl('_memory', CoreMemoryType(0, None)), - CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('_start{cabi=0.1}: func (b:string) -> (d:list)', + [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), + CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), + CoreExportDecl('cabi_start{cabi=0.1}: func (b:string) -> (d:list)', CoreFuncType(['i32','i32'],['i32'])), CoreExportDecl('c: func s8 -> s8', CoreFuncType(['i32'],['i32']))] ) @@ -429,12 +429,12 @@ def test_cabi(ct, expect): ), ModuleType( [CoreImportDecl('','a.b: func u8 -> u8', CoreFuncType(['i32'],['i32']))], - [CoreExportDecl('_memory', CoreMemoryType(0, None)), - CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('_start{cabi=0.1}: func (a.c:float32) -> (d.f:float64)', + [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), + CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), + CoreExportDecl('cabi_start{cabi=0.1}: func (a.c:float32) -> (d.f:float64)', CoreFuncType(['f32'],['f64'])), CoreExportDecl('d.e: func () -> list', CoreFuncType([],['i32'])), - CoreExportDecl('_post-d.e', CoreFuncType(['i32'],[]))] + CoreExportDecl('cabi_post_d.e', CoreFuncType(['i32'],[]))] ) ) test_cabi( # from CanonicalABI.md @@ -450,12 +450,12 @@ def test_cabi(ct, expect): ModuleType( [CoreImportDecl('','foo: func () -> ()', CoreFuncType([],[])), CoreImportDecl('','a.bar: func (x:u32, y:u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], - [CoreExportDecl('_memory', CoreMemoryType(0, None)), - CoreExportDecl('_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('_start{cabi=0.1}: func (v1:string) -> (v2:list>)', + [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), + CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), + CoreExportDecl('cabi_start{cabi=0.1}: func (v1:string) -> (v2:list>)', CoreFuncType(['i32','i32'],['i32'])), CoreExportDecl('baz: func () -> string', CoreFuncType([],['i32'])), - CoreExportDecl('_post-baz', CoreFuncType(['i32'],[]))] + CoreExportDecl('cabi_post_baz', CoreFuncType(['i32'],[]))] ) ) From 23d6e1ef9df8c566c55ee6c7878d9b3483cfc812 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 26 Jul 2022 20:02:15 -0500 Subject: [PATCH 113/301] Add 'outer' case to to match Binary.md Resolves #72 --- design/mvp/Binary.md | 3 ++- design/mvp/Explainer.md | 25 ++++++++++++++----------- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index fdeefe7..97eb7e7 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -117,7 +117,8 @@ Notes: index in the `sort` index space of the `i`th enclosing component (counting outward, starting with `0` referring to the current component). * For `outer` aliases of `core:aliastarget`, validation restricts the `sort` to - `type`. + `type` and `ct` must be `0` (for a component-level definition; see also the + `core:alias` case of `core:moduledecl` below). * For `outer` aliases of `aliastarget`, validation restricts the `sort` to one of `type`, `module` or `component`. diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index c88074f..3b1129b 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -227,6 +227,7 @@ component): ``` core:alias ::= (alias ( ?)) core:aliastarget ::= export + | outer alias ::= (alias ( ?)) aliastarget ::= export @@ -242,17 +243,19 @@ In the case of `export` aliases, validation ensures `name` is an export in the target instance and has a matching sort. In the case of `outer` aliases, the `u32` pair serves as a [de Bruijn -index], with first `u32` being the number of enclosing components to skip -and the second `u32` being an index into the target component's sort's index -space. In particular, the first `u32` can be `0`, in which case the outer -alias refers to the current component. To maintain the acyclicity of module -instantiation, outer aliases are only allowed to refer to *preceding* outer -definitions. - -There is no `outer` option in `core:aliastarget` because it would only be able -to refer to enclosing *core* modules and module types and, until -module-linking, modules and module types can't nest. In a module-linking -future, outer aliases would be added, making `core:alias` symmetric to `alias`. +index], with first `u32` being the number of enclosing components/modules to +skip and the second `u32` being an index into the target's sort's index space. +In particular, the first `u32` can be `0`, in which case the outer alias refers +to the current component. To maintain the acyclicity of module instantiation, +outer aliases are only allowed to refer to *preceding* outer definitions. + +As with other core definitions, core aliases are only supposed to "see" other +core definitions (as-if they were defined by Core WebAssembly extended with +[module-linking]). Thus, core `outer` aliases must have a skip-count of `0` +when defined within a component, only allowing them to duplicate core +definitions in core index spaces. (Core `outer` aliases have a second use +described in the next section, which is why they are included in the grammar +at all.) Components containing outer aliases effectively produce a [closure] at instantiation time, including a copy of the outer-aliased definitions. Because From c7823c0058bf7822bcb3d566c279370b8986f2d3 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 26 Jul 2022 20:33:16 -0500 Subject: [PATCH 114/301] More spaces in the name mangling scheme --- design/mvp/CanonicalABI.md | 24 ++++++++++++------------ design/mvp/canonical-abi/definitions.py | 14 +++++++------- design/mvp/canonical-abi/run_tests.py | 22 +++++++++++----------- 3 files changed, 30 insertions(+), 30 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 7265cf9..b609f9e 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1441,7 +1441,7 @@ def mangle_funcvec(es): if len(es) == 1 and isinstance(es[0], ValType): return mangle_valtype(es[0]) assert(all(type(e) == tuple and len(e) == 2 for e in es)) - mangled_elems = (e[0] + ':' + mangle_valtype(e[1]) for e in es) + mangled_elems = (e[0] + ': ' + mangle_valtype(e[1]) for e in es) return '(' + ', '.join(mangled_elems) + ')' def mangle_valtype(t): @@ -1470,24 +1470,24 @@ def mangle_valtype(t): case Result(ok,error) : return mangle_resulttype(ok,error) def mangle_recordtype(fields): - mangled_fields = (f.label + ':' + mangle_valtype(f.t) for f in fields) - return 'record{' + ', '.join(mangled_fields) + '}' + mangled_fields = (f.label + ': ' + mangle_valtype(f.t) for f in fields) + return 'record { ' + ', '.join(mangled_fields) + ' }' def mangle_tupletype(ts): return 'tuple<' + ', '.join(mangle_valtype(t) for t in ts) + '>' def mangle_flags(labels): - return 'flags{' + ', '.join(labels) + '}' + return 'flags { ' + ', '.join(labels) + ' }' def mangle_varianttype(cases): mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) - return 'variant{' + ', '.join(mangled_cases) + '}' + return 'variant { ' + ', '.join(mangled_cases) + ' }' def mangle_enumtype(labels): - return 'enum{' + ', '.join(labels) + '}' + return 'enum { ' + ', '.join(labels) + ' }' def mangle_uniontype(ts): - return 'union{' + ', '.join(mangle_valtype(t) for t in ts) + '}' + return 'union { ' + ', '.join(mangle_valtype(t) for t in ts) + ' }' def mangle_optiontype(t): return 'option<' + mangle_valtype(t) + '>' @@ -1516,12 +1516,12 @@ the `canonical_module_type` would be: ```wasm (module (import "" "foo: func () -> ()" (func)) - (import "" "a.bar: func (x:u32, y:u32) -> u32" (func param i32 i32) (result i32)) - (export "_memory" (memory 0)) - (export "_realloc" (func (param i32 i32 i32 i32) (result i32))) - (export "_start{cabi=0.1}: func (v1:string) -> (v2:list>)" (func (param i32 i32) (result i32))) + (import "" "a.bar: func (x: u32, y: u32) -> u32" (func param i32 i32) (result i32)) + (export "cabi_memory" (memory 0)) + (export "cabi_realloc" (func (param i32 i32 i32 i32) (result i32))) + (export "cabi_start{cabi=0.1}: func (v1: string) -> (v2: list>)" (func (param i32 i32) (result i32))) (export "baz: func () -> string" (func (result i32))) - (export "_post-baz" (func (param i32))) + (export "cabi_post_baz" (func (param i32))) ) ``` diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index b84b291..22abc3c 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -1088,7 +1088,7 @@ def mangle_funcvec(es): if len(es) == 1 and isinstance(es[0], ValType): return mangle_valtype(es[0]) assert(all(type(e) == tuple and len(e) == 2 for e in es)) - mangled_elems = (e[0] + ':' + mangle_valtype(e[1]) for e in es) + mangled_elems = (e[0] + ': ' + mangle_valtype(e[1]) for e in es) return '(' + ', '.join(mangled_elems) + ')' def mangle_valtype(t): @@ -1117,24 +1117,24 @@ def mangle_valtype(t): case Result(ok,error) : return mangle_resulttype(ok,error) def mangle_recordtype(fields): - mangled_fields = (f.label + ':' + mangle_valtype(f.t) for f in fields) - return 'record{' + ', '.join(mangled_fields) + '}' + mangled_fields = (f.label + ': ' + mangle_valtype(f.t) for f in fields) + return 'record { ' + ', '.join(mangled_fields) + ' }' def mangle_tupletype(ts): return 'tuple<' + ', '.join(mangle_valtype(t) for t in ts) + '>' def mangle_flags(labels): - return 'flags{' + ', '.join(labels) + '}' + return 'flags { ' + ', '.join(labels) + ' }' def mangle_varianttype(cases): mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) - return 'variant{' + ', '.join(mangled_cases) + '}' + return 'variant { ' + ', '.join(mangled_cases) + ' }' def mangle_enumtype(labels): - return 'enum{' + ', '.join(labels) + '}' + return 'enum { ' + ', '.join(labels) + ' }' def mangle_uniontype(ts): - return 'union{' + ', '.join(mangle_valtype(t) for t in ts) + '}' + return 'union { ' + ', '.join(mangle_valtype(t) for t in ts) + ' }' def mangle_optiontype(t): return 'option<' + mangle_valtype(t) + '>' diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 46425b8..41bf487 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -371,29 +371,29 @@ def test_mangle_functype(params, results, expect): got = mangle_funcname('x', ft) expect = 'x: ' + expect if got != expect: - fail("test_mangle_func({}) expected {}, got {}".format(ft, expect, got)) + fail("test_mangle_func() got:\n {}\nexpected:\n {}".format(got, expect)) test_mangle_functype([U8()], [U8()], 'func u8 -> u8') test_mangle_functype([U8()], [], 'func u8 -> ()') test_mangle_functype([], [U8()], 'func () -> u8') -test_mangle_functype([('x',U8())], [('y',U8())], 'func (x:u8) -> (y:u8)') +test_mangle_functype([('x',U8())], [('y',U8())], 'func (x: u8) -> (y: u8)') test_mangle_functype([('a',Bool()),('b',U8()),('c',S16()),('d',U32()),('e',S64())], [('a',S8()),('b',U16()),('c',S32()),('d',U64())], - 'func (a:bool, b:u8, c:s16, d:u32, e:s64) -> (a:s8, b:u16, c:s32, d:u64)') + 'func (a: bool, b: u8, c: s16, d: u32, e: s64) -> (a: s8, b: u16, c: s32, d: u64)') test_mangle_functype([List(List(String()))], [], 'func list> -> ()') test_mangle_functype([Record([Field('x',Record([Field('y',String())])),Field('z',U32())])], [], - 'func record{x:record{y:string}, z:u32} -> ()') + 'func record { x: record { y: string }, z: u32 } -> ()') test_mangle_functype([Tuple([U8()])], [Tuple([U8(),U8()])], 'func tuple -> tuple') test_mangle_functype([Flags(['a','b'])], [Enum(['a','b'])], - 'func flags{a, b} -> enum{a, b}') + 'func flags { a, b } -> enum { a, b }') test_mangle_functype([Variant([Case('a',None),Case('b',U8())])], [Union([U8(),List(String())])], - 'func variant{a(_), b(u8)} -> union{u8, list}') + 'func variant { a(_), b(u8) } -> union { u8, list }') test_mangle_functype([Option(Bool())],[Option(List(U8()))], 'func option -> option>') test_mangle_functype([], [('a',Result(None,None)),('b',Result(U8(),None)),('c',Result(None,U8()))], - 'func () -> (a:result<_, _>, b:result, c:result<_, u8>)') + 'func () -> (a: result<_, _>, b: result, c: result<_, u8>)') def test_cabi(ct, expect): got = canonical_module_type(ct) @@ -411,7 +411,7 @@ def test_cabi(ct, expect): [CoreImportDecl('','a: func u8 -> u8', CoreFuncType(['i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func (b:string) -> (d:list)', + CoreExportDecl('cabi_start{cabi=0.1}: func (b: string) -> (d: list)', CoreFuncType(['i32','i32'],['i32'])), CoreExportDecl('c: func s8 -> s8', CoreFuncType(['i32'],['i32']))] ) @@ -431,7 +431,7 @@ def test_cabi(ct, expect): [CoreImportDecl('','a.b: func u8 -> u8', CoreFuncType(['i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func (a.c:float32) -> (d.f:float64)', + CoreExportDecl('cabi_start{cabi=0.1}: func (a.c: float32) -> (d.f: float64)', CoreFuncType(['f32'],['f64'])), CoreExportDecl('d.e: func () -> list', CoreFuncType([],['i32'])), CoreExportDecl('cabi_post_d.e', CoreFuncType(['i32'],[]))] @@ -449,10 +449,10 @@ def test_cabi(ct, expect): ), ModuleType( [CoreImportDecl('','foo: func () -> ()', CoreFuncType([],[])), - CoreImportDecl('','a.bar: func (x:u32, y:u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], + CoreImportDecl('','a.bar: func (x: u32, y: u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func (v1:string) -> (v2:list>)', + CoreExportDecl('cabi_start{cabi=0.1}: func (v1: string) -> (v2: list>)', CoreFuncType(['i32','i32'],['i32'])), CoreExportDecl('baz: func () -> string', CoreFuncType([],['i32'])), CoreExportDecl('cabi_post_baz', CoreFuncType(['i32'],[]))] From 723ce728a745e469e037b7b65307e68d5513941b Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 29 Jul 2022 10:25:36 -0500 Subject: [PATCH 115/301] Fix size_flags for the zero-flags case Resolves #75 --- design/mvp/CanonicalABI.md | 1 + design/mvp/canonical-abi/definitions.py | 1 + design/mvp/canonical-abi/run_tests.py | 8 ++++++++ 3 files changed, 10 insertions(+) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 16f3164..0f2648c 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -180,6 +180,7 @@ def size_variant(cases): def size_flags(labels): n = len(labels) + if n == 0: return 0 if n <= 8: return 1 if n <= 16: return 2 return 4 * num_i32_flags(labels) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 3a90ec5..3d09ef2 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -183,6 +183,7 @@ def size_variant(cases): def size_flags(labels): n = len(labels) + if n == 0: return 0 if n <= 8: return 1 if n <= 16: return 2 return 4 * num_i32_flags(labels) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index e658637..965c024 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -95,6 +95,8 @@ def test_name(): test(Unit(), [], {}) test(Record([Field('x',U8()), Field('y',U16()), Field('z',U32())]), [1,2,3], {'x':1,'y':2,'z':3}) test(Tuple([Tuple([U8(),U8()]),U8()]), [1,2,3], {'0':{'0':1,'1':2},'1':3}) +t = Flags([]) +test(t, [], {}) t = Flags(['a','b']) test(t, [0], {'a':False,'b':False}) test(t, [2], {'a':False,'b':True}) @@ -283,6 +285,12 @@ def test_heap(t, expect, args, byte_array): [1,0xff,5,0,6,0xff,7,0xff, 0,0xff,8,0xff,0xff,0xff,9,0xff]) test_heap(List(Union([U8()])), [{'0':6},{'0':7},{'0':8}], [0,3], [0,6, 0,7, 0,8]) +t = List(Flags([])) +test_heap(t, [{},{},{}], [0,3], + []) +t = List(Tuple([Flags([]), U8()])) +test_heap(t, [mk_tup({}, 42), mk_tup({}, 43), mk_tup({}, 44)], [0,3], + [42,43,44]) t = List(Flags(['a','b'])) test_heap(t, [{'a':False,'b':False},{'a':False,'b':True},{'a':True,'b':True}], [0,3], [0,2,3]) From 92173ef58e2fb9336a535aa7889614de885e52e6 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 2 Aug 2022 11:34:32 -0500 Subject: [PATCH 116/301] Add some missing trap cases --- design/mvp/CanonicalABI.md | 28 +++++++++++++++++++++++ design/mvp/canonical-abi/definitions.py | 30 +++++++++++++++++++++++-- 2 files changed, 56 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 0f2648c..6d1a086 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -207,6 +207,7 @@ class Opts: def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) + assert(ptr + size(t) <= len(opts.memory)) match despecialize(t): case Bool() : return convert_int_to_bool(load_int(opts, ptr, 1)) case U8() : return load_int(opts, ptr, 1) @@ -297,12 +298,15 @@ UTF16_TAG = 1 << 31 def load_string_from_range(opts, ptr, tagged_code_units): match opts.string_encoding: case 'utf8': + alignment = 1 byte_length = tagged_code_units encoding = 'utf-8' case 'utf16': + alignment = 2 byte_length = 2 * tagged_code_units encoding = 'utf-16-le' case 'latin1+utf16': + alignment = 2 if bool(tagged_code_units & UTF16_TAG): byte_length = 2 * (tagged_code_units ^ UTF16_TAG) encoding = 'utf-16-le' @@ -310,6 +314,7 @@ def load_string_from_range(opts, ptr, tagged_code_units): byte_length = tagged_code_units encoding = 'latin-1' + trap_if(ptr != align_to(ptr, alignment)) trap_if(ptr + byte_length > len(opts.memory)) try: s = opts.memory[ptr : ptr+byte_length].decode(encoding) @@ -403,6 +408,7 @@ The `store` function defines how to write a value `v` of a given value type ```python def store(opts, v, t, ptr): assert(ptr == align_to(ptr, alignment(t))) + assert(ptr + size(t) <= len(opts.memory)) match despecialize(t): case Bool() : store_int(opts, int(bool(v)), ptr, 1) case U8() : store_int(opts, v, ptr, 1) @@ -522,6 +528,8 @@ def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_alignme dst_byte_length = dst_code_unit_size * src_code_units trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, dst_alignment, dst_byte_length) + trap_if(ptr != align_to(ptr, dst_alignment)) + trap_if(ptr + dst_byte_length > len(opts.memory)) encoded = src.encode(dst_encoding) assert(dst_byte_length == len(encoded)) opts.memory[ptr : ptr+len(encoded)] = encoded @@ -546,15 +554,18 @@ def store_latin1_to_utf8(opts, src, src_code_units): def store_string_to_utf8(opts, src, src_code_units, worst_case_size): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 1, src_code_units) + trap_if(ptr + src_code_units > len(opts.memory)) encoded = src.encode('utf-8') assert(src_code_units <= len(encoded)) opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] if src_code_units < len(encoded): trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size) + trap_if(ptr + worst_case_size > len(opts.memory)) opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] if worst_case_size > len(encoded): ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) + trap_if(ptr + len(encoded) > len(opts.memory)) return (ptr, len(encoded)) ``` @@ -567,10 +578,14 @@ def store_utf8_to_utf16(opts, src, src_code_units): worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 2, worst_case_size) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + worst_case_size > len(opts.memory)) encoded = src.encode('utf-16-le') opts.memory[ptr : ptr+len(encoded)] = encoded if len(encoded) < worst_case_size: ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + len(encoded) > len(opts.memory)) code_units = int(len(encoded) / 2) return (ptr, code_units) ``` @@ -587,6 +602,8 @@ bytes): def store_string_to_latin1_or_utf16(opts, src, src_code_units): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 2, src_code_units) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + src_code_units > len(opts.memory)) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): @@ -596,6 +613,8 @@ def store_string_to_latin1_or_utf16(opts, src, src_code_units): worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + worst_case_size > len(opts.memory)) for j in range(dst_byte_length-1, -1, -1): opts.memory[ptr + 2*j] = opts.memory[ptr + j] opts.memory[ptr + 2*j + 1] = 0 @@ -603,10 +622,14 @@ def store_string_to_latin1_or_utf16(opts, src, src_code_units): opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] if worst_case_size > len(encoded): ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + len(encoded) > len(opts.memory)) tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) if dst_byte_length < src_code_units: ptr = opts.realloc(ptr, src_code_units, 2, dst_byte_length) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + dst_byte_length > len(opts.memory)) return (ptr, dst_byte_length) ``` @@ -625,6 +648,8 @@ def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): src_byte_length = 2 * src_code_units trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 2, src_byte_length) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + src_byte_length > len(opts.memory)) encoded = src.encode('utf-16-le') opts.memory[ptr : ptr+len(encoded)] = encoded if any(ord(c) >= (1 << 8) for c in src): @@ -634,6 +659,7 @@ def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): for i in range(latin1_size): opts.memory[ptr + i] = opts.memory[ptr + 2*i] ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size) + trap_if(ptr + latin1_size > len(opts.memory)) return (ptr, latin1_size) ``` @@ -1046,6 +1072,7 @@ def lift(opts, max_flat, vi, ts): ptr = vi.next('i32') tuple_type = Tuple(ts) trap_if(ptr != align_to(ptr, alignment(tuple_type))) + trap_if(ptr + size(tuple_type) > len(opts.memory)) return list(load(opts, ptr, tuple_type).values()) else: return [ lift_flat(opts, vi, t) for t in ts ] @@ -1067,6 +1094,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): else: ptr = out_param.next('i32') trap_if(ptr != align_to(ptr, alignment(tuple_type))) + trap_if(ptr + size(tuple_type) > len(opts.memory)) store(opts, tuple_value, tuple_type, ptr) return [ Value('i32', ptr) ] else: diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 3d09ef2..4afc72e 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -201,6 +201,7 @@ class Opts: def load(opts, ptr, t): assert(ptr == align_to(ptr, alignment(t))) + assert(ptr + size(t) <= len(opts.memory)) match despecialize(t): case Bool() : return convert_int_to_bool(load_int(opts, ptr, 1)) case U8() : return load_int(opts, ptr, 1) @@ -223,7 +224,6 @@ def load(opts, ptr, t): # def load_int(opts, ptr, nbytes, signed = False): - trap_if(ptr + nbytes > len(opts.memory)) return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) # @@ -272,12 +272,15 @@ def load_string(opts, ptr): def load_string_from_range(opts, ptr, tagged_code_units): match opts.string_encoding: case 'utf8': + alignment = 1 byte_length = tagged_code_units encoding = 'utf-8' case 'utf16': + alignment = 2 byte_length = 2 * tagged_code_units encoding = 'utf-16-le' case 'latin1+utf16': + alignment = 2 if bool(tagged_code_units & UTF16_TAG): byte_length = 2 * (tagged_code_units ^ UTF16_TAG) encoding = 'utf-16-le' @@ -285,6 +288,7 @@ def load_string_from_range(opts, ptr, tagged_code_units): byte_length = tagged_code_units encoding = 'latin-1' + trap_if(ptr != align_to(ptr, alignment)) trap_if(ptr + byte_length > len(opts.memory)) try: s = opts.memory[ptr : ptr+byte_length].decode(encoding) @@ -358,6 +362,7 @@ def unpack_flags_from_int(i, labels): def store(opts, v, t, ptr): assert(ptr == align_to(ptr, alignment(t))) + assert(ptr + size(t) <= len(opts.memory)) match despecialize(t): case Bool() : store_int(opts, int(bool(v)), ptr, 1) case U8() : store_int(opts, v, ptr, 1) @@ -380,7 +385,6 @@ def store(opts, v, t, ptr): # def store_int(opts, v, ptr, nbytes, signed = False): - trap_if(ptr + nbytes > len(opts.memory)) opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) # @@ -447,6 +451,8 @@ def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_alignme dst_byte_length = dst_code_unit_size * src_code_units trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, dst_alignment, dst_byte_length) + trap_if(ptr != align_to(ptr, dst_alignment)) + trap_if(ptr + dst_byte_length > len(opts.memory)) encoded = src.encode(dst_encoding) assert(dst_byte_length == len(encoded)) opts.memory[ptr : ptr+len(encoded)] = encoded @@ -465,15 +471,18 @@ def store_latin1_to_utf8(opts, src, src_code_units): def store_string_to_utf8(opts, src, src_code_units, worst_case_size): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 1, src_code_units) + trap_if(ptr + src_code_units > len(opts.memory)) encoded = src.encode('utf-8') assert(src_code_units <= len(encoded)) opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] if src_code_units < len(encoded): trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size) + trap_if(ptr + worst_case_size > len(opts.memory)) opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] if worst_case_size > len(encoded): ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) + trap_if(ptr + len(encoded) > len(opts.memory)) return (ptr, len(encoded)) # @@ -482,10 +491,14 @@ def store_utf8_to_utf16(opts, src, src_code_units): worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 2, worst_case_size) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + worst_case_size > len(opts.memory)) encoded = src.encode('utf-16-le') opts.memory[ptr : ptr+len(encoded)] = encoded if len(encoded) < worst_case_size: ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + len(encoded) > len(opts.memory)) code_units = int(len(encoded) / 2) return (ptr, code_units) @@ -494,6 +507,8 @@ def store_utf8_to_utf16(opts, src, src_code_units): def store_string_to_latin1_or_utf16(opts, src, src_code_units): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 2, src_code_units) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + src_code_units > len(opts.memory)) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): @@ -503,6 +518,8 @@ def store_string_to_latin1_or_utf16(opts, src, src_code_units): worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + worst_case_size > len(opts.memory)) for j in range(dst_byte_length-1, -1, -1): opts.memory[ptr + 2*j] = opts.memory[ptr + j] opts.memory[ptr + 2*j + 1] = 0 @@ -510,10 +527,14 @@ def store_string_to_latin1_or_utf16(opts, src, src_code_units): opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] if worst_case_size > len(encoded): ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + len(encoded) > len(opts.memory)) tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) if dst_byte_length < src_code_units: ptr = opts.realloc(ptr, src_code_units, 2, dst_byte_length) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + dst_byte_length > len(opts.memory)) return (ptr, dst_byte_length) # @@ -522,6 +543,8 @@ def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): src_byte_length = 2 * src_code_units trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) ptr = opts.realloc(0, 0, 2, src_byte_length) + trap_if(ptr != align_to(ptr, 2)) + trap_if(ptr + src_byte_length > len(opts.memory)) encoded = src.encode('utf-16-le') opts.memory[ptr : ptr+len(encoded)] = encoded if any(ord(c) >= (1 << 8) for c in src): @@ -531,6 +554,7 @@ def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): for i in range(latin1_size): opts.memory[ptr + i] = opts.memory[ptr + 2*i] ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size) + trap_if(ptr + latin1_size > len(opts.memory)) return (ptr, latin1_size) # @@ -840,6 +864,7 @@ def lift(opts, max_flat, vi, ts): ptr = vi.next('i32') tuple_type = Tuple(ts) trap_if(ptr != align_to(ptr, alignment(tuple_type))) + trap_if(ptr + size(tuple_type) > len(opts.memory)) return list(load(opts, ptr, tuple_type).values()) else: return [ lift_flat(opts, vi, t) for t in ts ] @@ -856,6 +881,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): else: ptr = out_param.next('i32') trap_if(ptr != align_to(ptr, alignment(tuple_type))) + trap_if(ptr + size(tuple_type) > len(opts.memory)) store(opts, tuple_value, tuple_type, ptr) return [ Value('i32', ptr) ] else: From 7b65e73833493fac980255d46781e0ffa5424b32 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 2 Aug 2022 18:26:15 -0500 Subject: [PATCH 117/301] But no space before func opening paren to match convention --- design/mvp/CanonicalABI.md | 18 +++++++++--------- design/mvp/canonical-abi/definitions.py | 10 +++++----- design/mvp/canonical-abi/run_tests.py | 24 ++++++++++++------------ 3 files changed, 26 insertions(+), 26 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index b609f9e..7e3367f 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1432,14 +1432,14 @@ Function and value types are recursively mangled into [`wit`](WIT.md)-compatible syntax: ```python def mangle_funcname(name, ft): - return '{name}: func {params} -> {results}'.format( + return '{name}: func{params} -> {results}'.format( name = name, - params = mangle_funcvec(ft.params), - results = mangle_funcvec(ft.results)) + params = mangle_funcvec(ft.params, pre_space = False), + results = mangle_funcvec(ft.results, pre_space = True)) -def mangle_funcvec(es): +def mangle_funcvec(es, pre_space): if len(es) == 1 and isinstance(es[0], ValType): - return mangle_valtype(es[0]) + return (' ' if not pre_space else '') + mangle_valtype(es[0]) assert(all(type(e) == tuple and len(e) == 2 for e in es)) mangled_elems = (e[0] + ': ' + mangle_valtype(e[1]) for e in es) return '(' + ', '.join(mangled_elems) + ')' @@ -1508,19 +1508,19 @@ As an example, given a component type: (export "bar" (func (param "x" u32) (param "y" u32) (result u32))) )) (import "v1" (value string)) - (export "baz" (func (result string))) + (export "baz" (func (param string) (result string))) (export "v2" (value list>)) ) ``` the `canonical_module_type` would be: ```wasm (module - (import "" "foo: func () -> ()" (func)) - (import "" "a.bar: func (x: u32, y: u32) -> u32" (func param i32 i32) (result i32)) + (import "" "foo: func() -> ()" (func)) + (import "" "a.bar: func(x: u32, y: u32) -> u32" (func param i32 i32) (result i32)) (export "cabi_memory" (memory 0)) (export "cabi_realloc" (func (param i32 i32 i32 i32) (result i32))) (export "cabi_start{cabi=0.1}: func (v1: string) -> (v2: list>)" (func (param i32 i32) (result i32))) - (export "baz: func () -> string" (func (result i32))) + (export "baz: func string -> string" (func (param i32 i32) (result i32))) (export "cabi_post_baz" (func (param i32))) ) ``` diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 22abc3c..e2392ec 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -1079,14 +1079,14 @@ def mangle_instances(xs, path = ''): # def mangle_funcname(name, ft): - return '{name}: func {params} -> {results}'.format( + return '{name}: func{params} -> {results}'.format( name = name, - params = mangle_funcvec(ft.params), - results = mangle_funcvec(ft.results)) + params = mangle_funcvec(ft.params, pre_space = False), + results = mangle_funcvec(ft.results, pre_space = True)) -def mangle_funcvec(es): +def mangle_funcvec(es, pre_space): if len(es) == 1 and isinstance(es[0], ValType): - return mangle_valtype(es[0]) + return (' ' if not pre_space else '') + mangle_valtype(es[0]) assert(all(type(e) == tuple and len(e) == 2 for e in es)) mangled_elems = (e[0] + ': ' + mangle_valtype(e[1]) for e in es) return '(' + ', '.join(mangled_elems) + ')' diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 41bf487..a695be6 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -375,11 +375,11 @@ def test_mangle_functype(params, results, expect): test_mangle_functype([U8()], [U8()], 'func u8 -> u8') test_mangle_functype([U8()], [], 'func u8 -> ()') -test_mangle_functype([], [U8()], 'func () -> u8') -test_mangle_functype([('x',U8())], [('y',U8())], 'func (x: u8) -> (y: u8)') +test_mangle_functype([], [U8()], 'func() -> u8') +test_mangle_functype([('x',U8())], [('y',U8())], 'func(x: u8) -> (y: u8)') test_mangle_functype([('a',Bool()),('b',U8()),('c',S16()),('d',U32()),('e',S64())], [('a',S8()),('b',U16()),('c',S32()),('d',U64())], - 'func (a: bool, b: u8, c: s16, d: u32, e: s64) -> (a: s8, b: u16, c: s32, d: u64)') + 'func(a: bool, b: u8, c: s16, d: u32, e: s64) -> (a: s8, b: u16, c: s32, d: u64)') test_mangle_functype([List(List(String()))], [], 'func list> -> ()') test_mangle_functype([Record([Field('x',Record([Field('y',String())])),Field('z',U32())])], [], @@ -393,7 +393,7 @@ def test_mangle_functype(params, results, expect): test_mangle_functype([Option(Bool())],[Option(List(U8()))], 'func option -> option>') test_mangle_functype([], [('a',Result(None,None)),('b',Result(U8(),None)),('c',Result(None,U8()))], - 'func () -> (a: result<_, _>, b: result, c: result<_, u8>)') + 'func() -> (a: result<_, _>, b: result, c: result<_, u8>)') def test_cabi(ct, expect): got = canonical_module_type(ct) @@ -411,7 +411,7 @@ def test_cabi(ct, expect): [CoreImportDecl('','a: func u8 -> u8', CoreFuncType(['i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func (b: string) -> (d: list)', + CoreExportDecl('cabi_start{cabi=0.1}: func(b: string) -> (d: list)', CoreFuncType(['i32','i32'],['i32'])), CoreExportDecl('c: func s8 -> s8', CoreFuncType(['i32'],['i32']))] ) @@ -431,9 +431,9 @@ def test_cabi(ct, expect): [CoreImportDecl('','a.b: func u8 -> u8', CoreFuncType(['i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func (a.c: float32) -> (d.f: float64)', + CoreExportDecl('cabi_start{cabi=0.1}: func(a.c: float32) -> (d.f: float64)', CoreFuncType(['f32'],['f64'])), - CoreExportDecl('d.e: func () -> list', CoreFuncType([],['i32'])), + CoreExportDecl('d.e: func() -> list', CoreFuncType([],['i32'])), CoreExportDecl('cabi_post_d.e', CoreFuncType(['i32'],[]))] ) ) @@ -444,17 +444,17 @@ def test_cabi(ct, expect): ExternDecl('bar', FuncType([('x', U32()),('y', U32())],[U32()])) ])), ExternDecl('v1', ValueType(String()))], - [ExternDecl('baz', FuncType([], [String()])), + [ExternDecl('baz', FuncType([String()], [String()])), ExternDecl('v2', ValueType(List(List(String()))))] ), ModuleType( - [CoreImportDecl('','foo: func () -> ()', CoreFuncType([],[])), - CoreImportDecl('','a.bar: func (x: u32, y: u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], + [CoreImportDecl('','foo: func() -> ()', CoreFuncType([],[])), + CoreImportDecl('','a.bar: func(x: u32, y: u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func (v1: string) -> (v2: list>)', + CoreExportDecl('cabi_start{cabi=0.1}: func(v1: string) -> (v2: list>)', CoreFuncType(['i32','i32'],['i32'])), - CoreExportDecl('baz: func () -> string', CoreFuncType([],['i32'])), + CoreExportDecl('baz: func string -> string', CoreFuncType(['i32','i32'],['i32'])), CoreExportDecl('cabi_post_baz', CoreFuncType(['i32'],[]))] ) ) From cd75cf97f2fc3231eed8f62f3c2218d0fe732af1 Mon Sep 17 00:00:00 2001 From: Liam Murphy Date: Wed, 10 Aug 2022 10:15:15 +1000 Subject: [PATCH 118/301] Respect `MAX_FLAT_RESULTS` in Explainer.md I noticed a couple spots in Explainer.md where canonical-ABI functions return multiple values. This fixes those spots to return a pointer to those values instead, properly following the canonical ABI. --- design/mvp/Explainer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index b5e720b..c383942 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -726,7 +726,7 @@ takes a string, does some logging, then returns a string. (import "libc" "memory" (memory 1)) (import "libc" "realloc" (func (param i32 i32) (result i32))) (import "wasi:logging" "log" (func $log (param i32 i32))) - (func (export "run") (param i32 i32) (result i32 i32) + (func (export "run") (param i32 i32) (result i32) ... (call $log) ... ) ) @@ -786,7 +786,7 @@ exported string at instantiation time: (core instance $libc (instantiate $Libc)) (core module $Main (import "libc" ...) - (func (export "start") (param i32 i32) (result i32 i32) + (func (export "start") (param i32 i32) (result i32) ... general-purpose compute ) ) From 7caeccbc49f1cda78ce6796a4846150b577fdbca Mon Sep 17 00:00:00 2001 From: Lucy Maya Menon Date: Mon, 15 Aug 2022 14:26:38 -0400 Subject: [PATCH 119/301] Fix example which used one-level imports in a core module type --- design/mvp/Explainer.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index c383942..8c6132d 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -420,14 +420,14 @@ module types always start with an empty type index space. (component $C (core type $C1 (module (type (func (param i32) (result i32))) - (import "a" (func (type 0))) - (export "b" (func (type 0))) + (import "a" "b" (func (type 0))) + (export "c" (func (type 0))) )) (core type $F (func (param i32) (result i32))) (core type $C2 (module (alias outer 1 $F (type)) - (import "a" (func (type 0))) - (export "b" (func (type 0))) + (import "a" "b" (func (type 0))) + (export "c" (func (type 0))) )) ) ``` From 21666e2e61c9569294c1d208523636c1a4a4aab0 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 15 Aug 2022 19:58:49 -0500 Subject: [PATCH 120/301] Add explicit result arity to start definitions Resolves #84 --- design/mvp/Binary.md | 5 +++-- design/mvp/Explainer.md | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 5602a13..af1672b 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -266,10 +266,11 @@ Notes: (See [Start Definitions](Explainer.md#start-definitions) in the explainer.) ``` -start ::= f: arg*:vec() => (start f (value arg)*) +start ::= f: arg*:vec() r: => (start f (value arg)* (result (value))ʳ) ``` Notes: -* Validation requires `f` have `functype` with `param` arity and types matching `arg*`. +* Validation requires `f` have `functype` with `param` arity and types matching `arg*` + and `result` arity `r`. * Validation appends the `result` types of `f` to the value index space (making them available for reference by subsequent definitions). diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index c383942..2ce8be8 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -758,13 +758,14 @@ instantiation. Unlike modules, components can call start functions at multiple points during instantiation with each such call having parameters and results. Thus, `start` definitions in components look like function calls: ``` -start ::= (start (value )* (result (value ))?) +start ::= (start (value )* (result (value ?))*) ``` The `(value )*` list specifies the arguments passed to `funcidx` by indexing into the *value index space*. Value definitions (in the value index space) are like immutable `global` definitions in Core WebAssembly except that validation requires them to be consumed exactly once at instantiation-time -(i.e., they are [linear]). +(i.e., they are [linear]). The arity and types of the two value lists are +validated to match the signature of `funcidx`. As with all definition sorts, values may be imported and exported by components. As an example value import: From 49ce648d22322719881ad9ed6b4337b4e8b672c6 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 16 Aug 2022 10:15:26 -0500 Subject: [PATCH 121/301] Tweak variant/result mangling to produce valid wit Resolves #86 --- design/mvp/canonical-abi/definitions.py | 15 +++++++++------ design/mvp/canonical-abi/run_tests.py | 4 ++-- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index e76ea0d..4701db7 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -1154,7 +1154,10 @@ def mangle_flags(labels): return 'flags { ' + ', '.join(labels) + ' }' def mangle_varianttype(cases): - mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) + mangled_cases = ('{label}{payload}'.format( + label = c.label, + payload = '' if c.t is None else '(' + mangle_valtype(c.t) + ')') + for c in cases) return 'variant { ' + ', '.join(mangled_cases) + ' }' def mangle_enumtype(labels): @@ -1167,12 +1170,12 @@ def mangle_optiontype(t): return 'option<' + mangle_valtype(t) + '>' def mangle_resulttype(ok, error): - return 'result<' + mangle_maybevaltype(ok) + ', ' + mangle_maybevaltype(error) + '>' + match (ok, error): + case (None, None) : return 'result' + case (None, _) : return 'result<_, ' + mangle_valtype(error) + '>' + case (_, None) : return 'result<' + mangle_valtype(ok) + '>' + case (_, _) : return 'result<' + mangle_valtype(ok) + ', ' + mangle_valtype(error) + '>' -def mangle_maybevaltype(t): - if t is None: - return '_' - return mangle_valtype(t) ## Lifting Canonical Modules diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 9128ce4..ea482f2 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -397,11 +397,11 @@ def test_mangle_functype(params, results, expect): test_mangle_functype([Flags(['a','b'])], [Enum(['a','b'])], 'func flags { a, b } -> enum { a, b }') test_mangle_functype([Variant([Case('a',None),Case('b',U8())])], [Union([U8(),List(String())])], - 'func variant { a(_), b(u8) } -> union { u8, list }') + 'func variant { a, b(u8) } -> union { u8, list }') test_mangle_functype([Option(Bool())],[Option(List(U8()))], 'func option -> option>') test_mangle_functype([], [('a',Result(None,None)),('b',Result(U8(),None)),('c',Result(None,U8()))], - 'func() -> (a: result<_, _>, b: result, c: result<_, u8>)') + 'func() -> (a: result, b: result, c: result<_, u8>)') def test_cabi(ct, expect): got = canonical_module_type(ct) From 0591a1c0b75a6683314c394b528a126e2b6d1297 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 16 Aug 2022 15:49:13 -0500 Subject: [PATCH 122/301] Sync CanonicalABI.md --- design/mvp/CanonicalABI.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 6fbe734..b8d4ee4 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1509,7 +1509,10 @@ def mangle_flags(labels): return 'flags { ' + ', '.join(labels) + ' }' def mangle_varianttype(cases): - mangled_cases = (c.label + '(' + mangle_maybevaltype(c.t) + ')' for c in cases) + mangled_cases = ('{label}{payload}'.format( + label = c.label, + payload = '' if c.t is None else '(' + mangle_valtype(c.t) + ')') + for c in cases) return 'variant { ' + ', '.join(mangled_cases) + ' }' def mangle_enumtype(labels): @@ -1522,12 +1525,11 @@ def mangle_optiontype(t): return 'option<' + mangle_valtype(t) + '>' def mangle_resulttype(ok, error): - return 'result<' + mangle_maybevaltype(ok) + ', ' + mangle_maybevaltype(error) + '>' - -def mangle_maybevaltype(t): - if t is None: - return '_' - return mangle_valtype(t) + match (ok, error): + case (None, None) : return 'result' + case (None, _) : return 'result<_, ' + mangle_valtype(error) + '>' + case (_, None) : return 'result<' + mangle_valtype(ok) + '>' + case (_, _) : return 'result<' + mangle_valtype(ok) + ', ' + mangle_valtype(error) + '>' ``` As an example, given a component type: ```wasm From da976796ebb36d6c88a0d9eafd8356e2d6d48597 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 16 Aug 2022 16:09:32 -0500 Subject: [PATCH 123/301] Fix inline aliases used in 'memory' canonopt Resolves #90 --- design/mvp/Explainer.md | 4 ++-- design/mvp/examples/SharedEverythingDynamicLinking.md | 10 +++++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 8c6132d..ce06b4a 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -736,7 +736,7 @@ takes a string, does some logging, then returns a string. )) (func $run (param string) (result string) (canon lift (core func $main "run") - (memory $libc "mem") (realloc (func $libc "realloc")) + (memory (core memory $libc "mem")) (realloc (func $libc "realloc")) )) (export "run" (func $run)) ) @@ -793,7 +793,7 @@ exported string at instantiation time: (core instance $main (instantiate $Main (with "libc" (instance $libc)))) (func $start (param string) (result string) (canon lift (core func $main "start") - (memory $libc "mem") (realloc (func $libc "realloc")) + (memory (core memory $libc "mem")) (realloc (func $libc "realloc")) )) (start $start (value $name) (result (value $greeting))) (export "greeting" (value $greeting)) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 552ee5e..d111e6e 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -159,7 +159,7 @@ would look like: )) (func $zip (param (list u8)) (result (list u8)) (canon lift (func $main "zip") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "zip" (func $zip)) ) @@ -238,7 +238,7 @@ component-aware `clang`, the resulting component would look like: )) (func $transform (param (list u8)) (result (list u8)) (canon lift (func $main "transform") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "transform" (func $transform)) ) @@ -285,11 +285,11 @@ components. The resulting component could look like: (instance $libc (instantiate (module $Libc))) (func $zip (canon lower (func $zipper "zip") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (func $transform (canon lower (func $imgmgk "transform") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (instance $main (instantiate (module $Main) (with "libc" (instance $libc)) @@ -298,7 +298,7 @@ components. The resulting component could look like: )) (func $run (param string) (result string) (canon lift (func $main "run") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "run" (func $run)) ) From 6d5d8c0081f169b093f8570f9adb7d0bbc4bc1ae Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 16 Aug 2022 19:07:37 -0500 Subject: [PATCH 124/301] Add a bunch of missing 'core' prefixes to SharedEverythingDynamicLinking.md --- .../SharedEverythingDynamicLinking.md | 62 +++++++++---------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index d111e6e..d0a9264 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -129,17 +129,17 @@ would look like: ```wasm ;; zipper.wat (component - (import "libc" (module $Libc + (import "libc" (core module $Libc (export "memory" (memory 1)) (export "malloc" (func (param i32) (result i32))) )) - (import "libzip" (module $Libzip + (import "libzip" (core module $Libzip (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (export "zip" (func (param i32 i32 i32) (result i32))) )) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (import "libzip" "zip" (func (param i32 i32 i32) (result i32))) @@ -149,16 +149,16 @@ would look like: ) ) - (instance $libc (instantiate (module $Libc))) - (instance $libzip (instantiate (module $Libzip)) + (core instance $libc (instantiate (module $Libc))) + (core instance $libzip (instantiate (module $Libzip)) (with "libc" (instance $libc)) )) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) (func $zip (param (list u8)) (result (list u8)) (canon lift - (func $main "zip") + (core func $main "zip") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "zip" (func $zip)) @@ -210,11 +210,11 @@ component-aware `clang`, the resulting component would look like: ```wasm ;; imgmgk.wat (component $Imgmgk - (import "libc" (module $Libc ...)) - (import "libzip" (module $Libzip ...)) - (import "libimg" (module $Libimg ...)) + (import "libc" (core module $Libc ...)) + (import "libzip" (core module $Libzip ...)) + (import "libimg" (core module $Libimg ...)) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (import "libimg" "compress" (func (param i32 i32 i32) (result i32))) @@ -224,20 +224,20 @@ component-aware `clang`, the resulting component would look like: ) ) - (instance $libc (instantiate (module $Libc))) - (instance $libzip (instantiate (module $Libzip) + (core instance $libc (instantiate (module $Libc))) + (core instance $libzip (instantiate (module $Libzip) (with "libc" (instance $libc)) )) - (instance $libimg (instantiate (module $Libimg) + (core instance $libimg (instantiate (module $Libimg) (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "libimg" (instance $libimg)) )) (func $transform (param (list u8)) (result (list u8)) (canon lift - (func $main "transform") + (core func $main "transform") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "transform" (func $transform)) @@ -254,14 +254,14 @@ components. The resulting component could look like: ```wasm ;; app.wat (component - (import "libc" (module $Libc ...)) - (import "libzip" (module $Libzip ...)) - (import "libimg" (module $Libimg ...)) + (import "libc" (core module $Libc ...)) + (import "libzip" (core module $Libzip ...)) + (import "libimg" (core module $Libimg ...)) (import "zipper" (component $Zipper ...)) (import "imgmgk" (component $Imgmgk ...)) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (import "zipper" "zip" (func (param i32 i32) (result i32 i32))) @@ -282,22 +282,22 @@ components. The resulting component could look like: (with "libimg" (module $Libimg)) )) - (instance $libc (instantiate (module $Libc))) - (func $zip (canon lower + (core instance $libc (instantiate (module $Libc))) + (core func $zip (canon lower (func $zipper "zip") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) - (func $transform (canon lower + (core func $transform (canon lower (func $imgmgk "transform") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "zipper" (instance (export "zip" (func $zipper "zip")))) (with "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) (func $run (param string) (result string) (canon lift - (func $main "run") + (core func $main "run") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "run" (func $run)) @@ -358,17 +358,17 @@ a wrapper adapter module that supplies both `$A` and `$B` with a shared function table and `bar-index` mutable global. ```wat (component - (import "A" (module $A ...)) - (import "B" (module $B ...)) - (module $Linkage + (import "A" (core module $A ...)) + (import "B" (core module $B ...)) + (core module $Linkage (global (export "bar-index") (mut i32)) (table (export "table") funcref 1) ) - (instance $linkage (instantiate (module $Linkage))) - (instance $a (instantiate (module $A) + (core instance $linkage (instantiate (module $Linkage))) + (core instance $a (instantiate (module $A) (with "linkage" (instance $linkage)) )) - (instance $b (instantiate (module $B) + (core instance $b (instantiate (module $B) (import "a" (instance $a)) (with "linkage" (instance $linkage)) )) From 00b5d7ae8fa644a8f42d1d664091eae25abf33b0 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 16 Aug 2022 19:02:30 -0500 Subject: [PATCH 125/301] Remove component-level core aliases Resolves #89 --- design/mvp/Binary.md | 66 +++++++++++++++---------------- design/mvp/Explainer.md | 86 ++++++++++++++++++++--------------------- 2 files changed, 72 insertions(+), 80 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 5602a13..43b1323 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -25,16 +25,15 @@ layer ::= 0x01 0x00 section ::= section_0() => ϵ | m*:section_1() => [core-prefix(m)] | i*:section_2(vec()) => core-prefix(i)* - | a*:section_3(vec()) => core-prefix(a)* - | t*:section_4(vec()) => core-prefix(t)* - | c: section_5() => [c] - | i*:section_6(vec()) => i* - | a*:section_7(vec()) => a* - | t*:section_8(vec()) => t* - | c*:section_9(vec()) => c* - | s: section_10() => [s] - | i*:section_11(vec()) => i* - | e*:section_12(vec()) => e* + | t*:section_3(vec()) => core-prefix(t)* + | c: section_4() => [c] + | i*:section_5(vec()) => i* + | a*:section_6(vec()) => a* + | t*:section_7(vec()) => t* + | c*:section_8(vec()) => c* + | s: section_9() => [s] + | i*:section_10(vec()) => i* + | e*:section_11(vec()) => e* ``` Notes: * Reused Core binary rules: [`core:section`], [`core:custom`], [`core:module`] @@ -100,13 +99,10 @@ Notes: (See [Alias Definitions](Explainer.md#alias-definitions) in the explainer.) ``` -core:alias ::= sort: target: => (core alias target (sort)) -core:aliastarget ::= 0x00 i: n: => export i n - | 0x01 ct: idx: => outer ct idx - -alias ::= sort: target: => (alias target (sort)) -aliastarget ::= 0x00 i: n: => export i n - | 0x01 ct: idx: => outer ct idx +alias ::= s: t: => (alias t (s)) +aliastarget ::= 0x00 i: n: => export i n + | 0x01 i: n: => core export i n + | 0x02 ct: idx: => outer ct idx ``` Notes: * Reused Core binary rules: (variable-length encoded) [`core:u32`] @@ -116,10 +112,7 @@ Notes: of enclosing components and `i` is validated to be a valid index in the `sort` index space of the `i`th enclosing component (counting outward, starting with `0` referring to the current component). -* For `outer` aliases of `core:aliastarget`, validation restricts the `sort` to - `type` and `ct` must be `0` (for a component-level definition; see also the - `core:alias` case of `core:moduledecl` below). -* For `outer` aliases of `aliastarget`, validation restricts the `sort` to one +* For `outer` aliases, validation restricts the `sort` to one of `type`, `module` or `component`. @@ -127,18 +120,20 @@ Notes: (See [Type Definitions](Explainer.md#type-definitions) in the explainer.) ``` -core:type ::= dt: => (type dt) (GC proposal) -core:deftype ::= ft: => ft (WebAssembly 1.0) - | st: => st (GC proposal) - | at: => at (GC proposal) - | mt: => mt -core:moduletype ::= 0x50 md*:vec() => (module md*) -core:moduledecl ::= 0x00 i: => i - | 0x01 t: => t - | 0x02 a: => a - | 0x03 e: => e -core:importdecl ::= i: => i -core:exportdecl ::= n: d: => (export n d) +core:type ::= dt: => (type dt) (GC proposal) +core:deftype ::= ft: => ft (WebAssembly 1.0) + | st: => st (GC proposal) + | at: => at (GC proposal) + | mt: => mt +core:moduletype ::= 0x50 md*:vec() => (module md*) +core:moduledecl ::= 0x00 i: => i + | 0x01 t: => t + | 0x02 a: => a + | 0x03 e: => e +core:alias ::= s: t: => (alias t (s)) +core:aliastarget ::= 0x01 ct: idx: => outer ct idx +core:importdecl ::= i: => i +core:exportdecl ::= n: d: => (export n d) ``` Notes: * Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] @@ -146,8 +141,9 @@ Notes: inside `type` declarators (i.e., nested core module types). * As described in the explainer, each module type is validated with an initially-empty type index space. -* Validation of `alias` declarators only allows `outer` `type` aliases. - Validation of these aliases cannot see beyond the enclosing core type index +* `alias` declarators currently only allow `outer` `type` aliases but + would add `export` aliases when core wasm adds type exports. +* Validation of `outer` aliases cannot see beyond the enclosing core type index space. Since core modules and core module types cannot nest in the MVP, this means that the maximum `ct` in an MVP `alias` declarator is `1`. diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ce06b4a..5c8ab18 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -53,7 +53,6 @@ At the top-level, a `component` is a sequence of definitions of various kinds: component ::= (component ? *) definition ::= core-prefix() | core-prefix() - | core-prefix() | core-prefix() | | @@ -221,23 +220,17 @@ and export component functions. Alias definitions project definitions out of other components' index spaces and into the current component's index spaces. As represented in the AST below, -there are two kinds of "targets" for an alias: the `export` of an instance and -a definition in an index space of an `outer` component (containing the current -component): +there are three kinds of "targets" for an alias: the `export` of a component +instance, the `core export` of a core module instance and a definition of an +`outer` component (containing the current component): ``` -core:alias ::= (alias ( ?)) -core:aliastarget ::= export - | outer - alias ::= (alias ( ?)) aliastarget ::= export + | core export | outer ``` -The `core:sort`/`sort` immediate of the alias specifies which index space in -the target component is being read from and which index space of the containing -component is being added to. If present, the `id` of the alias is bound to the -new index added by the alias and can be used anywhere a normal `id` can be -used. +If present, the `id` of the alias is bound to the new index added by the alias +and can be used anywhere a normal `id` can be used. In the case of `export` aliases, validation ensures `name` is an export in the target instance and has a matching sort. @@ -249,14 +242,6 @@ In particular, the first `u32` can be `0`, in which case the outer alias refers to the current component. To maintain the acyclicity of module instantiation, outer aliases are only allowed to refer to *preceding* outer definitions. -As with other core definitions, core aliases are only supposed to "see" other -core definitions (as-if they were defined by Core WebAssembly extended with -[module-linking]). Thus, core `outer` aliases must have a skip-count of `0` -when defined within a component, only allowing them to duplicate core -definitions in core index spaces. (Core `outer` aliases have a second use -described in the next section, which is why they are included in the grammar -at all.) - Components containing outer aliases effectively produce a [closure] at instantiation time, including a copy of the outer-aliased definitions. Because of the prevalent assumption that components are immutable values, outer aliases @@ -268,10 +253,18 @@ via some kind of "`stateful`" type attribute.) Both kinds of aliases come with syntactic sugar for implicitly declaring them inline: -For `export` aliases, the inline sugar has the form `( +)` -and can be used in place of a `sortidx` or any sort-specific index (such as a -`typeidx` or `funcidx`). For example, the following snippet uses two inline -function aliases: +For `export` aliases, the inline sugar extends the definition of `sortidx` +and the various sort-specific indices: +``` +sortidx ::= ( ) ;; as above + | +Xidx ::= ;; as above + | +inlinealias ::= ( +) +``` +If `` refers to a ``, then the `` of `inlinealias` is a +``; otherwise it's an ``. For example, the +following snippet uses two inline function aliases: ```wasm (instance $j (instantiate $J (with "f" (func $i "f")))) (export "x" (func $j "g" "h")) @@ -310,9 +303,10 @@ is desugared into: Lastly, for symmetry with [imports][func-import-abbrev], aliases can be written in an inverted form that puts the sort first: ```wasm -(func $f (import "i" "f") ...type...) ≡ (import "i" "f" (func $f ...type...)) (WebAssembly 1.0) -(func $g (alias export $i "g1")) ≡ (alias export $i "g1" (func $g)) -(core func $g (alias export $i "g1")) ≡ (core alias export $i "g1" (func $g)) + (func $f (import "i" "f") ...type...) ≡ (import "i" "f" (func $f ...type...)) (WebAssembly 1.0) + (func $f (alias export $i "f")) ≡ (alias export $i "f" (func $f)) + (core module $m (alias export $i "m")) ≡ (alias export $i "m" (core module $m)) +(core func $f (alias core export $i "f")) ≡ (alias core export $i "f" (core func $f)) ``` With what's defined so far, we're able to link modules with arbitrary renamings: @@ -328,17 +322,17 @@ With what's defined so far, we're able to link modules with arbitrary renamings: ) (core instance $a (instantiate $A)) (core instance $b1 (instantiate $B - (with "a" (instance $a)) ;; no renaming + (with "a" (instance $a)) ;; no renaming )) - (core func $a_two (alias export $a "two")) ;; ≡ (core alias export $a "two" (func $a_two)) + (core func $a_two (alias core export $a "two") ;; ≡ (alias core export $a "two" (core func $a_two)) (core instance $b2 (instantiate $B (with "a" (instance - (export "one" (func $a_two)) ;; renaming, using out-of-line alias + (export "one" (func $a_two)) ;; renaming, using out-of-line alias )) )) (core instance $b3 (instantiate $B (with "a" (instance - (export "one" (func $a "three")) ;; renaming, using inline alias sugar + (export "one" (func $a "three")) ;; renaming, using )) )) ) @@ -352,19 +346,21 @@ type and function definitions which are introduced in the next two sections. The syntax for defining core types extends the existing core type definition syntax, adding a `module` type constructor: ``` -core:type ::= (type ? ) (GC proposal) -core:deftype ::= (WebAssembly 1.0) - | (GC proposal) - | (GC proposal) - | -core:moduletype ::= (module *) -core:moduledecl ::= - | - | - | -core:importdecl ::= (import ) -core:exportdecl ::= (export ) -core:exportdesc ::= strip-id() +core:type ::= (type ? ) (GC proposal) +core:deftype ::= (WebAssembly 1.0) + | (GC proposal) + | (GC proposal) + | +core:moduletype ::= (module *) +core:moduledecl ::= + | + | + | +core:alias ::= (alias ( ?)) +core:aliastarget ::= outer +core:importdecl ::= (import ) +core:exportdecl ::= (export ) +core:exportdesc ::= strip-id() where strip-id(X) parses '(' sort Y ')' when X parses '(' sort ? Y ')' ``` From 08f2ea6c5e8dc9ef46349574b0f84a2ff00021e3 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 17 Aug 2022 11:58:48 -0500 Subject: [PATCH 126/301] Add JS API notes about variants Resolve #25 --- design/mvp/Explainer.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 8c6132d..3053b11 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -967,7 +967,7 @@ At a high level, the additional coercions would be: | `float32`, `float64` | as a Number, mapping the canonical NaN to [JS NaN] | `ToNumber` mapping [JS NaN] to the canonical NaN | | `char` | same as [`USVString`] | same as [`USVString`], throw if the USV length is not 1 | | `record` | TBD: maybe a [JS Record]? | same as [`dictionary`] | -| `variant` | TBD | TBD | +| `variant` | see below | see below | | `list` | create a typed array copy for number types; otherwise produce a JS array (like [`sequence`]) | same as [`sequence`] | | `string` | same as [`USVString`] | same as [`USVString`] | | `tuple` | TBD: maybe a [JS Tuple]? | TBD | @@ -985,6 +985,14 @@ Notes: the return value is specified by `ToJSValue` above. Otherwise, the function result is wrapped into a JS object whose field names are taken from the result names and whose field values are specified by `ToJSValue` above. +* In lieu of an existing standard JS representation for `variant`, the JS API + would need to define its own custom binding built from objects. As a sketch, + the JS values accepted by `(variant (case "a" u32) (case "b" string))` could + include `{ a: 42 }` and `{ b: "hi" }`. +* For `union` and `option`, when Web IDL doesn't support particular type + combinations (e.g., `(option (option u32))`), the JS API would fall back to + the JS API of the unspecialized `variant` (e.g., + `(variant (case "some" (variant (case "some" u32) (case "none"))) (case "none"))`). * The forthcoming addition of [resource and handle types] would additionally allow coercion to and from the remaining Symbol and Object JavaScript value types. From 19167157b9b8a104fac05a3acdabb7c04653e61d Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 16 Aug 2022 16:09:32 -0500 Subject: [PATCH 127/301] Fix inline aliases used in 'memory' canonopt --- design/mvp/Explainer.md | 4 ++-- design/mvp/examples/SharedEverythingDynamicLinking.md | 10 +++++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 8c6132d..ce06b4a 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -736,7 +736,7 @@ takes a string, does some logging, then returns a string. )) (func $run (param string) (result string) (canon lift (core func $main "run") - (memory $libc "mem") (realloc (func $libc "realloc")) + (memory (core memory $libc "mem")) (realloc (func $libc "realloc")) )) (export "run" (func $run)) ) @@ -793,7 +793,7 @@ exported string at instantiation time: (core instance $main (instantiate $Main (with "libc" (instance $libc)))) (func $start (param string) (result string) (canon lift (core func $main "start") - (memory $libc "mem") (realloc (func $libc "realloc")) + (memory (core memory $libc "mem")) (realloc (func $libc "realloc")) )) (start $start (value $name) (result (value $greeting))) (export "greeting" (value $greeting)) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 552ee5e..d111e6e 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -159,7 +159,7 @@ would look like: )) (func $zip (param (list u8)) (result (list u8)) (canon lift (func $main "zip") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "zip" (func $zip)) ) @@ -238,7 +238,7 @@ component-aware `clang`, the resulting component would look like: )) (func $transform (param (list u8)) (result (list u8)) (canon lift (func $main "transform") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "transform" (func $transform)) ) @@ -285,11 +285,11 @@ components. The resulting component could look like: (instance $libc (instantiate (module $Libc))) (func $zip (canon lower (func $zipper "zip") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (func $transform (canon lower (func $imgmgk "transform") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (instance $main (instantiate (module $Main) (with "libc" (instance $libc)) @@ -298,7 +298,7 @@ components. The resulting component could look like: )) (func $run (param string) (result string) (canon lift (func $main "run") - (memory $libc "memory") (realloc (func $libc "realloc")) + (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "run" (func $run)) ) From 4a098676ade5eb131f6f6bb65432242cb86be71a Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 16 Aug 2022 19:07:37 -0500 Subject: [PATCH 128/301] Add a bunch of missing 'core' prefixes to SharedEverythingDynamicLinking.md --- .../SharedEverythingDynamicLinking.md | 62 +++++++++---------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index d111e6e..d0a9264 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -129,17 +129,17 @@ would look like: ```wasm ;; zipper.wat (component - (import "libc" (module $Libc + (import "libc" (core module $Libc (export "memory" (memory 1)) (export "malloc" (func (param i32) (result i32))) )) - (import "libzip" (module $Libzip + (import "libzip" (core module $Libzip (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (export "zip" (func (param i32 i32 i32) (result i32))) )) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (import "libzip" "zip" (func (param i32 i32 i32) (result i32))) @@ -149,16 +149,16 @@ would look like: ) ) - (instance $libc (instantiate (module $Libc))) - (instance $libzip (instantiate (module $Libzip)) + (core instance $libc (instantiate (module $Libc))) + (core instance $libzip (instantiate (module $Libzip)) (with "libc" (instance $libc)) )) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) (func $zip (param (list u8)) (result (list u8)) (canon lift - (func $main "zip") + (core func $main "zip") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "zip" (func $zip)) @@ -210,11 +210,11 @@ component-aware `clang`, the resulting component would look like: ```wasm ;; imgmgk.wat (component $Imgmgk - (import "libc" (module $Libc ...)) - (import "libzip" (module $Libzip ...)) - (import "libimg" (module $Libimg ...)) + (import "libc" (core module $Libc ...)) + (import "libzip" (core module $Libzip ...)) + (import "libimg" (core module $Libimg ...)) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (import "libimg" "compress" (func (param i32 i32 i32) (result i32))) @@ -224,20 +224,20 @@ component-aware `clang`, the resulting component would look like: ) ) - (instance $libc (instantiate (module $Libc))) - (instance $libzip (instantiate (module $Libzip) + (core instance $libc (instantiate (module $Libc))) + (core instance $libzip (instantiate (module $Libzip) (with "libc" (instance $libc)) )) - (instance $libimg (instantiate (module $Libimg) + (core instance $libimg (instantiate (module $Libimg) (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "libimg" (instance $libimg)) )) (func $transform (param (list u8)) (result (list u8)) (canon lift - (func $main "transform") + (core func $main "transform") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "transform" (func $transform)) @@ -254,14 +254,14 @@ components. The resulting component could look like: ```wasm ;; app.wat (component - (import "libc" (module $Libc ...)) - (import "libzip" (module $Libzip ...)) - (import "libimg" (module $Libimg ...)) + (import "libc" (core module $Libc ...)) + (import "libzip" (core module $Libzip ...)) + (import "libimg" (core module $Libimg ...)) (import "zipper" (component $Zipper ...)) (import "imgmgk" (component $Imgmgk ...)) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "malloc" (func (param i32) (result i32))) (import "zipper" "zip" (func (param i32 i32) (result i32 i32))) @@ -282,22 +282,22 @@ components. The resulting component could look like: (with "libimg" (module $Libimg)) )) - (instance $libc (instantiate (module $Libc))) - (func $zip (canon lower + (core instance $libc (instantiate (module $Libc))) + (core func $zip (canon lower (func $zipper "zip") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) - (func $transform (canon lower + (core func $transform (canon lower (func $imgmgk "transform") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "zipper" (instance (export "zip" (func $zipper "zip")))) (with "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) (func $run (param string) (result string) (canon lift - (func $main "run") + (core func $main "run") (memory (core memory $libc "memory")) (realloc (func $libc "realloc")) )) (export "run" (func $run)) @@ -358,17 +358,17 @@ a wrapper adapter module that supplies both `$A` and `$B` with a shared function table and `bar-index` mutable global. ```wat (component - (import "A" (module $A ...)) - (import "B" (module $B ...)) - (module $Linkage + (import "A" (core module $A ...)) + (import "B" (core module $B ...)) + (core module $Linkage (global (export "bar-index") (mut i32)) (table (export "table") funcref 1) ) - (instance $linkage (instantiate (module $Linkage))) - (instance $a (instantiate (module $A) + (core instance $linkage (instantiate (module $Linkage))) + (core instance $a (instantiate (module $A) (with "linkage" (instance $linkage)) )) - (instance $b (instantiate (module $B) + (core instance $b (instantiate (module $B) (import "a" (instance $a)) (with "linkage" (instance $linkage)) )) From 7339630b272ebb74c0ce828eb05d832dd042fcf5 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 17 Aug 2022 19:54:34 -0500 Subject: [PATCH 129/301] Tweak variant JS value example to be more like TypeScript/wit-bindgen --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 3053b11..e443b6b 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -988,7 +988,7 @@ Notes: * In lieu of an existing standard JS representation for `variant`, the JS API would need to define its own custom binding built from objects. As a sketch, the JS values accepted by `(variant (case "a" u32) (case "b" string))` could - include `{ a: 42 }` and `{ b: "hi" }`. + include `{ tag: 'a', value: 42 }` and `{ tag: 'b', value: "hi" }`. * For `union` and `option`, when Web IDL doesn't support particular type combinations (e.g., `(option (option u32))`), the JS API would fall back to the JS API of the unspecialized `variant` (e.g., From 05bc3d6151173cd952657c45d559d10340b71028 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 17 Aug 2022 19:56:38 -0500 Subject: [PATCH 130/301] Only despecialize problematic options/unions --- design/mvp/Explainer.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index e443b6b..08fd79d 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -992,7 +992,8 @@ Notes: * For `union` and `option`, when Web IDL doesn't support particular type combinations (e.g., `(option (option u32))`), the JS API would fall back to the JS API of the unspecialized `variant` (e.g., - `(variant (case "some" (variant (case "some" u32) (case "none"))) (case "none"))`). + `(variant (case "some" (option u32)) (case "none"))`, despecializing only + the problematic outer `option`). * The forthcoming addition of [resource and handle types] would additionally allow coercion to and from the remaining Symbol and Object JavaScript value types. From 11604e2ae7dd7389c926784995264487591559f6 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 23 Aug 2022 18:33:22 -0500 Subject: [PATCH 131/301] Add whitespace to reduce noise in next patch --- design/mvp/Binary.md | 110 +++++++++++++++++++++---------------------- 1 file changed, 55 insertions(+), 55 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index e08e94c..78f6a2c 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -148,61 +148,61 @@ Notes: means that the maximum `ct` in an MVP `alias` declarator is `1`. ``` -type ::= dt: => (type dt) -deftype ::= dvt: => dvt - | ft: => ft - | ct: => ct - | it: => it -primvaltype ::= 0x7f => bool - | 0x7e => s8 - | 0x7d => u8 - | 0x7c => s16 - | 0x7b => u16 - | 0x7a => s32 - | 0x79 => u32 - | 0x78 => s64 - | 0x77 => u64 - | 0x76 => float32 - | 0x75 => float64 - | 0x74 => char - | 0x73 => string -defvaltype ::= pvt: => pvt - | 0x72 nt*:vec() => (record (field nt)*) - | 0x71 case*:vec() => (variant case*) - | 0x70 t: => (list t) - | 0x6f t*:vec() => (tuple t*) - | 0x6e n*:vec() => (flags n*) - | 0x6d n*:vec() => (enum n*) - | 0x6c t*:vec() => (union t*) - | 0x6b t: => (option t) - | 0x6a t?: u?: => (result t? (error u)?) -namedvaltype ::= n: t: => n t -case ::= n: t?: 0x0 => (case n t?) - | n: t?: 0x1 i: => (case n t? (refines case-label[i])) -casetype ::= 0x00 => - | 0x01 t: => t -valtype ::= i: => i - | pvt: => pvt -functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) -funcvec ::= 0x00 t: => [t] - | 0x01 nt*:vec() => nt* -componenttype ::= 0x41 cd*:vec() => (component cd*) -instancetype ::= 0x42 id*:vec() => (instance id*) -componentdecl ::= 0x03 id: => id - | id: => id -instancedecl ::= 0x00 t: => t - | 0x01 t: => t - | 0x02 a: => a - | 0x04 ed: => ed -importdecl ::= n: ed: => (import n ed) -exportdecl ::= n: ed: => (export n ed) -externdesc ::= 0x00 0x11 i: => (core module (type i)) - | 0x01 i: => (func (type i)) - | 0x02 t: => (value t) - | 0x03 b: => (type b) - | 0x04 i: => (instance (type i)) - | 0x05 i: => (component (type i)) -typebound ::= 0x00 i: => (eq i) +type ::= dt: => (type dt) +deftype ::= dvt: => dvt + | ft: => ft + | ct: => ct + | it: => it +primvaltype ::= 0x7f => bool + | 0x7e => s8 + | 0x7d => u8 + | 0x7c => s16 + | 0x7b => u16 + | 0x7a => s32 + | 0x79 => u32 + | 0x78 => s64 + | 0x77 => u64 + | 0x76 => float32 + | 0x75 => float64 + | 0x74 => char + | 0x73 => string +defvaltype ::= pvt: => pvt + | 0x72 nt*:vec() => (record (field nt)*) + | 0x71 case*:vec() => (variant case*) + | 0x70 t: => (list t) + | 0x6f t*:vec() => (tuple t*) + | 0x6e n*:vec() => (flags n*) + | 0x6d n*:vec() => (enum n*) + | 0x6c t*:vec() => (union t*) + | 0x6b t: => (option t) + | 0x6a t?: u?: => (result t? (error u)?) +namedvaltype ::= n: t: => n t +case ::= n: t?: 0x0 => (case n t?) + | n: t?: 0x1 i: => (case n t? (refines case-label[i])) +casetype ::= 0x00 => + | 0x01 t: => t +valtype ::= i: => i + | pvt: => pvt +functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) +funcvec ::= 0x00 t: => [t] + | 0x01 nt*:vec() => nt* +componenttype ::= 0x41 cd*:vec() => (component cd*) +instancetype ::= 0x42 id*:vec() => (instance id*) +componentdecl ::= 0x03 id: => id + | id: => id +instancedecl ::= 0x00 t: => t + | 0x01 t: => t + | 0x02 a: => a + | 0x04 ed: => ed +importdecl ::= n: ed: => (import n ed) +exportdecl ::= n: ed: => (export n ed) +externdesc ::= 0x00 0x11 i: => (core module (type i)) + | 0x01 i: => (func (type i)) + | 0x02 t: => (value t) + | 0x03 b: => (type b) + | 0x04 i: => (instance (type i)) + | 0x05 i: => (component (type i)) +typebound ::= 0x00 i: => (eq i) ``` Notes: * The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, From 0caf4a06d071eda9ec5b4d08ac180853ad5259c5 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 29 Aug 2022 11:27:15 -0700 Subject: [PATCH 132/301] Remove the `async` keyword from WIT.md. In the current [async proposal], there are no longer `async` functions; there are instead functions that return `future` or `stream`. [async proposal]: https://docs.google.com/presentation/d/1MNVOZ8hdofO3tI0szg_i-Yoy0N2QPU2C--LzVuoGSlE/edit#slide=id.g1270ef7d5b6_0_111 --- design/high-level/UseCases.md | 2 +- design/mvp/WIT.md | 9 +++------ 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/design/high-level/UseCases.md b/design/high-level/UseCases.md index 7aa2cc6..d0a7796 100644 --- a/design/high-level/UseCases.md +++ b/design/high-level/UseCases.md @@ -180,7 +180,7 @@ use cases that require additional features: can encapsulate `i32` pointers to linear memory allocations that need to be safely freed when the last handle goes away. 3. Developers import or export functions with signatures containing - concurrency-oriented types (e.g., async, future and stream) to address + concurrency-oriented types (e.g., future and stream) to address concurrency use cases like non-blocking I/O, early return and streaming. Both developers (the caller and callee) are able to use their respective languages' native concurrency support, if it exists, using the concurrency-oriented types diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 276286e..29b8b6a 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -110,7 +110,6 @@ keyword ::= 'use' | 'static' | 'interface' | 'tuple' - | 'async' | 'future' | 'stream' ``` @@ -336,19 +335,17 @@ union-cases ::= ty, ## Item: `func` Functions can also be defined in a `wit` document. Functions have a name, -parameters, and results. Functions can optionally also be declared as `async` -functions. +parameters, and results. ```wit thunk: func() -> () fibonacci: func(n: u32) -> u32 -sleep: async func(ms: u64) -> () ``` Specifically functions have the structure: ```wit -func-item ::= id ':' 'async'? 'func' func-vec '->' func-vec +func-item ::= id ':' 'func' func-vec '->' func-vec func-vec ::= ty | '(' func-named-type-list ')' @@ -377,7 +374,7 @@ resource file-descriptor resource request { static new: func() -> request - body: async func() -> list + body: func() -> future> headers: func() -> list } ``` From 064f320852dbe53b8e136df323b18cbf11c61425 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Wed, 31 Aug 2022 16:06:30 -0700 Subject: [PATCH 133/301] Remove a space from a canonical ABI mangling example. Change `func (v1: string)` to `func(v1: string)` in an example to match the mangling of the canonical ABI. --- design/mvp/CanonicalABI.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index b8d4ee4..69f221d 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1550,7 +1550,7 @@ the `canonical_module_type` would be: (import "" "a.bar: func(x: u32, y: u32) -> u32" (func param i32 i32) (result i32)) (export "cabi_memory" (memory 0)) (export "cabi_realloc" (func (param i32 i32 i32 i32) (result i32))) - (export "cabi_start{cabi=0.1}: func (v1: string) -> (v2: list>)" (func (param i32 i32) (result i32))) + (export "cabi_start{cabi=0.1}: func(v1: string) -> (v2: list>)" (func (param i32 i32) (result i32))) (export "baz: func string -> string" (func (param i32 i32) (result i32))) (export "cabi_post_baz" (func (param i32))) ) From 472a87d8481e15c14c58b8599e4e4d20147ca20a Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 12 Sep 2022 06:33:22 -0700 Subject: [PATCH 134/301] Add subsection headers for the mangling documentation. This enables linking to specific subsections. --- design/mvp/CanonicalABI.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 69f221d..a4a17ad 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1429,6 +1429,8 @@ character in a component-level import/export (as is currently the case in `wit` [identifiers](WIT.md#identifiers)) and thus can safely be used to prefix auxiliary Canonical ABI-induced imports/exports. +#### Instance mangling + Instance-mangling recursively builds a dotted path string (of instance names) that is included in the mangled core import/export name: ```python @@ -1457,6 +1459,8 @@ def mangle_instances(xs, path = ''): The three `TODO` cases are intended to be filled in by future PRs extending the Canonical ABI. +#### Function and value type mangling + Function and value types are recursively mangled into [`wit`](WIT.md)-compatible syntax: ```python From b6c55cb3bf517e493b23fe1643568cc36ccb1e62 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 13 Sep 2022 09:28:19 -0700 Subject: [PATCH 135/301] Update design/mvp/CanonicalABI.md Co-authored-by: Luke Wagner --- design/mvp/CanonicalABI.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index a4a17ad..a0f5e4d 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1429,7 +1429,7 @@ character in a component-level import/export (as is currently the case in `wit` [identifiers](WIT.md#identifiers)) and thus can safely be used to prefix auxiliary Canonical ABI-induced imports/exports. -#### Instance mangling +#### Instance type mangling Instance-mangling recursively builds a dotted path string (of instance names) that is included in the mangled core import/export name: From 20659af6f8973222ce88293c00cef21042fc3a0d Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 13 Sep 2022 09:28:30 -0700 Subject: [PATCH 136/301] Update design/mvp/CanonicalABI.md Co-authored-by: Luke Wagner --- design/mvp/CanonicalABI.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index a0f5e4d..7ae19b7 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1431,7 +1431,7 @@ auxiliary Canonical ABI-induced imports/exports. #### Instance type mangling -Instance-mangling recursively builds a dotted path string (of instance names) +Instance-type mangling recursively builds a dotted path string (of instance names) that is included in the mangled core import/export name: ```python def mangle_instances(xs, path = ''): From 60627f6d8cca51892e4ae0ddf90c00c9dbc31978 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 13 Sep 2022 09:28:48 -0700 Subject: [PATCH 137/301] Update design/mvp/CanonicalABI.md Co-authored-by: Luke Wagner --- design/mvp/CanonicalABI.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 7ae19b7..c4024cf 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1459,7 +1459,7 @@ def mangle_instances(xs, path = ''): The three `TODO` cases are intended to be filled in by future PRs extending the Canonical ABI. -#### Function and value type mangling +#### Function type mangling Function and value types are recursively mangled into [`wit`](WIT.md)-compatible syntax: From 3e93c277c073d9097c48c4f108017b725b767ae2 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Tue, 13 Sep 2022 09:30:51 -0700 Subject: [PATCH 138/301] Add a subsection header for value type mangling. --- design/mvp/CanonicalABI.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index c4024cf..d9810c1 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1461,7 +1461,7 @@ the Canonical ABI. #### Function type mangling -Function and value types are recursively mangled into +Function types are mangled into [`wit`](WIT.md)-compatible syntax: ```python def mangle_funcname(name, ft): @@ -1477,6 +1477,11 @@ def mangle_funcvec(es, pre_space): mangled_elems = (e[0] + ': ' + mangle_valtype(e[1]) for e in es) return '(' + ', '.join(mangled_elems) + ')' +#### Value type mangling + +Value types are similarly mangled into [`wit`](WIT.md)-compatible syntax, +recursively: + def mangle_valtype(t): match t: case Bool() : return 'bool' From 9bea0ad068cbbeaf8f1ff60bcdc27d5171921e9f Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 16 Sep 2022 14:47:40 -0500 Subject: [PATCH 139/301] Make all function parameters named Resolves #107 --- design/mvp/Binary.md | 7 ++-- design/mvp/CanonicalABI.md | 29 ++++++++-------- design/mvp/Explainer.md | 11 +++---- design/mvp/WIT.md | 14 ++++---- design/mvp/canonical-abi/definitions.py | 22 +++++++------ design/mvp/canonical-abi/run_tests.py | 44 ++++++++++++------------- 6 files changed, 66 insertions(+), 61 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 78f6a2c..5d15107 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -183,9 +183,10 @@ casetype ::= 0x00 => | 0x01 t: => t valtype ::= i: => i | pvt: => pvt -functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) -funcvec ::= 0x00 t: => [t] - | 0x01 nt*:vec() => nt* +functype ::= 0x40 ps: rs: => (func ps rs) +paramlist ::= nt*:vec() => (param nt)* +resultlist ::= 0x00 t: => (result t) + | 0x01 nt*:vec() => (result nt)* componenttype ::= 0x41 cd*:vec() => (component cd*) instancetype ::= 0x42 id*:vec() => (instance id*) componentdecl ::= 0x03 id: => id diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index d9810c1..e4cc8d9 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1461,27 +1461,28 @@ the Canonical ABI. #### Function type mangling -Function types are mangled into -[`wit`](WIT.md)-compatible syntax: +Function types are mangled into [`wit`](WIT.md)-compatible syntax: ```python def mangle_funcname(name, ft): - return '{name}: func{params} -> {results}'.format( - name = name, - params = mangle_funcvec(ft.params, pre_space = False), - results = mangle_funcvec(ft.results, pre_space = True)) - -def mangle_funcvec(es, pre_space): - if len(es) == 1 and isinstance(es[0], ValType): - return (' ' if not pre_space else '') + mangle_valtype(es[0]) - assert(all(type(e) == tuple and len(e) == 2 for e in es)) - mangled_elems = (e[0] + ': ' + mangle_valtype(e[1]) for e in es) + params = mangle_named_types(ft.params) + if len(ft.results) == 1 and isinstance(ft.results[0], ValType): + results = mangle_valtype(ft.results[0]) + else: + results = mangle_named_types(ft.results) + return f'{name}: func{params} -> {results}' + +def mangle_named_types(nts): + assert(all(type(nt) == tuple and len(nt) == 2 for nt in nts)) + mangled_elems = (nt[0] + ': ' + mangle_valtype(nt[1]) for nt in nts) return '(' + ', '.join(mangled_elems) + ')' +``` #### Value type mangling Value types are similarly mangled into [`wit`](WIT.md)-compatible syntax, recursively: +``` def mangle_valtype(t): match t: case Bool() : return 'bool' @@ -1548,7 +1549,7 @@ As an example, given a component type: (export "bar" (func (param "x" u32) (param "y" u32) (result u32))) )) (import "v1" (value string)) - (export "baz" (func (param string) (result string))) + (export "baz" (func (param "s" string) (result string))) (export "v2" (value list>)) ) ``` @@ -1560,7 +1561,7 @@ the `canonical_module_type` would be: (export "cabi_memory" (memory 0)) (export "cabi_realloc" (func (param i32 i32 i32 i32) (result i32))) (export "cabi_start{cabi=0.1}: func(v1: string) -> (v2: list>)" (func (param i32 i32) (result i32))) - (export "baz: func string -> string" (func (param i32 i32) (result i32))) + (export "baz: func(s: string) -> string" (func (param i32 i32) (result i32))) (export "cabi_post_baz" (func (param i32))) ) ``` diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ef543d9..e3a41fe 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -456,7 +456,6 @@ valtype ::= | functype ::= (func ) paramlist ::= (param )* - | (param ) resultlist ::= (result )* | (result ) componenttype ::= (component *) @@ -533,11 +532,11 @@ shared-nothing functions, components and component instances: The `func` type constructor describes a component-level function definition that takes and returns a list of `valtype`. In contrast to [`core:functype`], the parameters and results of `functype` can have associated names which -validation requires to be unique. If a name is not present, the name is taken -to be a special "empty" name and uniqueness still requires there to only be one -unnamed parameter/result. To avoid unnecessary complexity for language binding -generators, parameter and result lists are not allowed to contain both named -and unnamed parameters. +validation requires to be unique. To improve the ergonomics and performance of +the common case of single-value-returning functions, function types may +additionally have a single unnamed return type. For this special case, bindings +generators are naturally encouraged to return the single value directly without +wrapping it in any containing record/object/struct. The `instance` type constructor describes a list of named, typed definitions that can be imported or exported by a component. Informally, instance types diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 29b8b6a..7f3a179 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -345,15 +345,17 @@ fibonacci: func(n: u32) -> u32 Specifically functions have the structure: ```wit -func-item ::= id ':' 'func' func-vec '->' func-vec +func-item ::= id ':' 'func' param-list '->' result-list -func-vec ::= ty - | '(' func-named-type-list ')' +param-list ::= '(' named-type-list ')' -func-named-type-list ::= nil - | func-named-type ( ',' func-named-type )* +result-list ::= ty + | '(' named-type-list ') -func-named-type ::= id ':' ty +named-type-list ::= nil + | named-type ( ',' named-type )* + +named-type ::= id ':' ty ``` ## Item: `resource` diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 4701db7..cd2ff9b 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -1106,18 +1106,20 @@ def mangle_instances(xs, path = ''): # def mangle_funcname(name, ft): - return '{name}: func{params} -> {results}'.format( - name = name, - params = mangle_funcvec(ft.params, pre_space = False), - results = mangle_funcvec(ft.results, pre_space = True)) - -def mangle_funcvec(es, pre_space): - if len(es) == 1 and isinstance(es[0], ValType): - return (' ' if not pre_space else '') + mangle_valtype(es[0]) - assert(all(type(e) == tuple and len(e) == 2 for e in es)) - mangled_elems = (e[0] + ': ' + mangle_valtype(e[1]) for e in es) + params = mangle_named_types(ft.params) + if len(ft.results) == 1 and isinstance(ft.results[0], ValType): + results = mangle_valtype(ft.results[0]) + else: + results = mangle_named_types(ft.results) + return f'{name}: func{params} -> {results}' + +def mangle_named_types(nts): + assert(all(type(nt) == tuple and len(nt) == 2 for nt in nts)) + mangled_elems = (nt[0] + ': ' + mangle_valtype(nt[1]) for nt in nts) return '(' + ', '.join(mangled_elems) + ')' +# + def mangle_valtype(t): match t: case Bool() : return 'bool' diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index ea482f2..fd81173 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -381,25 +381,25 @@ def test_mangle_functype(params, results, expect): if got != expect: fail("test_mangle_func() got:\n {}\nexpected:\n {}".format(got, expect)) -test_mangle_functype([U8()], [U8()], 'func u8 -> u8') -test_mangle_functype([U8()], [], 'func u8 -> ()') +test_mangle_functype([('x',U8())], [U8()], 'func(x: u8) -> u8') +test_mangle_functype([('x',U8())], [], 'func(x: u8) -> ()') test_mangle_functype([], [U8()], 'func() -> u8') test_mangle_functype([('x',U8())], [('y',U8())], 'func(x: u8) -> (y: u8)') test_mangle_functype([('a',Bool()),('b',U8()),('c',S16()),('d',U32()),('e',S64())], [('a',S8()),('b',U16()),('c',S32()),('d',U64())], 'func(a: bool, b: u8, c: s16, d: u32, e: s64) -> (a: s8, b: u16, c: s32, d: u64)') -test_mangle_functype([List(List(String()))], [], - 'func list> -> ()') -test_mangle_functype([Record([Field('x',Record([Field('y',String())])),Field('z',U32())])], [], - 'func record { x: record { y: string }, z: u32 } -> ()') -test_mangle_functype([Tuple([U8()])], [Tuple([U8(),U8()])], - 'func tuple -> tuple') -test_mangle_functype([Flags(['a','b'])], [Enum(['a','b'])], - 'func flags { a, b } -> enum { a, b }') -test_mangle_functype([Variant([Case('a',None),Case('b',U8())])], [Union([U8(),List(String())])], - 'func variant { a, b(u8) } -> union { u8, list }') -test_mangle_functype([Option(Bool())],[Option(List(U8()))], - 'func option -> option>') +test_mangle_functype([('l',List(List(String())))], [], + 'func(l: list>) -> ()') +test_mangle_functype([('r',Record([Field('x',Record([Field('y',String())])),Field('z',U32())]))], [], + 'func(r: record { x: record { y: string }, z: u32 }) -> ()') +test_mangle_functype([('t',Tuple([U8()]))], [Tuple([U8(),U8()])], + 'func(t: tuple) -> tuple') +test_mangle_functype([('f',Flags(['a','b']))], [Enum(['a','b'])], + 'func(f: flags { a, b }) -> enum { a, b }') +test_mangle_functype([('v',Variant([Case('a',None),Case('b',U8())]))], [Union([U8(),List(String())])], + 'func(v: variant { a, b(u8) }) -> union { u8, list }') +test_mangle_functype([('o',Option(Bool()))],[Option(List(U8()))], + 'func(o: option) -> option>') test_mangle_functype([], [('a',Result(None,None)),('b',Result(U8(),None)),('c',Result(None,U8()))], 'func() -> (a: result, b: result, c: result<_, u8>)') @@ -410,24 +410,24 @@ def test_cabi(ct, expect): test_cabi( ComponentType( - [ExternDecl('a', FuncType([U8()],[U8()])), + [ExternDecl('a', FuncType([('x',U8())],[U8()])), ExternDecl('b', ValueType(String()))], - [ExternDecl('c', FuncType([S8()],[S8()])), + [ExternDecl('c', FuncType([('x',S8())],[S8()])), ExternDecl('d', ValueType(List(U8())))] ), ModuleType( - [CoreImportDecl('','a: func u8 -> u8', CoreFuncType(['i32'],['i32']))], + [CoreImportDecl('','a: func(x: u8) -> u8', CoreFuncType(['i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), CoreExportDecl('cabi_start{cabi=0.1}: func(b: string) -> (d: list)', CoreFuncType(['i32','i32'],['i32'])), - CoreExportDecl('c: func s8 -> s8', CoreFuncType(['i32'],['i32']))] + CoreExportDecl('c: func(x: s8) -> s8', CoreFuncType(['i32'],['i32']))] ) ) test_cabi( ComponentType( [ExternDecl('a', InstanceType([ - ExternDecl('b', FuncType([U8()],[U8()])), + ExternDecl('b', FuncType([('x',U8())],[U8()])), ExternDecl('c', ValueType(Float32())) ]))], [ExternDecl('d', InstanceType([ @@ -436,7 +436,7 @@ def test_cabi(ct, expect): ]))] ), ModuleType( - [CoreImportDecl('','a.b: func u8 -> u8', CoreFuncType(['i32'],['i32']))], + [CoreImportDecl('','a.b: func(x: u8) -> u8', CoreFuncType(['i32'],['i32']))], [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), CoreExportDecl('cabi_start{cabi=0.1}: func(a.c: float32) -> (d.f: float64)', @@ -452,7 +452,7 @@ def test_cabi(ct, expect): ExternDecl('bar', FuncType([('x', U32()),('y', U32())],[U32()])) ])), ExternDecl('v1', ValueType(String()))], - [ExternDecl('baz', FuncType([String()], [String()])), + [ExternDecl('baz', FuncType([('s',String())], [String()])), ExternDecl('v2', ValueType(List(List(String()))))] ), ModuleType( @@ -462,7 +462,7 @@ def test_cabi(ct, expect): CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), CoreExportDecl('cabi_start{cabi=0.1}: func(v1: string) -> (v2: list>)', CoreFuncType(['i32','i32'],['i32'])), - CoreExportDecl('baz: func string -> string', CoreFuncType(['i32','i32'],['i32'])), + CoreExportDecl('baz: func(s: string) -> string', CoreFuncType(['i32','i32'],['i32'])), CoreExportDecl('cabi_post_baz', CoreFuncType(['i32'],[]))] ) ) From e7a22d817cd5338be5521fc9fa7de30e84e612d1 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 19 Aug 2022 14:57:48 -0500 Subject: [PATCH 140/301] Split import/export names into kebab-case name + optional URL fields --- design/mvp/Binary.md | 58 +++++---- design/mvp/Explainer.md | 260 +++++++++++++++++++++++++++++++--------- design/mvp/Subtyping.md | 1 + design/mvp/WIT.md | 8 +- 4 files changed, 243 insertions(+), 84 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 78f6a2c..ec8fcd2 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -55,8 +55,8 @@ Notes: ``` core:instance ::= ie: => (instance ie) core:instanceexpr ::= 0x00 m: arg*:vec() => (instantiate m arg*) - | 0x01 e*:vec() => e* -core:instantiatearg ::= n: 0x12 i: => (with n (instance i)) + | 0x01 e*:vec() => e* +core:instantiatearg ::= n: 0x12 i: => (with n (instance i)) core:sortidx ::= sort: idx: => (sort idx) core:sort ::= 0x00 => func | 0x01 => table @@ -65,11 +65,11 @@ core:sort ::= 0x00 => fu | 0x10 => type | 0x11 => module | 0x12 => instance -core:export ::= n: si: => (export n si) +core:inlineexport ::= n: si: => (export n si) instance ::= ie: => (instance ie) instanceexpr ::= 0x00 c: arg*:vec() => (instantiate c arg*) - | 0x01 e*:vec() => e* + | 0x01 e*:vec() => e* instantiatearg ::= n: si: => (with n si) sortidx ::= sort: idx: => (sort idx) sort ::= 0x00 cs: => core cs @@ -78,7 +78,12 @@ sort ::= 0x00 cs: => co | 0x03 => type | 0x04 => component | 0x05 => instance -export ::= n: si: => (export n si) +inlineexport ::= n: si: => (export n si) +name ::= len: n: => n (if len = |n|) +name-chars ::= w: => w + | n: 0x2d w: => n-w +word ::= w:[0x61-0x7a] x*:[0x30-0x39,0x61-0x7a]* => char(w)char(x)* + | W:[0x41-0x5a] X*:[0x30-0x39,0x41-0x5a]* => char(W)char(X)* ``` Notes: * Reused Core binary rules: [`core:name`], (variable-length encoded) [`core:u32`] @@ -92,6 +97,8 @@ Notes: for aliases (below). * Validation of `core:instantiatearg` initially only allows the `instance` sort, but would be extended to accept other sorts as core wasm is extended. +* Validation of `instantiate` requires that `name` is present in an + `externname` of `c` (with a matching type). * The indices in `sortidx` are validated according to their `sort`'s index spaces, which are built incrementally as each definition is validated. @@ -99,10 +106,10 @@ Notes: (See [Alias Definitions](Explainer.md#alias-definitions) in the explainer.) ``` -alias ::= s: t: => (alias t (s)) -aliastarget ::= 0x00 i: n: => export i n - | 0x01 i: n: => core export i n - | 0x02 ct: idx: => outer ct idx +alias ::= s: t: => (alias t (s)) +aliastarget ::= 0x00 i: n: => export i n + | 0x01 i: n: => core export i n + | 0x02 ct: idx: => outer ct idx ``` Notes: * Reused Core binary rules: (variable-length encoded) [`core:u32`] @@ -133,7 +140,7 @@ core:moduledecl ::= 0x00 i: => i core:alias ::= s: t: => (alias t (s)) core:aliastarget ::= 0x01 ct: idx: => outer ct idx core:importdecl ::= i: => i -core:exportdecl ::= n: d: => (export n d) +core:exportdecl ::= n: d: => (export n d) ``` Notes: * Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] @@ -175,12 +182,11 @@ defvaltype ::= pvt: => pvt | 0x6d n*:vec() => (enum n*) | 0x6c t*:vec() => (union t*) | 0x6b t: => (option t) - | 0x6a t?: u?: => (result t? (error u)?) + | 0x6a t?:? u?:? => (result t? (error u)?) namedvaltype ::= n: t: => n t -case ::= n: t?: 0x0 => (case n t?) - | n: t?: 0x1 i: => (case n t? (refines case-label[i])) -casetype ::= 0x00 => - | 0x01 t: => t +case ::= n: t?:? r?:? => (case n t? (refines case-label[r])?) +? ::= 0x00 => + | 0x01 t: => t valtype ::= i: => i | pvt: => pvt functype ::= 0x40 p*: r*: => (func (param p)* (result r)*) @@ -194,8 +200,8 @@ instancedecl ::= 0x00 t: => t | 0x01 t: => t | 0x02 a: => a | 0x04 ed: => ed -importdecl ::= n: ed: => (import n ed) -exportdecl ::= n: ed: => (export n ed) +importdecl ::= en: ed: => (import en ed) +exportdecl ::= en: ed: => (export en ed) externdesc ::= 0x00 0x11 i: => (core module (type i)) | 0x01 i: => (func (type i)) | 0x02 t: => (value t) @@ -214,6 +220,8 @@ Notes: * As described in the explainer, each component and instance type is validated with an initially-empty type index space. Outer aliases can be used to pull in type definitions from containing components. +* The uniqueness validation rules for `externname` described below are also + applied at the instance- and component-type level. * Validation of `externdesc` requires the various `typeidx` type constructors to match the preceding `sort`. * Validation of function parameter and result names, record field names, @@ -285,13 +293,21 @@ flags are set. (See [Import and Export Definitions](Explainer.md#import-and-export-definitions) in the explainer.) ``` -import ::= n: ed: => (import n ed) -export ::= n: si: => (export n si) +import ::= en: ed: => (import en ed) +export ::= en: si: => (export en si) +externname ::= n: u?:? => n u? +URL ::= b*:vec(byte) => char(b)*, if char(b)* parses as a URL ``` Notes: -* Validation requires all import and export `name`s are unique. +* The "parses as a URL" condition is defined by executing the [basic URL + parser] with `char(b)*` as *input*, no optional parameters and non-fatal + validation errors (which coincides with definition of `URL` in JS and `rust-url`). * Validation requires any exported `sortidx` to have a valid `externdesc` (which disallows core sorts other than `core module`). +* The `name` fields of `externname` must be unique among imports and exports, + respectively. The `URL` fields of `externname` (that are present) must + independently unique among imports and exports, respectively. +* URLs are compared for equality by plain byte identity. [`core:u32`]: https://webassembly.github.io/spec/core/binary/values.html#integers @@ -306,3 +322,5 @@ Notes: [type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md [module-linking]: https://github.com/WebAssembly/module-linking/blob/main/proposals/module-linking/Explainer.md + +[Basic URL Parser]: https://url.spec.whatwg.org/#concept-basic-url-parser diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ef543d9..bb71242 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -129,9 +129,9 @@ The syntax for defining a core module instance is: ``` core:instance ::= (instance ? ) core:instanceexpr ::= (instantiate *) - | * -core:instantiatearg ::= (with (instance )) - | (with (instance *)) + | * +core:instantiatearg ::= (with (instance )) + | (with (instance *)) core:sortidx ::= ( ) core:sort ::= func | table @@ -140,17 +140,17 @@ core:sort ::= func | type | module | instance -core:export ::= (export ) +core:inlineexport ::= (export ) ``` When instantiating a module via `instantiate`, the two-level imports of the core modules are resolved as follows: -1. The first `name` of the import is looked up in the named list of +1. The first `core:name` of the import is looked up in the named list of `core:instantiatearg` to select a core module instance. (In the future, other `core:sort`s could be allowed if core wasm adds single-level imports.) -2. The second `name` of the import is looked up in the named list of exports of - the core module instance found by the first step to select the imported - core definition. +2. The second `core:name` of the import is looked up in the named list of + exports of the core module instance found by the first step to select the + imported core definition. Each `core:sort` corresponds 1:1 with a distinct [index space] that contains only core definitions of that *sort*. The `u32` field of `core:sortidx` @@ -173,22 +173,23 @@ following component: To see examples of other sorts, we'll need `alias` definitions, which are introduced in the next section. -The `*` form of `core:instanceexpr` allows module instances to be -created by directly tupling together preceding definitions, without the need to -`instantiate` a helper module. The "inline" form of `*` inside -`(with ...)` is syntactic sugar that is expanded during text format parsing -into an out-of-line instance definition referenced by `with`. To show an -example of these, we'll also need the `alias` definitions introduced in the +The `*` form of `core:instanceexpr` allows module instances +to be created by directly tupling together preceding definitions, without the +need to `instantiate` a helper module. The `*` form of +`core:instantiatearg` is syntactic sugar that is expanded during text format +parsing into an out-of-line instance definition referenced by `with`. To show +an example of these, we'll also need the `alias` definitions introduced in the next section. The syntax for defining component instances is symmetric to core module -instances, but with an expanded component-level definition of `sort`: +instances, but with an expanded component-level definition of `sort` and +more restricted version of `name`: ``` instance ::= (instance ? ) instanceexpr ::= (instantiate *) - | * + | * instantiatearg ::= (with ) - | (with (instance *)) + | (with (instance *)) sortidx ::= ( ) sort ::= core | func @@ -196,7 +197,11 @@ sort ::= core | type | component | instance -export ::= (export ) +inlineexport ::= (export ) +name ::= + | - +word ::= [a-z][0-9a-z]* + | [A-Z][0-9A-Z]* ``` Because component-level function, type and instance definitions are different than core-level function, type and instance definitions, they are put into @@ -211,6 +216,21 @@ The `value` sort refers to a value that is provided and consumed during instantiation. How this works is described in the [start definitions](#start-definitions) section. +The component-level definition of `name` above corresponds to [kebab case]. The +reason for this particular form of casing is to unambiguously separate words +and acronyms (represented as all-caps words) so that source language bindings +can convert a `name` into the idiomatic casing of that language. (Indeed, +because hyphens are often invalid in identifiers, kebab case practically forces +language bindings to make such a conversion.) For example, the `name` `is-XML` +could be mapped to `isXML`, `IsXml` or `is_XML`, depending on the target +language. The highly-restricted character set ensures that capitalization is +trivial and does not require consulting Unicode tables. Having this structured +data encoded as a plain string provides a single canonical name for use in +tools and language-agnostic contexts, without requiring each to invent its own +custom interpretation. While the use of `name` above is mostly for internal +wiring, `name` is used in a number of productions below that are +developer-facing and imply bindings generation. + To see a non-trivial example of component instantiation, we'll first need to introduce a few other definitions below that allow components to import, define and export component functions. @@ -226,7 +246,7 @@ instance, the `core export` of a core module instance and a definition of an ``` alias ::= (alias ( ?)) aliastarget ::= export - | core export + | core export | outer ``` If present, the `id` of the alias is bound to the new index added by the alias @@ -358,8 +378,8 @@ core:moduledecl ::= | core:alias ::= (alias ( ?)) core:aliastarget ::= outer -core:importdecl ::= (import ) -core:exportdecl ::= (export ) +core:importdecl ::= (import ) +core:exportdecl ::= (export ) core:exportdesc ::= strip-id() where strip-id(X) parses '(' sort Y ')' when X parses '(' sort ? Y ')' @@ -467,8 +487,8 @@ instancedecl ::= core-prefix() | | | -importdecl ::= (import bind-id()) -exportdecl ::= (export ) +importdecl ::= (import bind-id()) +exportdecl ::= (export ) externdesc ::= ( (type ) ) | core-prefix() | @@ -560,10 +580,11 @@ core module declarators introduced above. As with core modules, `importdecl` and `exportdecl` classify component `import` and `export` definitions, with `importdecl` allowing an identifier to be -bound for use within the type. Following the precedent of [`core:typeuse`], the -text format allows both references to out-of-line type definitions (via -`(type )`) and inline type expressions that the text format desugars -into out-of-line type definitions. +bound for use within the type. The definition of `externname` is given in the +[imports and exports](#import-and-export-definitions) section below. Following +the precedent of [`core:typeuse`], the text format allows both references to +out-of-line type definitions (via `(type )`) and inline type +expressions that the text format desugars into out-of-line type definitions. The `value` case of `externdesc` describes a runtime value that is imported or exported at instantiation time as described in the @@ -803,41 +824,144 @@ of core linear memory. ### Import and Export Definitions -Lastly, imports and exports are defined in terms of the above as: +Lastly, imports and exports are defined as: ``` -import ::= -export ::= (export ) +import ::= (import bind-id()) +export ::= (export ) +externname ::= ? +``` +Components split the single externally-visible name of imports and exports into +two sub-fields: a kebab-case `name` (as defined [above](#instance-definitions)) +and a `URL` (defined by the [URL Standard], noting that, in this URL Standard, +the term "URL" subsumes what has historically been called a [URI], including +URLs that "identify" as opposed to "locate"). This subdivision of external +names allows component producers to represent a variety of intentions for how a +component is to be instantiated and executed so that a variety of hosts can +portably execute the component. + +The `name` field of `externname` is required to be unique. Thus, a single +`name` has been used in the preceding definitions of `with` and `alias` to +uniquely identify imports and exports. + +In guest source-code bindings, the `name` is meant to be translated to +source-language identifiers (applying case-conversion, as described +[above](#instance-definitions)) attached to whatever source-language constructs +represent the imports and exports (functions, globals, types, classes, etc). +For example, given an import in a component type: +``` +(import "one-two" (instance + (export "three-four" (func (param string) (result string))) +)) +``` +a Rust bindings generator for a component targeting this type could produce an +`extern crate one_two` containing the function `three_four`. Similarly, a +[JS Embedding](#js-embedding) could allow `import {threeFour} from 'one-two'` +to resolve to the imported function. Conversely, given an export in a component +type: +``` +(export "one-two" (instance + (export "three-four" (func (param string) (result string))) +)) +``` +a Rust bindings generator for a component with this export could produce a +trait `OneTwo` requiring a function `three_four` while the JS Embedding would +expect the JS module implementing this component type to export a function +`oneTwo` containing an object with a field `threeFour` containing a function. + +The `name` field can also be used by *host* source-code bindings, defining the +source-language identifiers that are to be used when instantiating a component +and accessing its exports. For example, the [JS API]'s +[`WebAssembly.instantiate()`] would use import `name`s in the [*read the +imports*] step and use export `name`s in the [*create an exports object*] step. + +The optional `URL` field of `externname` allows a component author to refer to +an *externally-defined* specification of what an import "wants" or what an +export has "implemented". One example is a URL naming a standard interface such +as `wasi:filesystem` (assuming that WASI registered the `wasi:` URI scheme with +IANA). Pre-standard, non-standard or proprietary interfaces could be referred +to by an `http:` URL in an interface registry. For imports, a URL could +alternatively refer to a *particular implementation* (e.g., at a hosted storage +location) or a *query* for a *set of possible implementations* (e.g., using the +query API of a public registry). Because of the wide variety of hosts executing +components, the Component Model doesn't specify how URLs are to be interpreted, +just that they are grammatically URLs. Even `http:` URLs may or may not be +literally fetched by the host (c.f. [import maps]). + +When present, `URL`s must *also* be unique (*in addition* the abovementioned +uniqueness of `name`s). Thus, a `URL` can *also* be used to uniquely identify +the subset of imports or exports that have `URL`s. + +While the `name` field is meant for source-code bindings generators, the `URL` +field is meant for automated interpretation by hosts and toolchains. In +particular, hosts are expected to identify their host-implemented imports and +host-called exports by `URL`, not `name`. This allows hosts to implement a +wide collection of independently-developed interfaces where `name`s are chosen +for developer ergonomics (and name collisions are handled independently in +the binding generators, which is needed in any case) and `URL`s serve as +the invariant identifier that concretely links the guest to host. If there was +only a `name`, interface authors would be forced to implicitly coordinate +across the ecosystem to avoid collisions (which in general, isn't possible) +while if there was only a `URL`, the developer-friendly identifiers would have +to be specified manually by every developer or derived in an ad hoc fashion +from the `URL`, whose contents may vary widely. This dual-name scheme is thus +proposed to resolve these competing requirements. + +Inside the component model, this dual-name scheme shows up in [subtyping](#Subtyping.md), +where the component subtyping simply ignores the `name` field when the `URL` +field is present. For example, the component: ``` -All import and export names within a component must be unique, respectively. - -With what's defined so far, we can write a component that imports, links and -exports other components: -```wasm (component - (import "c" (instance $c - (export "f" (func (result string))) - )) - (import "d" (component $D - (import "c" (instance $c - (export "f" (func (result string))) - )) - (export "g" (func (result string))) - )) - (instance $d1 (instantiate $D - (with "c" (instance $c)) - )) - (instance $d2 (instantiate $D - (with "c" (instance - (export "f" (func $d1 "g")) - )) + (import "fs" "wasi:filesystem" ...) +) +``` +can be supplied for the `x` import of the component: +``` +(component + (import "x" (component + (import "filesystem" "wasi:filesystem" ...) )) - (export "d2" (instance $d2)) ) ``` -Here, the imported component `d` is instantiated *twice*: first, with its -import satisfied by the imported instance `c`, and second, with its import -satisfied with the first instance of `d`. While this seems a little circular, -note that all definitions are acyclic as is the resulting instance graph. +because the `name`s are ignored and the `URL`s match. This subtyping is +symmetric to what was described above for hosts, allowing components to +serve as the "host" of other components, enabling [virtualization](examples/LinkTimeVirtualization.md). + +Since the concrete artifacts defining the host/guest interface is a collection +of [Wit files](WIT.md), Wit must naturally allow interface authors to specify +both the `name` and `URL` of component imports and exports. While the syntax is +still very much [in flux](https://github.com/WebAssembly/component-model/pull/83), +a hypothetical simplified interface between a guest and host might look like: +``` +// wasi:cli/Command +default world Command { + import fs: "wasi:filesystem" + import console: "wasi:cli/console" + export main: "wasi:cli/main" +} +``` +where `wasi:filesystem`, `wasi:log` and `wasi:main` are separately defined +interfaces that map to instance types. This "World" definition then maps to the +following component type: +``` +(component $Command + (import "fs" "wasi:filesystem" (instance ... filesystem function exports ...)) + (import "console" "wasi:cli/console" (instance ... log function exports ...)) + (export "main" "wasi:cli/main" (instance (export "main" (func ...)))) +) +``` +A component *targeting* `wasi:cli/Command` would thus need to be a *subtype* of +`$Command` (importing a subset of these imports and exporting a superset of +these exports) while a host *implementing* `wasi:cli/Command` would need to be +a *supertype* of `$Command` (offering a superset of these imports and expecting +to call a subset of these exports). + +Importantly, this `wasi:cli/Command` World has been able to define the short +developer-facing names like `fs` and `console` without worrying if there are +any other Worlds that conflict with these names. If a host wants to implement +`wasi:cli/Command` and some other World that also happens to pick `fs`, either +the `URL` fields are the same, and so the two imports can be unified, or the +`URL` fields are different, and the host supplies two distinct imports, +identified by `URL`. ## Component Invariants @@ -910,6 +1034,10 @@ of `WebAssembly.instantiate(Streaming)` would inherit the compound behavior of the abovementioned functions (again, using the `layer` field to eagerly distinguish between modules and components). +TODO: describe how kebab-names are mapped to JS identifiers + +TODO: describe how the fields can accept either a name or a URL (which are disjoint sets of strings) + For example, the following component: ```wasm ;; a.wasm @@ -1006,10 +1134,15 @@ Notes: ### ESM-integration -Like the JS API, [esm-integration] can be extended to load components in all +Like the JS API, [ESM-integration] can be extended to load components in all the same places where modules can be loaded today, branching on the `layer` field in the binary format to determine whether to decode as a module or a -component. The main question is how to deal with component imports having a +component. + +TODO: explain how `URL` field is used as module specifier, if present, falling +back to the `name` field, which can be implemented by [import maps] + +The main question is how to deal with component imports having a single string as well as the new importable component, module and instance types. Going through these one by one: @@ -1087,6 +1220,7 @@ and will be added over the coming months to complete the MVP proposal: [Index Space]: https://webassembly.github.io/spec/core/syntax/modules.html#indices [Abbreviations]: https://webassembly.github.io/spec/core/text/conventions.html#abbreviations +[`core:name`]: https://webassembly.github.io/spec/core/syntax/values.html#syntax-name [`core:module`]: https://webassembly.github.io/spec/core/text/modules.html#text-module [`core:type`]: https://webassembly.github.io/spec/core/text/modules.html#types [`core:importdesc`]: https://webassembly.github.io/spec/core/text/modules.html#text-importdesc @@ -1097,8 +1231,11 @@ and will be added over the coming months to complete the MVP proposal: [func-import-abbrev]: https://webassembly.github.io/spec/core/text/modules.html#text-func-abbrev [`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version +[`WebAssembly.instantiate()`]: https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/instantiate + [JS API]: https://webassembly.github.io/spec/js-api/index.html [*read the imports*]: https://webassembly.github.io/spec/js-api/index.html#read-the-imports +[*create the exports*]: https://webassembly.github.io/spec/js-api/index.html#create-an-exports-object [`ToJSValue`]: https://webassembly.github.io/spec/js-api/index.html#tojsvalue [`ToWebAssemblyValue`]: https://webassembly.github.io/spec/js-api/index.html#towebassemblyvalue [`USVString`]: https://webidl.spec.whatwg.org/#es-USVString @@ -1116,6 +1253,7 @@ and will be added over the coming months to complete the MVP proposal: [JS Tuple]: https://github.com/tc39/proposal-record-tuple [JS Record]: https://github.com/tc39/proposal-record-tuple +[Kebab Case]: https://en.wikipedia.org/wiki/Letter_case#Kebab_case [De Bruijn Index]: https://en.wikipedia.org/wiki/De_Bruijn_index [Closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming) [Empty Type]: https://en.wikipedia.org/w/index.php?title=Empty_type @@ -1131,6 +1269,10 @@ and will be added over the coming months to complete the MVP proposal: [Linear]: https://en.wikipedia.org/wiki/Substructural_type_system#Linear_type_systems [Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language +[URL Standard]: https://url.spec.whatwg.org +[URI]: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier +[Import Maps]: https://wicg.github.io/import-maps/ + [module-linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md [interface-types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md [type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md @@ -1142,6 +1284,8 @@ and will be added over the coming months to complete the MVP proposal: [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions [Canonical ABI]: CanonicalABI.md [Shared-Nothing]: ../high-level/Choices.md +[Use Cases]: ../high-level/UseCases.md +[Host Embeddings]: ../high-level/UseCases.md#hosts-embedding-components [`wizer`]: https://github.com/bytecodealliance/wizer diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index e6f86f7..2bcd52b 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -18,6 +18,7 @@ But roughly speaking: | `expected` | `T <: (expected T _)` | | `union` | `T <: (union ... T ...)` | | `func` | parameter names must match in order; contravariant parameter subtyping; superfluous parameters can be ignored in the subtype; `option` parameters can be ignored in the supertype; covariant result subtyping | +| `component` | all imports in the subtype must be present in the supertype with matching types; all exports in the supertype must be present in the subtype; the `URL` is treated as the complete name, when present, ignoring the `name` field | The remaining specialized value types inherit their subtyping from their fundamental value types. diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 276286e..96e794f 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -476,9 +476,8 @@ through a `use` statement or they can be defined locally. ## Identifiers Identifiers in `wit` can be defined with two different forms. The first is a -lower-case [stream-safe] [NFC] [kebab-case] identifier where each part delimited -by '-'s starts with a `XID_Start` scalar value with a zero Canonical Combining -Class: +[kebab-case] identifier defined by the [`name`](Explainer.md#instance-definitions) +production in the Component Model text format. ```wit foo: func(bar: u32) -> () @@ -500,9 +499,6 @@ prefixed with '%': ``` [kebab-case]: https://en.wikipedia.org/wiki/Letter_case#Kebab_case -[Unicode identifier]: http://www.unicode.org/reports/tr31/ -[stream-safe]: https://unicode.org/reports/tr15/#Stream_Safe_Text_Format -[NFC]: https://unicode.org/reports/tr15/#Norm_Forms ## Name resolution From b1050353982754979f69e14b6fe3dbc2b608ff67 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 22 Sep 2022 16:25:16 -0500 Subject: [PATCH 141/301] Fill in some JS Embedding details --- design/mvp/Explainer.md | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index bb71242..1386981 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -1022,22 +1022,30 @@ a 16-bit `layer` field with `0` for modules and `1` for components). Once compiled, a `WebAssembly.Component` could be instantiated using the existing JS API `WebAssembly.instantiate(Streaming)`. Since components have the -same basic import/export structure as modules, this mostly just means extending -the [*read the imports*] logic to support single-level imports as well as +same basic import/export structure as modules, this means extending the [*read +the imports*] logic to support single-level imports (of kebab-case component +import names converted to lowerCamelCase JavaScript identifiers) as well as imports of modules, components and instances. Since the results of instantiating a component is a record of JavaScript values, just like an instantiated module, `WebAssembly.instantiate` would always produce a -`WebAssembly.Instance` object for both module and component arguments. +`WebAssembly.Instance` object for both module and component arguments +(again, with kebab-case component export names converted to lowerCamelCase). + +Since the JavaScript embedding is generic, loading all component types, it +needs to allow the JS client to refer to either of the `name` or `URL` fields +of component `externname`s. On the import side, this means that, when a `URL` +is present, *read the imports* will first attempt to [`Get`] the `URL` and, on +failure, `Get` the `name`. On the export side, this means that *both* the +`name` and `URL` are exposed as exports in the export object (both holding the +same value). Since `name` and `URL` are necessarily disjoint sets of strings +(in particular, `URL`s must contain a `:`, `name` must not), there should not +be any conflicts in either of these cases. Lastly, when given a component binary, the compile-then-instantiate overloads of `WebAssembly.instantiate(Streaming)` would inherit the compound behavior of the abovementioned functions (again, using the `layer` field to eagerly distinguish between modules and components). -TODO: describe how kebab-names are mapped to JS identifiers - -TODO: describe how the fields can accept either a name or a URL (which are disjoint sets of strings) - For example, the following component: ```wasm ;; a.wasm @@ -1139,8 +1147,10 @@ the same places where modules can be loaded today, branching on the `layer` field in the binary format to determine whether to decode as a module or a component. -TODO: explain how `URL` field is used as module specifier, if present, falling -back to the `name` field, which can be implemented by [import maps] +When the `URL` field of an imported `externname` is present, the `URL` is +used as the module specifier, using the same resolution path as JS module. +Otherwise, the `name` field is used as the module specifier, which requires +[Import Maps] support to resolve to a `URL`. The main question is how to deal with component imports having a single string as well as the new importable component, module and instance @@ -1244,6 +1254,7 @@ and will be added over the coming months to complete the MVP proposal: [`enum`]: https://webidl.spec.whatwg.org/#es-enumeration [`T?`]: https://webidl.spec.whatwg.org/#es-nullable-type [`union`]: https://webidl.spec.whatwg.org/#es-union +[`Get`]: https://tc39.es/ecma262/#sec-get-o-p [JS NaN]: https://tc39.es/ecma262/#sec-ecmascript-language-types-number-type [Import Reflection]: https://github.com/tc39-transfer/proposal-import-reflection [Module Record]: https://tc39.es/ecma262/#sec-abstract-module-records From 50c0f0f58a377b2782ede04c229ebb7d1ab46e86 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 26 Sep 2022 12:26:41 -0500 Subject: [PATCH 142/301] Fix typo Co-authored-by: Liam Murphy --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 1386981..c36d807 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -865,7 +865,7 @@ type: ``` a Rust bindings generator for a component with this export could produce a trait `OneTwo` requiring a function `three_four` while the JS Embedding would -expect the JS module implementing this component type to export a function +expect the JS module implementing this component type to export a variable `oneTwo` containing an object with a field `threeFour` containing a function. The `name` field can also be used by *host* source-code bindings, defining the From 1f7d00964f4e02795afbf10f71982a23e5c80a1d Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 26 Sep 2022 12:29:19 -0500 Subject: [PATCH 143/301] Sync prose with code snippet --- design/mvp/Explainer.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index c36d807..0b8afe3 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -939,9 +939,9 @@ default world Command { export main: "wasi:cli/main" } ``` -where `wasi:filesystem`, `wasi:log` and `wasi:main` are separately defined -interfaces that map to instance types. This "World" definition then maps to the -following component type: +where `wasi:filesystem`, `wasi:cli/console` and `wasi:cli/main` are separately +defined interfaces that map to instance types. This "World" definition then +maps to the following component type: ``` (component $Command (import "fs" "wasi:filesystem" (instance ... filesystem function exports ...)) From eafc45f25049c16423fe7d9b06f8f86165307dd5 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 26 Sep 2022 16:37:35 -0500 Subject: [PATCH 144/301] Change 'implementing' to 'supporting' as word for what a host does with a world --- design/mvp/Explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 0b8afe3..d042f53 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -951,7 +951,7 @@ maps to the following component type: ``` A component *targeting* `wasi:cli/Command` would thus need to be a *subtype* of `$Command` (importing a subset of these imports and exporting a superset of -these exports) while a host *implementing* `wasi:cli/Command` would need to be +these exports) while a host *supporting* `wasi:cli/Command` would need to be a *supertype* of `$Command` (offering a superset of these imports and expecting to call a subset of these exports). From cbae8d783467fef8fc295d94fbcf8c6cb07e7a9c Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 27 Sep 2022 14:17:35 -0500 Subject: [PATCH 145/301] Fix RLBox link --- design/high-level/UseCases.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/high-level/UseCases.md b/design/high-level/UseCases.md index d0a7796..857e7cd 100644 --- a/design/high-level/UseCases.md +++ b/design/high-level/UseCases.md @@ -325,7 +325,7 @@ to call imports, which could break other components' single-threaded assumptions the imported function to have been explicitly `shared` and thus callable from any `fork`ed thread. -[RLBox]: https://docs.rlbox.dev/ +[RLBox]: https://rlbox.dev/ [Principle of Least Authority]: https://en.wikipedia.org/wiki/Principle_of_least_privilege [Modular Programming]: https://en.wikipedia.org/wiki/Modular_programming [start function]: https://webassembly.github.io/spec/core/intro/overview.html#semantic-phases From f70a67fd3f100d385c21d18032fa9fdf9dfe67c1 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Wed, 28 Sep 2022 07:26:16 -0700 Subject: [PATCH 146/301] Fix links to SharedEverythingDynamicLinking.md. Fix the link to SharedEverythingDynamicLinking.md, which was previously missing the word "dynamic". --- design/mvp/CanonicalABI.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index e4cc8d9..b04ef8a 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1422,7 +1422,7 @@ synthetic `cabi_start` function that is called immediately after instantiation. For imports (which in Core WebAssembly are [two-level]), the first-level name is set to be a zero-length string so that the entire rest of the first-level -string space is available for [shared-everything linking]. +string space is available for [shared-everything dynamic linking]. For imports and exports, the Canonical ABI assumes that `_` is not a valid character in a component-level import/export (as is currently the case in `wit` @@ -1596,7 +1596,7 @@ def lift_canonical_module(module: Module) -> Component: [Component Invariants]: Explainer.md#component-invariants [JavaScript Embedding]: Explainer.md#JavaScript-embedding [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions -[Shared-Everything Linking]: examples/SharedEverythingLinking.md +[Shared-Everything Dynamic Linking]: examples/SharedEverythingDynamicLinking.md [Administrative Instructions]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-instr-admin [Implementation Limits]: https://webassembly.github.io/spec/core/appendix/implementation.html From cc637249c6e0a6e2bcc69d1ea52d013e7a1daa7a Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 29 Sep 2022 16:07:37 -0500 Subject: [PATCH 147/301] Use 'https:' not 'http:' in example --- design/mvp/Explainer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index d042f53..21f47fb 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -879,12 +879,12 @@ an *externally-defined* specification of what an import "wants" or what an export has "implemented". One example is a URL naming a standard interface such as `wasi:filesystem` (assuming that WASI registered the `wasi:` URI scheme with IANA). Pre-standard, non-standard or proprietary interfaces could be referred -to by an `http:` URL in an interface registry. For imports, a URL could +to by an `https:` URL in an interface registry. For imports, a URL could alternatively refer to a *particular implementation* (e.g., at a hosted storage location) or a *query* for a *set of possible implementations* (e.g., using the query API of a public registry). Because of the wide variety of hosts executing components, the Component Model doesn't specify how URLs are to be interpreted, -just that they are grammatically URLs. Even `http:` URLs may or may not be +just that they are grammatically URLs. Even `https:` URLs may or may not be literally fetched by the host (c.f. [import maps]). When present, `URL`s must *also* be unique (*in addition* the abovementioned From e0cf9522693486759cc8ee5d1b0316b85da5e154 Mon Sep 17 00:00:00 2001 From: George Kulakowski Date: Mon, 3 Oct 2022 12:10:40 -0700 Subject: [PATCH 148/301] Remove `handle` as a keyword Owned/unique handles to resources are now just directly named, and this keyword is unused. --- design/mvp/WIT.md | 1 - 1 file changed, 1 deletion(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 7f3a179..e344cf9 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -94,7 +94,6 @@ keyword ::= 'use' | 's8' | 's16' | 's32' | 's64' | 'float32' | 'float64' | 'char' - | 'handle' | 'record' | 'enum' | 'flags' From f0994872bf1134ae51bc0db5efe0c8d0cf1159ae Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 6 Oct 2022 10:47:06 -0500 Subject: [PATCH 149/301] Back out Canonical ABI mangling scheme in preparation for future alternative --- design/mvp/CanonicalABI.md | 275 ------------------------ design/mvp/canonical-abi/definitions.py | 151 ------------- design/mvp/canonical-abi/run_tests.py | 93 -------- 3 files changed, 519 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index b04ef8a..330ded8 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1313,281 +1313,6 @@ the AOT compiler as requiring an intermediate copy to implement the above `lift`-then-`lower` semantics. -## Canonical ABI - -The above `canon` definitions are parameterized, giving each component a small -space of ABI options for interfacing with its contained core modules. Moreover, -each component can choose its ABI options independently of each other component, -with compiled adapter trampolines handling any conversions at cross-component -call boundaries. However, in some contexts, it is useful to fix a **single**, -"**canonical**" ABI that is fully determined by a given component type (which -itself is fully determined by a set of [`wit`](WIT.md) files). For example, -this allows existing Core WebAssembly toolchains to continue targeting [WASI] -by importing and exporting fixed Core Module functions signatures, without -having to add any new component-model concepts. - -To support these use cases, the following section defines two new mappings: -1. `canonical-module-type : componenttype -> core:moduletype` -2. `lift-canonical-module : core:module -> component` - -The `canonical-module-type` mapping defines the collection of core function -signatures that a core module must import and export to implement the given -component type via the Canonical ABI. - -The `lift-canonical-module` mapping defines the runtime behavior of a core -module that has successfully implemented `canonical-module-type` by fixing -a canonical set of ABI options that are passed to the above-defined `canon` -definitions. - -Together, these definitions are intended to satisfy the invariant: -``` -for all m : core:module and ct : componenttype: - module-type(m) = canonical-module-type(ct) implies ct = type-of(lift-canonical-module(m)) -``` -One consequence of this is that the canonical `core:moduletype` must encode -enough high-level type information for `lift-canonical-module` to be able to -reconstruct a working component. This is achieved using [name mangling]. Unlike -traditional C-family name mangling, which have a limited character set imposed -by linkers and aim to be space-efficient enough to support millions of -*internal* names, the Canonical ABI can use any valid UTF-8 string and only -needs to mangle *external* names, of which there will only be a handful. -Therefore, squeezing out every byte is a lower concern and so, for simplicity -and readability, type information is mangled using a subset of the -[`wit`](WIT.md) syntax. - -One final point of note is that `lift-canonical-module` is only able to produce -a *subset* of all possible components (e.g., not covering nesting and -virtualization scenarios); to express the full variety of components, a -toolchain needs to emit proper components directly. Thus, the Canonical ABI -serves as an incremental adoption path to the full component model, allowing -existing Core WebAssembly toolchains to produce simple components simply by -emitting module imports and exports with the appropriate mangled names (e.g., -in LLVM using the [`import_name`] and [`export_name`] attributes). - - -### Canonical Module Type - -For the same reason that core module and component [binaries](Binary.md) -include a version number (that is intended to never change after it reaches -1.0), the Canonical ABI defines its own version that is explicitly declared by -a core module. Before reaching stable 1.0, the Canonical ABI is explicitly -allowed to make breaking changes, so this version also serves the purpose of -coordinating breaking changes in pre-1.0 tools and runtimes. -```python -CABI_VERSION = '0.1' -``` -Working top-down, a canonical module type is defined by the following mapping: -```python -def canonical_module_type(ct: ComponentType) -> ModuleType: - start_params, import_funcs = mangle_instances(ct.imports) - start_results, export_funcs = mangle_instances(ct.exports) - - imports = [] - for name,ft in import_funcs: - flat_ft = flatten_functype(ft, 'lower') - imports.append(CoreImportDecl('', mangle_funcname(name, ft), flat_ft)) - - exports = [] - exports.append(CoreExportDecl('cabi_memory', CoreMemoryType(initial=0, maximum=None))) - exports.append(CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) - - start_ft = FuncType(start_params, start_results) - start_name = mangle_funcname('cabi_start{cabi=' + CABI_VERSION + '}', start_ft) - exports.append(CoreExportDecl(start_name, flatten_functype(start_ft, 'lift'))) - - for name,ft in export_funcs: - flat_ft = flatten_functype(ft, 'lift') - exports.append(CoreExportDecl(mangle_funcname(name, ft), flat_ft)) - if any(contains_dynamic_allocation(t) for t in ft.results): - exports.append(CoreExportDecl('cabi_post_' + name, CoreFuncType(flat_ft.results, []))) - - return ModuleType(imports, exports) - -def contains_dynamic_allocation(t): - match despecialize(t): - case String() : return True - case List(t) : return True - case Record(fields) : return any(contains_dynamic_allocation(f.t) for f in fields) - case Variant(cases) : return any(contains_dynamic_allocation(c.t) for c in cases) - case _ : return False -``` -This definition starts by mangling all nested instances into the names of the -leaf fields, so that instances can be subsequently ignored. Next, each -component-level function import/export is mapped to corresponding core function -import/export with the function type mangled into the name. Additionally, each -export whose return type implies possible dynamic allocation is given a -`post-return` function so that it can deallocate after the caller reads the -return value. Lastly, all value imports and exports are concatenated into a -synthetic `cabi_start` function that is called immediately after instantiation. - -For imports (which in Core WebAssembly are [two-level]), the first-level name -is set to be a zero-length string so that the entire rest of the first-level -string space is available for [shared-everything dynamic linking]. - -For imports and exports, the Canonical ABI assumes that `_` is not a valid -character in a component-level import/export (as is currently the case in `wit` -[identifiers](WIT.md#identifiers)) and thus can safely be used to prefix -auxiliary Canonical ABI-induced imports/exports. - -#### Instance type mangling - -Instance-type mangling recursively builds a dotted path string (of instance names) -that is included in the mangled core import/export name: -```python -def mangle_instances(xs, path = ''): - values = [] - funcs = [] - for x in xs: - name = path + x.name - match x.t: - case ValueType(t): - values.append( (name, t) ) - case FuncType(params,results): - funcs.append( (name, x.t) ) - case InstanceType(exports): - vs,fs = mangle_instances(exports, name + '.') - values += vs - funcs += fs - case TypeType(bounds): - assert(False) # TODO: resource types - case ComponentType(imports, exports): - assert(False) # TODO: `canon instantiate` - case ModuleType(imports, exports): - assert(False) # TODO: canonical shared-everything linking - return (values, funcs) -``` -The three `TODO` cases are intended to be filled in by future PRs extending -the Canonical ABI. - -#### Function type mangling - -Function types are mangled into [`wit`](WIT.md)-compatible syntax: -```python -def mangle_funcname(name, ft): - params = mangle_named_types(ft.params) - if len(ft.results) == 1 and isinstance(ft.results[0], ValType): - results = mangle_valtype(ft.results[0]) - else: - results = mangle_named_types(ft.results) - return f'{name}: func{params} -> {results}' - -def mangle_named_types(nts): - assert(all(type(nt) == tuple and len(nt) == 2 for nt in nts)) - mangled_elems = (nt[0] + ': ' + mangle_valtype(nt[1]) for nt in nts) - return '(' + ', '.join(mangled_elems) + ')' -``` - -#### Value type mangling - -Value types are similarly mangled into [`wit`](WIT.md)-compatible syntax, -recursively: - -``` -def mangle_valtype(t): - match t: - case Bool() : return 'bool' - case S8() : return 's8' - case U8() : return 'u8' - case S16() : return 's16' - case U16() : return 'u16' - case S32() : return 's32' - case U32() : return 'u32' - case S64() : return 's64' - case U64() : return 'u64' - case Float32() : return 'float32' - case Float64() : return 'float64' - case Char() : return 'char' - case String() : return 'string' - case List(t) : return 'list<' + mangle_valtype(t) + '>' - case Record(fields) : return mangle_recordtype(fields) - case Tuple(ts) : return mangle_tupletype(ts) - case Flags(labels) : return mangle_flags(labels) - case Variant(cases) : return mangle_varianttype(cases) - case Enum(labels) : return mangle_enumtype(labels) - case Union(ts) : return mangle_uniontype(ts) - case Option(t) : return mangle_optiontype(t) - case Result(ok,error) : return mangle_resulttype(ok,error) - -def mangle_recordtype(fields): - mangled_fields = (f.label + ': ' + mangle_valtype(f.t) for f in fields) - return 'record { ' + ', '.join(mangled_fields) + ' }' - -def mangle_tupletype(ts): - return 'tuple<' + ', '.join(mangle_valtype(t) for t in ts) + '>' - -def mangle_flags(labels): - return 'flags { ' + ', '.join(labels) + ' }' - -def mangle_varianttype(cases): - mangled_cases = ('{label}{payload}'.format( - label = c.label, - payload = '' if c.t is None else '(' + mangle_valtype(c.t) + ')') - for c in cases) - return 'variant { ' + ', '.join(mangled_cases) + ' }' - -def mangle_enumtype(labels): - return 'enum { ' + ', '.join(labels) + ' }' - -def mangle_uniontype(ts): - return 'union { ' + ', '.join(mangle_valtype(t) for t in ts) + ' }' - -def mangle_optiontype(t): - return 'option<' + mangle_valtype(t) + '>' - -def mangle_resulttype(ok, error): - match (ok, error): - case (None, None) : return 'result' - case (None, _) : return 'result<_, ' + mangle_valtype(error) + '>' - case (_, None) : return 'result<' + mangle_valtype(ok) + '>' - case (_, _) : return 'result<' + mangle_valtype(ok) + ', ' + mangle_valtype(error) + '>' -``` -As an example, given a component type: -```wasm -(component - (import "foo" (func)) - (import "a" (instance - (export "bar" (func (param "x" u32) (param "y" u32) (result u32))) - )) - (import "v1" (value string)) - (export "baz" (func (param "s" string) (result string))) - (export "v2" (value list>)) -) -``` -the `canonical_module_type` would be: -```wasm -(module - (import "" "foo: func() -> ()" (func)) - (import "" "a.bar: func(x: u32, y: u32) -> u32" (func param i32 i32) (result i32)) - (export "cabi_memory" (memory 0)) - (export "cabi_realloc" (func (param i32 i32 i32 i32) (result i32))) - (export "cabi_start{cabi=0.1}: func(v1: string) -> (v2: list>)" (func (param i32 i32) (result i32))) - (export "baz: func(s: string) -> string" (func (param i32 i32) (result i32))) - (export "cabi_post_baz" (func (param i32))) -) -``` - -### Lifting Canonical Modules - -TODO - -```python -class Module: - t: ModuleType - instantiate: Callable[typing.List[typing.Tuple[str,str,Value]], typing.List[typing.Tuple[str,Value]]] - -class Component: - t: ComponentType - instantiate: Callable[typing.List[typing.Tuple[str,any]], typing.List[typing.Tuple[str,any]]] - -def lift_canonical_module(module: Module) -> Component: - # TODO: define component.instantiate by: - # 1. creating canonical import adapters - # 2. creating a core module instance that imports (1) - # 3. creating canonical export adapters from the exports of (2) - pass -``` - - [Canonical Definitions]: Explainer.md#canonical-definitions [`canonopt`]: Explainer.md#canonical-definitions diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index cd2ff9b..805ad33 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -1040,154 +1040,3 @@ def canon_lower(caller_opts, caller_instance, callee, ft, flat_args): return flat_results -### Canonical Module Type - -CABI_VERSION = '0.1' - -# - -def canonical_module_type(ct: ComponentType) -> ModuleType: - start_params, import_funcs = mangle_instances(ct.imports) - start_results, export_funcs = mangle_instances(ct.exports) - - imports = [] - for name,ft in import_funcs: - flat_ft = flatten_functype(ft, 'lower') - imports.append(CoreImportDecl('', mangle_funcname(name, ft), flat_ft)) - - exports = [] - exports.append(CoreExportDecl('cabi_memory', CoreMemoryType(initial=0, maximum=None))) - exports.append(CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'], ['i32']))) - - start_ft = FuncType(start_params, start_results) - start_name = mangle_funcname('cabi_start{cabi=' + CABI_VERSION + '}', start_ft) - exports.append(CoreExportDecl(start_name, flatten_functype(start_ft, 'lift'))) - - for name,ft in export_funcs: - flat_ft = flatten_functype(ft, 'lift') - exports.append(CoreExportDecl(mangle_funcname(name, ft), flat_ft)) - if any(contains_dynamic_allocation(t) for t in ft.results): - exports.append(CoreExportDecl('cabi_post_' + name, CoreFuncType(flat_ft.results, []))) - - return ModuleType(imports, exports) - -def contains_dynamic_allocation(t): - match despecialize(t): - case String() : return True - case List(t) : return True - case Record(fields) : return any(contains_dynamic_allocation(f.t) for f in fields) - case Variant(cases) : return any(contains_dynamic_allocation(c.t) for c in cases) - case _ : return False - -# - -def mangle_instances(xs, path = ''): - values = [] - funcs = [] - for x in xs: - name = path + x.name - match x.t: - case ValueType(t): - values.append( (name, t) ) - case FuncType(params,results): - funcs.append( (name, x.t) ) - case InstanceType(exports): - vs,fs = mangle_instances(exports, name + '.') - values += vs - funcs += fs - case TypeType(bounds): - assert(False) # TODO: resource types - case ComponentType(imports, exports): - assert(False) # TODO: `canon instantiate` - case ModuleType(imports, exports): - assert(False) # TODO: canonical shared-everything linking - return (values, funcs) - -# - -def mangle_funcname(name, ft): - params = mangle_named_types(ft.params) - if len(ft.results) == 1 and isinstance(ft.results[0], ValType): - results = mangle_valtype(ft.results[0]) - else: - results = mangle_named_types(ft.results) - return f'{name}: func{params} -> {results}' - -def mangle_named_types(nts): - assert(all(type(nt) == tuple and len(nt) == 2 for nt in nts)) - mangled_elems = (nt[0] + ': ' + mangle_valtype(nt[1]) for nt in nts) - return '(' + ', '.join(mangled_elems) + ')' - -# - -def mangle_valtype(t): - match t: - case Bool() : return 'bool' - case S8() : return 's8' - case U8() : return 'u8' - case S16() : return 's16' - case U16() : return 'u16' - case S32() : return 's32' - case U32() : return 'u32' - case S64() : return 's64' - case U64() : return 'u64' - case Float32() : return 'float32' - case Float64() : return 'float64' - case Char() : return 'char' - case String() : return 'string' - case List(t) : return 'list<' + mangle_valtype(t) + '>' - case Record(fields) : return mangle_recordtype(fields) - case Tuple(ts) : return mangle_tupletype(ts) - case Flags(labels) : return mangle_flags(labels) - case Variant(cases) : return mangle_varianttype(cases) - case Enum(labels) : return mangle_enumtype(labels) - case Union(ts) : return mangle_uniontype(ts) - case Option(t) : return mangle_optiontype(t) - case Result(ok,error) : return mangle_resulttype(ok,error) - -def mangle_recordtype(fields): - mangled_fields = (f.label + ': ' + mangle_valtype(f.t) for f in fields) - return 'record { ' + ', '.join(mangled_fields) + ' }' - -def mangle_tupletype(ts): - return 'tuple<' + ', '.join(mangle_valtype(t) for t in ts) + '>' - -def mangle_flags(labels): - return 'flags { ' + ', '.join(labels) + ' }' - -def mangle_varianttype(cases): - mangled_cases = ('{label}{payload}'.format( - label = c.label, - payload = '' if c.t is None else '(' + mangle_valtype(c.t) + ')') - for c in cases) - return 'variant { ' + ', '.join(mangled_cases) + ' }' - -def mangle_enumtype(labels): - return 'enum { ' + ', '.join(labels) + ' }' - -def mangle_uniontype(ts): - return 'union { ' + ', '.join(mangle_valtype(t) for t in ts) + ' }' - -def mangle_optiontype(t): - return 'option<' + mangle_valtype(t) + '>' - -def mangle_resulttype(ok, error): - match (ok, error): - case (None, None) : return 'result' - case (None, _) : return 'result<_, ' + mangle_valtype(error) + '>' - case (_, None) : return 'result<' + mangle_valtype(ok) + '>' - case (_, _) : return 'result<' + mangle_valtype(ok) + ', ' + mangle_valtype(error) + '>' - - -## Lifting Canonical Modules - -class Module: - t: ModuleType - instantiate: Callable[typing.List[typing.Tuple[str,str,Value]], typing.List[typing.Tuple[str,Value]]] - -class Component: - t: ComponentType - instantiate: Callable[typing.List[typing.Tuple[str,any]], typing.List[typing.Tuple[str,any]]] - -def lift_canonical_module(module: Module) -> Component: - pass # TODO diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index fd81173..ce351fb 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -374,97 +374,4 @@ def test_roundtrip(t, v): test_roundtrip(List(List(String())), [[mk_str("one"),mk_str("two")],[mk_str("three")]]) test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':mk_tup(mk_str("answer"),42)}]) -def test_mangle_functype(params, results, expect): - ft = FuncType(params, results) - got = mangle_funcname('x', ft) - expect = 'x: ' + expect - if got != expect: - fail("test_mangle_func() got:\n {}\nexpected:\n {}".format(got, expect)) - -test_mangle_functype([('x',U8())], [U8()], 'func(x: u8) -> u8') -test_mangle_functype([('x',U8())], [], 'func(x: u8) -> ()') -test_mangle_functype([], [U8()], 'func() -> u8') -test_mangle_functype([('x',U8())], [('y',U8())], 'func(x: u8) -> (y: u8)') -test_mangle_functype([('a',Bool()),('b',U8()),('c',S16()),('d',U32()),('e',S64())], - [('a',S8()),('b',U16()),('c',S32()),('d',U64())], - 'func(a: bool, b: u8, c: s16, d: u32, e: s64) -> (a: s8, b: u16, c: s32, d: u64)') -test_mangle_functype([('l',List(List(String())))], [], - 'func(l: list>) -> ()') -test_mangle_functype([('r',Record([Field('x',Record([Field('y',String())])),Field('z',U32())]))], [], - 'func(r: record { x: record { y: string }, z: u32 }) -> ()') -test_mangle_functype([('t',Tuple([U8()]))], [Tuple([U8(),U8()])], - 'func(t: tuple) -> tuple') -test_mangle_functype([('f',Flags(['a','b']))], [Enum(['a','b'])], - 'func(f: flags { a, b }) -> enum { a, b }') -test_mangle_functype([('v',Variant([Case('a',None),Case('b',U8())]))], [Union([U8(),List(String())])], - 'func(v: variant { a, b(u8) }) -> union { u8, list }') -test_mangle_functype([('o',Option(Bool()))],[Option(List(U8()))], - 'func(o: option) -> option>') -test_mangle_functype([], [('a',Result(None,None)),('b',Result(U8(),None)),('c',Result(None,U8()))], - 'func() -> (a: result, b: result, c: result<_, u8>)') - -def test_cabi(ct, expect): - got = canonical_module_type(ct) - if got != expect: - fail("test_cabi() got:\n {}\nexpected:\n {}".format(got, expect)) - -test_cabi( - ComponentType( - [ExternDecl('a', FuncType([('x',U8())],[U8()])), - ExternDecl('b', ValueType(String()))], - [ExternDecl('c', FuncType([('x',S8())],[S8()])), - ExternDecl('d', ValueType(List(U8())))] - ), - ModuleType( - [CoreImportDecl('','a: func(x: u8) -> u8', CoreFuncType(['i32'],['i32']))], - [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), - CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func(b: string) -> (d: list)', - CoreFuncType(['i32','i32'],['i32'])), - CoreExportDecl('c: func(x: s8) -> s8', CoreFuncType(['i32'],['i32']))] - ) -) -test_cabi( - ComponentType( - [ExternDecl('a', InstanceType([ - ExternDecl('b', FuncType([('x',U8())],[U8()])), - ExternDecl('c', ValueType(Float32())) - ]))], - [ExternDecl('d', InstanceType([ - ExternDecl('e', FuncType([], [List(String())])), - ExternDecl('f', ValueType(Float64())) - ]))] - ), - ModuleType( - [CoreImportDecl('','a.b: func(x: u8) -> u8', CoreFuncType(['i32'],['i32']))], - [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), - CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func(a.c: float32) -> (d.f: float64)', - CoreFuncType(['f32'],['f64'])), - CoreExportDecl('d.e: func() -> list', CoreFuncType([],['i32'])), - CoreExportDecl('cabi_post_d.e', CoreFuncType(['i32'],[]))] - ) -) -test_cabi( # from CanonicalABI.md - ComponentType( - [ExternDecl('foo', FuncType([],[])), - ExternDecl('a', InstanceType([ - ExternDecl('bar', FuncType([('x', U32()),('y', U32())],[U32()])) - ])), - ExternDecl('v1', ValueType(String()))], - [ExternDecl('baz', FuncType([('s',String())], [String()])), - ExternDecl('v2', ValueType(List(List(String()))))] - ), - ModuleType( - [CoreImportDecl('','foo: func() -> ()', CoreFuncType([],[])), - CoreImportDecl('','a.bar: func(x: u32, y: u32) -> u32', CoreFuncType(['i32','i32'],['i32']))], - [CoreExportDecl('cabi_memory', CoreMemoryType(0, None)), - CoreExportDecl('cabi_realloc', CoreFuncType(['i32','i32','i32','i32'],['i32'])), - CoreExportDecl('cabi_start{cabi=0.1}: func(v1: string) -> (v2: list>)', - CoreFuncType(['i32','i32'],['i32'])), - CoreExportDecl('baz: func(s: string) -> string', CoreFuncType(['i32','i32'],['i32'])), - CoreExportDecl('cabi_post_baz', CoreFuncType(['i32'],[]))] - ) -) - print("All tests passed") From 30720ab704199bc829ddf7782dcfed30f909f9d1 Mon Sep 17 00:00:00 2001 From: Brian H Date: Mon, 10 Oct 2022 13:27:26 -0400 Subject: [PATCH 150/301] WIT Syntax: interface Signed-off-by: Brian H --- design/mvp/WIT.md | 71 ++++++++++++++++++++++++++++++----------------- 1 file changed, 45 insertions(+), 26 deletions(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 9906aed..4582dee 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -32,6 +32,7 @@ token ::= whitespace | operator | keyword | identifier + | strlit ``` Whitespace and comments are ignored when parsing structures defined elsewhere @@ -119,6 +120,50 @@ A `wit` document is a sequence of items specified at the top level. These items come one after another and it's recommended to separate them with newlines for readability but this isn't required. +Concretely, the structure of a `wit` document is: +``` +wit-document ::= interface-item* +``` + +## Item: `interface` + +Interfaces can be defined in a `wit` document. Interfaces have a name and a sequence of items and functions. + +```wit +interface example { + thunk: func() -> () + fibonacci: func(n: u32) -> u32 +} +``` + +Specifically interfaces have the structure: + +```wit +interface-item ::= 'interface' id strlit? '{' interface-items* '}' + +interface-items ::= resource-item + | variant-items + | record-item + | union-items + | flags-items + | enum-items + | type-item + | use-item + | func-item + +func-item ::= id ':' 'func' param-list '->' result-list + +param-list ::= '(' named-type-list ')' + +result-list ::= ty + | '(' named-type-list ') + +named-type-list ::= nil + | named-type ( ',' named-type )* + +named-type ::= id ':' ty +``` + ## Item: `use` A `use` statement enables importing type or resource definitions from other @@ -331,32 +376,6 @@ union-cases ::= ty, | ty ',' union-cases? ``` -## Item: `func` - -Functions can also be defined in a `wit` document. Functions have a name, -parameters, and results. - -```wit -thunk: func() -> () -fibonacci: func(n: u32) -> u32 -``` - -Specifically functions have the structure: - -```wit -func-item ::= id ':' 'func' param-list '->' result-list - -param-list ::= '(' named-type-list ')' - -result-list ::= ty - | '(' named-type-list ') - -named-type-list ::= nil - | named-type ( ',' named-type )* - -named-type ::= id ':' ty -``` - ## Item: `resource` Resources represent a value that has a hidden representation not known to the From 07308915a03a5e04220284da7b5b39d2d99c1b73 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 31 Oct 2022 15:07:36 -0500 Subject: [PATCH 151/301] Add extra Canonical ABI note and variant clarification Resolves #119 --- design/mvp/CanonicalABI.md | 21 +++++++++++---------- design/mvp/Explainer.md | 7 +++++++ 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 330ded8..adc73d9 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -116,9 +116,10 @@ def alignment_record(fields): ``` As an optimization, `variant` discriminants are represented by the smallest integer -covering the number of cases in the variant. Depending on the payload type, -this can allow more compact representations of variants in memory. This smallest -integer type is selected by the following function, used above and below: +covering the number of cases in the variant (with cases numbered in order from +`0` to `len(cases)-1`). Depending on the payload type, this can allow more +compact representations of variants in memory. This smallest integer type is +selected by the following function, used above and below: ```python def alignment_variant(cases): return max(alignment(discriminant_type(cases)), max_case_alignment(cases)) @@ -366,13 +367,13 @@ guaranteed to be a no-op on the first iteration because the record as a whole starts out aligned (as asserted at the top of `load`). Variants are loaded using the order of the cases in the type to determine the -case index. To support the subtyping allowed by `refines`, a lifted variant -value semantically includes a full ordered list of its `refines` case -labels so that the lowering code (defined below) can search this list to find a -case label it knows about. While the code below appears to perform case-label -lookup at runtime, a normal implementation can build the appropriate index -tables at compile-time so that variant-passing is always O(1) and not involving -string operations. +case index, assigning `0` to the first case, `1` to the next case, etc. To +support the subtyping allowed by `refines`, a lifted variant value semantically +includes a full ordered list of its `refines` case labels so that the lowering +code (defined below) can search this list to find a case label it knows about. +While the code below appears to perform case-label lookup at runtime, a normal +implementation can build the appropriate index tables at compile-time so that +variant-passing is always O(1) and not involving string operations. ```python def load_variant(opts, ptr, cases): disc_size = size(discriminant_type(cases)) diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index ca7ad7f..359261e 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -530,6 +530,13 @@ subtyping. In particular, a `variant` subtype can contain a `case` not present in the supertype if the subtype's `case` `refines` (directly or transitively) some `case` in the supertype. +How these abstract values are produced and consumed from Core WebAssembly +values and linear memory is configured by the component via *canonical lifting +and lowering definitions*, which are introduced [below](#canonical-definitions). +For example, while abstract `variant`s contain a list of `case`s labelled by +name, canonical lifting and lowering map each case to an `i32` value starting +at `0`. + The sets of values allowed for the remaining *specialized value types* are defined by the following mapping: ``` From 9d0a345d8e041162efbc06ec6c0046752dae9170 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 8 Nov 2022 08:01:54 -0800 Subject: [PATCH 152/301] Add a custom `name` section specification This is intended to assist with debugging/reading/writing the text format of components by avoiding the need to have everything be numbers and allowing tools to annotate names of items optionally (or have the names preserved from the text format). Closes #14 --- design/mvp/Binary.md | 48 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index ccf22f9..a93de32 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -325,3 +325,51 @@ Notes: [module-linking]: https://github.com/WebAssembly/module-linking/blob/main/proposals/module-linking/Explainer.md [Basic URL Parser]: https://url.spec.whatwg.org/#concept-basic-url-parser + +## Name Section + +Like the core wasm [name +section](https://webassembly.github.io/spec/core/appendix/custom.html#name-section) +a similar `name` custom section is specified here for components to be able to +name all the declarations that can happen within a component. Similarly like its +core wasm counterpart validity of this custom section is not required and +engines should not reject components which have an invalid `name` section. + +``` +namesec ::= section_0(namedata) +namedata ::= n: (if n = 'name') + sections*:* +subsection ::= 0x00 0x00 funcs: + 0x00 0x01 tables: + 0x00 0x02 memories: + 0x00 0x03 globals: + 0x00 0x10 types: + 0x00 0x11 modules: + 0x00 0x12 instances: + 0x01 funcs: + 0x02 values: + 0x03 types: + 0x04 components: + 0x05 instances: + +corefuncsubsec ::= map: +coretablesubsec ::= map: +corememorysubsec ::= map: +coreglobalsubsec ::= map: +coretypesubsec ::= map: +coremodulesubsec ::= map: +coreinstancesubsec ::= map: + +funcsubsec ::= map: +valuesubsec ::= map: +typesubsec ::= map: +componentsubsec ::= map: +instancesubsec ::= map: + +namemap ::= names:vec() +nameassoc ::= idx: name: +``` + +where `namemap` is the same as for core wasm. A particular `sort` should only +appear once within a `name` section, for example component instances can only be +named once. From 8e94d38581d767416453b4af39fb386fbb5aede9 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 8 Nov 2022 08:04:28 -0800 Subject: [PATCH 153/301] Simplify with just a `sort:` --- design/mvp/Binary.md | 27 +-------------------------- 1 file changed, 1 insertion(+), 26 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index a93de32..e766962 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -339,32 +339,7 @@ engines should not reject components which have an invalid `name` section. namesec ::= section_0(namedata) namedata ::= n: (if n = 'name') sections*:* -subsection ::= 0x00 0x00 funcs: - 0x00 0x01 tables: - 0x00 0x02 memories: - 0x00 0x03 globals: - 0x00 0x10 types: - 0x00 0x11 modules: - 0x00 0x12 instances: - 0x01 funcs: - 0x02 values: - 0x03 types: - 0x04 components: - 0x05 instances: - -corefuncsubsec ::= map: -coretablesubsec ::= map: -corememorysubsec ::= map: -coreglobalsubsec ::= map: -coretypesubsec ::= map: -coremodulesubsec ::= map: -coreinstancesubsec ::= map: - -funcsubsec ::= map: -valuesubsec ::= map: -typesubsec ::= map: -componentsubsec ::= map: -instancesubsec ::= map: +subsection ::= sort: names: namemap ::= names:vec() nameassoc ::= idx: name: From 2200307653a2816eb0b2799650417fa5e56b43f1 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 8 Nov 2022 08:06:33 -0800 Subject: [PATCH 154/301] Have the size of each section listed --- design/mvp/Binary.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index e766962..1421788 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -339,7 +339,8 @@ engines should not reject components which have an invalid `name` section. namesec ::= section_0(namedata) namedata ::= n: (if n = 'name') sections*:* -subsection ::= sort: names: +subsection ::= sort: namesubsection() +namesubsection(B) ::= size: B (if size == |B|) namemap ::= names:vec() nameassoc ::= idx: name: From 85166b93e94c8e23117d97e3e4fcb862009b9c91 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 8 Nov 2022 08:12:34 -0800 Subject: [PATCH 155/301] Add a component name subsection --- design/mvp/Binary.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 1421788..280d3ba 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -338,9 +338,13 @@ engines should not reject components which have an invalid `name` section. ``` namesec ::= section_0(namedata) namedata ::= n: (if n = 'name') - sections*:* -subsection ::= sort: namesubsection() -namesubsection(B) ::= size: B (if size == |B|) + name:? + decls*:* +namesubsection_N(B) ::= N: byte size: B (if size == |B|) + +componentnamesubsec ::= namesubsection_0() +declnamesubsec ::= namesubsection_1() +declnames ::= sort: names: namemap ::= names:vec() nameassoc ::= idx: name: From 7b9c9a9ae354c044cfe3b6ae3d63e57a1d450cf8 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 8 Nov 2022 08:22:41 -0800 Subject: [PATCH 156/301] Give components a unique custom section name Avoid confusion with the core encoding. --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 280d3ba..c25d86a 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -337,7 +337,7 @@ engines should not reject components which have an invalid `name` section. ``` namesec ::= section_0(namedata) -namedata ::= n: (if n = 'name') +namedata ::= n: (if n = 'component-name') name:? decls*:* namesubsection_N(B) ::= N: byte size: B (if size == |B|) From f281506cd905c50c7df7c75691dc820844a8ce52 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 8 Nov 2022 08:26:51 -0800 Subject: [PATCH 157/301] Fix some formatting --- design/mvp/Binary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index c25d86a..c940fb3 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -340,7 +340,7 @@ namesec ::= section_0(namedata) namedata ::= n: (if n = 'component-name') name:? decls*:* -namesubsection_N(B) ::= N: byte size: B (if size == |B|) +namesubsection_N(B) ::= N: size: B (if size == |B|) componentnamesubsec ::= namesubsection_0() declnamesubsec ::= namesubsection_1() From 983f01c87d84561c2e05900b1791b9ee54e88a34 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 8 Nov 2022 10:46:20 -0800 Subject: [PATCH 158/301] Review comments --- design/mvp/Binary.md | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index c940fb3..0fc460d 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -310,22 +310,6 @@ Notes: independently unique among imports and exports, respectively. * URLs are compared for equality by plain byte identity. - -[`core:u32`]: https://webassembly.github.io/spec/core/binary/values.html#integers -[`core:section`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-section -[`core:custom`]: https://webassembly.github.io/spec/core/binary/modules.html#custom-section -[`core:module`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-module -[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version -[`core:name`]: https://webassembly.github.io/spec/core/binary/values.html#binary-name -[`core:import`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-import -[`core:importdesc`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-importdesc -[`core:functype`]: https://webassembly.github.io/spec/core/binary/types.html#binary-functype - -[type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md -[module-linking]: https://github.com/WebAssembly/module-linking/blob/main/proposals/module-linking/Explainer.md - -[Basic URL Parser]: https://url.spec.whatwg.org/#concept-basic-url-parser - ## Name Section Like the core wasm [name @@ -339,12 +323,12 @@ engines should not reject components which have an invalid `name` section. namesec ::= section_0(namedata) namedata ::= n: (if n = 'component-name') name:? - decls*:* + sortnames*:* namesubsection_N(B) ::= N: size: B (if size == |B|) componentnamesubsec ::= namesubsection_0() -declnamesubsec ::= namesubsection_1() -declnames ::= sort: names: +sortnamesubsec ::= namesubsection_1() +sortnames ::= sort: names: namemap ::= names:vec() nameassoc ::= idx: name: @@ -353,3 +337,19 @@ nameassoc ::= idx: name: where `namemap` is the same as for core wasm. A particular `sort` should only appear once within a `name` section, for example component instances can only be named once. + + +[`core:u32`]: https://webassembly.github.io/spec/core/binary/values.html#integers +[`core:section`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-section +[`core:custom`]: https://webassembly.github.io/spec/core/binary/modules.html#custom-section +[`core:module`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-module +[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version +[`core:name`]: https://webassembly.github.io/spec/core/binary/values.html#binary-name +[`core:import`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-import +[`core:importdesc`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-importdesc +[`core:functype`]: https://webassembly.github.io/spec/core/binary/types.html#binary-functype + +[type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md +[module-linking]: https://github.com/WebAssembly/module-linking/blob/main/proposals/module-linking/Explainer.md + +[Basic URL Parser]: https://url.spec.whatwg.org/#concept-basic-url-parser From 8a5669d4b6e534f3cd70a4b90f482ab94e4e891d Mon Sep 17 00:00:00 2001 From: Brian H Date: Thu, 4 Aug 2022 14:07:13 -0400 Subject: [PATCH 159/301] Document proposed `WIT` to enable component Worlds This PR sketches out the initial format for WebAssembly profiles, i.e. world files. Signed-off-by: Brian H --- design/mvp/WIT.md | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 4582dee..667ced5 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -112,6 +112,9 @@ keyword ::= 'use' | 'tuple' | 'future' | 'stream' + | 'world' + | 'import' + | 'export' ``` ## Top-level items @@ -122,20 +125,32 @@ readability but this isn't required. Concretely, the structure of a `wit` document is: ``` -wit-document ::= interface-item* +wit-document ::= (interface-item | world-item)* ``` -## Item: `interface` +## Item: `world` -Interfaces can be defined in a `wit` document. Interfaces have a name and a sequence of items and functions. +Worlds define a [componenttype](https://github.com/WebAssembly/component-model/blob/main/design/mvp/Explainer.md#type-definitions) as a collection of imports and exports. + +Concretely, the structure of a world is: ```wit -interface example { - thunk: func() -> () - fibonacci: func(n: u32) -> u32 -} +world-item ::= 'world' id '{' world-items* '}' + +world-items ::= export-item | import-item + +export-item ::= 'export' id ':' extern-type +import-item ::= 'import' id ':' extern-type + +extern-type ::= ty | func-type | interface-type + +interface-type ::= 'interface' '{' interface-items* '}' ``` +## Item: `interface` + +Interfaces can be defined in a `wit` document. Interfaces have a name and a sequence of items and functions. + Specifically interfaces have the structure: ```wit @@ -151,7 +166,9 @@ interface-items ::= resource-item | use-item | func-item -func-item ::= id ':' 'func' param-list '->' result-list +func-item ::= id ':' func-type + +func-type ::= 'func' param-list '->' result-list param-list ::= '(' named-type-list ')' From a9f4989112923eefcf94c6ad206252e89adfe555 Mon Sep 17 00:00:00 2001 From: Patrick Huber <182398+patrickhuber@users.noreply.github.com> Date: Tue, 22 Nov 2022 12:18:30 -0500 Subject: [PATCH 160/301] Update WIT.md adds closing single quote --- design/mvp/WIT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index 4582dee..d6a9564 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -156,7 +156,7 @@ func-item ::= id ':' 'func' param-list '->' result-list param-list ::= '(' named-type-list ')' result-list ::= ty - | '(' named-type-list ') + | '(' named-type-list ')' named-type-list ::= nil | named-type ( ',' named-type )* From 7005516668a46a5faa05c3661b4f601d248bffc5 Mon Sep 17 00:00:00 2001 From: Patrick Huber <182398+patrickhuber@users.noreply.github.com> Date: Tue, 22 Nov 2022 16:20:49 -0500 Subject: [PATCH 161/301] Update WIT.md see issue #132 --- design/mvp/WIT.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/design/mvp/WIT.md b/design/mvp/WIT.md index d6a9564..d7f09dd 100644 --- a/design/mvp/WIT.md +++ b/design/mvp/WIT.md @@ -273,7 +273,7 @@ Specifically the structure of this is: ```wit flags-items ::= 'flags' id '{' flags-fields '}' -flags-fields ::= id, +flags-fields ::= id | id ',' flags-fields? ``` @@ -302,7 +302,7 @@ Specifically the structure of this is: ```wit variant-items ::= 'variant' id '{' variant-cases '}' -variant-cases ::= variant-case, +variant-cases ::= variant-case | variant-case ',' variant-cases? variant-case ::= id @@ -341,7 +341,7 @@ Specifically the structure of this is: ```wit enum-items ::= 'enum' id '{' enum-cases '}' -enum-cases ::= id, +enum-cases ::= id | id ',' enum-cases? ``` @@ -372,7 +372,7 @@ Specifically the structure of this is: ```wit union-items ::= 'union' id '{' union-cases '}' -union-cases ::= ty, +union-cases ::= ty | ty ',' union-cases? ``` From 6417d2c933cd37dd263beba0bc69c060b67bcf01 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Wed, 30 Nov 2022 18:32:55 -0600 Subject: [PATCH 162/301] Wrap the 'opts' param in a 'cx' param in preparation for resource types additions --- design/mvp/CanonicalABI.md | 417 ++++++++++++------------ design/mvp/canonical-abi/definitions.py | 381 +++++++++++----------- design/mvp/canonical-abi/run_tests.py | 67 ++-- 3 files changed, 437 insertions(+), 428 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index adc73d9..2c1d6dd 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -204,51 +204,63 @@ def num_i32_flags(labels): return math.ceil(len(labels) / 32) ``` +### Context -### Loading - -The `load` function defines how to read a value of a given value type `t` -out of linear memory starting at offset `ptr`, returning the value represented -as a Python value. The `Opts`/`opts` class/parameter contains the -[`canonopt`] immediates supplied as part of `canon lift`/`canon lower`. -Presenting the definition of `load` piecewise, we start with the top-level case -analysis: +The subsequent definitions of loading and storing a value from linear memory +require additional context, which is threaded through most subsequent +definitions via the `cx` parameter: ```python -class Opts: - string_encoding: str +class CanonicalOptions: memory: bytearray + string_encoding: str realloc: Callable[[int,int,int,int],int] post_return: Callable[[],None] -def load(opts, ptr, t): +class Context: + opts: CanonicalOptions +``` +Going through the fields of `Context`: + +The `opts` field represents the [`canonopt`] values supplied to +currently-executing `canon lift` or `canon lower`. + +(Others will be added shortly.) + +### Loading + +The `load` function defines how to read a value of a given value type `t` +out of linear memory starting at offset `ptr`, returning the value represented +as a Python value. Presenting the definition of `load` piecewise, we start with +the top-level case analysis: +```python +def load(cx, ptr, t): assert(ptr == align_to(ptr, alignment(t))) - assert(ptr + size(t) <= len(opts.memory)) + assert(ptr + size(t) <= len(cx.opts.memory)) match despecialize(t): - case Bool() : return convert_int_to_bool(load_int(opts, ptr, 1)) - case U8() : return load_int(opts, ptr, 1) - case U16() : return load_int(opts, ptr, 2) - case U32() : return load_int(opts, ptr, 4) - case U64() : return load_int(opts, ptr, 8) - case S8() : return load_int(opts, ptr, 1, signed=True) - case S16() : return load_int(opts, ptr, 2, signed=True) - case S32() : return load_int(opts, ptr, 4, signed=True) - case S64() : return load_int(opts, ptr, 8, signed=True) - case Float32() : return canonicalize32(reinterpret_i32_as_float(load_int(opts, ptr, 4))) - case Float64() : return canonicalize64(reinterpret_i64_as_float(load_int(opts, ptr, 8))) - case Char() : return i32_to_char(opts, load_int(opts, ptr, 4)) - case String() : return load_string(opts, ptr) - case List(t) : return load_list(opts, ptr, t) - case Record(fields) : return load_record(opts, ptr, fields) - case Variant(cases) : return load_variant(opts, ptr, cases) - case Flags(labels) : return load_flags(opts, ptr, labels) + case Bool() : return convert_int_to_bool(load_int(cx, ptr, 1)) + case U8() : return load_int(cx, ptr, 1) + case U16() : return load_int(cx, ptr, 2) + case U32() : return load_int(cx, ptr, 4) + case U64() : return load_int(cx, ptr, 8) + case S8() : return load_int(cx, ptr, 1, signed=True) + case S16() : return load_int(cx, ptr, 2, signed=True) + case S32() : return load_int(cx, ptr, 4, signed=True) + case S64() : return load_int(cx, ptr, 8, signed=True) + case Float32() : return canonicalize32(reinterpret_i32_as_float(load_int(cx, ptr, 4))) + case Float64() : return canonicalize64(reinterpret_i64_as_float(load_int(cx, ptr, 8))) + case Char() : return i32_to_char(cx, load_int(cx, ptr, 4)) + case String() : return load_string(cx, ptr) + case List(t) : return load_list(cx, ptr, t) + case Record(fields) : return load_record(cx, ptr, fields) + case Variant(cases) : return load_variant(cx, ptr, cases) + case Flags(labels) : return load_flags(cx, ptr, labels) ``` Integers are loaded directly from memory, with their high-order bit interpreted according to the signedness of the type. ```python -def load_int(opts, ptr, nbytes, signed = False): - trap_if(ptr + nbytes > len(opts.memory)) - return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) +def load_int(cx, ptr, nbytes, signed = False): + return int.from_bytes(cx.opts.memory[ptr : ptr+nbytes], 'little', signed=signed) ``` Integer-to-boolean conversions treats `0` as `false` and all other bit-patterns @@ -287,7 +299,7 @@ An `i32` is converted to a `char` (a [Unicode Scalar Value]) by dynamically testing that its unsigned integral value is in the valid [Unicode Code Point] range and not a [Surrogate]: ```python -def i32_to_char(opts, i): +def i32_to_char(cx, i): trap_if(i >= 0x110000) trap_if(0xD800 <= i <= 0xDFFF) return chr(i) @@ -303,15 +315,15 @@ allocation size choices in many cases. Thus, the value produced by `load_string` isn't simply a Python `str`, but a *tuple* containing a `str`, the original encoding and the original byte length. ```python -def load_string(opts, ptr): - begin = load_int(opts, ptr, 4) - tagged_code_units = load_int(opts, ptr + 4, 4) - return load_string_from_range(opts, begin, tagged_code_units) +def load_string(cx, ptr): + begin = load_int(cx, ptr, 4) + tagged_code_units = load_int(cx, ptr + 4, 4) + return load_string_from_range(cx, begin, tagged_code_units) UTF16_TAG = 1 << 31 -def load_string_from_range(opts, ptr, tagged_code_units): - match opts.string_encoding: +def load_string_from_range(cx, ptr, tagged_code_units): + match cx.opts.string_encoding: case 'utf8': alignment = 1 byte_length = tagged_code_units @@ -330,35 +342,35 @@ def load_string_from_range(opts, ptr, tagged_code_units): encoding = 'latin-1' trap_if(ptr != align_to(ptr, alignment)) - trap_if(ptr + byte_length > len(opts.memory)) + trap_if(ptr + byte_length > len(cx.opts.memory)) try: - s = opts.memory[ptr : ptr+byte_length].decode(encoding) + s = cx.opts.memory[ptr : ptr+byte_length].decode(encoding) except UnicodeError: trap() - return (s, opts.string_encoding, tagged_code_units) + return (s, cx.opts.string_encoding, tagged_code_units) ``` Lists and records are loaded by recursively loading their elements/fields: ```python -def load_list(opts, ptr, elem_type): - begin = load_int(opts, ptr, 4) - length = load_int(opts, ptr + 4, 4) - return load_list_from_range(opts, begin, length, elem_type) +def load_list(cx, ptr, elem_type): + begin = load_int(cx, ptr, 4) + length = load_int(cx, ptr + 4, 4) + return load_list_from_range(cx, begin, length, elem_type) -def load_list_from_range(opts, ptr, length, elem_type): +def load_list_from_range(cx, ptr, length, elem_type): trap_if(ptr != align_to(ptr, alignment(elem_type))) - trap_if(ptr + length * size(elem_type) > len(opts.memory)) + trap_if(ptr + length * size(elem_type) > len(cx.opts.memory)) a = [] for i in range(length): - a.append(load(opts, ptr + i * size(elem_type), elem_type)) + a.append(load(cx, ptr + i * size(elem_type), elem_type)) return a -def load_record(opts, ptr, fields): +def load_record(cx, ptr, fields): record = {} for field in fields: ptr = align_to(ptr, alignment(field.t)) - record[field.label] = load(opts, ptr, field.t) + record[field.label] = load(cx, ptr, field.t) ptr += size(field.t) return record ``` @@ -375,9 +387,9 @@ While the code below appears to perform case-label lookup at runtime, a normal implementation can build the appropriate index tables at compile-time so that variant-passing is always O(1) and not involving string operations. ```python -def load_variant(opts, ptr, cases): +def load_variant(cx, ptr, cases): disc_size = size(discriminant_type(cases)) - case_index = load_int(opts, ptr, disc_size) + case_index = load_int(cx, ptr, disc_size) ptr += disc_size trap_if(case_index >= len(cases)) c = cases[case_index] @@ -385,7 +397,7 @@ def load_variant(opts, ptr, cases): case_label = case_label_with_refinements(c, cases) if c.t is None: return { case_label: None } - return { case_label: load(opts, ptr, c.t) } + return { case_label: load(cx, ptr, c.t) } def case_label_with_refinements(c, cases): label = c.label @@ -406,8 +418,8 @@ Finally, flags are converted from a bit-vector to a dictionary whose keys are derived from the ordered labels of the `flags` type. The code here takes advantage of Python's support for integers of arbitrary width. ```python -def load_flags(opts, ptr, labels): - i = load_int(opts, ptr, size_flags(labels)) +def load_flags(cx, ptr, labels): + i = load_int(cx, ptr, size_flags(labels)) return unpack_flags_from_int(i, labels) def unpack_flags_from_int(i, labels): @@ -424,27 +436,27 @@ The `store` function defines how to write a value `v` of a given value type `t` into linear memory starting at offset `ptr`. Presenting the definition of `store` piecewise, we start with the top-level case analysis: ```python -def store(opts, v, t, ptr): +def store(cx, v, t, ptr): assert(ptr == align_to(ptr, alignment(t))) - assert(ptr + size(t) <= len(opts.memory)) + assert(ptr + size(t) <= len(cx.opts.memory)) match despecialize(t): - case Bool() : store_int(opts, int(bool(v)), ptr, 1) - case U8() : store_int(opts, v, ptr, 1) - case U16() : store_int(opts, v, ptr, 2) - case U32() : store_int(opts, v, ptr, 4) - case U64() : store_int(opts, v, ptr, 8) - case S8() : store_int(opts, v, ptr, 1, signed=True) - case S16() : store_int(opts, v, ptr, 2, signed=True) - case S32() : store_int(opts, v, ptr, 4, signed=True) - case S64() : store_int(opts, v, ptr, 8, signed=True) - case Float32() : store_int(opts, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4) - case Float64() : store_int(opts, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8) - case Char() : store_int(opts, char_to_i32(v), ptr, 4) - case String() : store_string(opts, v, ptr) - case List(t) : store_list(opts, v, ptr, t) - case Record(fields) : store_record(opts, v, ptr, fields) - case Variant(cases) : store_variant(opts, v, ptr, cases) - case Flags(labels) : store_flags(opts, v, ptr, labels) + case Bool() : store_int(cx, int(bool(v)), ptr, 1) + case U8() : store_int(cx, v, ptr, 1) + case U16() : store_int(cx, v, ptr, 2) + case U32() : store_int(cx, v, ptr, 4) + case U64() : store_int(cx, v, ptr, 8) + case S8() : store_int(cx, v, ptr, 1, signed=True) + case S16() : store_int(cx, v, ptr, 2, signed=True) + case S32() : store_int(cx, v, ptr, 4, signed=True) + case S64() : store_int(cx, v, ptr, 8, signed=True) + case Float32() : store_int(cx, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4) + case Float64() : store_int(cx, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8) + case Char() : store_int(cx, char_to_i32(v), ptr, 4) + case String() : store_string(cx, v, ptr) + case List(t) : store_list(cx, v, ptr, t) + case Record(fields) : store_record(cx, v, ptr, fields) + case Variant(cases) : store_variant(cx, v, ptr, cases) + case Flags(labels) : store_flags(cx, v, ptr, labels) ``` Integers are stored directly into memory. Because the input domain is exactly @@ -452,9 +464,8 @@ the integers in range for the given type, no extra range checks are necessary; the `signed` parameter is only present to ensure that the internal range checks of `int.to_bytes` are satisfied. ```python -def store_int(opts, v, ptr, nbytes, signed = False): - trap_if(ptr + nbytes > len(opts.memory)) - opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) +def store_int(cx, v, ptr, nbytes, signed = False): + cx.opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) ``` Floats are stored directly into memory (in the case of NaNs, using the @@ -496,12 +507,12 @@ We start with a case analysis to enumerate all the meaningful encoding combinations, subdividing the `latin1+utf16` encoding into either `latin1` or `utf16` based on the `UTF16_BIT` flag set by `load_string`: ```python -def store_string(opts, v, ptr): - begin, tagged_code_units = store_string_into_range(opts, v) - store_int(opts, begin, ptr, 4) - store_int(opts, tagged_code_units, ptr + 4, 4) +def store_string(cx, v, ptr): + begin, tagged_code_units = store_string_into_range(cx, v) + store_int(cx, begin, ptr, 4) + store_int(cx, tagged_code_units, ptr + 4, 4) -def store_string_into_range(opts, v): +def store_string_into_range(cx, v): src, src_encoding, src_tagged_code_units = v if src_encoding == 'latin1+utf16': @@ -515,25 +526,25 @@ def store_string_into_range(opts, v): src_simple_encoding = src_encoding src_code_units = src_tagged_code_units - match opts.string_encoding: + match cx.opts.string_encoding: case 'utf8': match src_simple_encoding: - case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 1, 'utf-8') - case 'utf16' : return store_utf16_to_utf8(opts, src, src_code_units) - case 'latin1' : return store_latin1_to_utf8(opts, src, src_code_units) + case 'utf8' : return store_string_copy(cx, src, src_code_units, 1, 1, 'utf-8') + case 'utf16' : return store_utf16_to_utf8(cx, src, src_code_units) + case 'latin1' : return store_latin1_to_utf8(cx, src, src_code_units) case 'utf16': match src_simple_encoding: - case 'utf8' : return store_utf8_to_utf16(opts, src, src_code_units) - case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') - case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') + case 'utf8' : return store_utf8_to_utf16(cx, src, src_code_units) + case 'utf16' : return store_string_copy(cx, src, src_code_units, 2, 2, 'utf-16-le') + case 'latin1' : return store_string_copy(cx, src, src_code_units, 2, 2, 'utf-16-le') case 'latin1+utf16': match src_encoding: - case 'utf8' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) - case 'utf16' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) + case 'utf8' : return store_string_to_latin1_or_utf16(cx, src, src_code_units) + case 'utf16' : return store_string_to_latin1_or_utf16(cx, src, src_code_units) case 'latin1+utf16' : match src_simple_encoding: - case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 2, 'latin-1') - case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units) + case 'latin1' : return store_string_copy(cx, src, src_code_units, 1, 2, 'latin-1') + case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(cx, src, src_code_units) ``` The simplest 4 cases above can compute the exact destination size and then copy @@ -542,15 +553,15 @@ byte after every Latin-1 byte). ```python MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 -def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_alignment, dst_encoding): +def store_string_copy(cx, src, src_code_units, dst_code_unit_size, dst_alignment, dst_encoding): dst_byte_length = dst_code_unit_size * src_code_units trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, dst_alignment, dst_byte_length) + ptr = cx.opts.realloc(0, 0, dst_alignment, dst_byte_length) trap_if(ptr != align_to(ptr, dst_alignment)) - trap_if(ptr + dst_byte_length > len(opts.memory)) + trap_if(ptr + dst_byte_length > len(cx.opts.memory)) encoded = src.encode(dst_encoding) assert(dst_byte_length == len(encoded)) - opts.memory[ptr : ptr+len(encoded)] = encoded + cx.opts.memory[ptr : ptr+len(encoded)] = encoded return (ptr, src_code_units) ``` The choice of `MAX_STRING_BYTE_LENGTH` constant ensures that the high bit of a @@ -561,29 +572,29 @@ optimistically assuming that each code unit of the source string fits in a single UTF-8 byte and then, failing that, reallocates to a worst-case size, finishes the copy, and then finishes with a shrinking reallocation. ```python -def store_utf16_to_utf8(opts, src, src_code_units): +def store_utf16_to_utf8(cx, src, src_code_units): worst_case_size = src_code_units * 3 - return store_string_to_utf8(opts, src, src_code_units, worst_case_size) + return store_string_to_utf8(cx, src, src_code_units, worst_case_size) -def store_latin1_to_utf8(opts, src, src_code_units): +def store_latin1_to_utf8(cx, src, src_code_units): worst_case_size = src_code_units * 2 - return store_string_to_utf8(opts, src, src_code_units, worst_case_size) + return store_string_to_utf8(cx, src, src_code_units, worst_case_size) -def store_string_to_utf8(opts, src, src_code_units, worst_case_size): +def store_string_to_utf8(cx, src, src_code_units, worst_case_size): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, src_code_units) - trap_if(ptr + src_code_units > len(opts.memory)) + ptr = cx.opts.realloc(0, 0, 1, src_code_units) + trap_if(ptr + src_code_units > len(cx.opts.memory)) encoded = src.encode('utf-8') assert(src_code_units <= len(encoded)) - opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] + cx.opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] if src_code_units < len(encoded): trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size) - trap_if(ptr + worst_case_size > len(opts.memory)) - opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] + ptr = cx.opts.realloc(ptr, src_code_units, 1, worst_case_size) + trap_if(ptr + worst_case_size > len(cx.opts.memory)) + cx.opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] if worst_case_size > len(encoded): - ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) - trap_if(ptr + len(encoded) > len(opts.memory)) + ptr = cx.opts.realloc(ptr, worst_case_size, 1, len(encoded)) + trap_if(ptr + len(encoded) > len(cx.opts.memory)) return (ptr, len(encoded)) ``` @@ -592,18 +603,18 @@ Converting from UTF-8 to UTF-16 performs an initial worst-case size allocation two-byte UTF-16 code unit) and then does a shrinking reallocation at the end if multiple UTF-8 bytes were collapsed into a single 2-byte UTF-16 code unit: ```python -def store_utf8_to_utf16(opts, src, src_code_units): +def store_utf8_to_utf16(cx, src, src_code_units): worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 2, worst_case_size) + ptr = cx.opts.realloc(0, 0, 2, worst_case_size) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + worst_case_size > len(opts.memory)) + trap_if(ptr + worst_case_size > len(cx.opts.memory)) encoded = src.encode('utf-16-le') - opts.memory[ptr : ptr+len(encoded)] = encoded + cx.opts.memory[ptr : ptr+len(encoded)] = encoded if len(encoded) < worst_case_size: - ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + ptr = cx.opts.realloc(ptr, worst_case_size, 2, len(encoded)) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + len(encoded) > len(opts.memory)) + trap_if(ptr + len(encoded) > len(cx.opts.memory)) code_units = int(len(encoded) / 2) return (ptr, code_units) ``` @@ -617,37 +628,37 @@ previously-copied Latin-1 bytes are inflated *in place*, inserting a 0 byte after every Latin-1 byte (iterating in reverse to avoid clobbering later bytes): ```python -def store_string_to_latin1_or_utf16(opts, src, src_code_units): +def store_string_to_latin1_or_utf16(cx, src, src_code_units): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 2, src_code_units) + ptr = cx.opts.realloc(0, 0, 2, src_code_units) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + src_code_units > len(opts.memory)) + trap_if(ptr + src_code_units > len(cx.opts.memory)) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): - opts.memory[ptr + dst_byte_length] = ord(usv) + cx.opts.memory[ptr + dst_byte_length] = ord(usv) dst_byte_length += 1 else: worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size) + ptr = cx.opts.realloc(ptr, src_code_units, 2, worst_case_size) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + worst_case_size > len(opts.memory)) + trap_if(ptr + worst_case_size > len(cx.opts.memory)) for j in range(dst_byte_length-1, -1, -1): - opts.memory[ptr + 2*j] = opts.memory[ptr + j] - opts.memory[ptr + 2*j + 1] = 0 + cx.opts.memory[ptr + 2*j] = cx.opts.memory[ptr + j] + cx.opts.memory[ptr + 2*j + 1] = 0 encoded = src.encode('utf-16-le') - opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] + cx.opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] if worst_case_size > len(encoded): - ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + ptr = cx.opts.realloc(ptr, worst_case_size, 2, len(encoded)) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + len(encoded) > len(opts.memory)) + trap_if(ptr + len(encoded) > len(cx.opts.memory)) tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) if dst_byte_length < src_code_units: - ptr = opts.realloc(ptr, src_code_units, 2, dst_byte_length) + ptr = cx.opts.realloc(ptr, src_code_units, 2, dst_byte_length) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + dst_byte_length > len(opts.memory)) + trap_if(ptr + dst_byte_length > len(cx.opts.memory)) return (ptr, dst_byte_length) ``` @@ -662,22 +673,22 @@ are all using `latin1+utf16` and *one* component over-uses UTF-16, other components can recover the Latin-1 compression. (The Latin-1 check can be inexpensively fused with the UTF-16 validate+copy loop.) ```python -def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): +def store_probably_utf16_to_latin1_or_utf16(cx, src, src_code_units): src_byte_length = 2 * src_code_units trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 2, src_byte_length) + ptr = cx.opts.realloc(0, 0, 2, src_byte_length) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + src_byte_length > len(opts.memory)) + trap_if(ptr + src_byte_length > len(cx.opts.memory)) encoded = src.encode('utf-16-le') - opts.memory[ptr : ptr+len(encoded)] = encoded + cx.opts.memory[ptr : ptr+len(encoded)] = encoded if any(ord(c) >= (1 << 8) for c in src): tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) latin1_size = int(len(encoded) / 2) for i in range(latin1_size): - opts.memory[ptr + i] = opts.memory[ptr + 2*i] - ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size) - trap_if(ptr + latin1_size > len(opts.memory)) + cx.opts.memory[ptr + i] = cx.opts.memory[ptr + 2*i] + ptr = cx.opts.realloc(ptr, src_byte_length, 1, latin1_size) + trap_if(ptr + latin1_size > len(cx.opts.memory)) return (ptr, latin1_size) ``` @@ -686,25 +697,25 @@ are symmetric to the loading functions. Unlike strings, lists can simply allocate based on the up-front knowledge of length and static element size. ```python -def store_list(opts, v, ptr, elem_type): - begin, length = store_list_into_range(opts, v, elem_type) - store_int(opts, begin, ptr, 4) - store_int(opts, length, ptr + 4, 4) +def store_list(cx, v, ptr, elem_type): + begin, length = store_list_into_range(cx, v, elem_type) + store_int(cx, begin, ptr, 4) + store_int(cx, length, ptr + 4, 4) -def store_list_into_range(opts, v, elem_type): +def store_list_into_range(cx, v, elem_type): byte_length = len(v) * size(elem_type) trap_if(byte_length >= (1 << 32)) - ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) + ptr = cx.opts.realloc(0, 0, alignment(elem_type), byte_length) trap_if(ptr != align_to(ptr, alignment(elem_type))) - trap_if(ptr + byte_length > len(opts.memory)) + trap_if(ptr + byte_length > len(cx.opts.memory)) for i,e in enumerate(v): - store(opts, e, elem_type, ptr + i * size(elem_type)) + store(cx, e, elem_type, ptr + i * size(elem_type)) return (ptr, len(v)) -def store_record(opts, v, ptr, fields): +def store_record(cx, v, ptr, fields): for f in fields: ptr = align_to(ptr, alignment(f.t)) - store(opts, v[f.label], f.t, ptr) + store(cx, v[f.label], f.t, ptr) ptr += size(f.t) ``` @@ -715,15 +726,15 @@ matching, a normal implementation can statically fuse `store_variant` with its matching `load_variant` to ultimately build a dense array that maps producer's case indices to the consumer's case indices. ```python -def store_variant(opts, v, ptr, cases): +def store_variant(cx, v, ptr, cases): case_index, case_value = match_case(v, cases) disc_size = size(discriminant_type(cases)) - store_int(opts, case_index, ptr, disc_size) + store_int(cx, case_index, ptr, disc_size) ptr += disc_size ptr = align_to(ptr, max_case_alignment(cases)) c = cases[case_index] if c.t is not None: - store(opts, case_value, c.t, ptr) + store(cx, case_value, c.t, ptr) def match_case(v, cases): assert(len(v.keys()) == 1) @@ -742,9 +753,9 @@ statically fused into array/integer operations (with a simple byte copy when the case lists are the same) to avoid any string operations in a similar manner to variants. ```python -def store_flags(opts, v, ptr, labels): +def store_flags(cx, v, ptr, labels): i = pack_flags_into_int(v, labels) - store_int(opts, i, ptr, size_flags(labels)) + store_int(cx, i, ptr, size_flags(labels)) def pack_flags_into_int(v, labels): i = 0 @@ -882,7 +893,7 @@ class ValueIter: assert(v.t == t) return v.v -def lift_flat(opts, vi, t): +def lift_flat(cx, vi, t): match despecialize(t): case Bool() : return convert_int_to_bool(vi.next('i32')) case U8() : return lift_flat_unsigned(vi, 32, 8) @@ -895,11 +906,11 @@ def lift_flat(opts, vi, t): case S64() : return lift_flat_signed(vi, 64, 64) case Float32() : return canonicalize32(vi.next('f32')) case Float64() : return canonicalize64(vi.next('f64')) - case Char() : return i32_to_char(opts, vi.next('i32')) - case String() : return lift_flat_string(opts, vi) - case List(t) : return lift_flat_list(opts, vi, t) - case Record(fields) : return lift_flat_record(opts, vi, fields) - case Variant(cases) : return lift_flat_variant(opts, vi, cases) + case Char() : return i32_to_char(cx, vi.next('i32')) + case String() : return lift_flat_string(cx, vi) + case List(t) : return lift_flat_list(cx, vi, t) + case Record(fields) : return lift_flat_record(cx, vi, fields) + case Variant(cases) : return lift_flat_variant(cx, vi, cases) case Flags(labels) : return lift_flat_flags(vi, labels) ``` @@ -929,23 +940,23 @@ types is essentially the same as loading them from memory; the only difference is that the pointer and length come from `i32` values instead of from linear memory: ```python -def lift_flat_string(opts, vi): +def lift_flat_string(cx, vi): ptr = vi.next('i32') packed_length = vi.next('i32') - return load_string_from_range(opts, ptr, packed_length) + return load_string_from_range(cx, ptr, packed_length) -def lift_flat_list(opts, vi, elem_type): +def lift_flat_list(cx, vi, elem_type): ptr = vi.next('i32') length = vi.next('i32') - return load_list_from_range(opts, ptr, length, elem_type) + return load_list_from_range(cx, ptr, length, elem_type) ``` Records are lifted by recursively lifting their fields: ```python -def lift_flat_record(opts, vi, fields): +def lift_flat_record(cx, vi, fields): record = {} for f in fields: - record[f.label] = lift_flat(opts, vi, f.t) + record[f.label] = lift_flat(cx, vi, f.t) return record ``` @@ -956,7 +967,7 @@ performed by `flatten_variant`, we need a more-permissive value iterator that reinterprets between the different types appropriately and also traps if the high bits of an `i64` are set for a 32-bit type: ```python -def lift_flat_variant(opts, vi, cases): +def lift_flat_variant(cx, vi, cases): flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') case_index = vi.next('i32') @@ -975,7 +986,7 @@ def lift_flat_variant(opts, vi, cases): if c.t is None: v = None else: - v = lift_flat(opts, CoerceValueIter(), c.t) + v = lift_flat(cx, CoerceValueIter(), c.t) for have in flat_types: _ = vi.next(have) return { case_label_with_refinements(c, cases): v } @@ -1004,7 +1015,7 @@ The `lower_flat` function defines how to convert a value `v` of a given type `t` into zero or more core values. Presenting the definition of `lower_flat` piecewise, we start with the top-level case analysis: ```python -def lower_flat(opts, v, t): +def lower_flat(cx, v, t): match despecialize(t): case Bool() : return [Value('i32', int(v))] case U8() : return [Value('i32', v)] @@ -1018,10 +1029,10 @@ def lower_flat(opts, v, t): case Float32() : return [Value('f32', canonicalize32(v))] case Float64() : return [Value('f64', canonicalize64(v))] case Char() : return [Value('i32', char_to_i32(v))] - case String() : return lower_flat_string(opts, v) - case List(t) : return lower_flat_list(opts, v, t) - case Record(fields) : return lower_flat_record(opts, v, fields) - case Variant(cases) : return lower_flat_variant(opts, v, cases) + case String() : return lower_flat_string(cx, v) + case List(t) : return lower_flat_list(cx, v, t) + case Record(fields) : return lower_flat_record(cx, v, fields) + case Variant(cases) : return lower_flat_variant(cx, v, cases) case Flags(labels) : return lower_flat_flags(v, labels) ``` @@ -1041,21 +1052,21 @@ Since strings and lists are stored in linear memory, lifting can reuse the previous definitions; only the resulting pointers are returned differently (as `i32` values instead of as a pair in linear memory): ```python -def lower_flat_string(opts, v): - ptr, packed_length = store_string_into_range(opts, v) +def lower_flat_string(cx, v): + ptr, packed_length = store_string_into_range(cx, v) return [Value('i32', ptr), Value('i32', packed_length)] -def lower_flat_list(opts, v, elem_type): - (ptr, length) = store_list_into_range(opts, v, elem_type) +def lower_flat_list(cx, v, elem_type): + (ptr, length) = store_list_into_range(cx, v, elem_type) return [Value('i32', ptr), Value('i32', length)] ``` Records are lowered by recursively lowering their fields: ```python -def lower_flat_record(opts, v, fields): +def lower_flat_record(cx, v, fields): flat = [] for f in fields: - flat += lower_flat(opts, v[f.label], f.t) + flat += lower_flat(cx, v[f.label], f.t) return flat ``` @@ -1063,7 +1074,7 @@ Variants are also lowered recursively. Symmetric to `lift_flat_variant` above, `lower_flat_variant` must consume all flattened types of `flatten_variant`, manually coercing the otherwise-incompatible type pairings allowed by `join`: ```python -def lower_flat_variant(opts, v, cases): +def lower_flat_variant(cx, v, cases): case_index, case_value = match_case(v, cases) flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') @@ -1071,7 +1082,7 @@ def lower_flat_variant(opts, v, cases): if c.t is None: payload = [] else: - payload = lower_flat(opts, case_value, c.t) + payload = lower_flat(cx, case_value, c.t) for i,have in enumerate(payload): want = flat_types.pop(0) match (have.t, want): @@ -1103,16 +1114,16 @@ The `lift_values` function defines how to lift a list of at most `max_flat` core parameters or results given by the `ValueIter` `vi` into a tuple of values with types `ts`: ```python -def lift_values(opts, max_flat, vi, ts): +def lift_values(cx, max_flat, vi, ts): flat_types = flatten_types(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') tuple_type = Tuple(ts) trap_if(ptr != align_to(ptr, alignment(tuple_type))) - trap_if(ptr + size(tuple_type) > len(opts.memory)) - return list(load(opts, ptr, tuple_type).values()) + trap_if(ptr + size(tuple_type) > len(cx.opts.memory)) + return list(load(cx, ptr, tuple_type).values()) else: - return [ lift_flat(opts, vi, t) for t in ts ] + return [ lift_flat(cx, vi, t) for t in ts ] ``` The `lower_values` function defines how to lower a list of component-level @@ -1121,46 +1132,40 @@ already described for [`flatten`](#flattening) above, lowering handles the greater-than-`max_flat` case by either allocating storage with `realloc` or accepting a caller-allocated buffer as an out-param: ```python -def lower_values(opts, max_flat, vs, ts, out_param = None): +def lower_values(cx, max_flat, vs, ts, out_param = None): flat_types = flatten_types(ts) if len(flat_types) > max_flat: tuple_type = Tuple(ts) tuple_value = {str(i): v for i,v in enumerate(vs)} if out_param is None: - ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) + ptr = cx.opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) else: ptr = out_param.next('i32') trap_if(ptr != align_to(ptr, alignment(tuple_type))) - trap_if(ptr + size(tuple_type) > len(opts.memory)) - store(opts, tuple_value, tuple_type, ptr) + trap_if(ptr + size(tuple_type) > len(cx.opts.memory)) + store(cx, tuple_value, tuple_type, ptr) return [ Value('i32', ptr) ] else: flat_vals = [] for i in range(len(vs)): - flat_vals += lower_flat(opts, vs[i], ts[i]) + flat_vals += lower_flat(cx, vs[i], ts[i]) return flat_vals ``` ## Canonical Definitions Using the above supporting definitions, we can describe the static and dynamic -semantics of component-level [`canon`] definitions, which have the following -AST (copied from the [explainer][Canonical Definitions]): -``` -canon ::= (canon lift * (func ?)) - | (canon lower * (core func ?)) -``` -The following subsections cover each of these cases (which will soon be +semantics of component-level [`canon`] definitions. The following subsections +cover each of these `canon` cases (which will soon be extended to include [async](https://docs.google.com/presentation/d/1MNVOZ8hdofO3tI0szg_i-Yoy0N2QPU2C--LzVuoGSlE/edit#slide=id.g13600a23b7f_16_0) and [resource/handle](https://github.com/alexcrichton/interface-types/blob/40f157ad429772c2b6a8b66ce7b4df01e83ae76d/proposals/interface-types/CanonicalABI.md#handle-intrinsics) built-ins). - ### `canon lift` For a function: ``` -(canon lift $ft: $opts:* $callee: (func $f)) +(canon lift $callee: $opts:* (func $f (type $ft))) ``` validation specifies: * `$callee` must have type `flatten($ft, 'lift')` @@ -1200,7 +1205,7 @@ the outside world through an export. Given the above closure arguments, `canon_lift` is defined: ```python -def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export): +def canon_lift(callee_cx, callee_instance, callee, ft, args, called_as_export): if called_as_export: trap_if(not callee_instance.may_enter) callee_instance.may_enter = False @@ -1209,7 +1214,7 @@ def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export) assert(callee_instance.may_leave) callee_instance.may_leave = False - flat_args = lower_values(callee_opts, MAX_FLAT_PARAMS, args, ft.param_types()) + flat_args = lower_values(callee_cx, MAX_FLAT_PARAMS, args, ft.param_types()) callee_instance.may_leave = True try: @@ -1217,10 +1222,10 @@ def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export) except CoreWebAssemblyException: trap() - results = lift_values(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) + results = lift_values(callee_cx, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) def post_return(): - if callee_opts.post_return is not None: - callee_opts.post_return(flat_results) + if callee_cx.opts.post_return is not None: + callee_cx.opts.post_return(flat_results) if called_as_export: callee_instance.may_enter = True @@ -1253,7 +1258,7 @@ actions after the lowering is complete. For a function: ``` -(canon lower $opts:* $callee: (core func $f)) +(canon lower $callee: $opts:* (core func $f)) ``` where `$callee` has type `$ft`, validation specifies: * `$f` is given type `flatten($ft, 'lower')` @@ -1268,16 +1273,16 @@ Thus, from the perspective of Core WebAssembly, `$f` is a [function instance] containing a `hostfunc` that closes over `$opts`, `$inst`, `$callee` and `$ft` and, when called from Core WebAssembly code, calls `canon_lower`, which is defined as: ```python -def canon_lower(caller_opts, caller_instance, callee, ft, flat_args): +def canon_lower(caller_cx, caller_instance, callee, ft, flat_args): trap_if(not caller_instance.may_leave) flat_args = ValueIter(flat_args) - args = lift_values(caller_opts, MAX_FLAT_PARAMS, flat_args, ft.param_types()) + args = lift_values(caller_cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) caller_instance.may_leave = False - flat_results = lower_values(caller_opts, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) + flat_results = lower_values(caller_cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) caller_instance.may_leave = True post_return() diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 805ad33..47390c7 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -269,40 +269,45 @@ def size_flags(labels): def num_i32_flags(labels): return math.ceil(len(labels) / 32) -### Loading +### Context -class Opts: - string_encoding: str +class CanonicalOptions: memory: bytearray + string_encoding: str realloc: Callable[[int,int,int,int],int] post_return: Callable[[],None] -def load(opts, ptr, t): +class Context: + opts: CanonicalOptions + +### Loading + +def load(cx, ptr, t): assert(ptr == align_to(ptr, alignment(t))) - assert(ptr + size(t) <= len(opts.memory)) + assert(ptr + size(t) <= len(cx.opts.memory)) match despecialize(t): - case Bool() : return convert_int_to_bool(load_int(opts, ptr, 1)) - case U8() : return load_int(opts, ptr, 1) - case U16() : return load_int(opts, ptr, 2) - case U32() : return load_int(opts, ptr, 4) - case U64() : return load_int(opts, ptr, 8) - case S8() : return load_int(opts, ptr, 1, signed=True) - case S16() : return load_int(opts, ptr, 2, signed=True) - case S32() : return load_int(opts, ptr, 4, signed=True) - case S64() : return load_int(opts, ptr, 8, signed=True) - case Float32() : return canonicalize32(reinterpret_i32_as_float(load_int(opts, ptr, 4))) - case Float64() : return canonicalize64(reinterpret_i64_as_float(load_int(opts, ptr, 8))) - case Char() : return i32_to_char(opts, load_int(opts, ptr, 4)) - case String() : return load_string(opts, ptr) - case List(t) : return load_list(opts, ptr, t) - case Record(fields) : return load_record(opts, ptr, fields) - case Variant(cases) : return load_variant(opts, ptr, cases) - case Flags(labels) : return load_flags(opts, ptr, labels) + case Bool() : return convert_int_to_bool(load_int(cx, ptr, 1)) + case U8() : return load_int(cx, ptr, 1) + case U16() : return load_int(cx, ptr, 2) + case U32() : return load_int(cx, ptr, 4) + case U64() : return load_int(cx, ptr, 8) + case S8() : return load_int(cx, ptr, 1, signed=True) + case S16() : return load_int(cx, ptr, 2, signed=True) + case S32() : return load_int(cx, ptr, 4, signed=True) + case S64() : return load_int(cx, ptr, 8, signed=True) + case Float32() : return canonicalize32(reinterpret_i32_as_float(load_int(cx, ptr, 4))) + case Float64() : return canonicalize64(reinterpret_i64_as_float(load_int(cx, ptr, 8))) + case Char() : return i32_to_char(cx, load_int(cx, ptr, 4)) + case String() : return load_string(cx, ptr) + case List(t) : return load_list(cx, ptr, t) + case Record(fields) : return load_record(cx, ptr, fields) + case Variant(cases) : return load_variant(cx, ptr, cases) + case Flags(labels) : return load_flags(cx, ptr, labels) # -def load_int(opts, ptr, nbytes, signed = False): - return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed) +def load_int(cx, ptr, nbytes, signed = False): + return int.from_bytes(cx.opts.memory[ptr : ptr+nbytes], 'little', signed=signed) # @@ -333,22 +338,22 @@ def canonicalize64(f): # -def i32_to_char(opts, i): +def i32_to_char(cx, i): trap_if(i >= 0x110000) trap_if(0xD800 <= i <= 0xDFFF) return chr(i) # -def load_string(opts, ptr): - begin = load_int(opts, ptr, 4) - tagged_code_units = load_int(opts, ptr + 4, 4) - return load_string_from_range(opts, begin, tagged_code_units) +def load_string(cx, ptr): + begin = load_int(cx, ptr, 4) + tagged_code_units = load_int(cx, ptr + 4, 4) + return load_string_from_range(cx, begin, tagged_code_units) UTF16_TAG = 1 << 31 -def load_string_from_range(opts, ptr, tagged_code_units): - match opts.string_encoding: +def load_string_from_range(cx, ptr, tagged_code_units): + match cx.opts.string_encoding: case 'utf8': alignment = 1 byte_length = tagged_code_units @@ -367,42 +372,42 @@ def load_string_from_range(opts, ptr, tagged_code_units): encoding = 'latin-1' trap_if(ptr != align_to(ptr, alignment)) - trap_if(ptr + byte_length > len(opts.memory)) + trap_if(ptr + byte_length > len(cx.opts.memory)) try: - s = opts.memory[ptr : ptr+byte_length].decode(encoding) + s = cx.opts.memory[ptr : ptr+byte_length].decode(encoding) except UnicodeError: trap() - return (s, opts.string_encoding, tagged_code_units) + return (s, cx.opts.string_encoding, tagged_code_units) # -def load_list(opts, ptr, elem_type): - begin = load_int(opts, ptr, 4) - length = load_int(opts, ptr + 4, 4) - return load_list_from_range(opts, begin, length, elem_type) +def load_list(cx, ptr, elem_type): + begin = load_int(cx, ptr, 4) + length = load_int(cx, ptr + 4, 4) + return load_list_from_range(cx, begin, length, elem_type) -def load_list_from_range(opts, ptr, length, elem_type): +def load_list_from_range(cx, ptr, length, elem_type): trap_if(ptr != align_to(ptr, alignment(elem_type))) - trap_if(ptr + length * size(elem_type) > len(opts.memory)) + trap_if(ptr + length * size(elem_type) > len(cx.opts.memory)) a = [] for i in range(length): - a.append(load(opts, ptr + i * size(elem_type), elem_type)) + a.append(load(cx, ptr + i * size(elem_type), elem_type)) return a -def load_record(opts, ptr, fields): +def load_record(cx, ptr, fields): record = {} for field in fields: ptr = align_to(ptr, alignment(field.t)) - record[field.label] = load(opts, ptr, field.t) + record[field.label] = load(cx, ptr, field.t) ptr += size(field.t) return record # -def load_variant(opts, ptr, cases): +def load_variant(cx, ptr, cases): disc_size = size(discriminant_type(cases)) - case_index = load_int(opts, ptr, disc_size) + case_index = load_int(cx, ptr, disc_size) ptr += disc_size trap_if(case_index >= len(cases)) c = cases[case_index] @@ -410,7 +415,7 @@ def load_variant(opts, ptr, cases): case_label = case_label_with_refinements(c, cases) if c.t is None: return { case_label: None } - return { case_label: load(opts, ptr, c.t) } + return { case_label: load(cx, ptr, c.t) } def case_label_with_refinements(c, cases): label = c.label @@ -428,8 +433,8 @@ def find_case(label, cases): # -def load_flags(opts, ptr, labels): - i = load_int(opts, ptr, size_flags(labels)) +def load_flags(cx, ptr, labels): + i = load_int(cx, ptr, size_flags(labels)) return unpack_flags_from_int(i, labels) def unpack_flags_from_int(i, labels): @@ -441,32 +446,32 @@ def unpack_flags_from_int(i, labels): ### Storing -def store(opts, v, t, ptr): +def store(cx, v, t, ptr): assert(ptr == align_to(ptr, alignment(t))) - assert(ptr + size(t) <= len(opts.memory)) + assert(ptr + size(t) <= len(cx.opts.memory)) match despecialize(t): - case Bool() : store_int(opts, int(bool(v)), ptr, 1) - case U8() : store_int(opts, v, ptr, 1) - case U16() : store_int(opts, v, ptr, 2) - case U32() : store_int(opts, v, ptr, 4) - case U64() : store_int(opts, v, ptr, 8) - case S8() : store_int(opts, v, ptr, 1, signed=True) - case S16() : store_int(opts, v, ptr, 2, signed=True) - case S32() : store_int(opts, v, ptr, 4, signed=True) - case S64() : store_int(opts, v, ptr, 8, signed=True) - case Float32() : store_int(opts, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4) - case Float64() : store_int(opts, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8) - case Char() : store_int(opts, char_to_i32(v), ptr, 4) - case String() : store_string(opts, v, ptr) - case List(t) : store_list(opts, v, ptr, t) - case Record(fields) : store_record(opts, v, ptr, fields) - case Variant(cases) : store_variant(opts, v, ptr, cases) - case Flags(labels) : store_flags(opts, v, ptr, labels) + case Bool() : store_int(cx, int(bool(v)), ptr, 1) + case U8() : store_int(cx, v, ptr, 1) + case U16() : store_int(cx, v, ptr, 2) + case U32() : store_int(cx, v, ptr, 4) + case U64() : store_int(cx, v, ptr, 8) + case S8() : store_int(cx, v, ptr, 1, signed=True) + case S16() : store_int(cx, v, ptr, 2, signed=True) + case S32() : store_int(cx, v, ptr, 4, signed=True) + case S64() : store_int(cx, v, ptr, 8, signed=True) + case Float32() : store_int(cx, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4) + case Float64() : store_int(cx, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8) + case Char() : store_int(cx, char_to_i32(v), ptr, 4) + case String() : store_string(cx, v, ptr) + case List(t) : store_list(cx, v, ptr, t) + case Record(fields) : store_record(cx, v, ptr, fields) + case Variant(cases) : store_variant(cx, v, ptr, cases) + case Flags(labels) : store_flags(cx, v, ptr, labels) # -def store_int(opts, v, ptr, nbytes, signed = False): - opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) +def store_int(cx, v, ptr, nbytes, signed = False): + cx.opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed) # @@ -485,12 +490,12 @@ def char_to_i32(c): # -def store_string(opts, v, ptr): - begin, tagged_code_units = store_string_into_range(opts, v) - store_int(opts, begin, ptr, 4) - store_int(opts, tagged_code_units, ptr + 4, 4) +def store_string(cx, v, ptr): + begin, tagged_code_units = store_string_into_range(cx, v) + store_int(cx, begin, ptr, 4) + store_int(cx, tagged_code_units, ptr + 4, 4) -def store_string_into_range(opts, v): +def store_string_into_range(cx, v): src, src_encoding, src_tagged_code_units = v if src_encoding == 'latin1+utf16': @@ -504,174 +509,174 @@ def store_string_into_range(opts, v): src_simple_encoding = src_encoding src_code_units = src_tagged_code_units - match opts.string_encoding: + match cx.opts.string_encoding: case 'utf8': match src_simple_encoding: - case 'utf8' : return store_string_copy(opts, src, src_code_units, 1, 1, 'utf-8') - case 'utf16' : return store_utf16_to_utf8(opts, src, src_code_units) - case 'latin1' : return store_latin1_to_utf8(opts, src, src_code_units) + case 'utf8' : return store_string_copy(cx, src, src_code_units, 1, 1, 'utf-8') + case 'utf16' : return store_utf16_to_utf8(cx, src, src_code_units) + case 'latin1' : return store_latin1_to_utf8(cx, src, src_code_units) case 'utf16': match src_simple_encoding: - case 'utf8' : return store_utf8_to_utf16(opts, src, src_code_units) - case 'utf16' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') - case 'latin1' : return store_string_copy(opts, src, src_code_units, 2, 2, 'utf-16-le') + case 'utf8' : return store_utf8_to_utf16(cx, src, src_code_units) + case 'utf16' : return store_string_copy(cx, src, src_code_units, 2, 2, 'utf-16-le') + case 'latin1' : return store_string_copy(cx, src, src_code_units, 2, 2, 'utf-16-le') case 'latin1+utf16': match src_encoding: - case 'utf8' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) - case 'utf16' : return store_string_to_latin1_or_utf16(opts, src, src_code_units) + case 'utf8' : return store_string_to_latin1_or_utf16(cx, src, src_code_units) + case 'utf16' : return store_string_to_latin1_or_utf16(cx, src, src_code_units) case 'latin1+utf16' : match src_simple_encoding: - case 'latin1' : return store_string_copy(opts, src, src_code_units, 1, 2, 'latin-1') - case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units) + case 'latin1' : return store_string_copy(cx, src, src_code_units, 1, 2, 'latin-1') + case 'utf16' : return store_probably_utf16_to_latin1_or_utf16(cx, src, src_code_units) # MAX_STRING_BYTE_LENGTH = (1 << 31) - 1 -def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_alignment, dst_encoding): +def store_string_copy(cx, src, src_code_units, dst_code_unit_size, dst_alignment, dst_encoding): dst_byte_length = dst_code_unit_size * src_code_units trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, dst_alignment, dst_byte_length) + ptr = cx.opts.realloc(0, 0, dst_alignment, dst_byte_length) trap_if(ptr != align_to(ptr, dst_alignment)) - trap_if(ptr + dst_byte_length > len(opts.memory)) + trap_if(ptr + dst_byte_length > len(cx.opts.memory)) encoded = src.encode(dst_encoding) assert(dst_byte_length == len(encoded)) - opts.memory[ptr : ptr+len(encoded)] = encoded + cx.opts.memory[ptr : ptr+len(encoded)] = encoded return (ptr, src_code_units) # -def store_utf16_to_utf8(opts, src, src_code_units): +def store_utf16_to_utf8(cx, src, src_code_units): worst_case_size = src_code_units * 3 - return store_string_to_utf8(opts, src, src_code_units, worst_case_size) + return store_string_to_utf8(cx, src, src_code_units, worst_case_size) -def store_latin1_to_utf8(opts, src, src_code_units): +def store_latin1_to_utf8(cx, src, src_code_units): worst_case_size = src_code_units * 2 - return store_string_to_utf8(opts, src, src_code_units, worst_case_size) + return store_string_to_utf8(cx, src, src_code_units, worst_case_size) -def store_string_to_utf8(opts, src, src_code_units, worst_case_size): +def store_string_to_utf8(cx, src, src_code_units, worst_case_size): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 1, src_code_units) - trap_if(ptr + src_code_units > len(opts.memory)) + ptr = cx.opts.realloc(0, 0, 1, src_code_units) + trap_if(ptr + src_code_units > len(cx.opts.memory)) encoded = src.encode('utf-8') assert(src_code_units <= len(encoded)) - opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] + cx.opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units] if src_code_units < len(encoded): trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size) - trap_if(ptr + worst_case_size > len(opts.memory)) - opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] + ptr = cx.opts.realloc(ptr, src_code_units, 1, worst_case_size) + trap_if(ptr + worst_case_size > len(cx.opts.memory)) + cx.opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ] if worst_case_size > len(encoded): - ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded)) - trap_if(ptr + len(encoded) > len(opts.memory)) + ptr = cx.opts.realloc(ptr, worst_case_size, 1, len(encoded)) + trap_if(ptr + len(encoded) > len(cx.opts.memory)) return (ptr, len(encoded)) # -def store_utf8_to_utf16(opts, src, src_code_units): +def store_utf8_to_utf16(cx, src, src_code_units): worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 2, worst_case_size) + ptr = cx.opts.realloc(0, 0, 2, worst_case_size) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + worst_case_size > len(opts.memory)) + trap_if(ptr + worst_case_size > len(cx.opts.memory)) encoded = src.encode('utf-16-le') - opts.memory[ptr : ptr+len(encoded)] = encoded + cx.opts.memory[ptr : ptr+len(encoded)] = encoded if len(encoded) < worst_case_size: - ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + ptr = cx.opts.realloc(ptr, worst_case_size, 2, len(encoded)) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + len(encoded) > len(opts.memory)) + trap_if(ptr + len(encoded) > len(cx.opts.memory)) code_units = int(len(encoded) / 2) return (ptr, code_units) # -def store_string_to_latin1_or_utf16(opts, src, src_code_units): +def store_string_to_latin1_or_utf16(cx, src, src_code_units): assert(src_code_units <= MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 2, src_code_units) + ptr = cx.opts.realloc(0, 0, 2, src_code_units) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + src_code_units > len(opts.memory)) + trap_if(ptr + src_code_units > len(cx.opts.memory)) dst_byte_length = 0 for usv in src: if ord(usv) < (1 << 8): - opts.memory[ptr + dst_byte_length] = ord(usv) + cx.opts.memory[ptr + dst_byte_length] = ord(usv) dst_byte_length += 1 else: worst_case_size = 2 * src_code_units trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size) + ptr = cx.opts.realloc(ptr, src_code_units, 2, worst_case_size) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + worst_case_size > len(opts.memory)) + trap_if(ptr + worst_case_size > len(cx.opts.memory)) for j in range(dst_byte_length-1, -1, -1): - opts.memory[ptr + 2*j] = opts.memory[ptr + j] - opts.memory[ptr + 2*j + 1] = 0 + cx.opts.memory[ptr + 2*j] = cx.opts.memory[ptr + j] + cx.opts.memory[ptr + 2*j + 1] = 0 encoded = src.encode('utf-16-le') - opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] + cx.opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ] if worst_case_size > len(encoded): - ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded)) + ptr = cx.opts.realloc(ptr, worst_case_size, 2, len(encoded)) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + len(encoded) > len(opts.memory)) + trap_if(ptr + len(encoded) > len(cx.opts.memory)) tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) if dst_byte_length < src_code_units: - ptr = opts.realloc(ptr, src_code_units, 2, dst_byte_length) + ptr = cx.opts.realloc(ptr, src_code_units, 2, dst_byte_length) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + dst_byte_length > len(opts.memory)) + trap_if(ptr + dst_byte_length > len(cx.opts.memory)) return (ptr, dst_byte_length) # -def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units): +def store_probably_utf16_to_latin1_or_utf16(cx, src, src_code_units): src_byte_length = 2 * src_code_units trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH) - ptr = opts.realloc(0, 0, 2, src_byte_length) + ptr = cx.opts.realloc(0, 0, 2, src_byte_length) trap_if(ptr != align_to(ptr, 2)) - trap_if(ptr + src_byte_length > len(opts.memory)) + trap_if(ptr + src_byte_length > len(cx.opts.memory)) encoded = src.encode('utf-16-le') - opts.memory[ptr : ptr+len(encoded)] = encoded + cx.opts.memory[ptr : ptr+len(encoded)] = encoded if any(ord(c) >= (1 << 8) for c in src): tagged_code_units = int(len(encoded) / 2) | UTF16_TAG return (ptr, tagged_code_units) latin1_size = int(len(encoded) / 2) for i in range(latin1_size): - opts.memory[ptr + i] = opts.memory[ptr + 2*i] - ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size) - trap_if(ptr + latin1_size > len(opts.memory)) + cx.opts.memory[ptr + i] = cx.opts.memory[ptr + 2*i] + ptr = cx.opts.realloc(ptr, src_byte_length, 1, latin1_size) + trap_if(ptr + latin1_size > len(cx.opts.memory)) return (ptr, latin1_size) # -def store_list(opts, v, ptr, elem_type): - begin, length = store_list_into_range(opts, v, elem_type) - store_int(opts, begin, ptr, 4) - store_int(opts, length, ptr + 4, 4) +def store_list(cx, v, ptr, elem_type): + begin, length = store_list_into_range(cx, v, elem_type) + store_int(cx, begin, ptr, 4) + store_int(cx, length, ptr + 4, 4) -def store_list_into_range(opts, v, elem_type): +def store_list_into_range(cx, v, elem_type): byte_length = len(v) * size(elem_type) trap_if(byte_length >= (1 << 32)) - ptr = opts.realloc(0, 0, alignment(elem_type), byte_length) + ptr = cx.opts.realloc(0, 0, alignment(elem_type), byte_length) trap_if(ptr != align_to(ptr, alignment(elem_type))) - trap_if(ptr + byte_length > len(opts.memory)) + trap_if(ptr + byte_length > len(cx.opts.memory)) for i,e in enumerate(v): - store(opts, e, elem_type, ptr + i * size(elem_type)) + store(cx, e, elem_type, ptr + i * size(elem_type)) return (ptr, len(v)) -def store_record(opts, v, ptr, fields): +def store_record(cx, v, ptr, fields): for f in fields: ptr = align_to(ptr, alignment(f.t)) - store(opts, v[f.label], f.t, ptr) + store(cx, v[f.label], f.t, ptr) ptr += size(f.t) # -def store_variant(opts, v, ptr, cases): +def store_variant(cx, v, ptr, cases): case_index, case_value = match_case(v, cases) disc_size = size(discriminant_type(cases)) - store_int(opts, case_index, ptr, disc_size) + store_int(cx, case_index, ptr, disc_size) ptr += disc_size ptr = align_to(ptr, max_case_alignment(cases)) c = cases[case_index] if c.t is not None: - store(opts, case_value, c.t, ptr) + store(cx, case_value, c.t, ptr) def match_case(v, cases): assert(len(v.keys()) == 1) @@ -684,9 +689,9 @@ def match_case(v, cases): # -def store_flags(opts, v, ptr, labels): +def store_flags(cx, v, ptr, labels): i = pack_flags_into_int(v, labels) - store_int(opts, i, ptr, size_flags(labels)) + store_int(cx, i, ptr, size_flags(labels)) def pack_flags_into_int(v, labels): i = 0 @@ -779,7 +784,7 @@ def next(self, t): assert(v.t == t) return v.v -def lift_flat(opts, vi, t): +def lift_flat(cx, vi, t): match despecialize(t): case Bool() : return convert_int_to_bool(vi.next('i32')) case U8() : return lift_flat_unsigned(vi, 32, 8) @@ -792,11 +797,11 @@ def lift_flat(opts, vi, t): case S64() : return lift_flat_signed(vi, 64, 64) case Float32() : return canonicalize32(vi.next('f32')) case Float64() : return canonicalize64(vi.next('f64')) - case Char() : return i32_to_char(opts, vi.next('i32')) - case String() : return lift_flat_string(opts, vi) - case List(t) : return lift_flat_list(opts, vi, t) - case Record(fields) : return lift_flat_record(opts, vi, fields) - case Variant(cases) : return lift_flat_variant(opts, vi, cases) + case Char() : return i32_to_char(cx, vi.next('i32')) + case String() : return lift_flat_string(cx, vi) + case List(t) : return lift_flat_list(cx, vi, t) + case Record(fields) : return lift_flat_record(cx, vi, fields) + case Variant(cases) : return lift_flat_variant(cx, vi, cases) case Flags(labels) : return lift_flat_flags(vi, labels) # @@ -816,27 +821,27 @@ def lift_flat_signed(vi, core_width, t_width): # -def lift_flat_string(opts, vi): +def lift_flat_string(cx, vi): ptr = vi.next('i32') packed_length = vi.next('i32') - return load_string_from_range(opts, ptr, packed_length) + return load_string_from_range(cx, ptr, packed_length) -def lift_flat_list(opts, vi, elem_type): +def lift_flat_list(cx, vi, elem_type): ptr = vi.next('i32') length = vi.next('i32') - return load_list_from_range(opts, ptr, length, elem_type) + return load_list_from_range(cx, ptr, length, elem_type) # -def lift_flat_record(opts, vi, fields): +def lift_flat_record(cx, vi, fields): record = {} for f in fields: - record[f.label] = lift_flat(opts, vi, f.t) + record[f.label] = lift_flat(cx, vi, f.t) return record # -def lift_flat_variant(opts, vi, cases): +def lift_flat_variant(cx, vi, cases): flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') case_index = vi.next('i32') @@ -855,7 +860,7 @@ def next(self, want): if c.t is None: v = None else: - v = lift_flat(opts, CoerceValueIter(), c.t) + v = lift_flat(cx, CoerceValueIter(), c.t) for have in flat_types: _ = vi.next(have) return { case_label_with_refinements(c, cases): v } @@ -876,7 +881,7 @@ def lift_flat_flags(vi, labels): ### Flat Lowering -def lower_flat(opts, v, t): +def lower_flat(cx, v, t): match despecialize(t): case Bool() : return [Value('i32', int(v))] case U8() : return [Value('i32', v)] @@ -890,10 +895,10 @@ def lower_flat(opts, v, t): case Float32() : return [Value('f32', canonicalize32(v))] case Float64() : return [Value('f64', canonicalize64(v))] case Char() : return [Value('i32', char_to_i32(v))] - case String() : return lower_flat_string(opts, v) - case List(t) : return lower_flat_list(opts, v, t) - case Record(fields) : return lower_flat_record(opts, v, fields) - case Variant(cases) : return lower_flat_variant(opts, v, cases) + case String() : return lower_flat_string(cx, v) + case List(t) : return lower_flat_list(cx, v, t) + case Record(fields) : return lower_flat_record(cx, v, fields) + case Variant(cases) : return lower_flat_variant(cx, v, cases) case Flags(labels) : return lower_flat_flags(v, labels) # @@ -905,25 +910,25 @@ def lower_flat_signed(i, core_bits): # -def lower_flat_string(opts, v): - ptr, packed_length = store_string_into_range(opts, v) +def lower_flat_string(cx, v): + ptr, packed_length = store_string_into_range(cx, v) return [Value('i32', ptr), Value('i32', packed_length)] -def lower_flat_list(opts, v, elem_type): - (ptr, length) = store_list_into_range(opts, v, elem_type) +def lower_flat_list(cx, v, elem_type): + (ptr, length) = store_list_into_range(cx, v, elem_type) return [Value('i32', ptr), Value('i32', length)] # -def lower_flat_record(opts, v, fields): +def lower_flat_record(cx, v, fields): flat = [] for f in fields: - flat += lower_flat(opts, v[f.label], f.t) + flat += lower_flat(cx, v[f.label], f.t) return flat # -def lower_flat_variant(opts, v, cases): +def lower_flat_variant(cx, v, cases): case_index, case_value = match_case(v, cases) flat_types = flatten_variant(cases) assert(flat_types.pop(0) == 'i32') @@ -931,7 +936,7 @@ def lower_flat_variant(opts, v, cases): if c.t is None: payload = [] else: - payload = lower_flat(opts, case_value, c.t) + payload = lower_flat(cx, case_value, c.t) for i,have in enumerate(payload): want = flat_types.pop(0) match (have.t, want): @@ -957,36 +962,36 @@ def lower_flat_flags(v, labels): ### Lifting and Lowering Values -def lift_values(opts, max_flat, vi, ts): +def lift_values(cx, max_flat, vi, ts): flat_types = flatten_types(ts) if len(flat_types) > max_flat: ptr = vi.next('i32') tuple_type = Tuple(ts) trap_if(ptr != align_to(ptr, alignment(tuple_type))) - trap_if(ptr + size(tuple_type) > len(opts.memory)) - return list(load(opts, ptr, tuple_type).values()) + trap_if(ptr + size(tuple_type) > len(cx.opts.memory)) + return list(load(cx, ptr, tuple_type).values()) else: - return [ lift_flat(opts, vi, t) for t in ts ] + return [ lift_flat(cx, vi, t) for t in ts ] # -def lower_values(opts, max_flat, vs, ts, out_param = None): +def lower_values(cx, max_flat, vs, ts, out_param = None): flat_types = flatten_types(ts) if len(flat_types) > max_flat: tuple_type = Tuple(ts) tuple_value = {str(i): v for i,v in enumerate(vs)} if out_param is None: - ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) + ptr = cx.opts.realloc(0, 0, alignment(tuple_type), size(tuple_type)) else: ptr = out_param.next('i32') trap_if(ptr != align_to(ptr, alignment(tuple_type))) - trap_if(ptr + size(tuple_type) > len(opts.memory)) - store(opts, tuple_value, tuple_type, ptr) + trap_if(ptr + size(tuple_type) > len(cx.opts.memory)) + store(cx, tuple_value, tuple_type, ptr) return [ Value('i32', ptr) ] else: flat_vals = [] for i in range(len(vs)): - flat_vals += lower_flat(opts, vs[i], ts[i]) + flat_vals += lower_flat(cx, vs[i], ts[i]) return flat_vals ### `lift` @@ -996,7 +1001,9 @@ class Instance: may_enter = True # ... -def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export): +# + +def canon_lift(callee_cx, callee_instance, callee, ft, args, called_as_export): if called_as_export: trap_if(not callee_instance.may_enter) callee_instance.may_enter = False @@ -1005,7 +1012,7 @@ def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export) assert(callee_instance.may_leave) callee_instance.may_leave = False - flat_args = lower_values(callee_opts, MAX_FLAT_PARAMS, args, ft.param_types()) + flat_args = lower_values(callee_cx, MAX_FLAT_PARAMS, args, ft.param_types()) callee_instance.may_leave = True try: @@ -1013,10 +1020,10 @@ def canon_lift(callee_opts, callee_instance, callee, ft, args, called_as_export) except CoreWebAssemblyException: trap() - results = lift_values(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) + results = lift_values(callee_cx, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) def post_return(): - if callee_opts.post_return is not None: - callee_opts.post_return(flat_results) + if callee_cx.opts.post_return is not None: + callee_cx.opts.post_return(flat_results) if called_as_export: callee_instance.may_enter = True @@ -1024,16 +1031,16 @@ def post_return(): ### `lower` -def canon_lower(caller_opts, caller_instance, callee, ft, flat_args): +def canon_lower(caller_cx, caller_instance, callee, ft, flat_args): trap_if(not caller_instance.may_leave) flat_args = ValueIter(flat_args) - args = lift_values(caller_opts, MAX_FLAT_PARAMS, flat_args, ft.param_types()) + args = lift_values(caller_cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) caller_instance.may_leave = False - flat_results = lower_values(caller_opts, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) + flat_results = lower_values(caller_cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) caller_instance.may_leave = True post_return() diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index ce351fb..ff7c357 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -32,13 +32,14 @@ def realloc(self, original_ptr, original_size, alignment, new_size): self.memory[ret : ret + original_size] = self.memory[original_ptr : original_ptr + original_size] return ret -def mk_opts(memory, encoding, realloc, post_return): - opts = Opts() - opts.memory = memory - opts.string_encoding = encoding - opts.realloc = realloc - opts.post_return = post_return - return opts +def mk_cx(memory, encoding = None, realloc = None, post_return = None): + cx = Context() + cx.opts = CanonicalOptions() + cx.opts.memory = memory + cx.opts.string_encoding = encoding + cx.opts.realloc = realloc + cx.opts.post_return = post_return + return cx def mk_str(s): return (s, 'utf8', len(s.encode('utf-8'))) @@ -54,7 +55,7 @@ def fail(msg): raise BaseException(msg) def test(t, vals_to_lift, v, - opts = mk_opts(bytearray(), 'utf8', None, None), + cx = mk_cx(bytearray(), 'utf8', None, None), dst_encoding = None, lower_t = None, lower_v = None): @@ -65,12 +66,12 @@ def test_name(): if v is None: try: - got = lift_flat(opts, vi, t) + got = lift_flat(cx, vi, t) fail("{} expected trap, but got {}".format(test_name(), got)) except Trap: return - got = lift_flat(opts, vi, t) + got = lift_flat(cx, vi, t) assert(vi.i == len(vi.values)) if got != v: fail("{} initial lift_flat() expected {} but got {}".format(test_name(), v, got)) @@ -80,15 +81,15 @@ def test_name(): if lower_v is None: lower_v = v - heap = Heap(5*len(opts.memory)) + heap = Heap(5*len(cx.opts.memory)) if dst_encoding is None: - dst_encoding = opts.string_encoding - opts = mk_opts(heap.memory, dst_encoding, heap.realloc, None) - lowered_vals = lower_flat(opts, v, lower_t) + dst_encoding = cx.opts.string_encoding + cx = mk_cx(heap.memory, dst_encoding, heap.realloc, None) + lowered_vals = lower_flat(cx, v, lower_t) assert(flatten_type(lower_t) == list(map(lambda v: v.t, lowered_vals))) vi = ValueIter(lowered_vals) - got = lift_flat(opts, vi, lower_t) + got = lift_flat(cx, vi, lower_t) if not equal_modulo_string_encoding(got, lower_v): fail("{} re-lift expected {} but got {}".format(test_name(), lower_v, got)) @@ -167,21 +168,17 @@ def test_pairs(t, pairs): test_pairs(Enum(['a','b']), [(0,{'a':None}), (1,{'b':None}), (2,None)]) def test_nan32(inbits, outbits): - f = lift_flat(Opts(), ValueIter([Value('f32', reinterpret_i32_as_float(inbits))]), Float32()) + f = lift_flat(Context(), ValueIter([Value('f32', reinterpret_i32_as_float(inbits))]), Float32()) assert(reinterpret_float_as_i32(f) == outbits) - load_opts = Opts() - load_opts.memory = bytearray(4) - load_opts.memory = int.to_bytes(inbits, 4, 'little') - f = load(load_opts, 0, Float32()) + cx = mk_cx(int.to_bytes(inbits, 4, 'little')) + f = load(cx, 0, Float32()) assert(reinterpret_float_as_i32(f) == outbits) def test_nan64(inbits, outbits): - f = lift_flat(Opts(), ValueIter([Value('f64', reinterpret_i64_as_float(inbits))]), Float64()) + f = lift_flat(Context(), ValueIter([Value('f64', reinterpret_i64_as_float(inbits))]), Float64()) assert(reinterpret_float_as_i64(f) == outbits) - load_opts = Opts() - load_opts.memory = bytearray(8) - load_opts.memory = int.to_bytes(inbits, 8, 'little') - f = load(load_opts, 0, Float64()) + cx = mk_cx(int.to_bytes(inbits, 8, 'little')) + f = load(cx, 0, Float64()) assert(reinterpret_float_as_i64(f) == outbits) test_nan32(0x7fc00000, CANONICAL_FLOAT32_NAN) @@ -202,9 +199,9 @@ def test_nan64(inbits, outbits): def test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units): heap = Heap(len(encoded)) heap.memory[:] = encoded[:] - opts = mk_opts(heap.memory, src_encoding, None, None) + cx = mk_cx(heap.memory, src_encoding, None, None) v = (s, src_encoding, tagged_code_units) - test(String(), [0, tagged_code_units], v, opts, dst_encoding) + test(String(), [0, tagged_code_units], v, cx, dst_encoding) def test_string(src_encoding, dst_encoding, s): if src_encoding == 'utf8': @@ -239,8 +236,8 @@ def test_string(src_encoding, dst_encoding, s): def test_heap(t, expect, args, byte_array): heap = Heap(byte_array) - opts = mk_opts(heap.memory, 'utf8', None, None) - test(t, args, expect, opts) + cx = mk_cx(heap.memory, 'utf8', None, None) + test(t, args, expect, cx) test_heap(List(Record([])), [{},{},{}], [0,3], []) test_heap(List(Bool()), [True,False,True], [0,3], [1,0,1]) @@ -350,16 +347,16 @@ def test_roundtrip(t, v): callee = lambda x: x callee_heap = Heap(1000) - callee_opts = mk_opts(callee_heap.memory, 'utf8', callee_heap.realloc, lambda x: () ) - lifted_callee = lambda args: canon_lift(callee_opts, callee_instance, callee, ft, args, True) + callee_cx = mk_cx(callee_heap.memory, 'utf8', callee_heap.realloc, lambda x: () ) + lifted_callee = lambda args: canon_lift(callee_cx, callee_instance, callee, ft, args, True) caller_heap = Heap(1000) caller_instance = Instance() - caller_opts = mk_opts(caller_heap.memory, 'utf8', caller_heap.realloc, None) + caller_cx = mk_cx(caller_heap.memory, 'utf8', caller_heap.realloc, None) - flat_args = lower_flat(caller_opts, v, t) - flat_results = canon_lower(caller_opts, caller_instance, lifted_callee, ft, flat_args) - got = lift_flat(caller_opts, ValueIter(flat_results), t) + flat_args = lower_flat(caller_cx, v, t) + flat_results = canon_lower(caller_cx, caller_instance, lifted_callee, ft, flat_args) + got = lift_flat(caller_cx, ValueIter(flat_results), t) if got != v: fail("test_roundtrip({},{},{}) got {}".format(t, v, caller_args, got)) From 8d8907840b98d03a95dcb47abfbce721fb4bdbf3 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 3 Dec 2022 12:24:55 -0600 Subject: [PATCH 163/301] Move the component instance into the context --- design/mvp/CanonicalABI.md | 49 ++++++++++++------------- design/mvp/canonical-abi/definitions.py | 37 +++++++++---------- design/mvp/canonical-abi/run_tests.py | 11 +++--- 3 files changed, 47 insertions(+), 50 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 2c1d6dd..dd2141b 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -216,15 +216,26 @@ class CanonicalOptions: realloc: Callable[[int,int,int,int],int] post_return: Callable[[],None] +class ComponentInstance: + may_leave = True + may_enter = True + # ... + class Context: opts: CanonicalOptions + inst: ComponentInstance ``` Going through the fields of `Context`: The `opts` field represents the [`canonopt`] values supplied to currently-executing `canon lift` or `canon lower`. -(Others will be added shortly.) +The `inst` field represents the component instance that the currently-executing +canonical definition is closed over. The `may_enter` and `may_leave` fields are +used to enforce the [component invariants]: `may_leave` indicates whether the +instance may call out to an import and the `may_enter` state indicates whether +the instance may be called from the outside world through an export. + ### Loading @@ -1191,31 +1202,19 @@ export. In any case, `canon lift` specifies how these variously-produced values are consumed as parameters (and produced as results) by a *single host-agnostic component*. -The `$inst` captured above is assumed to have at least the following two fields, -which are used to implement the [component invariants]: -```python -class Instance: - may_leave = True - may_enter = True - # ... -``` -The `may_leave` state indicates whether the instance may call out to an import -and the `may_enter` state indicates whether the instance may be called from -the outside world through an export. - Given the above closure arguments, `canon_lift` is defined: ```python -def canon_lift(callee_cx, callee_instance, callee, ft, args, called_as_export): +def canon_lift(callee_cx, callee, ft, args, called_as_export): if called_as_export: - trap_if(not callee_instance.may_enter) - callee_instance.may_enter = False + trap_if(not callee_cx.inst.may_enter) + callee_cx.inst.may_enter = False else: - assert(not callee_instance.may_enter) + assert(not callee_cx.inst.may_enter) - assert(callee_instance.may_leave) - callee_instance.may_leave = False + assert(callee_cx.inst.may_leave) + callee_cx.inst.may_leave = False flat_args = lower_values(callee_cx, MAX_FLAT_PARAMS, args, ft.param_types()) - callee_instance.may_leave = True + callee_cx.inst.may_leave = True try: flat_results = callee(flat_args) @@ -1227,7 +1226,7 @@ def canon_lift(callee_cx, callee_instance, callee, ft, args, called_as_export): if callee_cx.opts.post_return is not None: callee_cx.opts.post_return(flat_results) if called_as_export: - callee_instance.may_enter = True + callee_cx.inst.may_enter = True return (results, post_return) ``` @@ -1273,17 +1272,17 @@ Thus, from the perspective of Core WebAssembly, `$f` is a [function instance] containing a `hostfunc` that closes over `$opts`, `$inst`, `$callee` and `$ft` and, when called from Core WebAssembly code, calls `canon_lower`, which is defined as: ```python -def canon_lower(caller_cx, caller_instance, callee, ft, flat_args): - trap_if(not caller_instance.may_leave) +def canon_lower(caller_cx, callee, ft, flat_args): + trap_if(not caller_cx.inst.may_leave) flat_args = ValueIter(flat_args) args = lift_values(caller_cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) - caller_instance.may_leave = False + caller_cx.inst.may_leave = False flat_results = lower_values(caller_cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) - caller_instance.may_leave = True + caller_cx.inst.may_leave = True post_return() diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 47390c7..4cf18af 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -277,8 +277,14 @@ class CanonicalOptions: realloc: Callable[[int,int,int,int],int] post_return: Callable[[],None] +class ComponentInstance: + may_leave = True + may_enter = True + # ... + class Context: opts: CanonicalOptions + inst: ComponentInstance ### Loading @@ -996,24 +1002,17 @@ def lower_values(cx, max_flat, vs, ts, out_param = None): ### `lift` -class Instance: - may_leave = True - may_enter = True - # ... - -# - -def canon_lift(callee_cx, callee_instance, callee, ft, args, called_as_export): +def canon_lift(callee_cx, callee, ft, args, called_as_export): if called_as_export: - trap_if(not callee_instance.may_enter) - callee_instance.may_enter = False + trap_if(not callee_cx.inst.may_enter) + callee_cx.inst.may_enter = False else: - assert(not callee_instance.may_enter) + assert(not callee_cx.inst.may_enter) - assert(callee_instance.may_leave) - callee_instance.may_leave = False + assert(callee_cx.inst.may_leave) + callee_cx.inst.may_leave = False flat_args = lower_values(callee_cx, MAX_FLAT_PARAMS, args, ft.param_types()) - callee_instance.may_leave = True + callee_cx.inst.may_leave = True try: flat_results = callee(flat_args) @@ -1025,23 +1024,23 @@ def post_return(): if callee_cx.opts.post_return is not None: callee_cx.opts.post_return(flat_results) if called_as_export: - callee_instance.may_enter = True + callee_cx.inst.may_enter = True return (results, post_return) ### `lower` -def canon_lower(caller_cx, caller_instance, callee, ft, flat_args): - trap_if(not caller_instance.may_leave) +def canon_lower(caller_cx, callee, ft, flat_args): + trap_if(not caller_cx.inst.may_leave) flat_args = ValueIter(flat_args) args = lift_values(caller_cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) - caller_instance.may_leave = False + caller_cx.inst.may_leave = False flat_results = lower_values(caller_cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) - caller_instance.may_leave = True + caller_cx.inst.may_leave = True post_return() diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index ff7c357..51fddf2 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -39,6 +39,7 @@ def mk_cx(memory, encoding = None, realloc = None, post_return = None): cx.opts.string_encoding = encoding cx.opts.realloc = realloc cx.opts.post_return = post_return + cx.inst = ComponentInstance() return cx def mk_str(s): @@ -343,26 +344,24 @@ def test_roundtrip(t, v): definitions.MAX_FLAT_RESULTS = 16 ft = FuncType([t],[t]) - callee_instance = Instance() callee = lambda x: x callee_heap = Heap(1000) callee_cx = mk_cx(callee_heap.memory, 'utf8', callee_heap.realloc, lambda x: () ) - lifted_callee = lambda args: canon_lift(callee_cx, callee_instance, callee, ft, args, True) + lifted_callee = lambda args: canon_lift(callee_cx, callee, ft, args, True) caller_heap = Heap(1000) - caller_instance = Instance() caller_cx = mk_cx(caller_heap.memory, 'utf8', caller_heap.realloc, None) flat_args = lower_flat(caller_cx, v, t) - flat_results = canon_lower(caller_cx, caller_instance, lifted_callee, ft, flat_args) + flat_results = canon_lower(caller_cx, lifted_callee, ft, flat_args) got = lift_flat(caller_cx, ValueIter(flat_results), t) if got != v: fail("test_roundtrip({},{},{}) got {}".format(t, v, caller_args, got)) - assert(caller_instance.may_leave and caller_instance.may_enter) - assert(callee_instance.may_leave and callee_instance.may_enter) + assert(caller_cx.inst.may_leave and caller_cx.inst.may_enter) + assert(callee_cx.inst.may_leave and callee_cx.inst.may_enter) definitions.MAX_FLAT_RESULTS = before test_roundtrip(S8(), -1) From c489282281cdb5703c038341393341df13aeb2a9 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 3 Dec 2022 16:48:26 -0600 Subject: [PATCH 164/301] Reorder Context definitions to present top-down --- design/mvp/CanonicalABI.md | 30 +++++++++++++------------ design/mvp/canonical-abi/definitions.py | 13 +++++++---- 2 files changed, 25 insertions(+), 18 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index dd2141b..26dacf3 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -210,31 +210,33 @@ The subsequent definitions of loading and storing a value from linear memory require additional context, which is threaded through most subsequent definitions via the `cx` parameter: ```python +class Context: + opts: CanonicalOptions + inst: ComponentInstance +``` + +The `opts` field represents the [`canonopt`] values supplied to +currently-executing `canon lift` or `canon lower`: +```python class CanonicalOptions: memory: bytearray string_encoding: str realloc: Callable[[int,int,int,int],int] post_return: Callable[[],None] +``` +The `inst` field represents the component instance that the currently-executing +canonical definition is defined to execute inside. The `may_enter` and +`may_leave` fields are used to enforce the [component invariants]: `may_leave` +indicates whether the instance may call out to an import and the `may_enter` +state indicates whether the instance may be called from the outside world +through an export. +```python class ComponentInstance: may_leave = True may_enter = True # ... - -class Context: - opts: CanonicalOptions - inst: ComponentInstance ``` -Going through the fields of `Context`: - -The `opts` field represents the [`canonopt`] values supplied to -currently-executing `canon lift` or `canon lower`. - -The `inst` field represents the component instance that the currently-executing -canonical definition is closed over. The `may_enter` and `may_leave` fields are -used to enforce the [component invariants]: `may_leave` indicates whether the -instance may call out to an import and the `may_enter` state indicates whether -the instance may be called from the outside world through an export. ### Loading diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 4cf18af..84dbab4 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -4,6 +4,7 @@ ### Boilerplate +from __future__ import annotations import math import struct from dataclasses import dataclass @@ -271,21 +272,25 @@ def num_i32_flags(labels): ### Context +class Context: + opts: CanonicalOptions + inst: ComponentInstance + +# + class CanonicalOptions: memory: bytearray string_encoding: str realloc: Callable[[int,int,int,int],int] post_return: Callable[[],None] +# + class ComponentInstance: may_leave = True may_enter = True # ... -class Context: - opts: CanonicalOptions - inst: ComponentInstance - ### Loading def load(cx, ptr, t): From e80a1dee1745ab2771103a0829712bd301a1de77 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Sat, 3 Dec 2022 16:48:26 -0600 Subject: [PATCH 165/301] Move called_as_export into Context --- design/mvp/CanonicalABI.md | 62 +++++++++++++------------ design/mvp/canonical-abi/definitions.py | 41 ++++++++-------- design/mvp/canonical-abi/run_tests.py | 15 +++--- 3 files changed, 62 insertions(+), 56 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 26dacf3..acd88ea 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -213,6 +213,7 @@ definitions via the `cx` parameter: class Context: opts: CanonicalOptions inst: ComponentInstance + called_as_export: bool ``` The `opts` field represents the [`canonopt`] values supplied to @@ -238,6 +239,11 @@ class ComponentInstance: # ... ``` +Lastly, the `called_as_export` field indicates whether the lifted function is +being called through a component export or whether this is an internal call, +(for example, when a child component calls an import that is defined by its +parent component). + ### Loading @@ -1206,29 +1212,29 @@ component*. Given the above closure arguments, `canon_lift` is defined: ```python -def canon_lift(callee_cx, callee, ft, args, called_as_export): - if called_as_export: - trap_if(not callee_cx.inst.may_enter) - callee_cx.inst.may_enter = False +def canon_lift(cx, callee, ft, args): + if cx.called_as_export: + trap_if(not cx.inst.may_enter) + cx.inst.may_enter = False else: - assert(not callee_cx.inst.may_enter) + assert(not cx.inst.may_enter) - assert(callee_cx.inst.may_leave) - callee_cx.inst.may_leave = False - flat_args = lower_values(callee_cx, MAX_FLAT_PARAMS, args, ft.param_types()) - callee_cx.inst.may_leave = True + assert(cx.inst.may_leave) + cx.inst.may_leave = False + flat_args = lower_values(cx, MAX_FLAT_PARAMS, args, ft.param_types()) + cx.inst.may_leave = True try: flat_results = callee(flat_args) except CoreWebAssemblyException: trap() - results = lift_values(callee_cx, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) + results = lift_values(cx, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) def post_return(): - if callee_cx.opts.post_return is not None: - callee_cx.opts.post_return(flat_results) - if called_as_export: - callee_cx.inst.may_enter = True + if cx.opts.post_return is not None: + cx.opts.post_return(flat_results) + if cx.called_as_export: + cx.inst.may_enter = True return (results, post_return) ``` @@ -1239,15 +1245,13 @@ boundaries. Thus, if a component wishes to signal an error, it must use some sort of explicit type such as `result` (whose `error` case particular language bindings may choose to map to and from exceptions). -The `called_as_export` parameter indicates whether `canon_lift` is being called -as part of a component export or whether this `canon_lift` is being called -internally (for example, by a child component instance). By clearing -`may_enter` for the duration of `canon_lift` when called as an export, the -dynamic traps ensure that components cannot be reentered, which is a [component -invariant]. Furthermore, because `may_enter` is not cleared on the exceptional -exit path taken by `trap()`, if there is a trap during Core WebAssembly -execution or lifting/lowering, the component is left permanently un-enterable, -ensuring the lockdown-after-trap [component invariant]. +By clearing `may_enter` for the duration of `canon_lift` when the function is +called as an export, the dynamic traps ensure that components cannot be +reentered, ensuring the non-reentrance [component invariant]. Furthermore, +because `may_enter` is not cleared on the exceptional exit path taken by +`trap()`, if there is a trap during Core WebAssembly execution of lifting or +lowering, the component is left permanently un-enterable, ensuring the +lockdown-after-trap [component invariant]. The contract assumed by `canon_lift` (and ensured by `canon_lower` below) is that the caller of `canon_lift` *must* call `post_return` right after lowering @@ -1274,17 +1278,17 @@ Thus, from the perspective of Core WebAssembly, `$f` is a [function instance] containing a `hostfunc` that closes over `$opts`, `$inst`, `$callee` and `$ft` and, when called from Core WebAssembly code, calls `canon_lower`, which is defined as: ```python -def canon_lower(caller_cx, callee, ft, flat_args): - trap_if(not caller_cx.inst.may_leave) +def canon_lower(cx, callee, ft, flat_args): + trap_if(not cx.inst.may_leave) flat_args = ValueIter(flat_args) - args = lift_values(caller_cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) + args = lift_values(cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) - caller_cx.inst.may_leave = False - flat_results = lower_values(caller_cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) - caller_cx.inst.may_leave = True + cx.inst.may_leave = False + flat_results = lower_values(cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) + cx.inst.may_leave = True post_return() diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 84dbab4..b9ab43e 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -275,6 +275,7 @@ def num_i32_flags(labels): class Context: opts: CanonicalOptions inst: ComponentInstance + called_as_export: bool # @@ -1007,45 +1008,45 @@ def lower_values(cx, max_flat, vs, ts, out_param = None): ### `lift` -def canon_lift(callee_cx, callee, ft, args, called_as_export): - if called_as_export: - trap_if(not callee_cx.inst.may_enter) - callee_cx.inst.may_enter = False +def canon_lift(cx, callee, ft, args): + if cx.called_as_export: + trap_if(not cx.inst.may_enter) + cx.inst.may_enter = False else: - assert(not callee_cx.inst.may_enter) + assert(not cx.inst.may_enter) - assert(callee_cx.inst.may_leave) - callee_cx.inst.may_leave = False - flat_args = lower_values(callee_cx, MAX_FLAT_PARAMS, args, ft.param_types()) - callee_cx.inst.may_leave = True + assert(cx.inst.may_leave) + cx.inst.may_leave = False + flat_args = lower_values(cx, MAX_FLAT_PARAMS, args, ft.param_types()) + cx.inst.may_leave = True try: flat_results = callee(flat_args) except CoreWebAssemblyException: trap() - results = lift_values(callee_cx, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) + results = lift_values(cx, MAX_FLAT_RESULTS, ValueIter(flat_results), ft.result_types()) def post_return(): - if callee_cx.opts.post_return is not None: - callee_cx.opts.post_return(flat_results) - if called_as_export: - callee_cx.inst.may_enter = True + if cx.opts.post_return is not None: + cx.opts.post_return(flat_results) + if cx.called_as_export: + cx.inst.may_enter = True return (results, post_return) ### `lower` -def canon_lower(caller_cx, callee, ft, flat_args): - trap_if(not caller_cx.inst.may_leave) +def canon_lower(cx, callee, ft, flat_args): + trap_if(not cx.inst.may_leave) flat_args = ValueIter(flat_args) - args = lift_values(caller_cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) + args = lift_values(cx, MAX_FLAT_PARAMS, flat_args, ft.param_types()) results, post_return = callee(args) - caller_cx.inst.may_leave = False - flat_results = lower_values(caller_cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) - caller_cx.inst.may_leave = True + cx.inst.may_leave = False + flat_results = lower_values(cx, MAX_FLAT_RESULTS, results, ft.result_types(), flat_args) + cx.inst.may_leave = True post_return() diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 51fddf2..1500943 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -32,7 +32,7 @@ def realloc(self, original_ptr, original_size, alignment, new_size): self.memory[ret : ret + original_size] = self.memory[original_ptr : original_ptr + original_size] return ret -def mk_cx(memory, encoding = None, realloc = None, post_return = None): +def mk_cx(memory = bytearray(), encoding = 'utf8', realloc = None, post_return = None): cx = Context() cx.opts = CanonicalOptions() cx.opts.memory = memory @@ -40,6 +40,7 @@ def mk_cx(memory, encoding = None, realloc = None, post_return = None): cx.opts.realloc = realloc cx.opts.post_return = post_return cx.inst = ComponentInstance() + cx.called_as_export = True return cx def mk_str(s): @@ -56,7 +57,7 @@ def fail(msg): raise BaseException(msg) def test(t, vals_to_lift, v, - cx = mk_cx(bytearray(), 'utf8', None, None), + cx = mk_cx(), dst_encoding = None, lower_t = None, lower_v = None): @@ -85,7 +86,7 @@ def test_name(): heap = Heap(5*len(cx.opts.memory)) if dst_encoding is None: dst_encoding = cx.opts.string_encoding - cx = mk_cx(heap.memory, dst_encoding, heap.realloc, None) + cx = mk_cx(heap.memory, dst_encoding, heap.realloc) lowered_vals = lower_flat(cx, v, lower_t) assert(flatten_type(lower_t) == list(map(lambda v: v.t, lowered_vals))) @@ -200,7 +201,7 @@ def test_nan64(inbits, outbits): def test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units): heap = Heap(len(encoded)) heap.memory[:] = encoded[:] - cx = mk_cx(heap.memory, src_encoding, None, None) + cx = mk_cx(heap.memory, src_encoding) v = (s, src_encoding, tagged_code_units) test(String(), [0, tagged_code_units], v, cx, dst_encoding) @@ -237,7 +238,7 @@ def test_string(src_encoding, dst_encoding, s): def test_heap(t, expect, args, byte_array): heap = Heap(byte_array) - cx = mk_cx(heap.memory, 'utf8', None, None) + cx = mk_cx(heap.memory) test(t, args, expect, cx) test_heap(List(Record([])), [{},{},{}], [0,3], []) @@ -348,10 +349,10 @@ def test_roundtrip(t, v): callee_heap = Heap(1000) callee_cx = mk_cx(callee_heap.memory, 'utf8', callee_heap.realloc, lambda x: () ) - lifted_callee = lambda args: canon_lift(callee_cx, callee, ft, args, True) + lifted_callee = lambda args: canon_lift(callee_cx, callee, ft, args) caller_heap = Heap(1000) - caller_cx = mk_cx(caller_heap.memory, 'utf8', caller_heap.realloc, None) + caller_cx = mk_cx(caller_heap.memory, 'utf8', caller_heap.realloc) flat_args = lower_flat(caller_cx, v, t) flat_results = canon_lower(caller_cx, lifted_callee, ft, flat_args) From 0d96673b3cb2d55b04c7d885b0ba00bb24137334 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Mon, 5 Dec 2022 19:31:16 -0600 Subject: [PATCH 166/301] Fix typo --- design/mvp/CanonicalABI.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index acd88ea..2c2dd16 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1194,7 +1194,7 @@ validation specifies: * if a `post-return` is present, it has type `(func (param flatten($ft)['results']))` When instantiating component instance `$inst`: -* Define `$f` to be the closure `lambda args: canon_lift($opts, $inst, $callee, $ft, args)` +* Define `$f` to be the closure `lambda args: canon_lift(Context($opts, $inst), $callee, $ft, args)` Thus, `$f` captures `$opts`, `$inst`, `$callee` and `$ft` in a closure which can be subsequently exported or passed into a child instance (via `with`). If @@ -1272,7 +1272,7 @@ where `$callee` has type `$ft`, validation specifies: * there is no `post-return` in `$opts` When instantiating component instance `$inst`: -* Define `$f` to be the closure: `lambda args: canon_lower($opts, $inst, $callee, $ft, args)` +* Define `$f` to be the closure: `lambda args: canon_lower(Context($opts, $inst), $callee, $ft, args)` Thus, from the perspective of Core WebAssembly, `$f` is a [function instance] containing a `hostfunc` that closes over `$opts`, `$inst`, `$callee` and `$ft` From 52c6a0feb1c6f070f71f70d324ff753b1841be4e Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Tue, 6 Dec 2022 16:20:52 -0600 Subject: [PATCH 167/301] Tweak wording, align bullets --- design/mvp/CanonicalABI.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 2c2dd16..38cb55b 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1182,16 +1182,16 @@ built-ins). ### `canon lift` -For a function: +For a canonical definition: ``` (canon lift $callee: $opts:* (func $f (type $ft))) ``` validation specifies: - * `$callee` must have type `flatten($ft, 'lift')` - * `$f` is given type `$ft` - * a `memory` is present if required by lifting and is a subtype of `(memory 1)` - * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` - * if a `post-return` is present, it has type `(func (param flatten($ft)['results']))` +* `$callee` must have type `flatten($ft, 'lift')` +* `$f` is given type `$ft` +* a `memory` is present if required by lifting and is a subtype of `(memory 1)` +* a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` +* if a `post-return` is present, it has type `(func (param flatten($ft)['results']))` When instantiating component instance `$inst`: * Define `$f` to be the closure `lambda args: canon_lift(Context($opts, $inst), $callee, $ft, args)` @@ -1261,15 +1261,15 @@ actions after the lowering is complete. ### `canon lower` -For a function: +For a canonical definition: ``` (canon lower $callee: $opts:* (core func $f)) ``` where `$callee` has type `$ft`, validation specifies: * `$f` is given type `flatten($ft, 'lower')` - * a `memory` is present if required by lifting and is a subtype of `(memory 1)` - * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` - * there is no `post-return` in `$opts` +* a `memory` is present if required by lifting and is a subtype of `(memory 1)` +* a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` +* there is no `post-return` in `$opts` When instantiating component instance `$inst`: * Define `$f` to be the closure: `lambda args: canon_lower(Context($opts, $inst), $callee, $ft, args)` From a26ee9969cf1952e3d438bba3c9e2f72e6983106 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 17 Nov 2022 16:27:38 -0600 Subject: [PATCH 168/301] Add resource and initial handle types --- design/mvp/Binary.md | 88 +++- design/mvp/CanonicalABI.md | 342 ++++++++++++- design/mvp/Explainer.md | 625 +++++++++++++++++++++--- design/mvp/Subtyping.md | 1 + design/mvp/WIT.md | 36 ++ design/mvp/canonical-abi/definitions.py | 195 +++++++- design/mvp/canonical-abi/run_tests.py | 77 ++- 7 files changed, 1255 insertions(+), 109 deletions(-) diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 0fc460d..82fdde5 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -80,8 +80,12 @@ sort ::= 0x00 cs: => co | 0x05 => instance inlineexport ::= n: si: => (export n si) name ::= len: n: => n (if len = |n|) -name-chars ::= w: => w - | n: 0x2d w: => n-w +name-chars ::= l: