-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding second initialization phase after start
#146
Comments
I suppose it isn't feasible to rename the original |
Unfortunately, I think we've missed our window to rename the Core WebAssembly |
I've expected as much, unfortunately. However, an argument could be made that tooling will need to be thoroughly updated to make good use of the component model specification anyway, so perhaps a breaking change like that could be justified, especially since it's a relatively simple change for tool authors compared to everything else they'll have to take care of. As for alternative naming instead of renaming the original function, I'm afraid I don't have any ideas as of right now. It's worth noting, of course, that the name of this function isn't all that important, and progress definitely shouldn't be halted over the naming, as most developers will seldom ever write webassembly by hand, and instead, appropriate tooling will generate it, and for that purpose, |
Do I understand correctly, the only difference between start2 and a regular function export would be that other clients of the instance cannot invoke it? If so, then perhaps a more regular and general version would be some kind of privileged ("protected"?) export that only the surrounding scope has access to? For one, that would allow the client to pass arguments whose construction depends on the instance's exports, which may be a relevant use case. |
Clients not being able to invoke the Allowing the client to pass arguments is an interesting generalization to consider. One reason I didn't suggest this is that this would make the |
Unless I misunderstand the use-case, wouldn't it be possible to do something similar to Rust? |
Agreed that allowing the host to deallocate host-side startup arguments is a valuable optimization, although I don't think |
Sorry, my point was not about the optimization, that's just a bonus. My point was that it's probably more future-proof to have a |
Ah, that's an interesting point too, thanks. |
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. \#### Related activities In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. \#### Considered alternatives It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. \#### Process We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. Related activities ------------------ In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. Considered alternatives ----------------------- It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. Process ------- We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
…203) * Add a toolchain-independent ABI document, and propose `_initialize` The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. Related activities ------------------ In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. Considered alternatives ----------------------- It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. Process ------- We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section * Rename to "Basic Module ABI". * Update BasicModuleABI.md Co-authored-by: Derek Schuff <[email protected]> * Explain when we can and can't use the wasm start function. --------- Co-authored-by: Derek Schuff <[email protected]>
This issue captures the motivation, summary and sketch of an idea for improving how snapshots work in the component model.
Motivation
There are a number of scenarios where we'd like to reduce component initialization time by capturing a "snapshot" of component state after some deterministic interval of execution so that starting from the snapshot is semantically equivalent to starting from the beginning. For example, a snapshot can capture the result of:
One way to do this is with wizer, which is an impressive tool that is widely used for this purpose already. However:
An alternative and complementary approach is to do snapshotting at "deployment time" as part of the process of AOT-compiling a component (the same step that is already used for fusing canonical adapters into core wasm and generating machine code). Because of the component invariant that functions executed during the
start
phase cannot call imports, when wasm is executed in deterministic mode, a component's state at the end of thestart
phase is fully determined by itsvalue
imports. Thus, as a pure optimization, a component AOT compiler could locally instantiate the root component being deployed with its expected value imports and include a snapshot of the post-start
execution state in the final compiled representation of the component.This snapshot-as-deployment-time-optimization approach has a number of advantages:
wizer
(which is a bit tricky).However, there's a significant limitation with this approach: not being able to call imports during the
start
phase means thatstart
functions won't be able to do much other than purely component-internal initialization. This limitation shows up when we try to execute guest initialization code (like top-level script execution or C++ global constructors) that may- or may-not call imports before the snapshot. If we run this code during thestart
phase, we'll trap if an import is called. If we can't run the code duringstart
, our only other option is to run it lazily when the first export is called (which is definitely not included in the snapshot). Thus, our only two options are either overly-restrictive or overly-unoptimized.One motivating observation is that calls to imports during
start
may actually be deterministic in practice if:Simply relaxing the trapping rules to allow these cases would be anti-composable and anti-virtualizable, since now the same component may or may not trap depending on subtle host details and how it is linked, none of which is reflected in the component's signature. So instead...
Feature summary
The basic idea (which is an old idea originating in core wasm) is to have a second phase of initialization that is allowed to call imports that runs after the
start
phase and before the first export is called.As for what to call this second phase: based on discussion in this issue, calling the second phase "init" sounds like it will confuse at least some people (b/c "init" sounds like it goes before "start"). So to avoid that, as a strawperson, I'll just call this second phase of initialization
start2
.Just like
start
sections in the component model, there can be multiplestart2
sections/functions in a component and they are run in order. The component model would ensure that allstart
functions have finished before the firststart2
function runs and that allstart2
functions complete before the first export is allowed to be called. Thus, there is astart
phase followed by astart2
phase that precedes general calls to exports. Becausestart2
functions can call imports,start2
will be the default place for a language toolchains to execute arbitrary up-front/run-once/top-level/global-constructor user code that takes no arguments and produces no results.Parent components get to choose when to execute their child instances'
start2
phases. If the parent knows that a child component will not or cannot call the parent's own imports (which the parent is in a position to know, as the parent completely determines the child's imports), the parent may execute the child component'sstart2
phase during the parent'sstart
phase, thus including the child's post-start2
execution state in the root component's post-start
snapshot. However, the parent can always execute a child'sstart2
phase later, e.g., during the parent's ownstart2
phase. Because component instances form a tree, each parent going up the tree to the root has the option to run an entire child subtree'sstart2
phase during the parent's ownstart
phase, thereby including it in the final root snapshot.An AOT compiler can also be more aggressive and execute the root component's
start2
phase speculatively and capture a snapshot ifstart2
returns without calling an import (silently discarding thestart2
execution on trap, which will by design not be externally observable). If the AOT compiler additionally has knowledge of the host's implementation of imports, the AOT compiler can be even more aggressive and allow-list host imports under various conditions. In the limit, an AOT compiler could capture a snapshot at the first point of non-determinism. Ultimately, this is all in the realm of pure runtime optimization and can be configured and improved over time.Sketch
Here's a sketch that seems like it could maybe work:
start2
section would be added that can call component-level function (just like thestart
function). Thestart2
section can call lifted core functions that execute the component's top-level core code.(canon child.start2 <instanceidx> (func $f))
canon built-in would be added for creating a component-level function that, when called, executes thestart2
phase of the given instance. This function$f
can be called eagerly via a(start2 $f)
section or lazily bycanon lower
ing and calling arbitrarily later from core wasm.start2
phase is executed exactly once before its first export is called. This allows lazy initialization right before the first use. In the eager-start2
case, the dynamic traps could be trivially eliminated.start2
functions would need to be able to return aresult
(with empty success and error payloads).start2
sections simply propagate failure. Lowered calls tostart2
from core wasm can potentially handle and recover from failures.start2
functions could additionally return afuture<result>
. This async-ness would need to somehow be reflected in the component's type so that its clients know thatchild.start2
returns afuture<result>
.The text was updated successfully, but these errors were encountered: