Skip to content

Commit

Permalink
Add a toolchain-independent ABI document, and propose _initialize
Browse files Browse the repository at this point in the history
The Wasm ecosystem is currently not consistent in how "constructors" such
as C++ static initializers and similar features in other languages are
implemented, and the result is users reporting constructs running multiple
times, and other users reporting constructors not getting run when they
should.

WASI has [defined a convention] using an exported function named
`_initialize`, however not all users are using WASI conventions. In
particular, users of what is sometimes called "wasm32-unknown-unknown"
are not expecting to follow WASI conventions. However, they still have a
need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as
a toolchain-independent ABI, here in tool-conventions. This would
recognize the `_initialize` function as the toolchain-independent way
to ensure that constructors are properly called before other exports are
accessed.

In the component model, there is a proposal to add a
[second initialization phase]. If that's done, then component-model
toolchains could arrange for this `_initialize` function to be called
automatically by this second initialization mechanism.

It is tempting to use the [Wasm start function] for C++ constructors;
this has been [extensively discussed], and the short answer is, the Wasm
start function is often called at a time when the outside environment
can't access the module's exports, and C++ constructors can run
arbitrary user code which may generate calls to things that need to
access the module's exports.

It's also tempting to propose defining a second initialization phase in
core Wasm. I'm not opposed to this, but it is more complex at the core
Wasm level than at the component-model level. For example, in Emscripten,
Wasm modules depend on JS code being able to run after the exports are
available but before the initialization function is called, which
wouldn't be possible if we simply call the initilaization function as
part of the linking step.

Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use
that name instead of `_initialize`, but wasm-ld already does insert some
initialization in addition to just constructors, so I think it makes
sense to use `_initialize` as the exported function, which may call
`__wasm_call_ctors` in its body.

We don't have a formal process defined for tool-convention proposals,
but because this is proposal has potentially wide-ranging impacts, I
propose to follow the following process:

 - I'm starting by posting this PR here, and people can comment on it.
   If a better alternative emerges, I'll close this PR.

 - After discussion here settles, if a better alternative hasn't emerged,
   I plan to request a CG meeting agenda item to present this topic to the
   CG, and seek feedback there, to ensure that it has CG-level visibility.

 - If the CG is in favor of it, then I'd propose we merge this PR.

[defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md
[second initialization phase]: WebAssembly/component-model#146
[Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start
[extensively discussed]: WebAssembly/design#1160
[`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section
  • Loading branch information
sunfishcode committed Mar 6, 2023
1 parent 9b80cd2 commit d1bf8fc
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions ToolchainIndependentABI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Toolchain-independent ABI
=========================

There are many different ways to use Wasm modules, and many different
conventions and toolchain-specific ABIs. This document describes ABI features
intended to be common across all ABIs.

## The `_initialize` function

If a module exports a function named `_initialize` with no arguments and no
return values, and does not export a function named `_start`, the toolchain
that produced my assume that on any instance of the module, this `_initialize`
function is called before any other exports are accessed.

This is intended to support language features such as C++ static constructors,
as well as "top-level scripts" in many scripting languages.

0 comments on commit d1bf8fc

Please sign in to comment.