Limitations of start function with exported memory #1160

sbc100 · 2017-11-30T01:09:51Z

We've been experimenting with using the start section in the C/C++/clang/lld world:
https://reviews.llvm.org/D40559

I've run into an issue with exported memory and static initializes (or any code in the start function). Basically no JS function that requires access to memory can be run during start because the exported memory is not available until after the start function completes.

We can switch to importing memory but that doesn't seem like a reasonable requirement.

Am I missing something?

jfbastien · 2017-11-30T06:57:41Z

Reading the linked review, it isn't clear to me what you're trying to execute in start. Is it everything before main, but excluding main, or up to and including main? FWIW I don't think start should call main, or that was the original idea for start anyways.

Regardless, it seems like segments can do all of the C static global initializers just fine, but the non-trivial C++ global initializers can't generally be done because they might execute arbitrary user code including syscalls. That's indeed troublesome!

Do you think the problem is only embedder calls trying to use an exported memory, or do you think these calls might also try to re-enter WebAssembly? I think it's a simple enough thing to do if the fix is to export memory before start, then export everything else after. If we instead export everything before start then I'm not sure...

At the same time, start is meant to model what ES6 module's "evaluate" does. In a world where WebAssembly can participate in ES6 modules, what should "evaluate" / start do w.r.t. exports?

NWilson · 2017-11-30T09:20:04Z

What do you mean by "is not available" during start? Presumably the Memory that is created for import could (in principle) be accessed via a closure while start is executing. Or, is there some annoying check that means you can't read/write to the Memory until it's attached to a fully-instantiated Instance?

I agree that it makes syscall implementation awkward - I thought of this problem too while I was trying to implement some syscalls for upstream Musl. I wasn't too bothered by the Memory issue, but the problem of imports seems thornier - until the Instance is created (at the end of start) there really isn't any way for imported functions to call back into Wasm. In practice, it's OK, but it could cause problems for someone.

The ideal thing would have been that "this" refers to the partially-instantiated Instance if imports are invoked during start, so that an imported function can call exports during start via "this.exports.blah". But that opens up a whole can of safety issues, while the current approach is safe albeit a bit limiting.

For syscalls, it shouldn't matter much. By definition, they are in "kernel-mode" and thus don't need to call anything in "user-mode" (Wasm). Most of the syscalls ought to be able to their thing without touching the Module/Instance.

@jfbastien - in the LLVM review, there's no particular opinion on what runs during start. Currently LLD's default entrypoint is _start (ie run main as well). I agree that's not ideal, I'm hoping that we could change the default to something else like _start_wasm to run everything before main. It's up to whoever runs the compiler which entrypoint you set, and regarding Sam's issue here I don't think it matters whether main is run or not.

lukewagner · 2017-11-30T15:21:31Z

@sbc100 Yes, that does seem problematic in the near term. ES Modules do allow cyclic imports and so, if we had ES Module integration, the JS called by wasm could be an ES module that imported the calling wasm module to get the Memory.

sbc100 · 2017-11-30T22:39:17Z

From my POV the problem would be solved by giving the embedder access to the newly created memory before start is called. @jfbastien its not clear to me how you would propose doing this. Can you explain?

In terms of what we will use the start section for, the jury is still out on that. What I've been looking at is moving the C/C++ static init function there, and leaving main to be run explicitly by the embedder. But this issue exists whatever we choose to run there.

jfbastien · 2017-11-30T23:10:51Z

From my POV the problem would be solved by giving the embedder access to the newly created memory before start is called. @jfbastien its not clear to me how you would propose doing this. Can you explain?

I think that what we end up doing should match what ES6 modules do: first they link, then evaluate. This can be circular! In our implementation we use the ES6 module machinery, we first link here, resolving exports and setting the corresponding symbol table. After all the ES6 modules are linked (there's only one for WebAssembly at the moment) there's a call to evaluate which sets up element and segment, and at the very end we call the start function. That's the same as running an ES6 module's global code just before returning to the import call (and this is done in an order that's determined by link IIUC).

Given this, what I think needs to happen is that we make export availability match what ES6 modules do for their own exports during evaluate. I'd consult with @domenic or @Constellation about this.

I think this also happens to fix what you're trying to do.

sbc100 · 2017-11-30T23:43:52Z

But from the JS API POV, how could we allow the developer to get a handle to the new memory before start is called? I was under the impression that the instance would need to be returned in order to get the memory handle, but that the start function is guaranteed to be called before then.

As a workaround I could switch all the test cases to import their memory.. which is probably not a bad idea, but obviously others could still run into this issue.

rossberg · 2017-12-05T16:06:57Z

I'm a bit confused. If you need two-phase initialisation where some init function has to be invoked after the instance has been fully constructed and handed out, then what keeps you from using an ordinary function for that? Or asking the other way round, how would the semantics of a start function whose execution you'd have to initiate explicitly be different from just invoking a normal function?

sbc100 · 2017-12-05T17:18:28Z

@rossberg using a normal function invocation is the alternative option and its what we currently do. Its just that that means our tooling conventions (and clang/lld generated code) won't be able to take advantage of the wasm start section. I don't really feel strongly about using the wasm start section, but I just wanted to raise the issue here in case people see it as problematic that C/C++ won't be using it.

sbc100 · 2018-05-01T16:06:10Z

This came up again in the llmv bug tracker recently: https://bugs.llvm.org/show_bug.cgi?id=37198

sbc100 · 2019-05-17T23:54:22Z

This is coming up again with WASI: WebAssembly/WASI#19

devsnek · 2019-05-18T00:08:20Z

this isn't just a start function problem. if you declare a function in js or wasm, and wasm imports it (esm), that function has no way to get access to the memory of that wasm module.

sbc100 · 2019-05-20T09:49:40Z

This issue relates to the case when the wasm memory is exported from the module. In the case when the wasm memory is imported into the module then is seems reasonable to assume that the embedder has access to the memory since it would have needed it in order to instantiate the module in the first place.

There is third case where the memory is neither imported nor exported but of course there is no way to share the memory in that case.

sbc100 · 2019-05-20T09:50:17Z

However I agree this issue should probably be renamed since it relates not just to memory but to anything that wasm module might want to export.

rossberg · 2019-05-20T10:31:54Z

In retrospect we should have named the start function "init function", to indicate that its purpose is module initialisation, i.e., that the module isn't intended to be accessible before that initialisation is complete. Allowing reentrancy during initialisation would bring about the well-known issues with exposing uninitialised state, and thereby breaking encapsulation guarantees (which is an important feature of Wasm's module system).

That said, I believe that all discussed cases can be handled by having the module export an explicit init function. The start function isn't for these cases, but is that a problem?

sbc100 · 2019-05-20T10:52:30Z

The only problem I see is that often seems to confuse people.

I don't feel strongly about it but perhaps since the name is part of the binary format, we can still rename it to something else like "init" in the text format and in the spec text?

rossberg · 2019-05-20T11:47:05Z

I'd be fine with renaming in the spec. Likewise for the text format, but that might be a bit more controversial. We could allow both keywords.

devsnek · 2019-05-20T13:39:37Z

the start function is specified to run at normal evaluation time in the esm proposal. additionally, exporting something explicit doesn't work for esm at all because nothing is there to import and run it. I'm really confused at this point. is the esm integration intended to exist at all?

This issue relates to the case when the wasm memory is exported from the module.

even if you do that, random functions a wasm module imports can't magically access that export.

rossberg · 2019-05-20T15:33:49Z

The ESM integration gives the illusion of cyclic linking by wrapping JS stubs around Wasm exports that don't yet exist. But they throw if invoked too early. Only once the Wasm module is instantiated underneath these stubs become usable. So even with ESM, Wasm module initialisation is not reentrant or cyclic. That's a feature. :)

Why would importing a custom initialisation function and explicitly invoking it after linking not work?

devsnek · 2019-05-20T15:51:51Z

@rossberg

Why would importing a custom initialisation function and explicitly invoking it after linking not work?

are you suggesting something like this?

let id = 0;
const memories = {};
export function register_memory(mem) {
  const i = id++;
  memories[i] = mem;
  return i;
}

export function fd_write(id, fd, ptr, len) {
  const mem = memories[id];
  // ...
}

you'd have to make memory a first class value, and at that point you can just do this:

export function fd_write(mem, fd, ptr, len) {
  // ...
}

littledan · 2019-05-20T17:01:48Z

The ESM integration gives the illusion of cyclic linking by wrapping JS stubs around Wasm exports that don't yet exist.

We thought about doing this, but when it came down to details, that ended up being complex and unworkable. So these stubs do not exist in the current proposal.

lukewagner · 2019-05-20T17:44:13Z

@devsnek Thinking about WASI in particular (which based on the code samples, I think you are?): I think we need to separate the "current unstable" and "future (with reftypes and multi-memory)" cases:

For the current unstable case, all offsets are relative to the 0th memory, so you just need the instance to export its default memory (just with a plain memory export statement). IIRC, a wasm module's live exports are filled in before the start function is executed, so if you have a mini-cycle between JS glue code imported and called by wasm that (cyclicly) imports that wasm's memory, then by the time the wasm start function runs and calls the JS glue, the wasm's exported memory's live binding has been filled in. But maybe I'm out of date; @littledan correct me if I'm wrong.
For the future case, I think reads/writes from/to linear memory should be expressed in terms of Web IDL Bindings (or something analogous), where ranges of bytes would be passed via some abstract "buffer"/"slice" data type (used in the interface) that get created with the view/copy binding operators (which select their memory via a memory index immediate in the binding expression). Then the receiver of the buffer/slice doesn't have to worry about finding the right memory to use.

devsnek · 2019-05-20T19:12:13Z

@lukewagner its not possible to get access to any of the memory using esm, whether or not its the 0th one. i'm using wasi as a motivating example, but the issue extends much further.

rossberg · 2019-05-21T07:54:16Z

@littledan:

that ended up being complex and unworkable. So these stubs do not exist in the current proposal.

Ah, okay, is it just uninitialised "live" bindings then? Either way, the effect should be almost the same.

@lukewagner:

IIRC, a wasm module's live exports are filled in before the start function is executed

The exports of a module are filled in after the start function has completed. Before you only have the bindings, but they are yet undefined. But I believe that's good enough for those use cases, because there's no need to access the memory yet.

lukewagner · 2019-05-21T18:00:35Z

@devsnek It is, if the module exports it.

@rossberg

The exports of a module are filled in after the start function has completed. Before you only have the bindings, but they are yet undefined

By the time the start function executes, the instance has to be fully live/constructed, so I don't see any technical reason that the exports couldn't be written eagerly into their live bindings. I actually vaguely recall discussing this in the past which is why I thought that's what was already specified.

But I believe that's good enough for those use cases, because there's no need to access the memory yet.

It really depends what people want to do during initialization, but it's not too hard to imagine the need to call out to JS for some glue utility and for that JS glue to need the exported memory to implement its functionality.

devsnek · 2019-05-21T19:29:03Z

@lukewagner if I export a function intended to be consumed by wasm (from wasm or js), how does that function access the memory of the calling wasm?

lukewagner · 2019-05-21T20:44:45Z

@devsnek The same module exporting the function should also export its memory and the same module importing the function should also import the memory. It's not ideal, of course, but as explained above this isn't the goal state.

devsnek · 2019-05-21T20:58:16Z

@lukewagner the file with the function doesn't know which wasm module is consuming it, it's not a thing esm provides. (it also doesn't know the name of the exported memory, but that's less relevant)

rossberg · 2019-05-22T16:38:20Z

by the time the start function is called, module elems have been written into imported tables.

True, but only if the modules chooses to initialise an external table that way. This is like handing out this to an external function from your constructor -- you're on your own if you do so. Fortunately, the bulk proposal provides better alternatives.

By the time the start function is called, the instance must be a fully operational battle station!

But that is an impossible requirement in general, since you cannot even do the most trivial computation before the start function, like, adding two numbers.

But if the wasm start function code calls JS in a syscall capacity, and that JS syscall needs the wasm's (exported) memory to do what it needs to do, then we're not actually allowing this.

I agree. But why is it necessary to use the start function in such cases?

lukewagner · 2019-05-22T21:59:59Z

True, but only if the modules chooses to initialise an external table that way.

Yes, but the same argument applies here: if the module chooses to call an import during the start function, it's on that module to make sure that's a valid thing to do.

But that is an impossible requirement in general, since you cannot even do the most trivial computation before the start function, like, adding two numbers.

I'm saying that the spec/engine-created instance has to be fully functional, not that the application-level state has to be fully initialized.

I agree. But why is it necessary to use the start function in such cases?

As an implementation detail of the wasm code that implements the start function. Consider the polyfill of what would otherwise be a builtin-module (implemented by the host): there are plenty of reasons why the wasm start function could need to use builtin functionality, and thus plenty of reasons to call the polyfill.

rossberg · 2019-05-23T17:09:27Z

Yes, but the same argument applies here: if the module chooses to call an import during the start function, it's on that module to make sure that's a valid thing to do.

I think these are quite different categories. Making a call is something that is unavoidable in general. Using an active segment is entirely avoidable, at least now that the bulk proposal fixes that.

As in OO: calling a function from a constructor is the norm, passing out this or similar is usually advised against (and even made impossible in some languages).

I'm saying that the spec/engine-created instance has to be fully functional, not that the application-level state has to be fully initialized.

Now I'm confused. So far I thought we share the goal that modules should enable proper encapsulation of application-level state. Being able to initialise that state reliably is an essential part of this. What alternative are you suggesting?

As an implementation detail of the wasm code that implements the start function. Consider the polyfill of what would otherwise be a builtin-module (implemented by the host)

Not sure I see any difference between built-in and regular modules, they should have the same access.
But regardless, you are describing something that I call a two-phase initialisation protocol. That is not an implementation detail anymore, because it involves state changes on the other side (at the minimum, storing some references). I agree that the start function alone isn't enough to express that, but I don't think that is a problem either.

If we conclude that it is, then we should enable passing out references explicitly for modules that desire to do so. Breaking the encapsulation guarantees for all modules that merely need to invoke some function is not a good alternative. It would be a breaking change as well. (Or we introduce a second form of start function that runs afterwards. But that brings us back to the question why an export isn't enough.)

devsnek · 2019-05-23T17:18:37Z

I think the difference here is that we're seeing start function as being able to do a lot more than just set up internal state, but it seems that's just not the intention?

perhaps we can rename start to init and then add a new start entry that runs after exports? ideally we could just allow it to be reordered and keep one... I don't think it's insane to require a module to keep its own encapsulation after it starts running arbitrary code.

lukewagner · 2019-05-23T20:31:14Z

@rossberg Ok, I think I'm seeing your point more now. I was viewing the choice of a.wasm to import and call b.js during a.wasm's start function as an explicit choice to treat b.js as an extension of a.wasm's implementation. But actually, answering the below question helped me see that while that is sometimes the case (viz., generated JS glue code), it's not always the case and thus we can't really consider this "explicit".

Also, my initial thinking was: since the general ESM design fills in live bindings super-early, the precedent is already established and we should do likewise with wasm ESM-integration. However, I now see a counterpoint: it would be cool if (with the right bindings...) a single .wasm could run just as well in an ESM-loading host as any other host; if we allow exports to go live pre-start in the ESM world, that will create hazards for interoperable tooling. Case and point, WASI adopted an explicit exported _start function (to sidestep all these issues).

(Or we introduce a second form of start function that runs afterwards. But that brings us back to the question why an export isn't enough.)

In the context of ESM-integration, the start function is analogous to a JS module's top-level script and thus if you want to write an ESM in, say, Rust, and run a particular Rust function when a JS module's top-level script would run (which is a reasonable thing to want to do if you are trying to interoperate cleanly with the ESM ecosystem), you must (with the current design) run that function during the start function; if you put that code in an export, you have no control over when it's run.

Combining these two requires, I like this idea (which actually matches WASI) of designating a special export as being automatically called by ESM-integration after instantiation (and thus the start function) completes and exports are made live.

So the question is how to designate this function. One option is a custom section (validated and semantically meaningful on hosts that support it, similar to what's proposed for bindings) that designates the export by export field name. (This would make it quite easy to polyfill in terms of the current JS API without any binary rewriting.)

Another option is to pick a magic export name in the subset of Unicode that is not valid as an ESM export identifier (because all export fields are valid in the JS API, this would also be polyfillable), but maybe that is future-hazardous so the former is better.

Either way, WASI should use the same way to designate its _start function so that WASI modules could do the Right Thing(tm) in an ESM-loading host.

guybedford · 2019-05-25T04:09:36Z

Have the conventions been worked out for how WASI will be statically linked? Surely static linking of a WASI library as part of a larger application would not want to run the start function of that imported WASI file? Or would it? From a JS integration perspective statically including WASI libraries seems like a really important use case to work out with these considerations, where you wouldn't necessarily want a "run to completion" process model on import, and you want to still maintain the benefits of memory isolation by using the ES module integration.

lukewagner · 2019-05-28T15:54:49Z

@guybedford I don't quite follow what you mean by "statically linking WASI". WASI is an interface, defined as a built-in module. Thus, at static link time, uses of WASI by a wasm module show up as module imports (of a WASI built-in module) that are expected to be resolved by the host, somehow, at module instantiation time. Now the host may choose to implement WASI with some JS or wasm module (e.g., on the Web, using an import-map), but that's a host-specific, instantiation-time detail.

lukewagner · 2019-05-29T15:33:50Z

@littledan What do you think of the above proposed addition to ESM-integration? This might also be a good in-person CG discussion topic during the ESM-integration slot.

littledan · 2019-06-11T18:11:18Z

Apologies for my delay.

The above idea sounds good to me: we have both the current start function, called before exports are visible, and another function which is called after exports are exposed. Seems like this would meet all of the goals described above. I like how this change is purely local, not requiring a second traversal over the module graph (in addition to not requiring binary rewriting).

Bikeshedding: We could call this *start*, which is quite invalid as an ES identifier, while being typeable on many keyboards easily.

I will include this in my presentation to the Wasm F2F and get a spec PR out ASAP. Shout-out to current implementers in Rollup, Webpack and Node.js that this change is under consideration: @sokra @guybedford @MylesBorins (I intend to follow up with details in your repositories once we make a decision).

axic · 2019-06-12T12:11:33Z

In the Ethereum use case (ewasm) we have the same problem and our way of solving this is having an exported function named main. We'd be very much in favour of following a standard agreed here and not inventing our own version.

Would it be possible extending the StartSection to include two start functions:

pre-finalisation (aka pre-exports are loaded) – this is the current "start" function
post-finalisation – this would be the "new start" function

littledan · 2019-06-21T19:37:37Z

Studing this problem a bit more, I can see that start functions are more complex than what I assumed in #1160 (comment), since:

There may need to be some embedder-specific logic to run before calling the secondary start function, e.g., in this proposed Node.js/WASI integration https://github.com/nodejs/node/pull/27850/files#diff-ab9c666b48467c7030bd93aac6d06eb2R184
Any one particular name that we choose might overlap with something in some other environment, so best not to hard-code one

From here, I think we need to do a more careful study of how various toolchains solve this problem, rather than assuming that a simple solution would as proposed above would work. If you could give pointers to your system's module initialization path, or ideally, summarize more details, in WebAssembly/esm-integration#32, it would be really helpful.

sbc100 · 2019-08-07T01:07:41Z

This continues to confuse folks: https://bugs.llvm.org/show_bug.cgi?id=42713

lukewagner · 2020-03-05T18:10:33Z

(Returning to this because I think it would be quite valuable to standardize in wasm so as to avoid the current ad hoc embedding conventions.)

One way to generalize the Node.js issue @littledan pointed out is: what should the JS API do? If the point of this start_ (better name TBD) function is to be called immediately after exports are visible and WebAssembly.instantiate() or new WebAssembly.Instance() simply calls start_ right after start and before returning the new instance to the caller, the exports are not in fact yet visible.

One idea is that these instantiation functions can take a callback that is passed the just-created exports object and called before the start_ function. So, looking at the Node.js example, you could write:

new WebAssembly.Instance(compiled, reflect.imports, { expose(exports) {
  for (const expt of Object.keys(exports))
    reflect.exports[expt].set(exports[expt]);

  if (usesWasi && exports.memory)
    wasi.setMemory(exports.memory);
});

Thinking about this from an engine impl pov, this expose callback is actually symmetric to how we'd implement ESM-integration: the engine's wasm::Module::instantiate routine would take a native callback that was called between start and start_ and the ESM loader would use this callback to make the exports live in their respective ESM bindings. So really, we'd just be exposing this ESM-integration callback to JS when the lower-level JS API was used.

@littledan @binji WDYT?

devsnek · 2020-03-05T18:26:57Z

i suggested this callback idea somewhere ~~(i honestly have no idea where, it was a while ago) and reasons were presented that it was suboptiomal. i'm trying to find it but i haven't yet.~~ (found it, WebAssembly/WASI#82, on the subject of memory init and resizing, ran into issues with threading, apparently not as relevant as i remembered :P)

more recently, i was under the impression that the plan to fix this was to support passing live slices of memory (fs_read(fd, buf) instead of fs_read(fd, u32IndexToBuf)), or whatever other things are possible to be imported/exported from wasm

lukewagner · 2020-03-05T18:44:29Z

Great question: passing live slices would indeed avoid the need for the wasi.setMemory(). And native wasm ESM-integration would remove the rest of it, I expect. But I think the general problem of "I'm using the JS API to instantiate but I need to do something to make the exports "live" (for whatever my definition of "live" is) before the _start function is called" remains.

littledan · 2020-04-27T11:14:07Z

I like this line of investigation, and it seems like that could solve this particular issue with WASI but I don't really feel confident answering it in the air. I think, to proceed to make this a standard rather than a WASI-specific loader, it'd be best if we could do a survey of Wasm toolchains to figure out their initialization patterns, and see if this would solve their problems. I don't have time to do this sort of survey right now, personally.

verbessern · 2020-04-27T12:32:23Z

The start function may accept an argument: what to be started. Then the function name will remain. The default argument would be as the specification is defined now: lets say "START_DEFAULT" = "INIT+COMPLETE". Then if you want to break the start to few stages, have "START_INIT", "START_COMPLETE" and so on called separately.

Else I don't see any limitation of the start function as it is now, when its idea is to only initialize the basic functionality of the module for itself. Then, however, practically there is no function that will 'run' the module after its initialization is fully complete in a standard way. One run/main function could be added to facilitate that.

The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section

The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. \#### Related activities In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. \#### Considered alternatives It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. \#### Process We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section

The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. Related activities ------------------ In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. Considered alternatives ----------------------- It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. Process ------- We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section

…203) * Add a toolchain-independent ABI document, and propose `_initialize` The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should. WASI has [defined a convention] using an exported function named `_initialize`, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way. To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the `_initialize` function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed. Related activities ------------------ In the component model, there is a proposal to add a [second initialization phase]. If that's done, then component-model toolchains could arrange for this `_initialize` function to be called automatically by this second initialization mechanism. Considered alternatives ----------------------- It is tempting to use the [Wasm start function] for C++ constructors; this has been [extensively discussed], and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports. It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step. Wasm-ld has a [`__wasm_call_ctors` function], and in theory we could use that name instead of `_initialize`, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use `_initialize` as the exported function, which may call `__wasm_call_ctors` in its body. Process ------- We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process: - I'm starting by posting this PR here, and people can comment on it. If a better alternative emerges, I'll close this PR. - After discussion here settles, if a better alternative hasn't emerged, I plan to request a CG meeting agenda item to present this topic to the CG, and seek feedback there, to ensure that it has CG-level visibility. - If the CG is in favor of it, then I'd propose we merge this PR. [defined a convention]: https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md [second initialization phase]: WebAssembly/component-model#146 [Wasm start function]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-start [extensively discussed]: WebAssembly/design#1160 [`__wasm_call_ctors` function]: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#start-section * Rename to "Basic Module ABI". * Update BasicModuleABI.md Co-authored-by: Derek Schuff <[email protected]> * Explain when we can and can't use the wasm start function. --------- Co-authored-by: Derek Schuff <[email protected]>

sunfishcode mentioned this issue Dec 4, 2017

Use wasm start-function linking and __cxa_atexit WebAssembly/tool-conventions#25

Closed

sunfishcode mentioned this issue Jan 30, 2019

Support static constructors (ctors) rustwasm/wasm-bindgen#1216

Open

sbc100 mentioned this issue May 17, 2019

memory access for js callees #1276

Closed

lukewagner mentioned this issue May 29, 2019

June 2019 Meeting Agenda WebAssembly/meetings#398

Closed

littledan mentioned this issue Jun 21, 2019

Wasm/ESM integration and additional start/main functions WebAssembly/esm-integration#32

Open

rossberg mentioned this issue Sep 19, 2019

[spec] Rename start function WebAssembly/spec#1073

Closed

lukewagner mentioned this issue Jul 22, 2020

allow parent functions to be passed to nested instances? WebAssembly/module-linking#12

Closed

lukewagner mentioned this issue Dec 20, 2022

Consider adding second initialization phase after start WebAssembly/component-model#146

Open

sbc100 mentioned this issue Jan 9, 2023

Don't use sbrk(0) to determine the initial heap size WebAssembly/wasi-libc#377

Merged

sunfishcode mentioned this issue Mar 6, 2023

Add a toolchain-independent ABI document, and propose _initialize WebAssembly/tool-conventions#203

Merged

sbc100 mentioned this issue Aug 9, 2024

wasi calls from wasm start function WebAssembly/WASI#616

Open

sunfishcode added the imports/exports label Dec 4, 2024

Limitations of start function with exported memory #1160

Limitations of start function with exported memory #1160

Comments

sbc100 commented Nov 30, 2017

jfbastien commented Nov 30, 2017

NWilson commented Nov 30, 2017 • edited Loading

lukewagner commented Nov 30, 2017

sbc100 commented Nov 30, 2017

jfbastien commented Nov 30, 2017

sbc100 commented Nov 30, 2017

rossberg commented Dec 5, 2017

sbc100 commented Dec 5, 2017

sbc100 commented May 1, 2018

sbc100 commented May 17, 2019

devsnek commented May 18, 2019 • edited Loading

sbc100 commented May 20, 2019

sbc100 commented May 20, 2019

rossberg commented May 20, 2019 • edited Loading

sbc100 commented May 20, 2019

rossberg commented May 20, 2019 • edited Loading

devsnek commented May 20, 2019 • edited Loading

rossberg commented May 20, 2019

devsnek commented May 20, 2019 • edited Loading

littledan commented May 20, 2019

lukewagner commented May 20, 2019

devsnek commented May 20, 2019

rossberg commented May 21, 2019

lukewagner commented May 21, 2019

devsnek commented May 21, 2019 • edited Loading

lukewagner commented May 21, 2019

devsnek commented May 21, 2019

rossberg commented May 22, 2019

lukewagner commented May 22, 2019

rossberg commented May 23, 2019

devsnek commented May 23, 2019 • edited Loading

lukewagner commented May 23, 2019

guybedford commented May 25, 2019

lukewagner commented May 28, 2019

lukewagner commented May 29, 2019 • edited Loading

littledan commented Jun 11, 2019 • edited Loading

axic commented Jun 12, 2019 • edited Loading

littledan commented Jun 21, 2019

sbc100 commented Aug 7, 2019

lukewagner commented Mar 5, 2020

devsnek commented Mar 5, 2020 • edited Loading

lukewagner commented Mar 5, 2020

littledan commented Apr 27, 2020

verbessern commented Apr 27, 2020 • edited Loading

NWilson commented Nov 30, 2017 •

edited

Loading

devsnek commented May 18, 2019 •

edited

Loading

rossberg commented May 20, 2019 •

edited

Loading

rossberg commented May 20, 2019 •

edited

Loading

devsnek commented May 20, 2019 •

edited

Loading

devsnek commented May 20, 2019 •

edited

Loading

devsnek commented May 21, 2019 •

edited

Loading

devsnek commented May 23, 2019 •

edited

Loading

lukewagner commented May 29, 2019 •

edited

Loading

littledan commented Jun 11, 2019 •

edited

Loading

axic commented Jun 12, 2019 •

edited

Loading

devsnek commented Mar 5, 2020 •

edited

Loading

verbessern commented Apr 27, 2020 •

edited

Loading