Pondering the Stack and Globals #88

ncbray · 2015-05-28T17:35:07Z

There's no direct question here, just pondering how things fit together.

For the most part, variables will live in the "shadow stack" (the JS stack, for many implementations). Manipulation of this stack is entirely implicit. Do a function call? New frame on the shadow stack. There will be some cases where the shadow stack insufficient, however. For example, address taken of a stack local variable. To support this, a "user space" stack is needed that lives in the visible heap. OK... so how is this user space stack implemented? Is it explicitly compiled into the program? Or is it supported by the system and are there special opcodes?

If the user stack is an explicit part of the program, then how do you implement it? I believe Emscripten uses a global variable: STACKTOP. OK, that's kind of weird if you think about it... it's sort of a "shadow global". Can't take the address of it. Spiritually similar to the stack pointer on CPUs. A virtual register that is not confined to a particular stack frame? OK, how do those virtual registers get initialized? For example, when you launch a thread? I suppose they could be parameters passed in to thread initialization. (Although this raises the question of what the system/user interface for thread creation looks like.) Alternatively, you could just store STACKTOP in the user visible heap. Much simpler, doesn't require a separate concept. (The concept of shadow globals may be desirable. It isn't necessary, however.)

On the other hand - where do you store the thread-local STACKTOP global in user space? Some implementations may lower shadow globals into the heap, anyways, so the question of where the globals are allocated is relevant in multiple situations. If memory allocation is an explicit part of the program... how does the allocator know what parts of memory are safe to use vs. grabbed by some lower-level part of the system? Does there need to be a system-level "page allocation" API that user-level memory allocators build on top of?

JF mentioned an example of split stacks being added to LLVM: http://reviews.llvm.org/D6095

kripken · 2015-05-28T17:55:56Z

In general I think a reasonable approach is indeed to let the compiled program manage its own "user stack". Global variables in a wasm module are indeed kind of special, they cannot alias the rest of the heap. I feel like that's a nice feature. And if web workers are "threads", then those variables are basically a form of thread-local storage, and they are initialized when the module is initialized. And otherwise memory usage should be normal as per other platforms (malloc must be threadsafe, etc.).

There might be better approaches, though. I've worried that a user-handled stack like that might have overhead over "normal" native compilation, but I've never had an idea as to how to measure that. My hope though is that usage of that stack should be fairly rare, as scalarrepl should eliminate stack vars in most cases.

lukewagner · 2015-05-28T18:20:28Z

Just to go into a bit more detail:

wasm v.1 has globals and thus, in v.1, STACKTOP would just be an ordinary global, not specially recognized by wasm semantics.
when threads are added (right after v.1), globals would be allowed to be declared to be thread-local so STACKTOP would just be a normal thread-local global.
In various C/C++ cases (varargs, complicated object arguments), the user stack would end up effectively being part of an ABI. This was discussed in specifying the C/C++ ABI (but maybe not in v.1) #67 and the consensus seems to be that it would be beneficial to standardize this ABI as part of the spec when dynamic linking is added to wasm.

An alternative is to specifically incorporate the user-defined stack into semantics (e.g., by specially recognizing STACKTOP), but I haven't yet seen a strong argument for the performance win this would allow. Certainly open to discussing more, though.

ncbray · 2015-05-28T20:03:05Z

OK, let's get weird. What happens when dynamic linking is a thing?

Assuming the status quo, shadow globals would be only visible to a specific combination of thread and module. So... does this mean a user stack needs to be allocated for every thread for every module? How does that happen? NxM madness. Or is there a user-level ABI convention where the user stack pointer gets passed across module boundaries? Wrapper functions to hide this? Or is there a way to share shadow globals between modules? (This would require a somewhat de-optimized JS implementation?)

If STACKTOP is not a shadow global but lives in user space... how does a shared library find it? (We're back to per-thread initialization for each shared library?)

What happens when a shared library wants to create a thread? This means that any user-level thread APIs would also need to be in a shared library? How do shadow globals get initialized on thread creation?

There seem to be 3 workable solutions:

add a standard user stack pointer register to VM.
let shadow globals be (selectively) shared between modules and leave decisions to the ABI.
user stack pointer as cross-module calling convention.
Not taking a position, at this point.

I will say that shadow globals are weird. If they are thread and module local, using them seems to cause complications for num_thread > 1 && num_module > 1?

A random though: how much size could be shaved off by eliminating user stack setup and teardown operations? Building things in can reduce size, in general, in theory. With the obvious downsides.

jfbastien · 2015-05-28T20:17:46Z

We also have to design something that'll make it possible for wasm to eventually support:

Threading.
Dynamic linking.
Coroutines (for languages such as Go, and whatever C++ will eventually have).
User-mode lightweight / cooperative threading.
Precise GC through stack inspection.
Stack unwinding for exception handling.
Stack inspection for crash information.

FWIW I think the design will end up having a safe shadow stack and an untrusted stack. The details will be complicated!

lukewagner · 2015-05-28T20:27:14Z

Based on past discussions, the expectation was to do (2) (dynamically-linked modules can import/export functions and globals, thread-local or shared).

On a side note, I'm not sure "shadow" is the best adjective to describe globals or stacks. At least in my VM experience a "shadow stack" was a stack maintained in parallel with the native stack to hold, e.g., just the GC pointers. But here you're calling the native stack the shadow stack. Perhaps we could have the "trusted" stack and the "user" (or "heap" or "aliased" stack)? For the same reason, "shadow global" doesn't quite make sense; globals aren't aliasable, but they're not a shadow of anything else.

sunfishcode · 2015-07-28T03:57:49Z

Are there questions left that need answers here that aren't covered by #104, #126, and #154?

ncbray · 2015-07-28T19:27:46Z

If there are lingering questions, they'll bubble back up in later discussions.

sunfishcode added the question label May 29, 2015

titzer referenced this issue Jul 9, 2015

Clarify that linear memory sizes are limited by available resources.

57bdfbe

ncbray closed this as completed Jul 28, 2015

yzkuang mentioned this issue Jan 21, 2019

Crash because hxcpp's GC fail to mark a local variable HaxeFoundation/hxcpp#760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pondering the Stack and Globals #88

Pondering the Stack and Globals #88

ncbray commented May 28, 2015

kripken commented May 28, 2015

lukewagner commented May 28, 2015

ncbray commented May 28, 2015

jfbastien commented May 28, 2015

lukewagner commented May 28, 2015

sunfishcode commented Jul 28, 2015

ncbray commented Jul 28, 2015

Pondering the Stack and Globals #88

Pondering the Stack and Globals #88

Comments

ncbray commented May 28, 2015

kripken commented May 28, 2015

lukewagner commented May 28, 2015

ncbray commented May 28, 2015

jfbastien commented May 28, 2015

lukewagner commented May 28, 2015

sunfishcode commented Jul 28, 2015

ncbray commented Jul 28, 2015