-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling BSS data (can we remove DataSize from linking metadata?) #33
Comments
The memory is guaranteed to be initialized to zero by the wasm spec. The only risk is if the memory is imported and the loader has loaded something else in that space already, so in dynamic linking this needs some care. But in the simple case, avoiding emitting zeros can shrink the binary quite a bit, and e.g. binaryen optimizes that. |
Yes, thats exactly what i mean. dll's being the obvious use case, but who knows that the host code might have done with the memory before passing to wasm. Basically the tooling probably shouldn't rely on the the cleanness of the memory at instantiate time. |
It would be a shame not to, though, since it means bigger binaries? The only cost is the loader zeroing reused memory. |
We'll likely have the ability to programmatically zero memory soon, either via the bulk memory operations proposal or conditional segment initialization. |
I agree we don't want to be shipping zeros in the final binary. Perhaps the linker can have flag for this until there is better runtime support. Removing DATA_SIZE is mostly concerted with the intermediate object file format here (the files that have the linking metadata) rather than the final output binary. |
Funnily enough I opened an LLVM bug for this exact issue just a few days ago: I had reasoned that Wasm memory always starts off at zero, so zero sections could be omitted in the output. I think it's safe to reason that if someone imports a Memory object, it's up to the user to make sure that the Memory is zero'd, as it would be if it were not imported. |
It's not clear to me why hosts should be permitted to instantiate modules with unclean memory. Modules declare how much memory they want, and which bits of it have initializers, so any remaining bits are clearly intended to be zero. Is there more to it? |
If an instance imports memory and shares memory with another instance then it isn't clear which bits should be zeroed and which bits shouldn't be touched. Are we assuming that it is the responsibility of the loader to know which bits should be zeroed? That seems weird to me. |
For dynamic linking we defined it like this:
The library is told where its memory is, and it declares how much it needs, so the loader knows what to zero. For static linking maybe it's weird, though... |
I guess I like the idea that modules can be more independent from the host. Since they know exactly how much memory should be zeroed, they can just do it themselves. It also means that in the shared library case, the loader won't unnecessarily zero memory that will immediately be overwritten by data in a module's data segment. That said, it also means that in the common case (1 instance, 1 memory) the instance will unnecessarily zero memory that is already zero. |
I think we want to optimize for small binaries and fast loading. That means not shipping zeros in binaries, and not redundantly zeroing memory. For instantiating static binaries, the engine already guarantees clean memory (and having everything zeroed goes along with the determinism goals). Effectively the zeroing is part of the VM's ABI contract. In that case I can't think of any reason why we would want to do extra zeroing. For dynamic linking, I think we can be consistent with that behavior, and just declare that it's part of the linking ABI that a module be instantiated into clean memory. Then, in the common case of pre-run loading, the loader (assuming it's not just our previously-discussed |
Yes, looking back at my comments, I'm not sure what I was concerned about. :-) Probably conflating this with multi-threaded shared memory dynamic linking or something. |
So it seems like the status quo is OK with people? i.e. the wasm binary specifies the its BSS size storeing the DataSize in the linking metadata section.. anything between the data segments and DataSize can be assumed by the code to be zero on startup. |
Yes - but the LLVM issue is still open to actually take advantage of the zeros and not emit them in the Wasm files. |
We currently proposed that BSS be encoded by simply leaving gaps, in the memory not filled by data segments. Trailing bss would then be handled by a cap at the end and specified using the DataSize field in the linking metadata.
However, we llvm doesn't' currently implement it this way and always encodes bss as actual zeros in a data segment.
I think that for wasm we have can't rely on non-initialized data to represent bss, becasue we can't really host providing clean memory. So for now I propose that we just handle bss like any other data (this is what llvm does today anyway), and in the future we could optimize by adding a flag to a data segment to specify that its zero initialized, or some other approach (perhaps explicitly using the proposed bulk memory operations).
As a result of this I propose that we remove the DataData field from the linking metadata:
https://reviews.llvm.org/D41366
The text was updated successfully, but these errors were encountered: