Handling BSS data (can we remove DataSize from linking metadata?) #33

sbc100 · 2017-12-18T20:51:13Z

We currently proposed that BSS be encoded by simply leaving gaps, in the memory not filled by data segments. Trailing bss would then be handled by a cap at the end and specified using the DataSize field in the linking metadata.

However, we llvm doesn't' currently implement it this way and always encodes bss as actual zeros in a data segment.

I think that for wasm we have can't rely on non-initialized data to represent bss, becasue we can't really host providing clean memory. So for now I propose that we just handle bss like any other data (this is what llvm does today anyway), and in the future we could optimize by adding a flag to a data segment to specify that its zero initialized, or some other approach (perhaps explicitly using the proposed bulk memory operations).

As a result of this I propose that we remove the DataData field from the linking metadata:
https://reviews.llvm.org/D41366

kripken · 2017-12-18T22:00:36Z

The memory is guaranteed to be initialized to zero by the wasm spec. The only risk is if the memory is imported and the loader has loaded something else in that space already, so in dynamic linking this needs some care. But in the simple case, avoiding emitting zeros can shrink the binary quite a bit, and e.g. binaryen optimizes that.

sbc100 · 2017-12-18T22:12:00Z

Yes, thats exactly what i mean. dll's being the obvious use case, but who knows that the host code might have done with the memory before passing to wasm. Basically the tooling probably shouldn't rely on the the cleanness of the memory at instantiate time.

kripken · 2017-12-18T22:20:26Z

It would be a shame not to, though, since it means bigger binaries? The only cost is the loader zeroing reused memory.

binji · 2017-12-18T22:30:26Z

We'll likely have the ability to programmatically zero memory soon, either via the bulk memory operations proposal or conditional segment initialization.

sbc100 · 2017-12-18T23:26:08Z

I agree we don't want to be shipping zeros in the final binary. Perhaps the linker can have flag for this until there is better runtime support. Removing DATA_SIZE is mostly concerted with the intermediate object file format here (the files that have the linking metadata) rather than the final output binary.

NWilson · 2017-12-19T21:37:39Z

Funnily enough I opened an LLVM bug for this exact issue just a few days ago:
https://bugs.llvm.org/show_bug.cgi?id=35621

I had reasoned that Wasm memory always starts off at zero, so zero sections could be omitted in the output. I think it's safe to reason that if someone imports a Memory object, it's up to the user to make sure that the Memory is zero'd, as it would be if it were not imported.

sunfishcode · 2017-12-19T22:52:22Z

It's not clear to me why hosts should be permitted to instantiate modules with unclean memory. Modules declare how much memory they want, and which bits of it have initializers, so any remaining bits are clearly intended to be zero. Is there more to it?

binji · 2017-12-19T23:18:46Z

If an instance imports memory and shares memory with another instance then it isn't clear which bits should be zeroed and which bits shouldn't be touched. Are we assuming that it is the responsibility of the loader to know which bits should be zeroed? That seems weird to me.

kripken · 2017-12-19T23:35:08Z

For dynamic linking we defined it like this:

If the dynamic library has memorysize > 0 then the loader will reserve room in memory of that size and initialize it to zero (note: can be larger than the memory segments in the module, if the dynamic library wants additional space)

The library is told where its memory is, and it declares how much it needs, so the loader knows what to zero.

For static linking maybe it's weird, though...

binji · 2017-12-20T00:03:22Z

I guess I like the idea that modules can be more independent from the host. Since they know exactly how much memory should be zeroed, they can just do it themselves. It also means that in the shared library case, the loader won't unnecessarily zero memory that will immediately be overwritten by data in a module's data segment.

That said, it also means that in the common case (1 instance, 1 memory) the instance will unnecessarily zero memory that is already zero.

dschuff · 2018-01-09T17:45:55Z

I think we want to optimize for small binaries and fast loading. That means not shipping zeros in binaries, and not redundantly zeroing memory. For instantiating static binaries, the engine already guarantees clean memory (and having everything zeroed goes along with the determinism goals). Effectively the zeroing is part of the VM's ABI contract. In that case I can't think of any reason why we would want to do extra zeroing. For dynamic linking, I think we can be consistent with that behavior, and just declare that it's part of the linking ABI that a module be instantiated into clean memory. Then, in the common case of pre-run loading, the loader (assuming it's not just our previously-discussed instantiateGroup API) can just directly instantiate the modules without doing any extra zeroing at at all. This way everything is consistent, and the one implicit assumption (the ABI requirement the memory be cleared) is common to static and dynamic loading (and other platforms IIUC).

binji · 2018-01-09T20:29:30Z

Yes, looking back at my comments, I'm not sure what I was concerned about. :-) Probably conflating this with multi-threaded shared memory dynamic linking or something.

sbc100 · 2018-01-20T00:50:33Z

So it seems like the status quo is OK with people? i.e. the wasm binary specifies the its BSS size storeing the DataSize in the linking metadata section.. anything between the data segments and DataSize can be assumed by the code to be zero on startup.

NWilson · 2018-01-20T00:54:56Z

Yes - but the LLVM issue is still open to actually take advantage of the zeros and not emit them in the Wasm files.

sbc100 assigned sbc100, binji and sunfishcode Dec 18, 2017

sbc100 mentioned this issue Dec 18, 2017

Encoding data segment alignment and BSS size #12

Closed

sbc100 closed this as completed Jan 20, 2018

pepyakin mentioned this issue Apr 8, 2020

wasmtime: fix the memory zeroing for imported memories paritytech/substrate#5036

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling BSS data (can we remove DataSize from linking metadata?) #33

Handling BSS data (can we remove DataSize from linking metadata?) #33

sbc100 commented Dec 18, 2017

kripken commented Dec 18, 2017

sbc100 commented Dec 18, 2017

kripken commented Dec 18, 2017

binji commented Dec 18, 2017

sbc100 commented Dec 18, 2017

NWilson commented Dec 19, 2017

sunfishcode commented Dec 19, 2017

binji commented Dec 19, 2017

kripken commented Dec 19, 2017

binji commented Dec 20, 2017 •

edited

Loading

dschuff commented Jan 9, 2018

binji commented Jan 9, 2018

sbc100 commented Jan 20, 2018

NWilson commented Jan 20, 2018

Handling BSS data (can we remove DataSize from linking metadata?) #33

Handling BSS data (can we remove DataSize from linking metadata?) #33

Comments

sbc100 commented Dec 18, 2017

kripken commented Dec 18, 2017

sbc100 commented Dec 18, 2017

kripken commented Dec 18, 2017

binji commented Dec 18, 2017

sbc100 commented Dec 18, 2017

NWilson commented Dec 19, 2017

sunfishcode commented Dec 19, 2017

binji commented Dec 19, 2017

kripken commented Dec 19, 2017

binji commented Dec 20, 2017 • edited Loading

dschuff commented Jan 9, 2018

binji commented Jan 9, 2018

sbc100 commented Jan 20, 2018

NWilson commented Jan 20, 2018

binji commented Dec 20, 2017 •

edited

Loading