Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling BSS data (can we remove DataSize from linking metadata?) #33

Closed
sbc100 opened this issue Dec 18, 2017 · 14 comments
Closed

Handling BSS data (can we remove DataSize from linking metadata?) #33

sbc100 opened this issue Dec 18, 2017 · 14 comments
Assignees

Comments

@sbc100
Copy link
Member

sbc100 commented Dec 18, 2017

We currently proposed that BSS be encoded by simply leaving gaps, in the memory not filled by data segments. Trailing bss would then be handled by a cap at the end and specified using the DataSize field in the linking metadata.

However, we llvm doesn't' currently implement it this way and always encodes bss as actual zeros in a data segment.

I think that for wasm we have can't rely on non-initialized data to represent bss, becasue we can't really host providing clean memory. So for now I propose that we just handle bss like any other data (this is what llvm does today anyway), and in the future we could optimize by adding a flag to a data segment to specify that its zero initialized, or some other approach (perhaps explicitly using the proposed bulk memory operations).

As a result of this I propose that we remove the DataData field from the linking metadata:
https://reviews.llvm.org/D41366

@kripken
Copy link
Member

kripken commented Dec 18, 2017

The memory is guaranteed to be initialized to zero by the wasm spec. The only risk is if the memory is imported and the loader has loaded something else in that space already, so in dynamic linking this needs some care. But in the simple case, avoiding emitting zeros can shrink the binary quite a bit, and e.g. binaryen optimizes that.

@sbc100
Copy link
Member Author

sbc100 commented Dec 18, 2017

Yes, thats exactly what i mean. dll's being the obvious use case, but who knows that the host code might have done with the memory before passing to wasm. Basically the tooling probably shouldn't rely on the the cleanness of the memory at instantiate time.

@kripken
Copy link
Member

kripken commented Dec 18, 2017

It would be a shame not to, though, since it means bigger binaries? The only cost is the loader zeroing reused memory.

@binji
Copy link
Member

binji commented Dec 18, 2017

We'll likely have the ability to programmatically zero memory soon, either via the bulk memory operations proposal or conditional segment initialization.

@sbc100
Copy link
Member Author

sbc100 commented Dec 18, 2017

I agree we don't want to be shipping zeros in the final binary. Perhaps the linker can have flag for this until there is better runtime support. Removing DATA_SIZE is mostly concerted with the intermediate object file format here (the files that have the linking metadata) rather than the final output binary.

@NWilson
Copy link
Contributor

NWilson commented Dec 19, 2017

Funnily enough I opened an LLVM bug for this exact issue just a few days ago:
https://bugs.llvm.org/show_bug.cgi?id=35621

I had reasoned that Wasm memory always starts off at zero, so zero sections could be omitted in the output. I think it's safe to reason that if someone imports a Memory object, it's up to the user to make sure that the Memory is zero'd, as it would be if it were not imported.

@sunfishcode
Copy link
Member

It's not clear to me why hosts should be permitted to instantiate modules with unclean memory. Modules declare how much memory they want, and which bits of it have initializers, so any remaining bits are clearly intended to be zero. Is there more to it?

@binji
Copy link
Member

binji commented Dec 19, 2017

If an instance imports memory and shares memory with another instance then it isn't clear which bits should be zeroed and which bits shouldn't be touched. Are we assuming that it is the responsibility of the loader to know which bits should be zeroed? That seems weird to me.

@kripken
Copy link
Member

kripken commented Dec 19, 2017

For dynamic linking we defined it like this:

If the dynamic library has memorysize > 0 then the loader will reserve room in memory of that size and initialize it to zero (note: can be larger than the memory segments in the module, if the dynamic library wants additional space)

The library is told where its memory is, and it declares how much it needs, so the loader knows what to zero.

For static linking maybe it's weird, though...

@binji
Copy link
Member

binji commented Dec 20, 2017

I guess I like the idea that modules can be more independent from the host. Since they know exactly how much memory should be zeroed, they can just do it themselves. It also means that in the shared library case, the loader won't unnecessarily zero memory that will immediately be overwritten by data in a module's data segment.

That said, it also means that in the common case (1 instance, 1 memory) the instance will unnecessarily zero memory that is already zero.

@dschuff
Copy link
Member

dschuff commented Jan 9, 2018

I think we want to optimize for small binaries and fast loading. That means not shipping zeros in binaries, and not redundantly zeroing memory. For instantiating static binaries, the engine already guarantees clean memory (and having everything zeroed goes along with the determinism goals). Effectively the zeroing is part of the VM's ABI contract. In that case I can't think of any reason why we would want to do extra zeroing. For dynamic linking, I think we can be consistent with that behavior, and just declare that it's part of the linking ABI that a module be instantiated into clean memory. Then, in the common case of pre-run loading, the loader (assuming it's not just our previously-discussed instantiateGroup API) can just directly instantiate the modules without doing any extra zeroing at at all. This way everything is consistent, and the one implicit assumption (the ABI requirement the memory be cleared) is common to static and dynamic loading (and other platforms IIUC).

@binji
Copy link
Member

binji commented Jan 9, 2018

Yes, looking back at my comments, I'm not sure what I was concerned about. :-) Probably conflating this with multi-threaded shared memory dynamic linking or something.

@sbc100
Copy link
Member Author

sbc100 commented Jan 20, 2018

So it seems like the status quo is OK with people? i.e. the wasm binary specifies the its BSS size storeing the DataSize in the linking metadata section.. anything between the data segments and DataSize can be assumed by the code to be zero on startup.

@sbc100 sbc100 closed this as completed Jan 20, 2018
@NWilson
Copy link
Contributor

NWilson commented Jan 20, 2018

Yes - but the LLVM issue is still open to actually take advantage of the zeros and not emit them in the Wasm files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants