Skip to content
This repository has been archived by the owner on Nov 24, 2022. It is now read-only.

Avoid loading all object files at link-time #665

Open
TerrorJack opened this issue May 27, 2020 · 0 comments · May be fixed by #666
Open

Avoid loading all object files at link-time #665

TerrorJack opened this issue May 27, 2020 · 0 comments · May be fixed by #666
Assignees

Comments

@TerrorJack
Copy link
Member

TerrorJack commented May 27, 2020

Is your feature request related to a problem? Please describe.
We've been bothered by the excessive memory usage of the linker for a long time. There has been various tricks (e.g. lazy serialization of Map values) and efforts (e.g. #605) to improve it. In this ticket, we seek a deeper refactoring that addresses a technical debt: needing to load all objects at link-time.

Until now, the linker codebase mostly works with the AsteriusModule type which consists of several symbol-indexed immutable hash maps. This makes implementation simple, but retains a lot of heap data. This is especially true for the input to the gc-sections pass, since most input entities will be thrown away, but they had to be present in the memory for some time.

In the past, we used laziness to reduce memory consumption here. The hash maps in AsteriusModule has lazy values; when deserializing, each lazy value is a thunk which will decode to the actual entity, and the thunks retain the reference to the whole ByteString which is the object file content. This will take a lot less memory and time than strictly deserializing and having the decoded entity in the heap. Still, we have to perform this kind of lazy-loading for all object files, and this requires at least as much memory as the file sizes of input archives and objects.

Production linkers like lld don't load all object files at link-time. It's about the time we do the same and replace the previous "lazy loading" trick with a more decent solution.

Describe the solution you'd like

  • Refactor the AsteriusCachedModule type and serialization logic. The new logic should support:
    • Only reading the symbol dependency map, without touching the rest of the file
    • Given an entity's symbol, deserialize it starting from a certain offset of the file, without reading the whole file.
  • The gc sections pass is split into smaller passes:
    • Given the input (archives, objects, in-memory AsteriusModule), scan the files and build up the symbol dependency map. After this completes, there shouldn't be heap resident blobs of input file contents. It's also possible to include a separate archive-level symbol index file into the archive, since we have our own ahc-ar now; but directly scanning the archive entries should have the same effect.
    • Run gc sections using the symbol dependency map
    • We now have a list of symbols, and also the origins of these symbols (which object file and what offset into that file). Deserialize the entities indexed by those symbols and form the AsteriusModule as the gc section output. Before we introduce proper streaming logic into the rest of linker logic, the entities should still be lazily decoded; each thunk will read its own object file and perform decoding on its offset.

This should further reduce linker memory usage by avoiding retaining object file content blobs in memory.

Describe alternatives you've considered
Another potential option we've considered: using lld for linking, and adapt to the wasm linking convention as described here. This means we need to emit object files which are valid .wasm modules themselves, containing additional custom sections describing the linking-related metadata.

A major problem with using lld is: we have different atoms (entities) compared to vanilla wasm. lld knows about data segments and functions, but we have JSFFI imports/exports that are also indexed by symbols and need to be garbage collected. It is yet unclear how to make lld garbage-collect such custom entities and report the result.

Another feature which lld lacks is graceful handling of unresolved symbols. We have the Barf primitive which is capable of producing a meaningful runtime error message for unresolved symbols, but lld will only emit unreachable.

And finally: we may still want to perform link-time rewriting on the tree-formed IR in certain cases. There will be need to store the serialized tree IR as custom sections, and it's also unclear how to make lld handle those as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants