-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an explicit symbol table #38
Comments
Changing the scheme where Symbols are assembled from imports/exports would be quite a major refactor. After my changes, every Symbol is modelled with exactly one import or export, which makes it fairly neat. Overall though I am fairly positive about the idea of a "symtab" section. Pros:
Cons:
I could be convinced either way. Having just written a series of patches it would nice to see them merged, since I feel they are useful incremental improvements to the current situation. I can see some advantages to the new idea though. Are there any direct benefits to the "symtab" idea? I can't see that it actually enables anything new/better. It wouldn't even add to robustness, it's just a cleaner way of representing the same data in the Wasm object, by moving it one-to-one from one place to another in the file. (This isn't a disagreement, just to check I haven't missed a benefit.) |
Another thought: we may not have much time remaining to make this change easily! I've noticed that binaryen and the Rust toolchain have experimental support for using the current linking metadata. It's not simply an internal contract between Clang and LLD, so doing a big change to the format will cause problems down the line for dependent projects (eg Emscripten), and possibly require coordination with them? (The intention with my "wasm index space vs symbol index space" changes was to preserve indexes; the indexes will only change for function aliases, and other indexes will all stay the same. So disruption to downstream tools should be rather minimal.) |
Its actually the fact that the proposed symbol index space is almost by not quite the same as the existing index space that I find a little disconcerting. If we are going to use different index space I'd rather be explicit about it and not have it match or be mistaken in any way for the existing index space. This is one reason for proposing that clean break and explicit symbol table. You are right about the format though. We do want to minimize churn and breaking changes. I would be food to reach out to our major consumers so we can coordinate with them or at least let them know when we place to make a breaking change. My hope is that we will soon reach a point were we will commit to more stability. Do you have contact with any of the non-emscripten external users? We should try to make sure they join the conversation, or are at least monitoring this repo. |
@NWilson Hm, not sure what do you mean by rust having experimental support for using the current linking metadata |
Hi @pepyakin, do you work on the rust wasm stuff? I've been meaning to dig into it and see how it currently works. Even if rust is currently using the linking metadata I imagine we will see use cases where the intermediate object files are being read by tools other than lld. This is why it have the Linking.md "spec" so that others can build on it. I did hear that the rust people (you?) wrote some kind of compressor (GC) for wam files, and I'm hoping that functionality will be subsumed soon by lld's gc-sections (which will be the default I hope). |
@sbc100 I think you might mean https://github.com/alexcrichton/wasm-gc ? cc @alexcrichton |
I'd say I'm just hanging around rust and wasm stuff :)
You probably mean https://github.com/alexcrichton/wasm-gc |
Ah if you're curious feel free to ask any questions! At a high level though it currently looks like this:
and that's it! Lots of hacks, lots of room for improvement. Very much want to use |
Ok, I'm just spreading FUD then! Sounds like there aren't any current users of the Linking Data except LLD. I'm not too worried about the compat implications of my changes - after all, you merged the alt index stuff quite rapidly, and that's just as damaging to any people who might be implementing the currently documenting model, since any downstream consumers again need to match the change you made in LLD. |
I've had another fresh thought on this... What about shared objects (".so" style Wasm)? I haven't followed all the latest planning, but it seems to me like relocatable Wasm files currently aren't far off from working like that: we have a Wasm import/export for each function symbol, which makes the modules quite usable from another one. And, the globals (that you don't like above) represent the relocatable memory addresses - you said "it's poor modelling to use globals for constants which are never read using a get_global instruction" - but perhaps for code compiled with It does bother me vaguely to take a useful idea like "Symbol", which runtime code (dynamic linker) might want to interact with, and move that to be modelled by a custom section that's not so easily available at runtime. For functions at least, using imports+exports is elegant, and that's how the symbols are exposed to JS, so it is nice to use the matching idea for sharing the same symbols between Clang and LLD. After all, LLD's symbols (that it exports) are precisely the same symbols that Clang is trying to export; the frontend and backend don't have an particularly different notion "callable symbol". (Finally - if you want a clean break for compatibility reasons, changing the section name to I really hope I'm not becoming attached to the way things just because of the tunnel vision of working on it! If a decision is made to create an explicit symtab section, I'd be happy to help implement it or test it. |
Dynamic libraries are definitely something we want to support. We haven't done much design work on that at all yet. And as you noted, it would really benefit from a more stable ABI. I'm not really worried about breaking ABIs just yet, since we are still getting everything working (and also I think the static linking ABI is a much lower bar for breaks than e.g. the C ABI or dyamic linking ABI), but it will be more important as we get more users and more things work, which we are very close to. Hopefully we've put enough public caveats and our current users are following us closely enough that they won't be too blindsided by changes. |
After some chat in IRC, I'm convinced! I'm working on a spec of what the symtab section might look like; see https://github.com/NWilson/tool-conventions/pull/1/files for the ideas so far. Then my various incremental patches might be merged, and Sam or I will come up with a prototype of the new symtab section. |
I have code in cib which deals heavily in this. Right now it's also tied to D41472. I knew major changes were coming when I wrote it, so I plan to update it once all this settles down. |
What is "cib"? If its open source do you have a link to the code. I'm thinking it might be good idea to list all the current users/implementations in Linking.md so we can be responsible about breaking changes. |
Clang In Browser. I haven't picked a license yet; I'm leaning toward the liberal ones. https://github.com/tbfleming/cib It uses the linking info in 2 places right now:
|
Can close this, I think. |
This change modified lld to in response the llvm change which moved to a more explicit symbol table in the object format. Based on patches by Nicholas Wilson: 1. https://reviews.llvm.org/D41955 2. https://reviews.llvm.org/D42585 The primary difference that we see in the test output is that for relocatable (-r) output we now have symbol table which replaces exports/imports and globals. See: WebAssembly/tool-conventions#38 Differential Revision: https://reviews.llvm.org/D43264 llvm-svn=325861
In the current spec and llvm implementation we are modeling the symbol table based on imports and exports of functions and globals.
While this has worked fairly well it has caused us a few headaches. Specifically around weak symbols which we currently model as both and import and an export.
Also, the modeling of data symbols as wasm globals doesn't make much sense. We are using wasm globals here only to link names to addresses (not for use in set_global/get_global instructions which is their purpose).
We then add extra metadata via the "syminfo" subsection of the "linking" section.
Rather than trying to continue to build an implicit symbol table from imports and exports I propose that we bite the bullet and just be explicit about it. This would be things clearer and allow us to completely remove the imports and exports of globals which we currently use to model data addresses.
My proposal would be to add a new "symtab" section (or subsection?) that would replace the current "syminfo" subsection. Every symbol in the object would require an entry in this table. For data symbols the name would be included directly in the table. For function symbols the name could refer to function import or export so avoid duplicating the name. The symbol index would then be explicit and all relocations would then refer to symbols in this table. The exception would be type relocations for which there is no name or symbol yet.. but we could consider making symbols for types too.
The text was updated successfully, but these errors were encountered: