-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename argc/argv main
to __main_argc_argv
#134
Conversation
Change the ABI for the user entry point to rename `main` to `__main_argc_argv` when it has argc/argv. This is needed because wasm requires caller and callee signatures to match, so the traditional trick of passing main arguments even if it doesn't expect them doesn't work. LLVM and related tools have been using the name `__original_main` for a similar purpose, but that's a confusing name. There's also a change here in that this is renaming the argc/argv form rather than renaming the nullary form. The choice is somewhat arbitrary, but I think it's slightly nicer to bias the aesthetics toward the no-argument form, because that's the smaller and simpler form.
main
to __nullary_main
.main
to __main_argc_argv
As a heads-up, as I was writing this up, it occurred to me that renaming the argc/argv form would be a little nicer than renaming the no-arg form, but I forgot to adjust the title before submitting the pull request. I've now corrected the title. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, you are proposing that we replace
- main with no params is renamed to
__original_main
. - the exported name is
main
regardless
to
- main with no params stays main
- main with params is called
__main_argc_argv
- the exported name is one of the above two
? Or have I misunderstood?
Yes, that's right. It simplifies things to say that we just rename I also plan to submit a clang patch to do this rename in the frontend, instead of in the LLVM backend, to eliminate the wrapper, which has turned out to be a nuisance in practice. |
In that case I'm not that happy about the part where we export a different name from the wasm module. That's more complicated than the current single export, and is user-visible and perhaps confusing. |
I've now posted a clang patch which implements the It's a name mangling rule, which means:
It's very simple. There's no generated wrapper function, linking works the same way for LTO and non-LTO, and libc code can easily detect whether to link in the command-line argument initialization code by defining its own __attribute__((weak))
int main(void) {
int argc = ...;
char **argv = ...;
return __main_argc_argv(argc, argv);
} It also isn't that different from the |
Sorry @sunfishcode , I'm not sure whether your answer addresses or does not the topic of a different export name? |
It doesn't change the |
I apologize, maybe I wasn't clear before. I wasn't talking about |
Ah; in wasi-sdk, It is possible for a symbol name to differ from the wasm export name -- @sbc100's export attribute patch allows this, for example. So one possible option is to export |
Yes, in non-standalone wasm, emscripten exports I'm not sure what "entrypoint" means, in the text of this PR, if it isn't an export? Did we define it somewhere? |
Clarify the "entrypoint" terminology, and mention that export names may differ from ABI symbol names.
I've now rewritten the wording to give a definition of "entrypoint" and mention that export names may differ from ABI symbol names, and to hopefully be more clear overall. Let me know what you think! |
Right now emscripten has two entry-point modes. The primary mode is where "main" is exported and called directly from JS after the constructors have been called. In this mode there is no The second mode we what we call "standalone wasm mode" where I'm not sure how this change will effect the first/primary mode that emscripten uses, but in general it looks like a good change. We currently export main by passing If we can move emscripten over to always use |
@sbc100 Emscripten changing from The current wording looks mostly reasonable to me, but I don't know the linking process well enough to say if there is any downside to this over the current |
To support the status quo in emscripten we would need way to export a symbol called |
I'm not a fan of the current |
As a possible path forward, I've now updated the clang patch which implements this: https://reviews.llvm.org/D70700 |
I don't quite understand what that patch does (not familiar with that part of clang), but if it doesn't affect emscripten then I have no objection to it myself. (That seems separate from the discussion in this PR, but I don't think we have a clear answer here?) |
Also note that C symbol names are distinct from WebAssembly export | ||
names, which are outside the scope of the C ABI. Toolchains which export | ||
the user entrypoint may chose to export it as the name `main`, even when | ||
the C ABI symbol name is `__main_argc_argv`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can the toolchain support exporting main? Won't this a toolchain that want to do something like --export=main
? We need to some way to export the user entry point and it seems like with this change it because hard to name the user entry point because it has two possible names, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that __attribute__((export_name("main")))
does work with this patch, with and without argv/argc. I get an export named "main" regardless of the internal symbol name.
However, -Wl,--export=main
indeed doesn't work with this patch in the case of main
with argc/argv. One way to fix that would be to introduce a -mexport=main flag to clang and have people use that instead. Clang could translate that into inserting the appropriate export_name
directive. Clang flags are nicer for end users than -Wl,
flags anyway, and it would handle mangled names better in general. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For emscripten can't ask users to decorate the main symbol in the source code.
A new -mexport=main
cflag would have to be applied to every single compilation unit. Seems kind of heavy weight and would require more plumbing to land in llvm.
Actually I think may be able to just pass -Wl,--export=main
and -Wl,--export=__main_argc_argv
since this flag doesn't warn or error if the symbol is missing, and then we can have binaryen take care of exporting __main_argc_argv as main perhaps? Will have to see how it goes.
low-level detail that most code doesn't interact with. | ||
|
||
The program entrypoint is out of scope for the wasm C ABI. It may depend | ||
on what environment the program will be run in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it out of scope? Were do we document _start
and __wasm_call_ctors
if not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__wasm_call_ctors
is a convention between the linker and crt1.o (or something else serving the purpose of crt1.o), making it appropriate for Linking.md, where it is. Anyone calling __wasm_call_ctors
has to know that the module was generated from a linker following Linking.md.
I see _start
as being a convention between a wasm module and the outside which doesn't know anything about the tools that produced it. It may not use BasicCABI.md or Linking.md. Eventually it'll be replaced by something we define in interface types, but for now it's just a simple "_start" convention.
In the LLVM patch, this change is disabled when the target triple OS is Emscripten. Also, @sbc100's comments above suggest there might be a reasonable path for implementing this in Emscripten when it's ready as well. With that, are there any remaining concerns about landing this and proceeding with the changes to LLVM, wasi-libc? |
I support this change, but I'd be sad if can't find a way to make this work in emscripten too. I fairly certain that we can but I've not invested any time in it yet. I can you give me a little more time to prototype. Maybe by EOD Wednesday? If don't find any fundamental issues by then we can go ahead and land? |
@sbc100 Sounds good to me. |
It's now EOD on Wednesday; is there an update? |
Did you verify that the debug name in the dwarf section is correct? I couldn't see the functions names in the output of I did confirm that the name section does have the mangled name in it, despite then name section supposedly containing the demangled names. This I believe is because the name section is generated by the linker by demanging the mangled names and I guess doesn't (yet) know how to demangle |
The llvm-dwarfdump --all output contains the name "main" as the Confirmed about the name section; I propose the following patch, which fixes it: diff --git a/lld/wasm/Symbols.cpp b/lld/wasm/Symbols.cpp
index 88e6f217c0d..0d4f7f648d5 100644
--- a/lld/wasm/Symbols.cpp
+++ b/lld/wasm/Symbols.cpp
@@ -29,6 +29,8 @@ std::string toString(const wasm::Symbol &sym) {
}
std::string maybeDemangleSymbol(StringRef name) {
+ if (name == "__main_argc_argv")
+ return "main";
if (wasm::config->demangle)
return demangleItanium(name);
return std::string(name); |
OK, I've played with the patch in emscripten and I think we have a way forward. I'm going ahead with this and the llvm patch. I have provisional patch for binaryen and emscripten so we should be able remove the emscripten exclusion from the llvm patch (either before or after landing it). |
WebAssembly enforces a rule that caller and callee signatures must match. This means that the traditional technique of passing `main` `argc` and `argv` even when it doesn't need them doesn't work. Currently the backend renames `main` to `__original_main`, however this doesn't interact well with LTO'ing libc, and the name isn't intuitive. This patch allows us to transition to `__main_argc_argv` instead. This implements the proposal in WebAssembly/tool-conventions#134 with a flag to disable it when targeting Emscripten, though this is expected to be temporary, as discussed in the proposal comments. Differential Revision: https://reviews.llvm.org/D70700
WebAssembly enforces a rule that caller and callee signatures must match. This means that the traditional technique of passing `main` `argc` and `argv` even when it doesn't need them doesn't work. Currently the backend renames `main` to `__original_main`, however this doesn't interact well with LTO'ing libc, and the name isn't intuitive. This patch allows us to transition to `__main_argc_argv` instead. This implements the proposal in WebAssembly/tool-conventions#134 with a flag to disable it when targeting Emscripten, though this is expected to be temporary, as discussed in the proposal comments. Differential Revision: https://reviews.llvm.org/D70700
Change the ABI for the user entry point to rename
main
to__main_argc_argv
when it has argc/argv. This is needed becausewasm requires caller and callee signatures to match, so the
traditional trick of passing main arguments even if it doesn't
expect them doesn't work.
LLVM and related tools have been using the name
__original_main
for a similar purpose, but that's a confusing name.
There's also a change here in that this is renaming the argc/argv
form rather than renaming the nullary form. The choice is somewhat
arbitrary, but I think it's slightly nicer to bias the aesthetics
toward the no-argument form, because that's the smaller and
simpler form.