Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename argc/argv main to __main_argc_argv #134

Merged
merged 3 commits into from
Feb 27, 2020
Merged

Rename argc/argv main to __main_argc_argv #134

merged 3 commits into from
Feb 27, 2020

Conversation

sunfishcode
Copy link
Member

Change the ABI for the user entry point to rename main to
__main_argc_argv when it has argc/argv. This is needed because
wasm requires caller and callee signatures to match, so the
traditional trick of passing main arguments even if it doesn't
expect them doesn't work.

LLVM and related tools have been using the name __original_main
for a similar purpose, but that's a confusing name.

There's also a change here in that this is renaming the argc/argv
form rather than renaming the nullary form. The choice is somewhat
arbitrary, but I think it's slightly nicer to bias the aesthetics
toward the no-argument form, because that's the smaller and
simpler form.

Change the ABI for the user entry point to rename `main` to
`__main_argc_argv` when it has argc/argv. This is needed because
wasm requires caller and callee signatures to match, so the
traditional trick of passing main arguments even if it doesn't
expect them doesn't work.

LLVM and related tools have been using the name `__original_main`
for a similar purpose, but that's a confusing name.

There's also a change here in that this is renaming the argc/argv
form rather than renaming the nullary form. The choice is somewhat
arbitrary, but I think it's slightly nicer to bias the aesthetics
toward the no-argument form, because that's the smaller and
simpler form.
@sunfishcode sunfishcode changed the title Rename no-argument main to __nullary_main. Rename argc/argv main to __main_argc_argv Nov 25, 2019
@sunfishcode
Copy link
Member Author

As a heads-up, as I was writing this up, it occurred to me that renaming the argc/argv form would be a little nicer than renaming the no-arg form, but I forgot to adjust the title before submitting the pull request. I've now corrected the title.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, you are proposing that we replace

  • main with no params is renamed to __original_main.
  • the exported name is main regardless

to

  • main with no params stays main
  • main with params is called __main_argc_argv
  • the exported name is one of the above two

? Or have I misunderstood?

BasicCABI.md Outdated Show resolved Hide resolved
@sunfishcode
Copy link
Member Author

Yes, that's right. It simplifies things to say that we just rename main, rather than that we keep main around and add a wrapper.

I also plan to submit a clang patch to do this rename in the frontend, instead of in the LLVM backend, to eliminate the wrapper, which has turned out to be a nuisance in practice.

@kripken
Copy link
Member

kripken commented Nov 25, 2019

In that case I'm not that happy about the part where we export a different name from the wasm module. That's more complicated than the current single export, and is user-visible and perhaps confusing.

@sunfishcode
Copy link
Member Author

I've now posted a clang patch which implements the __main_argc_argv proposal here.

It's a name mangling rule, which means:

  • If users call main from other translation units, those callsites are also renamed to match.
  • When debug info is emitted, it uses the user's name "main".

It's very simple. There's no generated wrapper function, linking works the same way for LTO and non-LTO, and libc code can easily detect whether to link in the command-line argument initialization code by defining its own main function in libc.a:

__attribute__((weak))
int main(void) {
    int argc = ...;
    char **argv = ...;
    return __main_argc_argv(argc, argv);
}

It also isn't that different from the __original_main situation we have right now, where Binaryen is special-case inlining it. If you inline __main_argc_argv into main here, you'll end up with a function called main containing the user main code, so backtraces etc. will look like you want them to.

@kripken
Copy link
Member

kripken commented Nov 26, 2019

Sorry @sunfishcode , I'm not sure whether your answer addresses or does not the topic of a different export name?

@sunfishcode
Copy link
Member Author

It doesn't change the _start export. I'd ideally like to rename _start also, but that's a bigger change we can talk about separately.

@kripken
Copy link
Member

kripken commented Nov 26, 2019

I apologize, maybe I wasn't clear before. I wasn't talking about _start which I agree is a separate issue. What I'm asking about is what I asked about earlier too, that this proposal makes the wasm module export either main or __main_argc_argv. I wasn't sure if your last comment refers to this (as you mention other callsites etc. which I think may involve exports?).

@sunfishcode
Copy link
Member Author

Ah; in wasi-sdk, main isn't exported from the final wasm binary. I guess in Emscripten it is exported, or it can be?

It is possible for a symbol name to differ from the wasm export name -- @sbc100's export attribute patch allows this, for example. So one possible option is to export __main_argc_argv as main at the wasm module level.

@kripken
Copy link
Member

kripken commented Nov 26, 2019

Yes, in non-standalone wasm, emscripten exports main from the wasm binary. I wouldn't want to have a convention that where the export name can differ, so yeah, keeping the same export name as always sounds best.

I'm not sure what "entrypoint" means, in the text of this PR, if it isn't an export? Did we define it somewhere?

Clarify the "entrypoint" terminology, and mention that export names
may differ from ABI symbol names.
@sunfishcode
Copy link
Member Author

I've now rewritten the wording to give a definition of "entrypoint" and mention that export names may differ from ABI symbol names, and to hopefully be more clear overall. Let me know what you think!

@sbc100
Copy link
Member

sbc100 commented Nov 26, 2019

Right now emscripten has two entry-point modes. The primary mode is where "main" is exported and called directly from JS after the constructors have been called. In this mode there is no _start, only main.

The second mode we what we call "standalone wasm mode" where _start is exported and main is not.

I'm not sure how this change will effect the first/primary mode that emscripten uses, but in general it looks like a good change. We currently export main by passing --export=main which is a linker flag (and which will work with wask-sdk too btw). I think this change as proposed might require some emscripten-side changes. Since we assume we can pass args to main from JS and due to the JS calling conventions that works for 0-arg and 2-arg main both.

If we can move emscripten over to always use _start then this problem would go away. Do you know of any reasons why that might be a bad idea @kripken ?

@kripken
Copy link
Member

kripken commented Nov 26, 2019

@sbc100 Emscripten changing from main to _start for the export to be called might be disruptive for existing users, that call main themselves directly. (Also, I do think main is the better name!) So I'd prefer to not do that, but it looks like this PR isn't?

The current wording looks mostly reasonable to me, but I don't know the linking process well enough to say if there is any downside to this over the current __original_main. @sbc100 do you think it would have problems with our standalone support in emscripten?

@sbc100
Copy link
Member

sbc100 commented Nov 27, 2019

To support the status quo in emscripten we would need way to export a symbol called main which takes args. I sounds like under this proposal --export=main would result in the zero arg main being exported.

@sbc100
Copy link
Member

sbc100 commented Nov 27, 2019

I'm not a fan of the current __original_main mangling or the way its found its way into our crt1.c code in both wasi and emscripten, so if we can move away from that I would be happy.

@sunfishcode
Copy link
Member Author

As a possible path forward, I've now updated the clang patch which implements this: https://reviews.llvm.org/D70700
so that it checks the target, and doesn't do the mangling on Emscripten. That way, Emscripten is unaffected by this change.

@sunfishcode
Copy link
Member Author

@sbc100 @kripken Any update here? I need some solution to this problem in order to enable LTO builds of libc, and right now, compatibility with Emscripten appears to be the only thing holding this back.

@kripken
Copy link
Member

kripken commented Dec 20, 2019

I don't quite understand what that patch does (not familiar with that part of clang), but if it doesn't affect emscripten then I have no objection to it myself. (That seems separate from the discussion in this PR, but I don't think we have a clear answer here?)

Also note that C symbol names are distinct from WebAssembly export
names, which are outside the scope of the C ABI. Toolchains which export
the user entrypoint may chose to export it as the name `main`, even when
the C ABI symbol name is `__main_argc_argv`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can the toolchain support exporting main? Won't this a toolchain that want to do something like --export=main? We need to some way to export the user entry point and it seems like with this change it because hard to name the user entry point because it has two possible names, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that __attribute__((export_name("main"))) does work with this patch, with and without argv/argc. I get an export named "main" regardless of the internal symbol name.

However, -Wl,--export=main indeed doesn't work with this patch in the case of main with argc/argv. One way to fix that would be to introduce a -mexport=main flag to clang and have people use that instead. Clang could translate that into inserting the appropriate export_name directive. Clang flags are nicer for end users than -Wl, flags anyway, and it would handle mangled names better in general. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For emscripten can't ask users to decorate the main symbol in the source code.

A new -mexport=main cflag would have to be applied to every single compilation unit. Seems kind of heavy weight and would require more plumbing to land in llvm.

Actually I think may be able to just pass -Wl,--export=main and -Wl,--export=__main_argc_argv since this flag doesn't warn or error if the symbol is missing, and then we can have binaryen take care of exporting __main_argc_argv as main perhaps? Will have to see how it goes.

low-level detail that most code doesn't interact with.

The program entrypoint is out of scope for the wasm C ABI. It may depend
on what environment the program will be run in.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it out of scope? Were do we document _start and __wasm_call_ctors if not here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__wasm_call_ctors is a convention between the linker and crt1.o (or something else serving the purpose of crt1.o), making it appropriate for Linking.md, where it is. Anyone calling __wasm_call_ctors has to know that the module was generated from a linker following Linking.md.

I see _start as being a convention between a wasm module and the outside which doesn't know anything about the tools that produced it. It may not use BasicCABI.md or Linking.md. Eventually it'll be replaced by something we define in interface types, but for now it's just a simple "_start" convention.

@sunfishcode
Copy link
Member Author

In the LLVM patch, this change is disabled when the target triple OS is Emscripten. Also, @sbc100's comments above suggest there might be a reasonable path for implementing this in Emscripten when it's ready as well. With that, are there any remaining concerns about landing this and proceeding with the changes to LLVM, wasi-libc?

@sbc100
Copy link
Member

sbc100 commented Feb 24, 2020

I support this change, but I'd be sad if can't find a way to make this work in emscripten too. I fairly certain that we can but I've not invested any time in it yet. I can you give me a little more time to prototype. Maybe by EOD Wednesday? If don't find any fundamental issues by then we can go ahead and land?

@sunfishcode
Copy link
Member Author

@sbc100 Sounds good to me.

@sunfishcode
Copy link
Member Author

It's now EOD on Wednesday; is there an update?

@sbc100
Copy link
Member

sbc100 commented Feb 27, 2020

Did you verify that the debug name in the dwarf section is correct? I couldn't see the functions names in the output of llvm-dwarfdump --all which means I'm probably doing something wrong.

I did confirm that the name section does have the mangled name in it, despite then name section supposedly containing the demangled names. This I believe is because the name section is generated by the linker by demanging the mangled names and I guess doesn't (yet) know how to demangle __main_argv_argv back into main.

@sunfishcode
Copy link
Member Author

The llvm-dwarfdump --all output contains the name "main" as the DW_AT_name field for me (and "__main_argc_argv" in the DW_AT_linkage_name, as expected).

Confirmed about the name section; I propose the following patch, which fixes it:

diff --git a/lld/wasm/Symbols.cpp b/lld/wasm/Symbols.cpp
index 88e6f217c0d..0d4f7f648d5 100644
--- a/lld/wasm/Symbols.cpp
+++ b/lld/wasm/Symbols.cpp
@@ -29,6 +29,8 @@ std::string toString(const wasm::Symbol &sym) {
 }

 std::string maybeDemangleSymbol(StringRef name) {
+  if (name == "__main_argc_argv")
+    return "main";
   if (wasm::config->demangle)
     return demangleItanium(name);
   return std::string(name);

@sbc100
Copy link
Member

sbc100 commented Feb 27, 2020

OK, I've played with the patch in emscripten and I think we have a way forward. I'm going ahead with this and the llvm patch.

I have provisional patch for binaryen and emscripten so we should be able remove the emscripten exclusion from the llvm patch (either before or after landing it).

@sunfishcode sunfishcode merged commit 473c089 into master Feb 27, 2020
@sunfishcode sunfishcode deleted the rename-main branch February 27, 2020 14:34
sunfishcode added a commit to llvm/llvm-project that referenced this pull request Feb 27, 2020
WebAssembly enforces a rule that caller and callee signatures must
match. This means that the traditional technique of passing `main`
`argc` and `argv` even when it doesn't need them doesn't work.

Currently the backend renames `main` to `__original_main`, however this
doesn't interact well with LTO'ing libc, and the name isn't intuitive.
This patch allows us to transition to `__main_argc_argv` instead.

This implements the proposal in
WebAssembly/tool-conventions#134
with a flag to disable it when targeting Emscripten, though this is
expected to be temporary, as discussed in the proposal comments.

Differential Revision: https://reviews.llvm.org/D70700
arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Apr 2, 2020
WebAssembly enforces a rule that caller and callee signatures must
match. This means that the traditional technique of passing `main`
`argc` and `argv` even when it doesn't need them doesn't work.

Currently the backend renames `main` to `__original_main`, however this
doesn't interact well with LTO'ing libc, and the name isn't intuitive.
This patch allows us to transition to `__main_argc_argv` instead.

This implements the proposal in
WebAssembly/tool-conventions#134
with a flag to disable it when targeting Emscripten, though this is
expected to be temporary, as discussed in the proposal comments.

Differential Revision: https://reviews.llvm.org/D70700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants