Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm: strip debug info and compress relocations #1994

Merged
merged 1 commit into from
Jul 31, 2021
Merged

Conversation

aykevl
Copy link
Member

@aykevl aykevl commented Jul 11, 2021

This patch reduces code size significantly by stripping debug
information at the link stage and additionally compressing relocations
(which is only possible when debug information has been stripped).

For example, when compiling testdata/alias.go with -target=wasi, I get
the following results:

Previously, without -no-debug: 25326 bytes
Previously, with -no-debug:     5591 bytes
With this patch:                4762 bytes
With this patch and -no-debug:  4747 bytes

In other words, by stripping debug information and compressing
relocations by default, the resulting binary is smaller than it would
otherwise be even with -no-debug. Adding -no-debug may give a small
extra reduction in code size, but after this patch it's not very
significant.

This patch currently does not provide a way to opt out of this behavior.
If there is a need, an option can be added at a later time to change
this behavior.

@dkegel-fastly @fgsch because you've been rather involved in WebAssembly lately: what do you think of this change? Is it a good change of defaults?

@dkegel-fastly
Copy link
Contributor

Ideally tinygo might follow go's lead here, and aim for a good debugging experience for optimized programs ( https://blog.golang.org/debug-opt ). Failing that, make sure it's easy to debug by adding an option.

Removing debug info, with no option to bring it back, feels... strange. So, maybe make it its own option that can be turned on and off? And/or potentially make it -opt 3 ? I dunno much about tinygo optimization. Bigger compilers have both lots of little optimizations which can be controlled individually, and are controlled grossly in groups with -O1, -O2, -O3, or -Os; it would be natural for tinygo to move a bit in that direction.

@aykevl
Copy link
Member Author

aykevl commented Jul 11, 2021

I agree that debug information should be available for most targets. However, WebAssembly is a bit different. Here are my considerations:

  • Compressing relocations is only possible when debug information is stripped because of limitations in DWARF emission (there is no technical reason why this isn't possible, RISC-V manages to do it). This compression does save around 8% of code size (or around 3% after compression) so it's a worthwhile optimization in my opinion.
  • WebAssembly is still hard to debug with DWARF, I believe only Chrome implements it.
  • Most people will want small binaries by default.

That said, you've got me convinced that it should be an option (that is probably off by default on WebAssembly for the above reasons). This makes small binaries the default while leaving the option to debug those binaries if needed.

Ideally tinygo might follow go's lead here, and aim for a good debugging experience for optimized programs ( https://blog.golang.org/debug-opt ).

This is actually somewhat unrelated. The blog post describes how to improve debug information that's already present. In short, there is no reason why only unoptimized builds can be debugged (even though that's what many people think). While optimizations will make debugging harder, DWARF has been catching up and adding more and more features that allow it to better debug optimized programs - basically to catch up with modern optimizations.

Luckily TinyGo doesn't have to deal with this, mostly. LLVM is responsible for most optimizations and thus also responsible for preserving debug information. The choice that TinyGo can make is whether to emit it at all.

@aykevl aykevl marked this pull request as draft July 11, 2021 12:17
@dkegel-fastly
Copy link
Contributor

The blog post said Google wanted to make production binaries work well with debuggers. That's on topic.

That said, there's lots of history for debugging defaulting to off back in the old days, so you wouldn't be going out on a huge limb.

FWIW I suspect server side wasm would be ok with larger files, and might usually prefer to keep the debug info around to get those sweet, sweet backtraces.

I found https://thenewstack.io/the-pain-of-debugging-webassembly/ interesting. I do hope the situation gets better.

@aykevl
Copy link
Member Author

aykevl commented Jul 11, 2021

FWIW I suspect server side wasm would be ok with larger files, and might usually prefer to keep the debug info around to get those sweet, sweet backtraces.

Are there any WebAssembly runtimes that support DWARF other than Chrome? I'm not aware of any but I could be wrong.

Note that there is also a "names" section that provides some symbol names and can thus give a rough backtrace (but without accounting for inlining unfortunately). If I remember correctly, Chrome has support for this section.

From the blog post you mention:

Cool. Except a lot of WASM tools don’t account for things like byte offsets. For good reason, because WASM actually has two formats: a binary format and WAST, which is a text representation. Within DWARF, these two formats are no longer interchangeable in the presence of debug information, as otherwise the offsets would require rewriting. Bummer.

I'm actually considering doing something like this, for improved GC and scheduler support. While they kind of work at the moment, you can see from bugs like #1966 that they're not working very well. The reason is that WebAssembly is very limited compared to other instruction sets: it doesn't allow direct access to the stack (for security). While there are proposals that would allow TinyGo to work well here (with stack switching and using the host GC), both are still proposals that see very little activity. By modifying WebAssembly after linking, both of these might be possible to implement in a more robust way.
That would unfortunately mean that debugging wouldn't be possible if these are used (requiring workarounds like -gc=leaking and -scheduler=none) until these features are implemented in WebAssembly runtimes.
I wish all of that wasn't necessary, but the current state of WebAssembly isn't great for languages that aren't C like (C/C++/Rust).

@fgsch
Copy link
Contributor

fgsch commented Jul 12, 2021

I agree with @dkegel-fastly; if anything this should be optional.

To illustrate this, trying the code in #1981, before this change:

panic: runtime error: nil pointer dereference
Error: failed to run main module `bug2.wasm`

Caused by:
    0: failed to invoke command default
    1: wasm trap: unreachable
       wasm backtrace:
           0:  0x6bf - runtime.abort
                           at /Users/fgsch/foss/tinygo/src/runtime/runtime_tinygowasm.go:55:6
                     - runtime.runtimePanic
                           at /Users/fgsch/foss/tinygo/src/runtime/panic.go:20:7
           1:  0x826 - runtime.nilPanic
                           at /Users/fgsch/foss/tinygo/src/runtime/panic.go:32:14
           2:  0xcd0 - (*flag.FlagSet).Var
                           at /usr/local/Cellar/go/1.16.5/libexec/src/flag/flag.go:862:23
           3:  0xc83 - <unknown>!runtime.initAll
           4:  0xbfd - runtime.run$1
                           at /Users/fgsch/foss/tinygo/src/runtime/scheduler_any.go:19:10
           5:  0xbf3 - runtime.run
                           at /Users/fgsch/foss/tinygo/src/runtime/scheduler_any.go:18:2
           6:  0xbe3 - _start
                           at /Users/fgsch/foss/tinygo/src/runtime/runtime_wasm_wasi.go:21:5                      

After this change:

panic: runtime error: nil pointer dereference
Error: failed to run main module `bug2.wasm`

Caused by:
    0: failed to invoke command default
    1: wasm trap: unreachable
       wasm backtrace:
           0:  0x5f1 - <unknown>!<wasm function 10>
           1:  0x723 - <unknown>!<wasm function 18>
           2:  0xaf2 - <unknown>!<wasm function 34>
           3:  0xab5 - <unknown>!<wasm function 33>
           4:  0xa4e - <unknown>!<wasm function 32>
           5:  0xa48 - <unknown>!<wasm function 31>
           6:  0xa40 - <unknown>!<wasm function 30>

@aykevl
Copy link
Member Author

aykevl commented Jul 12, 2021

Sweet! Which runtime is this? Is it public? It shows the source locations (and not just names) so it must be reading the DWARF sections.

FWIW I suspect server side wasm would be ok with larger files, and might usually prefer to keep the debug info around to get those sweet, sweet backtraces.

Out of interest, what's the reason for looking into TinyGo here? Is it the reduced code size, WASI support, or something else?

@fgsch
Copy link
Contributor

fgsch commented Jul 12, 2021

@aykevl this is wasmtime with WASMTIME_BACKTRACE_DETAILS=1.

@aykevl
Copy link
Member Author

aykevl commented Jul 12, 2021

Cool! I just found out right before I saw your comment. I had to update wasmtime from v0.20.0 to v0.28.0 for it to work, though.

@aykevl
Copy link
Member Author

aykevl commented Jul 12, 2021

Interestingly, the --strip-debug flag also strips the name section which stores the names of functions. That kind of makes sense, but previously the names would be retained with -no-debug (and tools like wasmtime could at least show a limited backtrace).

@aykevl
Copy link
Member Author

aykevl commented Jul 12, 2021

I have an improved branch at dev...debug-flag. It strips debug information at link time instead of at compile time and also replaces the -no-debug flag with -debug=true/false. The debug flag defaults to false for wasm, true for everything else.
Now I'm thinking: is this the right solution? With wasmtime supporting DWARF, maybe it makes sense to keep emitting debug information even on WebAssembly. What do you think?

@dkegel-fastly
Copy link
Contributor

Special cases may be confusing, seems like users who want the extra smallness will turn it on.

Keeping the debug info around by default will be handy once more runtimes support backtraces.

So I'm in the "debuggable by default" camp, I guess.

@dkegel-fastly
Copy link
Contributor

Also, I'd just leave the -no-debug flag the way it is; it's shorter to type than -debug=false :-)

@mathetake
Copy link
Contributor

Sorry to weigh in here. As a user, I would give +1 to @dkegel-fastly and I would like to keep debug info by default. At least, we in Envoy/Proxy-Wasm project already have used name sections for emitting backtraces, so I definitely would love to keep name section at least. Anyway thanks for your efforts here to reduce the binary size, that would be awesome for everyone.

FWIW, we haven't support DWARF in Envoy proxy yet proxy-wasm/proxy-wasm-cpp-host#149

@aykevl aykevl marked this pull request as ready for review July 13, 2021 14:13
@aykevl
Copy link
Member Author

aykevl commented Jul 13, 2021

Ok, you got me convinced. I'll keep the defaults as-is, with debug information. I've force-pushed an update.

Also, I'd just leave the -no-debug flag the way it is; it's shorter to type than -debug=false :-)

It was necessary when the default depends on the target (for example, you can't re-enable debug information with -no-debug if it's disabled for wasm). But it's not necessary anymore if everything always emits debug information.

Sorry to weigh in here.

I'm always interested in feedback especially from people affected by the change :)

@dkegel-fastly
Copy link
Contributor

That's more like it :-)

@deadprogram
Copy link
Member

@aykevl some merge conflicts are needing to be resolved, please.

@deadprogram
Copy link
Member

Reminder to @aykevl to please resolve merge conflict here. Thanks!

Stripping debug information at link time also allows relocation
compression (aka linker relaxations). Keeping debug information at
compile time and optionally stripping it at link time has some
advantages:

  * Automatic stack sizes on Cortex-M rely on the presence of debug
    information.
  * Some parts of the compiler now rely on the presence of debug
    information for proper diagnostics.
  * It works better with the cache: there is no distinction between
    debug and no-debug builds.
  * It makes it easier (or possible at all) to enable debug information
    in the wasi-libc library without big downsides.
@aykevl
Copy link
Member Author

aykevl commented Jul 31, 2021

Rebased.

@deadprogram
Copy link
Member

Thanks @aykevl for working on this and thanks to everyone who helped provide feedback and guidance.

Now merging.

@deadprogram deadprogram merged commit 7434e5a into dev Jul 31, 2021
@deadprogram deadprogram deleted the wasm-strip branch July 31, 2021 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants