Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DWARF parser for better stack traces #149

Open
mathetake opened this issue Apr 6, 2021 · 11 comments
Open

Implement DWARF parser for better stack traces #149

mathetake opened this issue Apr 6, 2021 · 11 comments

Comments

@mathetake
Copy link
Contributor

mathetake commented Apr 6, 2021

Luckily our SDK languages except AsmScript are LLVM based ones so we have .debug_line custom section as long as modules compiled with debug info.

I think it's really useful for users if we have minimum implementation of a parse for it and use it for better stack traces (demangled symbols with file names) when the section is available.

This may come with some computational cost so this would be trade-off between mangled-yet-effecient stack traces with name custom section vs demagnled-yet-expensive one.

I have WIP code in my local but yet to be able to finish until I have much cycles to work on this. maybe we could use LLVM's full fledged DWARF parser internally used for llvm-dwarfdump but I think it would be too much just for parsing debug_line.

edited: If we want to get function names, then we have to parse debug_info as well. I wonder if V8 already has API(s) for getting DWARF informations.

@PiotrSikora
Copy link
Member

Do you have before/after example?

@mathetake
Copy link
Contributor Author

So for example, we have the following stack traces for Rust binary:

Proxy-Wasm plugin in-VM backtrace:
  0:  0x42deb - __rust_start_panic
  1:  0x42c0c - rust_panic
  2:  0x42882 - _ZN3std9panicking20rust_panic_with_hook17h072472ae3822b936E
  3:  0x32914 - _ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17hed88036b12f483dfE
  4:  0x34891 - _ZN3std10sys_common9backtrace26__rust_end_short_backtrace17h9133fcc3e85035deE
  5:  0x32810 - _ZN3std9panicking11begin_panic17he6f6e918174263cfE
  6:  0x39eb - _ZN77_$LT$http_headers..HttpHeaders$u20$as$u20$proxy_wasm..traits..HttpContext$GT$6on_log17hde90e85ea16e616eE
  7:  0x2ae53 - _ZN10proxy_wasm10dispatcher10Dispatcher6on_log17hc6cd4fb35c538b86E
  8:  0x2d3dd - _ZN10proxy_wasm10dispatcher12proxy_on_log28_$u7b$$u7b$closure$u7d$$u7d$17h3f864ec735f41e70E
  9:  0x311bd - _ZN3std6thread5local17LocalKey$LT$T$GT$8try_with17hc87d8e9cf2d2494cE

where all the symbols are mangled by rustc and looks not human readable. This is because the function names in name custom sections are already mangled.

If you look at the Rust binary's header: you will find the .debug_line custom section

$ wasm-objdump http_headers.wasm -h
...
   Custom start=0x000f9cae end=0x00145fef (size=0x0004c341) ".debug_line"
   Custom start=0x00145ff3 end=0x001d0bb2 (size=0x0008abbf) ".debug_str"
   Custom start=0x001d0bb6 end=0x001f638f (size=0x000257d9) ".debug_pubnames"
   Custom start=0x001f6393 end=0x00213ab8 (size=0x0001d725) "name"
   Custom start=0x00213aba end=0x00213b14 (size=0x0000005a) "producers"

and you could see un-mangled file and directory names in the .debug_line section (more precisely they are not directly included, but in DWARF debug_line specific compressed format.):

$ llvm-dwarfdump http_headers.wasm --debug-line | grep -E "include|.rs"

include_directories[ 38] = "library/std/src/sys_common/condvar"
include_directories[ 39] = "library/std/src/../../backtrace/src/backtrace"
include_directories[ 40] = "library/std/src/../../stdarch/crates/std_detect/src/detect"
include_directories[ 41] = "library/std/src/sys/wasi/ext"
           name: "vec.rs"
           name: "raw_vec.rs"
           name: "uint_macros.rs"
           name: "macros.rs"
           name: "option.rs"
           name: "cmp.rs"
           name: "alloc.rs"
           name: "result.rs"
           name: "mut_ptr.rs"

and the section is basically allowing us to lookup the file and dir name by the address, and the address used in the section is calculated by the wasm-c-api's trace->module_offset and the offset to the code section in the Wasm binary.

@mathetake
Copy link
Contributor Author

mathetake commented Apr 7, 2021

+the line number corresponding to the address

@mathetake
Copy link
Contributor Author

realized that we need to parse debug_info too for translating addresses to function names.

@mathetake mathetake changed the title Implement DWARF debug_line section parser for better stack traces Implement DWARF parser for better stack traces Apr 7, 2021
@PiotrSikora
Copy link
Member

I think it's fine to use .debug_info if it exists. However, I suspect that most of the modules will ship without it, since it's a huge overhead (in Rust SDK examples, it's ~110 KiB for stripped modules vs ~300 KiB for modules with .debug_info).

I think we should try to demangle the symbols instead, since that works even without .debug_info, but I don't know if the mangling rules are the same for all languages.

@mathetake
Copy link
Contributor Author

mangling scheme is language (more precisely, compiler frontend)-specific one. You can see that TinyGo does not mangle symbols and use the ${pacakage_name}.${function_name} style symbol, which is much more readable as it is than Rust. So we need to implement language specific demangler both for C++ and Rust (not sure for AsmScript..).

The point is whether we want to have line and file information or not, and if the demangled symbols can have enough information, we should try implementing demangler for each language.

In anyway, this would require us to have much cycles to complete, but I would like to provide better debugging experience to users in the long rune 🙂

@mathetake
Copy link
Contributor Author

found the Rust demangler https://github.com/alexcrichton/rustc-demangle/blob/master/src/v0.rs so maybe we could port this to C++ code here to provide better stack trace for Rust.

@PiotrSikora
Copy link
Member

We can link rustc-demangle via C API, no need to port anything.

@mathetake
Copy link
Contributor Author

OK, that would be nice.

just curious do we have any external dependency criteria here in Proxy-Wasm project like Envoy does?

@PiotrSikora
Copy link
Member

We have such criteria yet, and that dependency looks trivial... but we'd need to convince Envoy to approve it as a dependency, which might be a bit painful, considering the latest use of OSSF Scorecard, etc.

@mathetake
Copy link
Contributor Author

I found that Rust binary somehow comes with both of C/C++ mangled symbols (starts with _Z) and Rust ones (starts with _R) together even if we enable the "v0" mangling scheme in nightly rustc. I'm still not sure but maybe this is because of linking against wasi-libc.

So maybe demangling symbols is much more harder than I though 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants