-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(debuginfo): don't strip leading underscores from PDB symbols #642
Conversation
Leaving the symbol name in its original form allows consumers to detect "C decorated" functions correctly. According to the Microsoft documentation [1], there are four forms of "C decorated" names: ``` _<name> (cdecl) _<name>@<paramsize> (stdcall) @<name>@<paramsize> (fastcall) <name>@@<paramsize> (vectorcall) ``` According to the same docs, the first three are only used on 32 bit. [2] dump_syms parses these forms and decodes the parameter size from them. It needs the parameter size on 32 bit for correct stack unwinding in some cases. Before this patch, symbolic-debuginfo was stripping leading underscores from these symbols. This made it impossible for dump_syms to detect the second form (stdcall, `_<name>@<paramsize>`), so it couldn't decode the parameter size from it and would leave the @<paramsize> appendix on the function name. Here's an example from basic-opt32.pdb in the dump_syms repo. The original symbol names here are "_MultiByteToWideChar@24" and "_WideCharToMultiByte@32". Bad: ``` PUBLIC 53011 0 MultiByteToWideChar@24 PUBLIC 53017 0 WideCharToMultiByte@32 ``` Good: ``` PUBLIC 53011 18 MultiByteToWideChar PUBLIC 53017 20 WideCharToMultiByte ``` This patch preserves the underscore. This allows dump_syms to detect the parameter size correctly. It may lead to more underscores appearing in stack traces, though. If these extra underscores are deemed undesirable, removing them during demangling is probably a better approach. At the moment, demangling does not seem to strip the underscore in this case. Example: ``` $ cargo run --release -p symcache_debug -- -d ./symbolic-testutils/fixtures/windows/crash.pdb --lookup 0x3756 _malloc at <unknown file > ``` [1] https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170#FormatC [2] I don't completely understand this; I can see plenty of functions with leading underscores in 64 bit PDBs, too. If those underscores are not cdecl decorations, what are they?
Here's one way to strip the underscore during demangling, if desired: diff --git a/symbolic-demangle/src/lib.rs b/symbolic-demangle/src/lib.rs
index a1a4275..2e4ff24 100644
--- a/symbolic-demangle/src/lib.rs
+++ b/symbolic-demangle/src/lib.rs
@@ -446,7 +446,10 @@ impl<'a> Demangle for Name<'a> {
Language::Rust => try_demangle_rust(self.as_str(), opts),
Language::Cpp => try_demangle_cpp(self.as_str(), opts),
Language::Swift => try_demangle_swift(self.as_str(), opts),
- _ => None,
+ _ => {
+ // Opportunistically strip a single leading underscore.
+ self.as_str().strip_prefix('_').map(ToOwned::to_owned)
+ }
}
}
|
Codecov Report
@@ Coverage Diff @@
## master #642 +/- ##
==========================================
+ Coverage 71.20% 71.32% +0.11%
==========================================
Files 96 96
Lines 18408 18405 -3
==========================================
+ Hits 13108 13127 +19
+ Misses 5300 5278 -22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably do this in demangling indeed. We now have symbols like 3738 __CxxThrowException@8
shortening that to _CxxThrowException
in demangling may be a good idea.
#642 removed stripping the underscores from C-decorated Windows symbols. The symbolic-debuginfo abstraction now outputs raw names (except for mach-o, where it still strips underscores). As a followup to that change, this will now undecorate these symbols when writing a symcache.
Leaving the symbol name in its original form allows consumers to detect
"C decorated" functions correctly.
According to the Microsoft documentation [1], there are four forms of
"C decorated" names:
According to the same docs, the first three are only used on 32 bit. [2]
dump_syms parses these forms and decodes the parameter size from them.
It needs the parameter size on 32 bit for correct stack unwinding in
some cases.
Before this patch, symbolic-debuginfo was stripping leading underscores
from these symbols. This made it impossible for dump_syms to detect the
second form (stdcall,
_<name>@<paramsize>
), so it couldn't decode theparameter size from it and would leave the @ appendix on the
function name.
Here's an example from basic-opt32.pdb in the dump_syms repo. The
original symbol names here are "_MultiByteToWideChar@24" and
"_WideCharToMultiByte@32".
Bad:
Good:
This patch preserves the underscore. This allows dump_syms to detect the
parameter size correctly. It may lead to more underscores appearing in
stack traces, though. If these extra underscores are deemed undesirable,
removing them during demangling is probably a better approach.
At the moment, demangling does not seem to strip the underscore in this
case. Example:
[1] https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170#FormatC
[2] I don't completely understand this; I can see plenty of functions
with leading underscores in 64 bit PDBs, too. If those underscores
are not cdecl decorations, what are they?