Skip to content
This repository has been archived by the owner on Mar 18, 2020. It is now read-only.

Dump extern "C" functions without parameters in signature #9

Open
luser opened this issue Sep 26, 2014 · 7 comments
Open

Dump extern "C" functions without parameters in signature #9

luser opened this issue Sep 26, 2014 · 7 comments

Comments

@luser
Copy link
Owner

luser commented Sep 26, 2014

For compatibility with Breakpad's dump_syms, we should figure out how to dump extern "C" functions without parameters in the function signature. I have an attempt at doing this in the check-linkage branch (71add3e), but it doesn't work properly. The PDB file I'm testing with has multiple records in the globals stream with the same address, and I don't know how to disambiguate them.

@jon-turney
Copy link
Contributor

Do you have an example of a PDB which has this problem?

@luser
Copy link
Owner Author

luser commented Oct 31, 2014

Yeah, I've been testing against a simple test program:
http://people.mozilla.com/~tmielczarek/TestApp.pdb.gz

@luser
Copy link
Owner Author

luser commented Oct 31, 2014

That program has, for example:

extern "C" void testC(int arg1, short arg2)

and all the other test* functions are normal C++ linkage. I've been comparing against the output of the stock Breakpad dump_syms to try to match it.

@luser
Copy link
Owner Author

luser commented Oct 31, 2014

I rebased the check-linkage branch: https://github.com/luser/dump_syms/tree/check-linkage

@jon-turney
Copy link
Contributor

Hmm, yes, very strange.

$ ./dump_syms tests/TestApp.pdb 2>&1 | egrep "(wmain|test6|test7|testC)"
leaftype 110e, symbol type 2, address 000115a0 (offset 000005a0, segment 0002), name _wmain
leaftype 110e, symbol type 2, address 00011d00 (offset 00000d00, segment 0002), name _wmainCRTStartup
leaftype 110e, symbol type 2, address 000115a0 (offset 000005a0, segment 0002), name ?test6@@YAXMNO@Z
leaftype 110e, symbol type 2, address 00011bc0 (offset 00000bc0, segment 0002), name _wmain
leaftype 110e, symbol type 2, address 00011bc0 (offset 00000bc0, segment 0002), name ?test7@@YAXC_WPA_WPAPAD@Z
leaftype 110e, symbol type 2, address 00011bf0 (offset 00000bf0, segment 0002), name _wmain
leaftype 110e, symbol type 2, address 00011bf0 (offset 00000bf0, segment 0002), name _testC
leaftype 110e, symbol type 2, address 000124c0 (offset 000014c0, segment 0002), name _wmain
FUNC 115a0 25 14 test6(float,double,double)
FUNC 11bc0 25 10 test7(signed char,wchar_t,wchar_t *,char * *)
FUNC 11bf0 49 8 testC(int,short)
FUNC 11d00 f 0 wmainCRTStartup()
FUNC 124c0 46 8 wmain(int,wchar_t * *)

I'm not sure what it means that different symbols are apparently at the same address. Have test6, test7 and testC been inlined into wmain?

Anyhow, it seems that the assumption that symbols can be looked up by just address is invalid

I had an attempt at implementing looking them up by name instead, see 3bb706e, which seems to produce the right output for this test, and also addresses #11, but it needs to be improved to do the name lookup in a sensible way.

$ ./dump_syms tests/TestApp.pdb | egrep "(wmain|test6|test7|testC)"
FUNC 115a0 25 14 test6(float,double,double)
FUNC 11bc0 25 10 test7(signed char,wchar_t,wchar_t *,char * *)
FUNC 11bf0 49 8 testC
FUNC 11d00 f 0 wmainCRTStartup
FUNC 124c0 46 8 wmain

But now it occurs to me that this isn't right either, as the same function name could occur multiple times, mangled with different sets of parameters and also unmangled, so perhaps the lookup needs to be on both offset and function name

@luser
Copy link
Owner Author

luser commented Nov 3, 2014

I don't think they're inlined, if you look at the FUNC records (from either version of dump_syms) we get distinct addresses for testC and wmain (you can see in your output above). I just can't figure out how those correspond to the entries in the globals stream.

@jon-turney
Copy link
Contributor

Checking the set of PDBs from the MS symbol server that I have, I didn't find any other examples of this (duplicate symbols in the global symbol table with different addresses)

Looking at the data above, it seems a simple heuristic which would give the expected data would be to use the last definition of each symbol (so _wmain = 000124c0 and the other definitions are ignored), but it's hard to know if this is the correct way to interpet things.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants