Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#5926 libdw: Switch drsyms to use libdw from elfutils #6534

Merged
merged 26 commits into from
Jan 5, 2024

Conversation

derekbruening
Copy link
Contributor

@derekbruening derekbruening commented Jan 2, 2024

Switches from elftoolchain's libdwarf, which is no longer being
updated, to elfutil's libdw, only for Linux for now. The interfaces
are slightly different but generally similar. We also switch to the
libelf from elfutil to ensure its interactions with libdw work
properly.

Adds a new third_party/elfutils submodule.

Adds the LGPL3 elfutils license to our license docs, stressing that we
are providing drsyms as a library distinct from the rest of DR.

Adds build rules to build the 4 elfutils libraries we need (libdw,
libelf, plus helpers libdwelf and libebl) as static PIC libraries.
As elfutils uses autoconf, we check in a config.h file from a local
configure believed to be general enough for our supported toolchains
to feed to our CMake build.

Adds 4 patches needed for elfutils. These are stored in the parent
dynamorio repository and applied at CMake configure time to create new
source files in the build directory. This keeps the submodule
pristine. The patches are:

  1. Fix a bug in elf_memory() where an ignored "command" enum was used.
  2. Shrink the stack usage of read_srclines() called by dwarf_getsrclines() to
    fit on DR's small stacks.
  3. Redirect zlib heap allocation.
  4. Pre-allocate a destination instead of realpath(xxx, NULL).

Adds a requirement that zlib be available on Linux to build drysms, as
elfutils requires zlib for compressed debug sections.

Adds -Dmalloc=__wrap_malloc, etc. to redirect heap usage by the
elfutils static libraries to DR.

Adds a new tool.drcacheoff.check-malloc test to better check for
malloc use in the drmemtrace client. The new test explicitly invokes
drsyms via the drmemtrace function tracing feature, while disabling
lz4 compression (as it turns off the import check). This test only
passes with the -Dmalloc=__wrap_malloc defines and the zlib heap
redirect patch.

Adds a new drsyms_dw.c file, partly mirroring drsyms_dwarf.c. The
libdw and libdwarf interfaces are different enough it seemed best to
have a separate file, though the general code is similar. The
drsyms_elf.c differences are in-file via ifdef.

Removes the -gdwarf-4 workaround from the docs.

Adds new -gdwarf-4 and -gdwarf-5 drsyms tests.

The libelftc demangler is kept via the checked-in static library,
since libdw does not have its own demangler. We do not expect this
part of libelftc to need updates.

Fixes #5926

Adds a new API routine dr_running_under_dynamorio() which is similar
to the application interface dr_app_running_under_dynamorio() but is
callable from a client.  The use case is code meant for us both in
managed mode under DR and in standalone library mode.  Sometimes the
behavior for the two differs significantly, such as whether certain
library routines are safe when linked statically.

Adds sanity checks to some existing tests.

The driving use case is in drsyms's support for libdw.  The actual use
of this will come in that libdw commit.

Issue: #5926
Switches from elftoolchain's libdwarf, which is no longer being
updated, to elfutil's libdw, only for Linux for now.  The interfaces
are slightly different but generally similar.  We also switch to the
libelf from elfutil to ensure its interactions with libdw work
properly.

Adds a new third_party/elfutils submodule.

Adds the LGPL3 elfutils license to our license docs, stressing that we
are providing drsyms as a library distinct from the rest of DR.

Adds build rules to build the 4 elfutils libraries we need (libdw,
libelf, plus helpers libdwelf and libebl) as static PIC libraries.
As elfutils uses autoconf, we check in a config.h file from a local
configure believed to be general enough for our supported toolchains
to feed to our CMake build.

Adds 3 patches needed for elfutils.  These are stored in the parent
dynamorio repository and applied at CMake configure time to create new
source files in the build directory.  This keeps the submodule
pristine.  The patches are:

  1) Fix a bug in elf_memory() where an ignored "command" enum was used.
  2) Shrink the stack usage of read_srclines() called by dwarf_getsrclines() to
     fit on DR's small stacks.
  3) Redirect zlib heap allocation.

Adds a requirement that zlib be available on Linux to build drysms, as
elfutils requires zlib for compressed debug sections.

Adds -Dmalloc=__wrap_malloc, etc. to redirect heap usage by the
elfutils static libraries to DR.

Adds a new tool.drcacheoff.check-malloc test to better check for
malloc use in the drmemtrace client.  The new test explicitly invokes
drsyms via the drmemtrace function tracing feature, while disabling
lz4 compression (as it turns off the import check).  This test only
passes with the -Dmalloc=__wrap_malloc defines and the zlib heap
redirect patch.

Adds a new drsyms_dw.c file, partly mirroring drsyms_dwarf.c.  The
libdw and libdwarf interfaces are different enough it seemed best to
have a separate file, though the general code is similar.  The
drsyms_elf.c differences are in-file via ifdef.

Removes the -gdwarf-4 workaround from the docs.

Adds new -gdwarf-4 and -gdwarf-5 drsyms tests.

The libelftc demangler is kept via the checked-in static library,
since libdw does not have its own demangler.  We do not expect this
part of libelftc to need updates.

Fixes #5926
…do so for Android (once it works will copy to ci-package.yml)
Base automatically changed from i5926-running-under-api to master January 2, 2024 21:11
@derekbruening
Copy link
Contributor Author

Getting zlib installed for 32-bit Android is the blocker here. I will spend a little time on it...but Android is not really a high priority (someone who is interested in reviving Android full support should speak up...); at some point I would just drop drsyms support on Android, or more likely go back to libelftc on Android and not support dwarf-5 there.

@derekbruening
Copy link
Contributor Author

drcache{sim,off}.simple 32-bit ubuntu22 are failing every time, but I suspect it is unrelated to libdw and was instead triggered by installing libz for our 32-bit testing which was not done before. I will see if I can repro.

@derekbruening
Copy link
Contributor Author

The unresolved failures are all an assert core/synch.c:261 res. I still have not found a machine where they reproduce. Since the diff is large and there may be discussion on the overall approach I will start the review process now while trying to figure out this synch.c issue which as mentioned may be due to zlib and not libdw.

@derekbruening
Copy link
Contributor Author

The 32-bit AMD GA CI also has a bunch of those synch.c asserts. But I can't repro any of them on local AMD machines. Still think it's related to the particular zlib installed here, which is done in a different way than locally b/c of the different repo on GA CI. Plus, when I put in the try-except, the failures change from the synch.c to reported DR crashes. I am separating out the try-except under #6541 as it seems a separate useful change.

@derekbruening
Copy link
Contributor Author

It is indeed from the 32-bit zlib now installed. DR's private loader crashes trying to relocate a zlib symbol "deflateResetKeep" in drsyms.so (presumably zlib is statically linked into drsyms here):

Program received signal SIGSEGV, Segmentation fault.
0xf7c8dce5 in module_relocate_symbol (rel=0xf7f13520, pd=0x433edd64, is_rela=false) at /home/runner/work/dynamorio/dynamorio/core/unix/module_elf.c:1585
1585            *(uint *)r_addr = (uint)(reg_t)res;
(gdb) bt
#0  0xf7c8dce5 in module_relocate_symbol (rel=0xf7f13520, pd=0x433edd64, is_rela=false) at /home/runner/work/dynamorio/dynamorio/core/unix/module_elf.c:1585
#1  0xf7c8ddbf in module_relocate_rel (modbase=0xf7f0f000 "\177ELF\001\001\001", pd=0x433edd64, start=0xf7f12c30, end=0xf7f135c0)
    at /home/runner/work/dynamorio/dynamorio/core/unix/module_elf.c:1614
#2  0xf7c83e75 in privload_relocate_os_privmod_data (opd=0x433edd64, mod_base=0xf7f0f000 "\177ELF\001\001\001") at /home/runner/work/dynamorio/dynamorio/core/unix/loader.c:1254
#3  0xf7c84188 in privload_relocate_mod (mod=0x433edb44) at /home/runner/work/dynamorio/dynamorio/core/unix/loader.c:1304
#4  0xf7c820e4 in privload_process_imports (mod=0x433edb44) at /home/runner/work/dynamorio/dynamorio/core/unix/loader.c:597
#5  0xf7b8b6f6 in privload_load_process (privmod=0x433edb44) at /home/runner/work/dynamorio/dynamorio/core/loader_shared.c:811
#6  0xf7b8b002 in privload_load (filename=0xffffa760 "/home/runner/work/dynamorio/dynamorio/build_debug-internal-32/ext/lib32/debug/libdrsyms.so", dependent=0x433594f8, 
    client=true) at /home/runner/work/dynamorio/dynamorio/core/loader_shared.c:683
#7  0xf7c82a9f in privload_locate_and_load (impname=0xf7f6b6c8 "libdrsyms.so", dependent=0x433594f8, reachable=true)
    at /home/runner/work/dynamorio/dynamorio/core/unix/loader.c:776
#8  0xf7c8204e in privload_process_imports (mod=0x433594f8) at /home/runner/work/dynamorio/dynamorio/core/unix/loader.c:571
#9  0xf7b8b6f6 in privload_load_process (privmod=0x433594f8) at /home/runner/work/dynamorio/dynamorio/core/loader_shared.c:811
#10 0xf7b89918 in privload_process_early_mods () at /home/runner/work/dynamorio/dynamorio/core/loader_shared.c:139
#11 0xf7b89af8 in loader_init_epilogue (dcontext=0x43360b40) at /home/runner/work/dynamorio/dynamorio/core/loader_shared.c:203
#12 0xf7a3052f in dynamorio_app_init_part_two_finalize () at /home/runner/work/dynamorio/dynamorio/core/dynamo.c:675
#13 0xf7c85e38 in privload_early_inject (sp=0xffffbf80, old_libdr_base=0x0, old_libdr_size=0) at /home/runner/work/dynamorio/dynamorio/core/unix/loader.c:2264
#14 0xf7c39a47 in reloaded_xfer () at /home/runner/work/dynamorio/dynamorio/core/arch/x86/x86.asm:1187
(gdb) info local
r_addr = 0xf7f34507
r_type = 2
r_sym = 259
sym = 0xf7f11244
res = 0xfffff059 <error: Cannot access memory at address 0xfffff059>
addend = 0
name = 0xf7f1243b "deflateResetKeep"
resolved = false

@derekbruening
Copy link
Contributor Author

The difference seems to be that on my local machine libz is not statically linked in:

derek@smaug:~/dr/git/build_x86_dbg_tests$ ldd ext/lib32/debug/libdrsyms.so
	linux-gate.so.1 (0xf7f49000)
	libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf7ed6000)
	libdynamorio.so => not found
	libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf7eaf000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7c00000)
	/lib/ld-linux.so.2 (0xf7f4b000)

ext/drsyms/drsyms_dw.c Outdated Show resolved Hide resolved
@derekbruening
Copy link
Contributor Author

I should file a separate issue on zlib as this would be hit without libdw since we want to use zlib with drmemtrace, we just never tried to install it on these GA CI VMs. The relocation itself looks ok; the issue seems to be that it's relocating .text which is mapped as +rx and is not writable. I'm trying to understand how other text relocation ever worked...could it be that regular PIC libraries never have text relocs? There are security policies banning text relocs. So fundamentally the problem is not really in our loader if we only support PIC, it's that the static libz.a chosen here is not PIC? Or would ld.so go mark .txt +w temporarily?

@derekbruening
Copy link
Contributor Author

That seems to have resolved everything: the only failures left look like the #6417 32-bit ones (before that job had extra failures; now it is back to the baseline).

@derekbruening
Copy link
Contributor Author

Filed #6543 as Won't Fix to document the text relocs from non-PIC .a

@derekbruening
Copy link
Contributor Author

Well it got redder after adding the one-line release note but it's all flakes: win64 replaceall #5412, x64 detach_test #6536, x64 ub22 rseq #6185

@derekbruening derekbruening merged commit 0a6c057 into master Jan 5, 2024
11 of 15 checks passed
@derekbruening derekbruening deleted the i5926-libdw-plus-api branch January 5, 2024 21:09
.gitmodules Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tool.drcov.fib failing in local runs due to lack of DWARF5 support in libelftc used by drsyms
3 participants