Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH loading private glibc 2.39-4 from relocation error #7120

Closed
derekbruening opened this issue Dec 10, 2024 · 0 comments · Fixed by #7121
Closed

CRASH loading private glibc 2.39-4 from relocation error #7120

derekbruening opened this issue Dec 10, 2024 · 0 comments · Fixed by #7121
Assignees

Comments

@derekbruening
Copy link
Contributor

After a machine update, every client that uses lbc crashes up front during loading:

$ bin64/drrun -t drmemtrace -offline -- suite/tests/bin/simple_app
<Starting application mydir/build_x64_dbg_tests/suite/tests/bin/simple_app (846630)>
<Initial options = -no_dynamic_options -client_lib 'mydir/build_x64_dbg_tests/bin64/../clients/lib64/debug/libdrmemtrace.so;0;-offline' -client_lib64 'mydir/build_x64_dbg_tests/bin64/../clients/lib64/debug/libdrmemtrace.so;0;-offline' -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Application mydir/build_x64_dbg_tests/suite/tests/bin/simple_app (846630).  DynamoRIO internal crash at PC 0x00007f498ffd4148.  Please report this at http://dynamorio.org/issues/.  Program aborted.
Received SIGSEGV at pc 0x00007f498ffd4148 in thread 846630

This is the libc version:

$ ldd --version
ldd (Debian GLIBC 2.39-4+gl0) 2.39

I thought we had already gotten 2.39 to work wrt #5437: and that part does seem to be correct here.
The crash is just a few instructions later when __libc_early_init calls __pthread_tunables_init and it tries to call __tunable_get_val in ld.so through the PLT:

(gdb)
0x00007ffff76c7e97      56      in ./nptl/pthread_mutex_conf.c
2: x/i $pc
=> 0x7ffff76c7e97 <__pthread_tunables_init+39>: call   0x7ffff7661330
(gdb)
0x00007ffff7661330 in ?? ()
2: x/i $pc
=> 0x7ffff7661330:      jmp    *0x1b19f2(%rip)        # 0x7ffff7812d28
(gdb)
0x00007ffff7fa0140 in crc32_z ()
2: x/i $pc
=> 0x7ffff7fa0140 <crc32_z+1072>:       cmp    $0x5,%esp
(gdb) x/30 $pc
=> 0x7ffff7fa0140 <crc32_z+1072>:       cmp    $0x5,%esp
   0x7ffff7fa0143 <crc32_z+1075>:       je     0x7ffff7fa0175 <crc32_z+1125>
   0x7ffff7fa0145 <crc32_z+1077>:       mov    %rdi,%rsi
   0x7ffff7fa0148 <crc32_z+1080>:       xor    0x5(%rcx),%dil

(gdb) disas

   0x00007ffff7fa013f <+1071>:  cmp    $0x5,%r12
   0x00007ffff7fa0143 <+1075>:  je     0x7ffff7fa0175 <crc32_z+1125>
   0x00007ffff7fa0145 <+1077>:  mov    %rdi,%rsi
   0x00007ffff7fa0148 <+1080>:  xor    0x5(%rcx),%dil

(gdb) x/2gx 0x7ffff7812d28
0x7ffff7812d28: 0x00007ffff7fa0140      0x00007ffff77a8480

Without DR:

(gdb) disas __pthread_tunables_init
Dump of assembler code for function __pthread_tunables_init:
   0x00007ffff7e50e70 <+0>:     push   %rbx
   0x00007ffff7e50e71 <+1>:     lea    -0x38(%rip),%rdx        # 0x7ffff7e50e40 <_dl_tunable_set_mutex_spin_count>
   0x00007ffff7e50e78 <+8>:     mov    $0x3,%edi
   0x00007ffff7e50e7d <+13>:    sub    $0x10,%rsp
   0x00007ffff7e50e81 <+17>:    mov    %fs:0x28,%rax
   0x00007ffff7e50e8a <+26>:    mov    %rax,0x8(%rsp)
   0x00007ffff7e50e8f <+31>:    xor    %eax,%eax
   0x00007ffff7e50e91 <+33>:    mov    %rsp,%rbx
   0x00007ffff7e50e94 <+36>:    mov    %rbx,%rsi
   0x00007ffff7e50e97 <+39>:    call   0x7ffff7dea330 <__tunable_get_val+0x28330@plt>
...
(gdb) x/2i 0x7ffff7dea330
   0x7ffff7dea330 <__tunable_get_val+0x28330@plt>:      jmp    *0x1b19f2(%rip)        # 0x7ffff7f9bd28 <[email protected]>
   0x7ffff7dea336 <__tunable_get_val+0x28330@plt+6>:    push   $0xb
(gdb) x/4gx 0x7ffff7f9bd28
0x7ffff7f9bd28 <[email protected]>:     0x00007ffff7fdbe10      0x00007ffff7f31480
0x7ffff7f9bd38 <*ABS*@got.plt>: 0x00007ffff7f2b5c0      0x00007ffff7f32700
(gdb) x/4i 0x00007ffff7fdbe10
   0x7ffff7fdbe10 <__GI___tunable_get_val>:     mov    %edi,%edi
   0x7ffff7fdbe12 <__GI___tunable_get_val+2>:   lea    0x1f607(%rip),%rcx        # 0x7ffff7ffb420 <tunable_list>
   0x7ffff7fdbe19 <__GI___tunable_get_val+9>:   mov    %rdi,%rax
   0x7ffff7fdbe1c <__GI___tunable_get_val+12>:  shl    $0x4,%rax

Isn't [email protected] just a regular import?
We find it:

module_relocate_symbol: reloc @ 0x00007f2819212d28 type=7
sym lookup for __tunable_get_val from libc.so.6
sym lookup for __tunable_get_val from libdrsyms.so = mydir/build_x64_dbg_tests/ext/lib64/debug/libdrsyms.so
sym lookup for __tunable_get_val from libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1
sym lookup for __tunable_get_val from libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6
sym lookup for __tunable_get_val from ld-linux-x86-64.so.2 = /usr/lib64/ld-linux-x86-64.so.2
elf_sym_matches: considering type=2 __tunable_get_val
symbol lookup for __tunable_get_val 0x00007f281995ee10

Our code:

#define R_X86_64_JUMP_SLOT	7	/* Create PLT entry */
#        define ELF_R_JUMP_SLOT R_X86_64_JUMP_SLOT /* PLT entry */

    case ELF_R_JUMP_SLOT: *r_addr = (reg_t)res + addend; break;

More diagnostics:

module_relocate_symbol: reloc @ 0x00007fdb7f812d28 type=7 is_rela=1 addend=0x28330
sym lookup for __tunable_get_val from libc.so.6
sym lookup for __tunable_get_val from libdrsyms.so = mydir/build_x64_dbg_tests/ext/lib64/debug/libdrsyms.so
sym lookup for __tunable_get_val from libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1
sym lookup for __tunable_get_val from libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6
sym lookup for __tunable_get_val from ld-linux-x86-64.so.2 = /usr/lib64/ld-linux-x86-64.so.2
elf_sym_matches: considering type=2 __tunable_get_val
symbol lookup for __tunable_get_val 0x00007fdb80000e10
just wrote 0x00007fdb80029140 to 0x00007fdb7f812d28

So it's the addend for being "rela": if the addend were 0 it would have
done the right thing.

$ readelf -r /lib/x86_64-linux-gnu/libc.so.6
<...>
Relocation section '.rela.plt' at offset 0x26cc8 contains 61 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000001d9ba0  064400000007 R_X86_64_JUMP_SLO 000000000009c8e0 realloc@@GLIBC_2.2.5 + 28020
0000001d9bc0  09bb00000007 R_X86_64_JUMP_SLO 000000000009cfe0 calloc@@GLIBC_2.2.5 + 28060
0000001d9c08  000200000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_find_dso_for_[...]@GLIBC_PRIVATE + 280f0
0000001d9c40  000400000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_deallocate_tls@GLIBC_PRIVATE + 28160
0000001d9c48  000500000007 R_X86_64_JUMP_SLO 0000000000000000 __tls_get_addr@GLIBC_2.3 + 28170
0000001d9c70  000800000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_signal_error@GLIBC_PRIVATE + 281c0
0000001d9ca8  000900000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_signal_exception@GLIBC_PRIVATE + 28230
0000001d9cc0  000a00000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_audit_symbind_alt@GLIBC_PRIVATE + 28260
0000001d9ce0  000b00000007 R_X86_64_JUMP_SLO 0000000000000000 __tunable_is_init[...]@GLIBC_PRIVATE + 282a0
0000001d9d00  000c00000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_rtld_di_serinfo@GLIBC_PRIVATE + 282e0
0000001d9d18  000d00000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_allocate_tls@GLIBC_PRIVATE + 28310
0000001d9d28  000e00000007 R_X86_64_JUMP_SLO 0000000000000000 __tunable_get_val@GLIBC_PRIVATE + 28330
0000001d9d48  000f00000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_catch_exception@GLIBC_PRIVATE + 28370
0000001d9d58  001000000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_allocate_tls_init@GLIBC_PRIVATE + 28390
0000001d9d60  001200000007 R_X86_64_JUMP_SLO 0000000000000000 __nptl_change_sta[...]@GLIBC_PRIVATE + 283a0
0000001d9d70  001300000007 R_X86_64_JUMP_SLO 0000000000000000 _dl_audit_preinit@GLIBC_PRIVATE + 283c0
0000001d9d78  000000000025 R_X86_64_IRELATIV                    a3150
<...>

They all have non-0 addends -- why?
From above, 0x28330 is listed in the symbol but how to interpret?
Looks like tunable_get_val plus 0x28330 not that it's at the PLT slot
0x28330 yet the address ending in 330 can't be a coincidence.

In glibc's elf_machine_rela() for x86_64: it
adds r_addend to other types, but not to R_X86_64_JUMP_SLOT or
R_X86_64_GLOB_DAT. So why have those use rela entries? Does some other
code need the PLT offset and they hackily store that in r_addend?

For aarch64: r_addend is added for GLOB_DAT and JUMP_SLOT.

For riscv: r_addend is not added for JUMP_SLOT.

No r_addend reference for 32-bit arm or x86.

@derekbruening derekbruening self-assigned this Dec 10, 2024
derekbruening added a commit that referenced this issue Dec 10, 2024
Despite "rela" relocations having an explicit addend value and it
being set to non-0 in a new Debian glibc, the addend is assumed to
*not* be added to the symbol value when relocating on x86_64 and
aarch64 (it does seem to be added on RISCV).
This is not obvious and not well documented; we have to just behave like
existing loaders behave from experimentation/examination.
(Yet another reason to possibly invert the private loader and let
the private copy of ld.so do all the loading and relocating: #5437).

Tested on a machine where nearly every client test in our suite was
crashing after a glibc update: now they pass.  Unfortunately it's not
simple to make automated tests for this: we don't have an existing
framework for relocation testing and it would take non-trivial effort
to construct that, beyond the scope of this fix.

Fixes #7120
derekbruening added a commit that referenced this issue Dec 16, 2024
Update the version number to 11.1 for a bugfix release primarily for
the #7120 bug fix.

Issue: #7120
derekbruening added a commit that referenced this issue Dec 16, 2024
Update the version number to 11.1 for a bugfix release primarily for the
#7120 bug fix.

Issue: #7120
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant