-
Notifications
You must be signed in to change notification settings - Fork 3.2k
How to resolve source lines from a user crash using gdb
We have this gdb output from a user:
(user-gdb)
Thread 1 (Thread 0x7fb303fff700 (LWP 56461)):
#0 0x00007fb3966e178a in rd_kafka_txn_handle_TxnOffsetCommit () from librdkafka.so.1
#1 0x00007fb39666b4fb in rd_kafka_buf_callback () from librdkafka.so.1
#2 0x00007fb396675a2b in rd_kafka_op_handle_std () from librdkafka.so.1
#3 0x00007fb396675aa8 in rd_kafka_op_handle () from librdkafka.so.1
#4 0x00007fb39666fa63 in rd_kafka_q_serve () from librdkafka.so.1
#5 0x00007fb3966395dc in rd_kafka_thread_main () from librdkafka.so.1
#6 0x00007fb3966affd7 in _thrd_wrapper_function () from librdkafka.so.1
#7 0x00007fb38af0bdd5 in start_thread () from /usr/lib64/libpthread.so.0
#8 0x00007fb389cdd02d in clone () from /usr/lib64/libc.so.6
There are however no source line references since their librdkafka.so.1 is from a confluent-platform librdkafka1...rpm package which is stripped of debug symbols.
We do know that they are using the CP librdkafka v1.5.3 RPM packages though, and each such release also has a librdkafka-debuginfo..rpm package with the debug symbols.
$ ls
librdkafka1-1.5.3_confluent6.1.0-0.1.0.el7.x86_64.rpm
librdkafka-1.5.3_confluent6.1.0-0.1.0.el7.src.rpm
librdkafka-debuginfo-1.5.3_confluent6.1.0-0.1.0.el7.x86_64.rpm
librdkafka-devel-1.5.3_confluent6.1.0-0.1.0.el7.x86_64.rpm
Extract librdkafka1..*rpm:
$ rpm2cpio librdkafka1-*rpm | cpio -idmv
Extract librdkafka-debuginfo..*rpm:
$ rpm2cpio librdkafka-debuginfo*rpm | cpio -idmv
We now have the stripped librdkafka.so.1 and its unstripped counterpart as well as the librdkafka source code:
$ find . -name librdkafka.so.1* -or -name rdkafka.h
./usr/src/debug/librdkafka-1.5.3_confluent6.1.0/src/rdkafka.h
./usr/lib/debug/usr/lib64/librdkafka.so.1.debug
./usr/lib64/librdkafka.so.1
Start up gdb using the debuginfo lib:
$ gdb ./usr/lib/debug/usr/lib64/librdkafka.so.1.debug
..
(gdb)
Verify that the library is loaded by looking up the function that crashed:
(gdb) x/i rd_kafka_txn_handle_TxnOffsetCommit
0xca740 <rd_kafka_txn_handle_TxnOffsetCommit>: add %al,(%rax)
Since we're running gdb on the shared library itself rather than an application the library is loaded at address 0x0. We can see this by:
(gdb) info target
Symbols from "./usr/lib/debug/usr/lib64/librdkafka.so.1.debug".
Local exec file:
`./usr/lib/debug/usr/lib64/librdkafka.so.1.debug', file type elf64-x86-64.
warning: Cannot find section for the entry point of ./usr/lib/debug/usr/lib64/librdkafka.so.1.debug.
Entry point: 0x13510
0x0000000000000200 - 0x0000000000000224 is .note.gnu.build-id
0x0000000000000228 - 0x0000000000000bd8 is .gnu.hash
0x0000000000000bd8 - 0x0000000000003f20 is .dynsym
0x0000000000003f20 - 0x0000000000006d42 is .dynstr
0x0000000000006d42 - 0x0000000000007188 is .gnu.version
0x0000000000007188 - 0x0000000000007318 is .gnu.version_r
0x0000000000007318 - 0x0000000000010210 is .rela.dyn
0x0000000000010210 - 0x0000000000012088 is .rela.plt
0x0000000000012088 - 0x00000000000120a2 is .init
0x00000000000120b0 - 0x0000000000013510 is .plt
0x0000000000013510 - 0x00000000001884ab is .text
0x00000000001884ac - 0x00000000001884b5 is .fini
0x00000000001884c0 - 0x00000000001be800 is .rodata
0x00000000001be800 - 0x00000000001c3274 is .eh_frame_hdr
0x00000000001c3278 - 0x00000000001e07dc is .eh_frame
0x00000000003e0d70 - 0x00000000003e0dc0 is .tdata
0x00000000003e0dc0 - 0x00000000003e4538 is .tbss
0x00000000003e0dc0 - 0x00000000003e0dc8 is .init_array
0x00000000003e0dc8 - 0x00000000003e0dd0 is .fini_array
0x00000000003e0dd0 - 0x00000000003e0dd8 is .jcr
0x00000000003e0de0 - 0x00000000003fccb0 is .data.rel.ro
0x00000000003fccb0 - 0x00000000003fcf10 is .dynamic
0x00000000003fcf10 - 0x00000000003fcff8 is .got
0x00000000003fd000 - 0x00000000003fda40 is .got.plt
0x00000000003fda40 - 0x00000000003fde30 is .data
0x00000000003fde40 - 0x0000000000401f30 is .bss
We see that the .text
section (which is where the function code resides)
is at 0x13510
, which is also the Entry point.
So, now we have the shared library's relative offset to the first instruction
in rd_kafka_txn_handle_TxnOffsetCommit(): 0xca740
.
And we have the absolute offset to the crash location from the user's
gdb output: 0x00007fb3966e178a
.
But we don't know at what address librdkafka.so.1 was loaded/mapped in the user's application.
There are two ways to find out:
Have the user issue info shared
in gdb, it will print the
load address of all shared libraries. This is by far the the simplest.
(user-gdb) info shared
From To Syms Read Shared Object Library
...lots of other stuff...
0x00007fb39662a510 0x00007fb39679f4ab Yes (*) librdkafka.so.1
...
Perfect, librdkafka's .text
segment was loaded at 0x00007fb39662a510
and if we subtract the crash address 0x00007fb3966e178a
, we get:
(gdb) x/a 0x00007fb3966e178a-0x00007fb39662a510
0xb727a <rd_kafka_CreateTopicsResponse_parse+4810>: 0x0
But that's not where the crash is, it's supposed to be in
rd_kafka_txn_handle_TxnOffsetCommit()
, not rd_kafka_CreateTopicsResponse_parse()
.
We need to add the .text
segment's offset since that's what the load
address from info shared
refers to:
gdb) x/a 0x00007fb3966e178a-0x00007fb39662a510+0x13510
0xca78a <rd_kafka_txn_handle_TxnOffsetCommit+74>: 0x0
That looks much better, now let's get the source line for that address: skip the next chapter and jump to Inspect the source.
By looking at the backtraces we can compare relative offsets between known functions in the user's gdb output and our debuginfo gdb and derive a base offset where the library was probably loaded.
TBD.
Now when we have the address of the crash in our local librdkafka gdb session we can look at the source line for the crash:
gdb) list *(0x00007fb3966e178a-0x00007fb39662a510+0x13510)
0xca78a is in rd_kafka_txn_handle_TxnOffsetCommit (rdkafka_txnmgr.c:1382).
1377 rd_kafka_topic_partition_list_t *partitions = NULL;
1378 char errstr[512];
1379
1380 *errstr = '\0';
1381
1382 if (err != RD_KAFKA_RESP_ERR__DESTROY &&
1383 !rd_kafka_q_ready(rko->rko_replyq.q))
1384 err = RD_KAFKA_RESP_ERR__OUTDATED;
1385
1386 if (err)
It crashed at line 1372, since the build was with optimization (not -O0
)
line numbers don't exactly match, but we know that it is that if-statement,
we also know that the err != ..
check can't crash, so it is either the
rko->..
dereferencing, or something in rd_kafka_q_ready()
if that
function is inlined (which it is).
Since we don't have access to the core file itself, this is as far as we can
go since we can't inspect the memory of rko
.
Also verify that the source files are indeed loaded from the debuginfo rpm package we extracted, so we know they match the address.
gdb) info source
Current source file is rdkafka_txnmgr.c
Compilation directory is /usr/src/debug/librdkafka-1.5.3_confluent6.1.0/src
Located in /home/me/Downloads/dd/usr/src/debug/librdkafka-1.5.3_confluent6.1.0/src/rdkafka_txnmgr.c
Perfect! That's where we extracted the debuginfo rpm.