memory_prefaulter leaves a zombie pthread, confuses gdb during coredump debugging #2623

michoecho · 2025-01-18T02:34:21Z

After the memory_prefaulter threads do their job and exit, their handles aren't join()ed until smp::~smp runs.
This means that they leave a zombie entry in pthread's thread lists.

As it turns out, this confuses gdb. When looking for threads and their TLS segments, gdb uses td_ta_thr_iter (from libthread-db/nplt-db) to iterate over all threads:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-thread-db.c;h=9d84187a9ad0897ced25c7639e92ebb2e1e96746;hb=refs/heads/master#l1540

Then, it uses the ti_lid field (which is supposed to contain the PID of the thread) returned by libthread-db as an identifier for the thread:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-thread-db.c;h=9d84187a9ad0897ced25c7639e92ebb2e1e96746;hb=refs/heads/master#l1513

The issue is: for those zombie entries, libthread-db reports the PID of the process (which in Seastar is equal to the PID of reactor-0), not the PID of the thread:
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl_db/td_thr_get_info.c;h=7a64ef4c63614cf86714df9fc56f235b04f52253;hb=refs/heads/master#l108

So what happens is that gdb sees the handle of the zombie memory_prefaulter thread, thinks it's the entry of reactor-0, and uses this entry to serve queries about thread-local variables of reactor-0. The real entry of reactor-0 is ignored, since an entry for this PID has already been recorded by the time the real entry is seen.

(gdb) maintenance check libthread-db 

Running libthread_db integrity checks:
  Got thread 0x7f3920bff6c0 => 5938; can't map_lwp2thr ... OK <------------ memory_prefaulter
  Got thread 0x7f39215ff6c0 => 5952; can't map_lwp2thr ... OK
  Got thread 0x7f39223ff6c0 => 5951; can't map_lwp2thr ... OK
  Got thread 0x7f3922dff6c0 => 5950; can't map_lwp2thr ... OK
  Got thread 0x7f39233ff6c0 => 5949; can't map_lwp2thr ... OK
  Got thread 0x7f39239ff6c0 => 5948; can't map_lwp2thr ... OK
  Got thread 0x7f39285ff6c0 => 5947; can't map_lwp2thr ... OK
  Got thread 0x7f3928fff6c0 => 5946; can't map_lwp2thr ... OK
  Got thread 0x7f39295ff6c0 => 5945; can't map_lwp2thr ... OK
  Got thread 0x7f3929bff6c0 => 5944; can't map_lwp2thr ... OK
  Got thread 0x7f392a1ff6c0 => 5943; can't map_lwp2thr ... OK
  Got thread 0x7f392a7ff6c0 => 5942; can't map_lwp2thr ... OK
  Got thread 0x7f392adff6c0 => 5941; can't map_lwp2thr ... OK
  Got thread 0x7f392b3ff6c0 => 5940; can't map_lwp2thr ... OK
  Got thread 0x7f392be64140 => 5938; can't map_lwp2thr ... OK <------------ reactor-0
  Got thread 0x7f392be216c0 => 5939; can't map_lwp2thr ... OK

(gdb) thread
[Current thread is 1 (Thread 0x7f3920bff6c0 (LWP 5938))]
(gdb) p/x $fs_base
$1 = 0x7f392be64140
(gdb) p &seastar::local_engine
$2 = (seastar::reactor **) 0x7f3920be51a8
(gdb) p seastar::local_engine
$3 = (seastar::reactor *) 0x0

Refs scylladb/scylladb#15665 (comment)
Refs scylladb/scylladb#19110 (comment)
Refs scylladb/scylladb#19110 (comment)
Refs scylladb/scylladb#22245 (comment)

The text was updated successfully, but these errors were encountered:

michoecho · 2025-01-18T03:18:05Z

Arguably a bug in gdb? find_new_threads_callback should probably ignore threads with ti_state == TD_THR_ZOMBIE?

avikivity · 2025-01-22T19:12:32Z

It would be better to tell the reactor (perhaps via seastar::alien) to collect the thread. However, it could be tricky since the infrastructure around cleanup is very dated.

Maybe it should be hooked into app_template.

michoecho · 2025-01-22T19:22:50Z

I realized that I didn't explicitly say what's the effect of this bug:
when you use gdb to read some thread-local variable on shard 0, it will instead read the corresponding slot for this thread-local variable from the memory_prefaulter thread. Since those slots are at their initial values (the memory prefaulter thread doesn't do anything interesting with them), usually it means that thread-local variables on shard 0 seem to be zeroed out mysteriously.

This has caused us some problems in Scylla over the last year, since several times we needed coredumps to debug an issue, but they seemed to be corrupted, and we couldn't use them. (I linked some affected Scylla issues in the opening post, but there were more). But AFAIK, until now nobody tried to understand the source of this "corruption".

This issue can be worked around by manually editing the core so that the pthread handle of memory_prefaulter points to the TLS of shard 0. (Or by copying the TLS of shard 0 over the TLS of memory prefaulter. Or any other equivalent. Or by forking gdb and ignoring zombie threads in find_new_threads_callback).

avikivity · 2025-01-22T19:57:09Z

Interesting. I wanted to collect the thread while writing the prefaulter, but only from a dislike of leaving garbage around. This is better motivation.

avikivity · 2025-01-22T19:59:12Z

Could we fix up gdb to use a real reactor thread to read these thread local variables? I guess the problem will appear even if the thread isn't a zombie (and it's quite likely to happen if we crash while prefaulting, due to some startup bug).

michoecho · 2025-01-22T20:29:28Z

Could we fix up gdb to use a real reactor thread to read these thread local variables?

Well, as I said, it seems to me that the right solution to all of this would be to teach gdb (specifically find_new_threads_callback) to ignore zombie threads (i.e. with ti_state == TD_THR_ZOMBIE). But I'm not familiar with the topic enough to understand who actually is breaking the contract here: gdb or libthread_db.

I guess the problem will appear even if the thread isn't a zombie (and it's quite likely to happen if we crash while prefaulting, due to some startup bug).

No. When the thread is alive, then its tid field (in struct pthread) contains it's ID. Gdb will use this real ID to identify the thread, and all is fine.

Only after the thread becomes a zombie, its tid field will be reset to 0 (IIUC the kernel does that on thread exit, because glibc passes CLONE_CHILD_CLEARTID to the kernel), and libthread_db will fill the ti_lid field passed to gdb with the PID of the entire process, as a placeholder ID. Which, down the line, causes gdb to confuse this thread with the live thread that really has this ID (i.e. shard 0, who is the initial thread of the process, so its TID is equal to the PID).

michoecho mentioned this issue Jan 18, 2025

SCT/jenkins coredumps are corrupt scylladb/scylladb#22367

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory_prefaulter leaves a zombie pthread, confuses gdb during coredump debugging #2623

memory_prefaulter leaves a zombie pthread, confuses gdb during coredump debugging #2623

michoecho commented Jan 18, 2025 •

edited

Loading

michoecho commented Jan 18, 2025

avikivity commented Jan 22, 2025

michoecho commented Jan 22, 2025 •

edited

Loading

avikivity commented Jan 22, 2025

avikivity commented Jan 22, 2025

michoecho commented Jan 22, 2025 •

edited

Loading

memory_prefaulter leaves a zombie pthread, confuses gdb during coredump debugging #2623

memory_prefaulter leaves a zombie pthread, confuses gdb during coredump debugging #2623

Comments

michoecho commented Jan 18, 2025 • edited Loading

michoecho commented Jan 18, 2025

avikivity commented Jan 22, 2025

michoecho commented Jan 22, 2025 • edited Loading

avikivity commented Jan 22, 2025

avikivity commented Jan 22, 2025

michoecho commented Jan 22, 2025 • edited Loading

michoecho commented Jan 18, 2025 •

edited

Loading

michoecho commented Jan 22, 2025 •

edited

Loading

michoecho commented Jan 22, 2025 •

edited

Loading