Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory_prefaulter leaves a zombie pthread, confuses gdb during coredump debugging #2623

Open
michoecho opened this issue Jan 18, 2025 · 6 comments

Comments

@michoecho
Copy link
Contributor

michoecho commented Jan 18, 2025

After the memory_prefaulter threads do their job and exit, their handles aren't join()ed until smp::~smp runs.
This means that they leave a zombie entry in pthread's thread lists.

As it turns out, this confuses gdb. When looking for threads and their TLS segments, gdb uses td_ta_thr_iter (from libthread-db/nplt-db) to iterate over all threads:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-thread-db.c;h=9d84187a9ad0897ced25c7639e92ebb2e1e96746;hb=refs/heads/master#l1540

Then, it uses the ti_lid field (which is supposed to contain the PID of the thread) returned by libthread-db as an identifier for the thread:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-thread-db.c;h=9d84187a9ad0897ced25c7639e92ebb2e1e96746;hb=refs/heads/master#l1513

The issue is: for those zombie entries, libthread-db reports the PID of the process (which in Seastar is equal to the PID of reactor-0), not the PID of the thread:
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl_db/td_thr_get_info.c;h=7a64ef4c63614cf86714df9fc56f235b04f52253;hb=refs/heads/master#l108

So what happens is that gdb sees the handle of the zombie memory_prefaulter thread, thinks it's the entry of reactor-0, and uses this entry to serve queries about thread-local variables of reactor-0. The real entry of reactor-0 is ignored, since an entry for this PID has already been recorded by the time the real entry is seen.

(gdb) maintenance check libthread-db 

Running libthread_db integrity checks:
  Got thread 0x7f3920bff6c0 => 5938; can't map_lwp2thr ... OK <------------ memory_prefaulter
  Got thread 0x7f39215ff6c0 => 5952; can't map_lwp2thr ... OK
  Got thread 0x7f39223ff6c0 => 5951; can't map_lwp2thr ... OK
  Got thread 0x7f3922dff6c0 => 5950; can't map_lwp2thr ... OK
  Got thread 0x7f39233ff6c0 => 5949; can't map_lwp2thr ... OK
  Got thread 0x7f39239ff6c0 => 5948; can't map_lwp2thr ... OK
  Got thread 0x7f39285ff6c0 => 5947; can't map_lwp2thr ... OK
  Got thread 0x7f3928fff6c0 => 5946; can't map_lwp2thr ... OK
  Got thread 0x7f39295ff6c0 => 5945; can't map_lwp2thr ... OK
  Got thread 0x7f3929bff6c0 => 5944; can't map_lwp2thr ... OK
  Got thread 0x7f392a1ff6c0 => 5943; can't map_lwp2thr ... OK
  Got thread 0x7f392a7ff6c0 => 5942; can't map_lwp2thr ... OK
  Got thread 0x7f392adff6c0 => 5941; can't map_lwp2thr ... OK
  Got thread 0x7f392b3ff6c0 => 5940; can't map_lwp2thr ... OK
  Got thread 0x7f392be64140 => 5938; can't map_lwp2thr ... OK <------------ reactor-0
  Got thread 0x7f392be216c0 => 5939; can't map_lwp2thr ... OK

(gdb) thread
[Current thread is 1 (Thread 0x7f3920bff6c0 (LWP 5938))]
(gdb) p/x $fs_base
$1 = 0x7f392be64140
(gdb) p &seastar::local_engine
$2 = (seastar::reactor **) 0x7f3920be51a8
(gdb) p seastar::local_engine
$3 = (seastar::reactor *) 0x0

Refs scylladb/scylladb#15665 (comment)
Refs scylladb/scylladb#19110 (comment)
Refs scylladb/scylladb#19110 (comment)
Refs scylladb/scylladb#22245 (comment)

@michoecho
Copy link
Contributor Author

Arguably a bug in gdb? find_new_threads_callback should probably ignore threads with ti_state == TD_THR_ZOMBIE?

@avikivity
Copy link
Member

It would be better to tell the reactor (perhaps via seastar::alien) to collect the thread. However, it could be tricky since the infrastructure around cleanup is very dated.

Maybe it should be hooked into app_template.

@michoecho
Copy link
Contributor Author

michoecho commented Jan 22, 2025

I realized that I didn't explicitly say what's the effect of this bug:
when you use gdb to read some thread-local variable on shard 0, it will instead read the corresponding slot for this thread-local variable from the memory_prefaulter thread. Since those slots are at their initial values (the memory prefaulter thread doesn't do anything interesting with them), usually it means that thread-local variables on shard 0 seem to be zeroed out mysteriously.

This has caused us some problems in Scylla over the last year, since several times we needed coredumps to debug an issue, but they seemed to be corrupted, and we couldn't use them. (I linked some affected Scylla issues in the opening post, but there were more). But AFAIK, until now nobody tried to understand the source of this "corruption".

This issue can be worked around by manually editing the core so that the pthread handle of memory_prefaulter points to the TLS of shard 0. (Or by copying the TLS of shard 0 over the TLS of memory prefaulter. Or any other equivalent. Or by forking gdb and ignoring zombie threads in find_new_threads_callback).

@avikivity
Copy link
Member

Interesting. I wanted to collect the thread while writing the prefaulter, but only from a dislike of leaving garbage around. This is better motivation.

@avikivity
Copy link
Member

Could we fix up gdb to use a real reactor thread to read these thread local variables? I guess the problem will appear even if the thread isn't a zombie (and it's quite likely to happen if we crash while prefaulting, due to some startup bug).

@michoecho
Copy link
Contributor Author

michoecho commented Jan 22, 2025

Could we fix up gdb to use a real reactor thread to read these thread local variables?

Well, as I said, it seems to me that the right solution to all of this would be to teach gdb (specifically find_new_threads_callback) to ignore zombie threads (i.e. with ti_state == TD_THR_ZOMBIE). But I'm not familiar with the topic enough to understand who actually is breaking the contract here: gdb or libthread_db.

I guess the problem will appear even if the thread isn't a zombie (and it's quite likely to happen if we crash while prefaulting, due to some startup bug).

No. When the thread is alive, then its tid field (in struct pthread) contains it's ID. Gdb will use this real ID to identify the thread, and all is fine.

Only after the thread becomes a zombie, its tid field will be reset to 0 (IIUC the kernel does that on thread exit, because glibc passes CLONE_CHILD_CLEARTID to the kernel), and libthread_db will fill the ti_lid field passed to gdb with the PID of the entire process, as a placeholder ID. Which, down the line, causes gdb to confuse this thread with the live thread that really has this ID (i.e. shard 0, who is the initial thread of the process, so its TID is equal to the PID).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants