Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault/aborts calling readline.set_completer_delims in threads in a free-threaded build #126895

Open
devdanzin opened this issue Nov 16, 2024 · 17 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@devdanzin
Copy link
Contributor

devdanzin commented Nov 16, 2024

Crash report

What happened?

Calling difflib._test in threads in a free-threaded build (with PYTHON_GIL=0) will result in aborts or segfaults, apparently related to memory issues:

from threading import Thread
import difflib

for x in range(100):
    Thread(target=difflib._test, args=()).start()

Segfault backtrace:

Thread 92 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffeca7e4640 (LWP 631239)]
0x00007ffff7d3f743 in unlink_chunk (p=p@entry=0x555555f56c00, av=0x7ffff7eb8c80 <main_arena>) at ./malloc/malloc.c:1634
1634    ./malloc/malloc.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7d3f743 in unlink_chunk (p=p@entry=0x555555f56c00, av=0x7ffff7eb8c80 <main_arena>)
    at ./malloc/malloc.c:1634
#1  0x00007ffff7d40d2b in _int_free (av=0x7ffff7eb8c80 <main_arena>, p=0x555555f528a0,
    have_lock=<optimized out>) at ./malloc/malloc.c:4616
#2  0x00007ffff7d43453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#3  0x000055555570ed2c in _PyMem_RawFree (_unused_ctx=<optimized out>, ptr=<optimized out>)
    at Objects/obmalloc.c:90
#4  0x0000555555713955 in _PyMem_DebugRawFree (ctx=0x555555cdb8f8 <_PyRuntime+824>, p=0x555555f528c0)
    at Objects/obmalloc.c:2767
#5  0x000055555572c462 in PyMem_RawFree (ptr=ptr@entry=0x555555f528c0) at Objects/obmalloc.c:971
#6  0x00005555559527fc in free_threadstate (tstate=tstate@entry=0x555555f528c0) at Python/pystate.c:1411
#7  0x00005555559549c0 in _PyThreadState_DeleteCurrent (tstate=tstate@entry=0x555555f528c0)
    at Python/pystate.c:1834
#8  0x0000555555a11b74 in thread_run (boot_raw=boot_raw@entry=0x555555f52870)
    at ./Modules/_threadmodule.c:355
#9  0x0000555555974fdb in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:242
#10 0x00007ffff7d32ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007ffff7dc4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Abort 1 backtrace:

Thread 78 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffed17f2640 (LWP 633651)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140732413191744) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140732413191744) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140732413191744) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140732413191744, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7ce0476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7cc67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7d27676 in __libc_message (action=action@entry=do_abort,
    fmt=fmt@entry=0x7ffff7e79b77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7d3ecfc in malloc_printerr (
    str=str@entry=0x7ffff7e7c6c0 "free(): unaligned chunk detected in tcache 2") at ./malloc/malloc.c:5664
#7  0x00007ffff7d40e3f in _int_free (av=0x7ffff7eb8c80 <main_arena>, p=0x555555dc79b0, have_lock=0)
    at ./malloc/malloc.c:4471
#8  0x00007ffff7d43453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00007ffff41d4a2b in readline_set_completer_delims (module=<optimized out>, string=<optimized out>)
    at ./Modules/readline.c:593
#10 0x00005555557021ef in cfunction_vectorcall_O (
    func=<built-in method set_completer_delims of module object at remote 0x2000297b920>,
    args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:523
#11 0x000055555567cc55 in _PyObject_VectorcallTstate (tstate=0x555555f15bc0,
    callable=<built-in method set_completer_delims of module object at remote 0x2000297b920>,
    args=0x7ffed17f15a8, nargsf=9223372036854775809, kwnames=0x0) at ./Include/internal/pycore_call.h:167
#12 0x000055555567cd74 in PyObject_Vectorcall (
    callable=callable@entry=<built-in method set_completer_delims of module object at remote 0x2000297b920>,
 args=args@entry=0x7ffed17f15a8, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at Objects/call.c:327
#13 0x00005555558441f7 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555f15bc0,
    frame=<optimized out>, throwflag=throwflag@entry=0) at Python/generated_cases.c.h:955
#14 0x0000555555872978 in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0x555555f15bc0)
    at ./Include/internal/pycore_ceval.h:116

Abort 2 backtrace:

Thread 43 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff54ff9640 (LWP 636245)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140734619424320) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140734619424320) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140734619424320) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140734619424320, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7ce0476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7cc67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7d27676 in __libc_message (action=action@entry=do_abort,
    fmt=fmt@entry=0x7ffff7e79b77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7d3ecfc in malloc_printerr (
    str=str@entry=0x7ffff7e7cf20 "tcache_thread_shutdown(): unaligned tcache chunk detected")
    at ./malloc/malloc.c:5664
#7  0x00007ffff7d436c4 in tcache_thread_shutdown () at ./malloc/malloc.c:3224
#8  __malloc_arena_thread_freeres () at ./malloc/arena.c:1003
#9  0x00007ffff7d461ca in __libc_thread_freeres () at ./malloc/thread-freeres.c:44
#10 0x00007ffff7d3294f in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:456
#11 0x00007ffff7dc4850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Found using fusil by @vstinner.

CPython versions tested on:

3.13, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a1+ experimental free-threading build (heads/main-dirty:612ac283b81, Nov 16 2024, 01:37:56) [GCC 11.4.0]

@devdanzin devdanzin added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 16, 2024
@picnixz picnixz added stdlib Python modules in the Lib dir topic-free-threading extension-modules C modules in the Modules dir 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 16, 2024
@picnixz picnixz self-assigned this Nov 16, 2024
@picnixz
Copy link
Contributor

picnixz commented Nov 16, 2024

The _test function is only doing:

def _test():
    import doctest, difflib
    return doctest.testmod(difflib)

I think the issue is either with doctest or the imports, not with difflib itself.

cc @ZeroIntensity

@picnixz picnixz removed their assignment Nov 16, 2024
@devdanzin
Copy link
Contributor Author

devdanzin commented Nov 16, 2024

I believe this is due to doctest using pdb, which seems to cause all sorts of segfaults and aborts when called from threads. Comment the marked lines out for a different crash.

import pdb
import sys
from threading import Thread

stdin = open("/dev/null")
sys.stdin = stdin

def f():
    for x in range(100):
        p = pdb.Pdb()
        p.reset()
        try:               # Comment
            p.set_trace()  # these
        except:            # lines
            pass           # out.
        p.reset()
        p.set_continue()

for x in range(100):
    Thread(target=f, args=()).start()

So maybe the issue is just that pdb isn't free-threading ready?

Edit: here's a collection of errors this causes:

    2908176786326028288 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0x5b *** OUCH
        at p-6: 0xf0 *** OUCH
        at p-5: 0xf8 *** OUCH
        at p-4: 0xfe *** OUCH
        at p-3: 0x53 *** OUCH
        at p-2: 0x1e *** OUCH
        at p-1: 0x99 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
mimalloc: error: thread 0x7fa0154d4640: buffer overflow in heap block 0x2000a010f30 of size 40: write after 40 bytes
    The 8 pad bytes at tail=0x285c417fbab28370 are Aborted
free(): unaligned chunk detected in tcache 2
Aborted
tcache_thread_shutdown(): unaligned tcache chunk detected
Aborted
free(): invalid pointer
Aborted
Segmentation fault
double free or corruption (fasttop)
Aborted
Debug memory block at address p=0x5572ce2f7580: API '�'
    14 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0xdb *** OUCH
        at p-6: 0x4c *** OUCH
        at p-5: 0x84 *** OUCH
        at p-4: 0x07 *** OUCH
        at p-2: 0xc2 *** OUCH
        at p-1: 0x3a *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0x5572ce2f758e are not all FORBIDDENBYTE (0xfd):
        at tail+0: 0xdd *** OUCH
        at tail+1: 0xdd *** OUCH
        at tail+2: 0xdd *** OUCH
        at tail+3: 0xdd *** OUCH
        at tail+4: 0xdd *** OUCH
        at tail+5: 0xdd *** OUCH
        at tail+6: 0xdd *** OUCH
        at tail+7: 0xdd *** OUCH
    Data at p: dd dd dd dd dd dd dd dd dd dd dd dd dd dd

Enable tracemalloc to get the memory block allocation traceback

Fatal Python error: _PyMem_DebugRawFree: bad ID: Allocated using API '�', verified using API 'r'
Python runtime state: initialized
Aborted

@picnixz
Copy link
Contributor

picnixz commented Nov 16, 2024

@gaogaotiantian Is there a plan to make pdb free-threaded friendly or not? if so, how can we help, if not what should we do for this issue?

@ZeroIntensity
Copy link
Member

I'll wait for Tian's input, but I'm happy to spend a few hours covering pdb in locks if we need to.

@gaogaotiantian
Copy link
Member

pdb is a pure Python module, without any black magic (ctypes, changing the code object, etc.). It should not be possible for it to crash (segfault) CPython, free-threaded build or not. If pdb crashes free-threaded build, that means free-threaded build has a bug, not pdb.

From the code above, the conclusion I can get is pdb triggered a crash, and we don't know why. Protecting pdb with lock might hide the crash, but it does not solve the problem. We should not be able to crash CPython with pure Python code.

Also, pdb does not support multi-threading debugging at this point. It's really slow to move it forward for any relatively new features. That being said, I don't think it's a good time to try to protect it with locks because we need to deal with multithreading in the future.

@ZeroIntensity
Copy link
Member

Does it use some weird frames APIs (e.g. sys._getframe) by any chance? Those aren't thread safe IIRC.

@devdanzin
Copy link
Contributor Author

The culprit seems to be readline, which pdb uses:

from threading import Thread
import readline

def f():
    for x in range(100):
        readline.get_completer_delims()
        readline.set_completer_delims(' \t\n`@#%^&*()=+[{]}\\|;:\'",<>?')
        readline.set_completer_delims(' \t\n`@#%^&*()=+[{]}\\|;:\'",<>?')
        readline.get_completer_delims()

for x in range(100):
    Thread(target=f, args=()).start()

Seems to fail with the same errors as the original code.

@ZeroIntensity
Copy link
Member

It's probably the system that's not thread safe in that case.

@devdanzin
Copy link
Contributor Author

The error appears to come from CPython code, line 593 seems to be affecting module global data, right?

cpython/Modules/readline.c

Lines 578 to 594 in 2313f84

static PyObject *
readline_set_completer_delims(PyObject *module, PyObject *string)
/*[clinic end generated code: output=4305b266106c4f1f input=ae945337ebd01e20]*/
{
char *break_chars;
PyObject *encoded = encode(string);
if (encoded == NULL) {
return NULL;
}
/* Keep a reference to the allocated memory in the module state in case
some other module modifies rl_completer_word_break_characters
(see issue #17289). */
break_chars = strdup(PyBytes_AS_STRING(encoded));
Py_DECREF(encoded);
if (break_chars) {
free(completer_word_break_characters);
completer_word_break_characters = break_chars;

@gaogaotiantian
Copy link
Member

I won't be surprised if readline is not thread safe. Actually at this point, I won't be surprised if anything crashes free-threaded CPython. It's just not ready yet. That's not saying we should ignore this, but I think we have an issue (#116738) to track the thread-safety for all internal C modules, and it seems like readline is under the category that's not inspected. This is just some work we need to do in the future, definitely not a bug in pdb.

@ZeroIntensity
Copy link
Member

That issue looks a little out of date, but yeah, we're still working on the thread safety. Thanks for your input, Tian!

@devdanzin, would you like to author a PR for fixing the global state in readline? (I'm assuming we need to either make it thread-local, move it to the module state, or just put an ugly lock around it.) If not, I can do it.

@devdanzin
Copy link
Contributor Author

I wouldn't know how, please take it if you can :)

@ZeroIntensity
Copy link
Member

Ok, I'm busy with a few other issues right now, so I'll leave this open to someone else for a bit. If nobody decides they want to do it, I'll get to it :)

@gaogaotiantian
Copy link
Member

gaogaotiantian commented Nov 16, 2024

I would be a little bit more careful when dealing with readline, because it relies on some global variables of the readline library, so we can't simply put everything thread or module local. The variables starting with rl_ are the extern global variables exposed by readline itself. Also, we support libedit which has a slightly different interface. The module is not the most stable one in our stdlib so tread carefully :)

@ZeroIntensity
Copy link
Member

If it has global state, I doubt the functions themself are thread safe. In that case, we probably should just mark it as needing the GIL.

@devdanzin devdanzin changed the title Segfault/aborts calling difflib._test in threads in a free-threaded build Segfault/aborts calling readline.set_completer_delims in threads in a free-threaded build Nov 17, 2024
@vstinner
Copy link
Member

In that case, we probably should just mark it as needing the GIL.

Or add @critical_section or another lock on functions which are known to not be thread-safe.

@ZeroIntensity
Copy link
Member

That wouldn't work for subinterpreters, would it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

5 participants