Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An interpreter's initial thread can be accessed while finalizing #126914

Closed
ZeroIntensity opened this issue Nov 16, 2024 · 1 comment
Closed

An interpreter's initial thread can be accessed while finalizing #126914

ZeroIntensity opened this issue Nov 16, 2024 · 1 comment
Assignees
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Nov 16, 2024

Crash report

What happened?

While working on gh-126644, I came across an elusive bug that seems to occur when a lot of threads are trying to create a thread state for the same interpreter. Initially, I thought it was an issue with free-threading, but it occurs on the default build as well.

Here's a small reproducer:

from threading import Thread
import _interpreters

interp = _interpreters.create()

def run():
    this_interp = _interpreters.create()
    _interpreters.run_string(this_interp, f"import _interpreters; _interpreters.run_string({interp}, '1')")

threads = [Thread(target=run) for _ in range(1000)]
for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

As it turns out, this is because of this check: https://github.com/python/cpython/blob/main/Python/pystate.c#L1531

Inside tstate_delete_common, the runtime lock is released after the threads.head has already been set to NULL, so other threads think erroneously think it's OK to try and use _initial_thread while it's still getting finalized. Possibly related, free_threadstate tries to reset _initial_thread back to the default settings without the runtime lock, which is possibly racy.

There's a few ways to fix this, but I think the easiest (and has the least amount of risk for backporting) is to just hold the runtime lock for all of thread state deletion. This will introduce a very slight slowdown due to lock contention, but that should get better once #114940 lands. I have a PR ready.

CPython versions tested on:

3.12, 3.13, 3.14, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

No response

Linked PRs

@ZeroIntensity ZeroIntensity added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump topic-subinterpreters 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 16, 2024
@ZeroIntensity ZeroIntensity self-assigned this Nov 16, 2024
@ZeroIntensity
Copy link
Member Author

I think the easiest (and has the least amount of risk for backporting) is to just hold the runtime lock for all of thread state deletion.

Apparently, this is problematic for the free-threaded build, because QSBR and thread-local refcounting initialization cause a lock-ordering deadlock in some cases. So, instead, my solution is to just add a new field to the interpreter state that indicates when it's safe to access _initial_thread, and that field is only used atomically (unlike something like tstate->_status). I'm hoping that should fix it without issues.

ericsnowcurrently added a commit that referenced this issue Nov 19, 2024
…preterState Field (gh-126989)

This approach eliminates the originally reported race. It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.
ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this issue Nov 21, 2024
…yInterpreterState Field (pythongh-126989)

This approach eliminates the originally reported race. It also gets rid of the deadlock reported in pythongh-96071, so we can remove the workaround added then.
ericsnowcurrently added a commit that referenced this issue Dec 2, 2024
…PyInterpreterState Field (gh-127114)

This approach eliminates the originally reported race.  It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.

This is mostly a cherry-pick of 1c0a104 (AKA gh-126989).  The difference is we add PyInterpreterState.threads_preallocated at the end of PyInterpreterState, instead of adding PyInterpreterState.threads.preallocated.  That avoids ABI disruption.
@github-project-automation github-project-automation bot moved this from Todo to Done in Subinterpreters Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-subinterpreters type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Status: Done
Development

No branches or pull requests

2 participants