Runtime finalization assumes all other threads have exited. #80657

ericsnowcurrently · 2019-03-29T20:38:26Z

BPO	36476
Nosy	@tim-one, @ncoghlan, @pitrou, @vstinner, @ericsnowcurrently, @phmc, @pablogsal, @nanjekyejoannah

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-03-29.20:38:26.289>
labels = ['expert-subinterpreters', 'interpreter-core', 'type-bug', '3.8', '3.9']
title = 'Runtime finalization assumes all other threads have exited.'
updated_at = <Date 2020-06-03.16:42:03.692>
user = 'https://github.com/ericsnowcurrently'

bugs.python.org fields:

activity = <Date 2020-06-03.16:42:03.692>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core', 'Subinterpreters']
creation = <Date 2019-03-29.20:38:26.289>
creator = 'eric.snow'
dependencies = []
files = []
hgrepos = []
issue_num = 36476
keywords = []
message_count = 10.0
messages = ['339143', '339145', '358745', '358747', '358749', '358750', '358751', '358782', '359101', '359103']
nosy_count = 8.0
nosy_names = ['tim.peters', 'ncoghlan', 'pitrou', 'vstinner', 'eric.snow', 'pconnell', 'pablogsal', 'nanjekyejoannah']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue36476'
versions = ['Python 3.8', 'Python 3.9']

ericsnowcurrently · 2019-03-29T20:38:26Z

Among the first 3 things that happen in Py_FinalizeEx() are, in order:

wait for all non-daemon threads (of the main interpreter) to finish
call registered atexit funcs
mark the runtime as finalizing

At that point the only remaining Python threads are:

the main thread (where finalization is happening)
daemon threads
non-daemon threads created in atexit functions
any threads belonging to subinterpreters

The next time any of those threads (aside from main) acquire the GIL, we expect that they will exit via a call to PyThread_exit_thread() (caveat: issue bpo-36475). However, we have no guarantee on when that will happen, if ever. Such lingering threads can cause problems, including crashes and deadlock (see issue bpo-36469).

I don't know what else we can do, beyond what we're already doing. Any ideas?

ericsnowcurrently · 2019-03-29T20:40:12Z

FYI, I've opened bpo-36477 to deal with the subinterpreters case.

ericsnowcurrently · 2019-12-20T23:30:08Z

Adding to the list:

any OS threads created by an extension module or embedding application

ericsnowcurrently · 2019-12-20T23:40:58Z

Problems with lingering threads during/after runtime finalization continue to be a problem. I'm going to use this issue as the focal point for efforts to resolve this.

Related issues:

bpo-36479 "Exit threads when interpreter is finalizing rather than runtime."
bpo-24770 "Py_Finalize() doesn't stop daemon threads"
bpo-23592 "SIGSEGV on interpreter shutdown, with daemon threads running wild"
bpo-37127 "Handling pending calls during runtime finalization may cause problems."
bpo-33608 "Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed."
bpo-36818 "Add PyInterpreterState.runtime."
bpo-36724 "Clear _PyRuntime at exit"
bpo-14073 "allow per-thread atexit()"
bpo-1596321 "KeyError at exit after 'import threading' in other thread"
bpo-37266 "Daemon threads must be forbidden in subinterpreters"
bpo-31517 "MainThread association logic is fragile"

ericsnowcurrently · 2019-12-20T23:48:58Z

Analysis by @pconnell:

tl;dr daemon threads and external C-API access during/after runtime finalization are causing crashes.

ericsnowcurrently · 2019-12-20T23:54:02Z

To put it another way:

(from bpo-33608#msg358748)

The docs [1] aren't super clear about it, but there are some fundamental
assumptions we make about runtime finalization:

no use of the C-API while Py_FinalizeEx() is executing (except for a
few helpers like Py_Initialized)

only a small portion of the C-API is available afterward (at least
until Py_Initialize() is run)

I guess the real question is what to do about this?

[1] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

Adding to that list:

no other Python threads are running once we start finalizing the runtime (not far into Py_FinalizeEx())

ericsnowcurrently · 2019-12-21T00:07:29Z

So I see 3 things to address here:

Python daemon threads
Python threads created in atexit handlers
non-Python threads accessing the C-API

Possible solutions (starting point for discussion):

stop them at the point we stop waiting for non-daemon threads (at the beginning of finalization)
disallow them? do one more pass of wait-for-threads?
cause all (external) attempts to access the C-API to fail once finalization begins

Regarding daemon threads, the docs already say "Daemon threads are abruptly stopped at shutdown." [1] So let's force them to stop. Can we do that? If we *can* simply kill the threads, can we do so without leaking resources? Regardless, the mechanism we currently use (check for finalizing each(?) time through the eval loop) mostly works fine. The problem is when C code called from Python in a daemon thread blocks long enough that it makes C-API calls (or even the eval loop) *after* we've started cleaning up the runtime state. So if there was a way to interrupt that blocking code, that would probably be good enough.

The other two possible solutions are, I suppose, a bit more drastic. What are the alternatives?

[1] https://docs.python.org/3/library/threading.html#thread-objects

pitrou · 2019-12-21T20:15:14Z

Python daemon threads

I think the answer is to document a bit more clearly that they can pose all kinds of problems. Perhaps we could even display a visible warning when people create daemon threads.

Python threads created in atexit handlers

We could run the "join non-daemon threads" routine a *second time* after atexit handlers have been called. It probably can't hurt (unless people do silly things?).

non-Python threads accessing the C-API

This one I don't know how to handle. By construction, a non-Python thread can do anything it wants, and we cannot add guards against this at the beginning of each C API function. I think that when someone calls the C API, we're clearly in the realm of "consenting adults".

ncoghlan · 2019-12-31T05:50:56Z

Perhaps we need a threading.throw() API, similar to the one we have for generators and coroutines?

If we had that, then Py_FinalizeEx() could gain a few new features:

throw SystemExit into all daemon threads and then give them a chance to terminate before calling any atexit handlers (printing a warning if some of the threads don't exit)
throw SystemExit into all daemon and non-daemon threads after running atexit handlers (printing a warning if any such threads exist at all, along with another warning if some of the threads don't exit)

Adding that would require an enhancement to the PendingCall machinery, though, since most pending calls are only processed in the main thread (there's no way to route them to specific child threads).

A simpler alternative would be to have an atomic "terminate_threads" counter in the ceval runtime state that was incremented to 1 to request that SystemExit be raised in daemon threads, and then to 2 to request that SystemExit be raised in all still running threads. When a thread received that request to exit, it would set a new flag in the thread state to indicate it was terminating, and then raise SystemExit. (The thread running Py_FinalizeEx would set that flag in advance so it wasn't affected, and other threads would use it to ensure they only raised SystemExit once). The runtime cost of this would just be another _Py_atomic_load_relaxed call in the eval_breaker branch. (Probably inside make_pending_calls, so it gets triggered both by the eval_breaker logic, and by explicit calls to Py_MakePendingCalls).

ncoghlan · 2019-12-31T06:30:46Z

Thinking about that idea further, I don't think that change would help much, since the relevant operations should already be checking for thread termination when they attempt to reacquire the GIL.

That means what we're missing is:

When daemon threads still exist after the non-daemon threads terminate, deliberately giving them additional time to run (and hence terminate)
Explicitly attempting to kick daemon threads out of blocking system calls by sending them signals to provoke EINTR (I have no idea if there's a windows equivalent for this, but we should be able to use pthread_kill on POSIX systems. However, choosing *which* wakeup signal to send could be fraught with compatibility problems)

gpshead · 2023-05-26T22:42:48Z

related issue: #104690 "Threads started in exit handler are still running after their thread states are destroyed"
and a related PR: #104826 "Disallow thread creation and fork at interpreter finalization"

gpshead · 2023-05-26T22:46:46Z

If we had that, then Py_FinalizeEx() could gain a few new features:

throw SystemExit into all daemon threads and then give them a chance to terminate before calling any atexit handlers (printing a warning if some of the threads don't exit)

related to the above linked issues and ongoing thoughts about how horrible daemon threads are and wishing they didn't exist... I was pondering an if we could inject the equivalent of an uncatchable exception (meaning even bare except: would not handle it) into all daemon and non-python threads via their tstate so that their code would uncerimoniously exit no matter what. That's as close to a SIGKILL for a thread can we can cleanly get. Perhaps this only makes sense to do after Nick's idea of a SystemExit in the threads and waiting a little while.

ericsnowcurrently · 2023-05-27T02:26:25Z

You mean in addition to the trick we do in ceval_gil.c, with tstate_must_exit() and PyThread_exit_thread()?

gpshead · 2023-05-28T00:05:22Z

You mean in addition to the trick we do in ceval_gil.c, with tstate_must_exit() and PyThread_exit_thread()?

PyThread_exit_thread() is being deprecated by #28525 as it is impossible to safely end a thread abruptly. So take_gil will turn into an intentional hang in that situation.

I was thinking of if we could do something before we get to the point of destroying the thread states - the only safe exit from a thread is for them to actually return all the way up their call stack on their own.

ericsnowcurrently added 3.7 (EOL) end of life 3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Mar 29, 2019

ericsnowcurrently added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Dec 20, 2019

vstinner added the topic-subinterpreters label Jun 3, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

ezio-melotti added this to Subinterpreters Apr 15, 2022

ezio-melotti moved this to Todo in Subinterpreters Apr 15, 2022

jorisvandenbossche mentioned this issue Nov 8, 2023

[Python] Segmentation fault when pyarrow is imported in exit handler apache/arrow#38626

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime finalization assumes all other threads have exited. #80657

Runtime finalization assumes all other threads have exited. #80657

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 21, 2019

pitrou commented Dec 21, 2019

ncoghlan commented Dec 31, 2019

ncoghlan commented Dec 31, 2019

gpshead commented May 26, 2023

gpshead commented May 26, 2023

ericsnowcurrently commented May 27, 2023

gpshead commented May 28, 2023

Runtime finalization assumes all other threads have exited. #80657

Runtime finalization assumes all other threads have exited. #80657

Comments

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 20, 2019

ericsnowcurrently commented Dec 21, 2019

pitrou commented Dec 21, 2019

ncoghlan commented Dec 31, 2019

ncoghlan commented Dec 31, 2019

gpshead commented May 26, 2023

gpshead commented May 26, 2023

ericsnowcurrently commented May 27, 2023

gpshead commented May 28, 2023