Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyEval_AcquireThread can crash process if called from another thread at interpreter shutdown #117762

Open
ezyang opened this issue Jan 18, 2024 · 4 comments
Labels
module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented Jan 18, 2024

🐛 Describe the bug

Meta xref: https://fb.workplace.com/groups/1405155842844877/posts/7910584775635252/?comment_id=7912862145407515&reply_comment_id=7917182454975484&notif_id=1705598818428473&notif_t=work_group_comment_mention

Quote:

Attempted to use _Py_IsFinalizing() to guard against finalize, but it always returns false (even though the coredump indicates the interpreter thread is within Py_FinalizeEx() and should have set the finalize attribute on the runtime...but the whole thing is racy): D52866385
Also tried just catching the exception with try {...} catch (...) {} but the program still aborts with FATAL: exception not rethrown.

The exact line that fails is we create a new PyThreadState and then try to swap it and lock the GIL w/ PyEval_AcquireThread in the pybind11 code.

The docs for PyEval_AcquireThread suggest that the calling thread will be terminate if the interpreter is finalizing, which matches the behavior we get. https://docs.python.org/3/c-api/init.html...

The problem is I don't know how to avoid that. I tried _Py_IsFinalizing(), it returns false everywhere. I attempted to use Py_AtExit to schedule a callback where I could implement my own synchronization mechanism, but it only gets scheduled half the time.

Versions

main

cc @albanD

@tringwald
Copy link
Collaborator

tringwald commented Jan 19, 2024

Currently, there is actually a meaningful space of time between when Py_Finalize() is called and when Py_IsFinalizing() returns true. In that time we do a number of things, like wait for all non-daemon threads to stop and run all global atexit hooks. Essentially, Py_IsFinalizing() returns true only at the point in finalization where other threads (i.e. not the current/main thread) should no longer rely on a stable runtime or C-API. (Perhaps the function should be called something like Py_AreThreadsAllowed() instead.)

Relevant information from python/cpython#110490

Would it be possible to use the following as a guard? It seems to be set beforePy_IsFinalizing() is true.

PyThreadState *tstate = PyThreadState_GET();
if (tstate->interp->finalizing){
   ...
}

@peterdelevoryas
Copy link

Would it be possible to use the following as a guard? It seems to be set beforePy_IsFinalizing() is true.

PyThreadState *tstate = PyThreadState_GET();
if (tstate->interp->finalizing){
   ...
}

I don't think so, PyInterpreter doesn't have any public members:

error: member access into incomplete type 'PyInterpreterState' (aka '_is')
            printf("\033[32m%s:%d PyEval_AcquireThread IsFinalizing=%d tstate->interp->finalizing=%d \033[0m\n", __FILE__, __LINE__, _Py_IsFinalizing(), tstate->interp->finalizing);

I mean, I could try hacking something together here cause I have the address of the PyInterpreterState object...

--- a/third-party/pybind11/2.10.4/include/pybind11/gil.h
+++ b/third-party/pybind11/2.10.4/include/pybind11/gil.h
@@ -50,6 +50,25 @@
  * in this case).
  */

+typedef void* PyThread_type_lock;
+
+struct PyInterpreterState_ {
+    struct _is *next;
+    struct _ts *tstate_head;
+
+    /* Reference to the _PyRuntime global variable. This field exists
+       to not have to pass runtime in addition to tstate to a function.
+       Get runtime from tstate: tstate->interp->runtime. */
+    struct pyruntimestate *runtime;
+
+    int64_t id;
+    int64_t id_refcount;
+    int requires_idref;
+    PyThread_type_lock id_mutex;
+
+    int finalizing;
+};
+
 class gil_scoped_acquire {
 public:
     PYBIND11_NOINLINE gil_scoped_acquire() {
@@ -83,7 +102,7 @@
         }

         if (release) {
-            printf("\033[32m%s:%d PyEval_AcquireThread IsFinalizing=%d \033[0m\n", __FILE__, __LINE__, _Py_IsFinalizing());
+            printf("\033[32m%s:%d PyEval_AcquireThread _Py_IsFinalizing()=%d PyThreadState_GET()->interp->finalizing=%d \033[0m\n", __FILE__, __LINE__, _Py_IsFinalizing(), ((PyInterpreterState_*)PyThreadState_GET()->interp)->finalizing);
             PyEval_AcquireThread(tstate);
         }

It still doesn't seem to work here:

caffe2/torch/csrc/PyInterpreter.cpp:218 decref()
pybind11/gil.h:78 gil_scoped_acquire(): PyThread_tss_get()=(nil)
pybind11/gil.h:87 gil_scoped_acquire(): PyGILState_GetThisThreadState()=(nil)
pybind11/gil.h:92 gil_scoped_acquire(): PyThreadState_New()=0x7f2fcce223c0
pybind11/gil.h:105 PyEval_AcquireThread _Py_IsFinalizing()=0 PyThreadState_GET()->interp->finalizing=0
terminate called without an active exception

I even attempted wrapping finalizing w/ std::atomic<int> and doing a .load(std::memory_order_acquire), doesn't seem to make a difference.

To be clear, the backtrace for the Python interpreter thread in the coredump indicates Py_FinalizeEx() was entered:

warning: Section `.reg-xstate/481759' in core file too small.
#0  __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:45
45      pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f379fa00000 (LWP 481759))]
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f381d3ee000 (LWP 471381))]
warning: Section `.reg-xstate/471381' in core file too small.
#0  _ZL13gc_reset_refsP9PyGC_Headl.__uniq.137703359133237359249323886361199135396 (g=0x7f37d497e5b0, refs=1) at ./third-party/python/3.10/Modules/gcmodule.c:105
105     ./third-party/python/3.10/Modules/gcmodule.c: No such file or directory.
(gdb) bt
#0  _ZL13gc_reset_refsP9PyGC_Headl.__uniq.137703359133237359249323886361199135396 (g=0x7f37d497e5b0, refs=1) at ./third-party/python/3.10/Modules/gcmodule.c:105
#1  _ZL11update_refsP9PyGC_Head.__uniq.137703359133237359249323886361199135396 (containers=0x7f37e9d092b0) at ./third-party/python/3.10/Modules/gcmodule.c:427
#2  _ZL18deduce_unreachableP9PyGC_HeadS0_.__uniq.137703359133237359249323886361199135396 (base=0x7f37e9d092b0, unreachable=0x7ffecc464d58) at ./third-party/python/3.10/Modules/gcmodule.c:1104
#3  _ZL15gc_collect_mainP3_tsiPlS1_i.__uniq.137703359133237359249323886361199135396 (tstate=0x7f380c49b040, generation=2, n_collected=0x7ffecc464da0, n_uncollectable=0x7ffecc464da8, nofail=0)
    at ./third-party/python/3.10/Modules/gcmodule.c:1239
#4  0x00007f380f32dd9a in _ZL24gc_collect_with_callbackP3_tsi.__uniq.137703359133237359249323886361199135396 (tstate=0x7f380c49b040, generation=2) at ./third-party/python/3.10/Modules/gcmodule.c:1413
#5  0x00007f380f32dce4 in PyGC_Collect () at ./third-party/python/3.10/Modules/gcmodule.c:2099
#6  0x00007f380f32d77c in Py_FinalizeEx () at ./third-party/python/3.10/Python/pylifecycle.c:1781
#7  0x00007f380f334842 in Py_RunMain () at ./third-party/python/3.10/Modules/main.c:668
#8  0x0000555e2a1c44ca in main ()

Edit: Oh, but this probably makes sense. I just looked at the exact version of Python I'm using. 3.10 doesn't have interp->finalizing = 1:

int
Py_FinalizeEx(void)
{
    int status = 0;

    _PyRuntimeState *runtime = &_PyRuntime;
    if (!runtime->initialized) {
        return status;
    }

    /* Get current thread state and interpreter pointer */
    PyThreadState *tstate = _PyRuntimeState_GetThreadState(runtime);

    // Wrap up existing "threading"-module-created, non-daemon threads.
    wait_for_thread_shutdown(tstate);

    // Make any remaining pending calls.
    _Py_FinishPendingCalls(tstate);

    /* The interpreter is still entirely intact at this point, and the
     * exit funcs may be relying on that.  In particular, if some thread
     * or exit func is still waiting to do an import, the import machinery
     * expects Py_IsInitialized() to return true.  So don't say the
     * runtime is uninitialized until after the exit funcs have run.
     * Note that Threading.py uses an exit func to do a join on all the
     * threads created thru it, so this also protects pending imports in
     * the threads created via Threading.
     */

    _PyAtExit_Call(tstate->interp);

@cpuhrsch cpuhrsch added module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jan 20, 2024
@Skylion007
Copy link
Collaborator

I think we tried to patch this in CPython on pybind11, alas we had trouble getting it merged: python/cpython#28525

@albanD
Copy link
Collaborator

albanD commented Jan 24, 2024

Would a solution on our end here be to mimick exactly for each version of cpython the function that checks if the gil acquisition will terminate (https://github.com/python/cpython/blob/main/Python/pystate.c#L2749 in the latest version for example and https://github.com/python/cpython/blob/3.10/Python/ceval_gil.h#L204 for 3.10) and use that to guard calling the gil acquisition?
In particular for the latest version we need to do a 2 step check to detect if we're in the first or second phase of finalization.
For older versions, it is simpler as only one check exists.

There is still the possibility of a race if the finalization starts in between our check and the gil acquisition (that would need the cpython PR above to solve).

I'm not very optimistic it will solve @peterdelevoryas issue though as in 3.10, we only have the one flag which we already guard on. So we are in a case where we're hitting exactly this worst case scenario where the cleanup early in the finalization triggers the thread to destroy objects which is very likely to race with the flag being set just after.
The benefit of doing the change above though is that for 3.12 (where we have the extra early termination flag) it should solve the issue and so we can at least confirm that, when the workload migrates to 3.12 (which I guess is going to be EOY in this case), it will work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants