Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when importing collections.abc #125245

Closed
ngoldbaum opened this issue Oct 10, 2024 · 8 comments
Closed

Race condition when importing collections.abc #125245

ngoldbaum opened this issue Oct 10, 2024 · 8 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Oct 10, 2024

Bug report

Bug description:

Discovered alongside #125243 with similar steps to reproduce. I don't have a simpler way to trigger this than "run the PyO3 tests in a loop" because I think it requires many threads accessing the python runtime simulatenously.

To trigger it, have a rust toolchain and Python installed, clone the PyO3 repo and execute the following command:

while RUST_BACKTRACE=1 UNSAFE_PYO3_BUILD_FREE_THREADED=1 cargo test --lib --features=full -- --test-threads=100; do :; done

You may also hit some other test failures related to ZoneInfo, see the other issue I opened about that.

You will eventually see a test failure with the following text:

---- exceptions::socket::tests::gaierror stdout ----
thread 'exceptions::socket::tests::gaierror' panicked at src/impl_/exceptions.rs:26:17:
failed to import exception socket.gaierror: ImportError: cannot import name 'Mapping' from 'collections.abc' (/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/collections/abc.py)

I slightly modified PyO3 to get a traceback as well (hidden because it's a distractingly long diff):

diff --git a/src/impl_/exceptions.rs b/src/impl_/exceptions.rs
index 15b6f53b..de63ad59 100644
--- a/src/impl_/exceptions.rs
+++ b/src/impl_/exceptions.rs
@@ -1,4 +1,8 @@
-use crate::{sync::GILOnceCell, types::PyType, Bound, Py, Python};
+use crate::{
+    sync::GILOnceCell,
+    types::{PyTracebackMethods, PyType},
+    Bound, Py, Python,
+};

 pub struct ImportedExceptionTypeObject {
     imported_value: GILOnceCell<Py<PyType>>,
@@ -20,8 +24,11 @@ impl ImportedExceptionTypeObject {
             .import(py, self.module, self.name)
             .unwrap_or_else(|e| {
                 panic!(
-                    "failed to import exception {}.{}: {}",
-                    self.module, self.name, e
+                    "failed to import exception {}.{}: {}\n{}",
+                    self.module,
+                    self.name,
+                    e,
+                    e.traceback(py).unwrap().format().unwrap(),
                 )
             })
     }

And the traceback is:

Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/socket.py", line 55, in <module>
    import os, sys, io, selectors
  File "/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/selectors.py", line 10, in <module>
    from collections.abc import Mapping

So somehow during setup of the socket module, Mapping isn't available yet, but only if many threads are simultaneously touching the Python runtime.

(ping @davidhewitt, we probably want to disable the socket error tests on the free-threaded build as well)

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

@ngoldbaum ngoldbaum added the type-bug An unexpected behavior, bug, or error label Oct 10, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Oct 10, 2024
Lock `ZoneInfoType` to protect accesses to `ZONEINFO_STRONG_CACHE`.
Refactor the `tp_new` handler to use Argument Clinic so that we can just
use `@critical_section` annotations on the relevant functions.

Also use `PyDict_SetDefaultRef` instead of `PyDict_SetDefault` when
inserting into the `TIMEDELTA_CACHE`.
@colesbury
Copy link
Contributor

colesbury commented Oct 10, 2024

EDIT: Commented on wrong issue

@colesbury
Copy link
Contributor

colesbury commented Oct 11, 2024

There's a race condition that I believe was introduced in #123613. The sys.modules[__name__] = _collections_abc is not safe with the longstanding importlib optimization that avoids _ModuleLockManager if the module is already initialized:

# Optimization: we avoid unneeded module locking if the module
# already exists in sys.modules and is fully initialized.
module = sys.modules.get(name, _NEEDS_LOADING)
if (module is _NEEDS_LOADING or
getattr(getattr(module, "__spec__", None), "_initializing", False)):

I think this may affect the default (with GIL) build as well depending on when the GIL-switch happens, but it definitely will happen more often on the free threading build.

The problematic execution when two threads (T1, T2) run import collections.abc:

  1. T1: Starts to import Lib/collections/abc.py, sets sys.modules["collections.abc"]
  2. T2: Loads module = sys.modules["collections.abc"] (line 1353 above). This points to the Lib/collections/abc.py module.
  3. T1: Finishes importing collections/abc.py, sys.modules["collections.abc"] now points to the Lib/_collections_abc.py module.
  4. T2: Checks module. __spec__. _initializing (line 1355) on the . This module is finished initializing, so it's returned. Note that at this point sys.modules["collections.abc"] points to the correct _collections_abc.py module, but the local variable module still points to the collections/abc.py module.

T2 now incorrectly has the Python collections.abc module.

cc @serhiy-storchaka

EDIT: Lib/_collections_abc.py is also a Python module

@colesbury colesbury added 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Oct 11, 2024
@colesbury
Copy link
Contributor

I think we can do something like:

--- a/Lib/collections/__init__.py
+++ b/Lib/collections/__init__.py
@@ -29,6 +29,9 @@
 import _collections_abc
 import sys as _sys

+_sys.modules['collections.abc'] = _collections_abc
+abc = _collections_abc
+
 from itertools import chain as _chain
 from itertools import repeat as _repeat
 from itertools import starmap as _starmap
--- a/Lib/collections/abc.py
+++ /dev/null
@@ -1,3 +0,0 @@
-import _collections_abc
-import sys
-sys.modules[__name__] = _collections_abc

@colesbury colesbury changed the title Importing the socket module fails on free-threaded build Race condition when importing collections.abc Oct 13, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Oct 13, 2024
If multiple threads concurrently imported `collections.abc`, some of the
threads might incorrectly see the "shim" `Lib/collections/abc.py` module
instead of the correct `Lib/_collections_abc.py` module.  This was
observed in the free threading build, but could, in theory, occur in the
default GIL-enabled build as well.
@hroncok
Copy link
Contributor

hroncok commented Oct 16, 2024

I think this may affect the default (with GIL) build as well depending on when the GIL-switch happens, but it definitely will happen more often on the free threading build.

Indeed. We see this in Fedora with the default (with GIL) build as well.

@ngoldbaum
Copy link
Contributor Author

This is fixed now.

@colesbury
Copy link
Contributor

There's an open PR, but it's not merged yet. I don't think the underlying issue is fixed.

I'm waiting on more feedback regarding the issue that Serhiy raised on the PR:

#125415 (review)

@colesbury colesbury reopened this Oct 24, 2024
@ngoldbaum
Copy link
Contributor Author

Oops, I'm sorry!

colesbury added a commit that referenced this issue Oct 24, 2024
If multiple threads concurrently imported `collections.abc`, some of the
threads might incorrectly see the "shim" `Lib/collections/abc.py` module
instead of the correct `Lib/_collections_abc.py` module.  This affected
both the free threading build and the default GIL-enabled build.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 24, 2024
…ythonGH-125415)

If multiple threads concurrently imported `collections.abc`, some of the
threads might incorrectly see the "shim" `Lib/collections/abc.py` module
instead of the correct `Lib/_collections_abc.py` module.  This affected
both the free threading build and the default GIL-enabled build.
(cherry picked from commit fed501d)

Co-authored-by: Sam Gross <[email protected]>
colesbury added a commit that referenced this issue Oct 24, 2024
…GH-125415) (GH-125944)

If multiple threads concurrently imported `collections.abc`, some of the
threads might incorrectly see the "shim" `Lib/collections/abc.py` module
instead of the correct `Lib/_collections_abc.py` module.  This affected
both the free threading build and the default GIL-enabled build.
(cherry picked from commit fed501d)

Co-authored-by: Sam Gross <[email protected]>
@colesbury
Copy link
Contributor

The PRs are merged now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants