Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-126914: Store the Preallocated Thread State's Pointer in a PyInterpreterState Field #126989

Merged

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Nov 19, 2024

This approach eliminates the originally reported race. It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.

@ZeroIntensity
Copy link
Member

I'll get to reviewing in an hour or so. At a glance:

  • Should we backport this? I'm concerned we might make debuggers unhappy on 3.13
  • We should consider a lock-free algorithm for the freelist to help ease the contention if a lot of subintepreters are trying to get created.
  • If I understand correctly, this only applies to the preallocated initial thread. Why not make all thread states participate in the freelist?

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I'll yield to you on whether or not it's worth trying to do this locklessly :)

Python/pystate.c Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Show resolved Hide resolved
Lib/test/test_interpreters/test_stress.py Show resolved Hide resolved
@ericsnowcurrently
Copy link
Member Author

  • Should we backport this? I'm concerned we might make debuggers unhappy on 3.13

Let's circle back to this later. My initial thought is that backporting would make sense. What do you mean about making debuggers unhappy?

  • We should consider a lock-free algorithm for the freelist to help ease the contention if a lot of subintepreters are trying to get created.

Perhaps. I don't anticipate there being enough simultaneous attempts that the overhead of any contention would matter.

  • If I understand correctly, this only applies to the preallocated initial thread. Why not make all thread states participate in the freelist?

I wanted to start off with this. I'm not convinced that an actual freelist of thread states is worth it. Conceptually it made sense as a solution for the pre-allocated thread state. I debated on having a dedicated field for just that one vs. do the freelist thing. Clearly I went with the latter, but now I'm thinking the illusion of a freelist isn't the best thing. I may switch it back.

@ZeroIntensity
Copy link
Member

What do you mean about making debuggers unhappy?

Some low-level debuggers and profilers (something like PyStack, py-spy, and probably some others that I don't know about) might get a nasty surprise if we change where the main thread is stored. I don't think it's that important, as they're already aware of the maintenance burden that comes with relying on private implementation details, but it's not too convincing for a backport. (It's probably easier for downstream if we backport GH-126915, and then just put this on main for going forward, but I'm definitely biased 😄)

Perhaps. I don't anticipate there being enough simultaneous attempts that the overhead of any contention would matter.

I'm not too worried about the subinterpreter case, but instead for just normal multithreading under one interpreter. This will add additional overhead to calling something like PyGILState_Ensure with a lot of threads, and users might not be happy with how it scales.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it looks like you've removed the freelist now. (I would change the title for the commit message.)

This looks pretty good, and I think this is small enough to not worry about giving debuggers too much of a headache. I'll do one final pass a little later today once you've figured out what you want to do about resets and then let's merge :)

@ericsnowcurrently
Copy link
Member Author

might get a nasty surprise if we change where the main thread is stored.

It is still found at PyInterpreterState._initial_thread, so it shouldn't be a problem.

This will add additional overhead to calling something like PyGILState_Ensure with a lot of threads, and users might not be happy with how it scales.

I'd be surprised if lock/atomic contention here were more than insignificant relative to any of the other operations at play. Also keep in mind that PyMutex is mostly a user-space lock, so some of the overhead of contention is reduced (relative to a kernel-provided lock).

@ericsnowcurrently ericsnowcurrently changed the title gh-126914: Store the Preallocated Thread State in a Freelist gh-126914: Store the Preallocated Thread State Pointer in a PyInterpreterState Field Nov 19, 2024
@ericsnowcurrently ericsnowcurrently changed the title gh-126914: Store the Preallocated Thread State Pointer in a PyInterpreterState Field gh-126914: Store the Preallocated Thread State's Pointer in a PyInterpreterState Field Nov 19, 2024
@ZeroIntensity
Copy link
Member

It is still found at PyInterpreterState._initial_thread, so it shouldn't be a problem.

I noticed! I like this fix better than mine with the extra field, it's clever. In fact, we could reuse the preallocated field as a makeshift freelist for later; heap-allocated thread states could also get stored in that field, and then alloc_threadstate will use them just fine.

Also keep in mind that PyMutex is mostly a user-space lock, so some of the overhead of contention is reduced (relative to a kernel-provided lock).

Yeah, contention on those functions is probably not great right now anyways because of the HEAD_LOCK. I'm more talking generally, and trying to think ahead about what we should do in the future.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZeroIntensity ZeroIntensity added the needs backport to 3.13 bugs and security fixes label Nov 19, 2024
@ericsnowcurrently ericsnowcurrently removed the needs backport to 3.13 bugs and security fixes label Nov 19, 2024
@ericsnowcurrently
Copy link
Member Author

I'll add the backport to 3.13 label once gh-126995 is merged. Otherwise sorting out the ABI data file becomes a much bigger pain.

@ericsnowcurrently ericsnowcurrently merged commit 1c0a104 into python:main Nov 19, 2024
41 checks passed
@ericsnowcurrently ericsnowcurrently deleted the thread-state-freelist branch November 19, 2024 19:59
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 Android 3.x has failed when building commit 1c0a104.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1594/builds/622) and take a look at the build logs.
  4. Check if the failure is related to this commit (1c0a104) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1594/builds/622

Failed tests:

  • test_os

Failed subtests:

  • test_timerfd_select - test.test_os.TimerfdTests.test_timerfd_select
  • test_timerfd_ns_select - test.test_os.TimerfdTests.test_timerfd_ns_select

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_os.py", line 4499, in test_timerfd_ns_select
    self.assertEqual(self.read_count_signaled(fd), 1)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 1


Traceback (most recent call last):
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_os.py", line 4334, in test_timerfd_select
    self.assertEqual(self.read_count_signaled(fd), 1)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 1

@ZeroIntensity
Copy link
Member

In hindsight, we probably should have ran buildbots before merging. I'm not sure that's related though.

@ericsnowcurrently
Copy link
Member Author

I'm fairly sure it's unrelated.

ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this pull request Nov 21, 2024
…yInterpreterState Field (pythongh-126989)

This approach eliminates the originally reported race. It also gets rid of the deadlock reported in pythongh-96071, so we can remove the workaround added then.
ericsnowcurrently added a commit that referenced this pull request Dec 2, 2024
…PyInterpreterState Field (gh-127114)

This approach eliminates the originally reported race.  It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.

This is mostly a cherry-pick of 1c0a104 (AKA gh-126989).  The difference is we add PyInterpreterState.threads_preallocated at the end of PyInterpreterState, instead of adding PyInterpreterState.threads.preallocated.  That avoids ABI disruption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants