Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuples should be immutable and safe in C, as well as in Python. #127058

Open
markshannon opened this issue Nov 20, 2024 · 8 comments
Open

Tuples should be immutable and safe in C, as well as in Python. #127058

markshannon opened this issue Nov 20, 2024 · 8 comments
Labels
3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) triaged The issue has been accepted as valid by a triager. type-bug An unexpected behavior, bug, or error

Comments

@markshannon
Copy link
Member

markshannon commented Nov 20, 2024

Bug report

Bug description:

[Apologies if this sounds a bit like a rant. I'm not blaming anyone. Just because something is the wrong choice now, doesn't mean it wasn't the right choice historically]

Tuples are immutable in Python, but we play all sorts of games in C with tuples, filling them will NULLs, mutating them and reusing them.

We do this in the mistaken belief that it improves performance.
But it doesn't. It makes the code base more complicated and fragile as we need to work around tuples that misbehave and do strange things. Any local performance gain is overwhelmed by slowdowns caused by the extra complexity in tuple code, the garbage collector and a few other places.

So let's fix this.

We need to:

  • Provide a new C API PyTuple_MakePair(). Pairs are by far the most common type of tuple that we play games with. By providing a fast way to create pairs, we can provide an upgrade path for C code that creates tuples in unsafe ways to do so safely and quickly.
  • Deprecate PyTuple_New. I don't know when we'll be able to remove it, but we should deprecate it ASAP.
  • Change PyTuple_New to fill the tuple with pointers to None instead of NULL. This doesn't fix the mutability issue, but it at least means the GC will only see valid objects. (This might break too much code, so we might just have to clearly document that tuples should be fully initialized in one go, before the tuple escapes the function it was created in)
  • Fix our own code to not use PyTuple_New() or perform tuple shenanigans. We can't reasonably expect third-party package authors to follow the rules if we don't.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@markshannon markshannon added type-bug An unexpected behavior, bug, or error interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.14 new features, bugs and security fixes labels Nov 20, 2024
@markshannon
Copy link
Member Author

I'm filing this as a bug rather than an enhancement as it does cause bugs. Most notably: https://github.com/python/cpython/blob/main/Lib/test/crashers/gc_inspection.py

@markshannon
Copy link
Member Author

#59313

@markshannon
Copy link
Member Author

PyTuple_SetItem and PyTuple_SET_ITEM need to go too.

@colesbury
Copy link
Contributor

This requires a massive change to C extensions, but doesn't actually address the problem that "tuples should be immutable."

Change PyTuple_New to fill the tuple with pointers to None instead of NULL

This would break backwards compatibility with C extensions. See some examples: https://github.com/search?q=%2FPyTuple_GET_ITEM.*%3D%3D.*NULL%2F&type=code

gc.get_referrers has a lot of issues, not just with tuples. As the linked files writes:

Note that this is only an example. There are many ways to crash Python
by using gc.get_referrers(), as well as many extension modules (even
when they are using perfectly documented patterns to build objects).

@markshannon
Copy link
Member Author

markshannon commented Nov 20, 2024

It might take a while for people to stop using PyTuple_New, but that doesn't mean we shouldn't deprecate it.
I suspect we won't be removing it for a long time.

I think you need to refine your search for PyTuple_GET_ITEM and NULL. A lot of the results look like foo(PyTuple_GET_ITEM(tuple, index)) == NULL which is fine.

gc.get_referrers has a lot of issues, not just with tuples.

Such as?

Partially created tuples may not be the only culprit, but they are the main one, I think.

@colesbury
Copy link
Contributor

It might take a while for people to stop using PyTuple_New, but that doesn't mean we shouldn't deprecate it

Deprecating commonly used APIs imposes a cost to C API extension authors even when we don't remove the existing API. Many extension authors will change their code in order to adopt the new best practices. If we don't have a foreseeable path to actually removing PyTuple_New() then it's unlikely that we'll ever reap the benefits of the deprecation. We'll have created churn for extension authors without delivering any benefits.

I think you need to refine your search...

My point is that it's a breaking change. The search does not capture all the uses that would break either.

gc.get_referrers has a lot of issues, not just with tuples. Such as?

As Armin wrote in #39117: "Expecting an object not to be seen before you first hand it out is extremely common, and get_referrers() breaks that assumption."

PyType_GenericAlloc is the most common way to create extension objects, and it returns an object that is already tracked, but not yet initialized (other than zero initialization).


  • In addition to deprecating PyTuple_New(), won't this also require deprecating PyTuple_SET_ITEM and PyTuple_SetItem for initializing tuples?
  • Other than the proposed PyTuple_MakePair, what are the intended replacements for PyTuple_New()? PyTuple_Pack()? How is someone supposed to create a tuple with a dynamic number of arguments?
  • Where is the extra complexity that this would improve? There's not a whole lot of tuple-specific code in the GC, and I don't see any GC code that is dedicated to handling tuple mutability.

@markshannon
Copy link
Member Author

Other than the proposed PyTuple_MakePair, what are the intended replacements for PyTuple_New()? PyTuple_Pack()? How is someone supposed to create a tuple with a dynamic number of arguments?

We already have PyTuple_FromArray which is an efficient way to create a tuple. We could expose PyTuple_FromArraySteal which is even more efficient if you don't need the references to the values in the array.

For creating tuples of unknown size, the best way is to create a list and then convert it to a tuple with PyList_AsTuple
If the extra overhead of clearing the list is a concern we could add PyList_AsTupleAndClear which would clear the list at the same time as creating the tuple, saving the cost of modifying reference counts.

Looking at the uses of PyTuple_New in CPython, prepending an object to a tuple is surprising common. So we could add a new function PyTuple_Prepend(PyObject *first_item, PyObject *tuple). PyTupleConcat is another possibility.

Where is the extra complexity that this would improve?

Not just the GC, but in the optimizer and in memory management. Knowing that tuples are genuinely immutable allows some useful savings and a few tricks.

Mainly it allows us to make reasoned improvements. For example, when untracking tuples in the GC we should be able to assume that, thanks to immutability, a tuple must be younger than the objects it contains. Therefore a simple oldest-first scan should find all tuples that can be untracked. Sadly, this isn't the case. Likewise, we would like to know that tuples cannot contain themselves.

@picnixz picnixz added type-feature A feature request or enhancement type-bug An unexpected behavior, bug, or error 3.14 new features, bugs and security fixes triaged The issue has been accepted as valid by a triager. and removed type-bug An unexpected behavior, bug, or error 3.14 new features, bugs and security fixes type-feature A feature request or enhancement labels Dec 9, 2024
@picnixz
Copy link
Contributor

picnixz commented Dec 9, 2024

(sorry I missed the comment on triaged)

markshannon added a commit that referenced this issue Dec 11, 2024
* Use a small buffer, then list when constructing a tuple from an arbitrary sequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) triaged The issue has been accepted as valid by a triager. type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants