-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The C API is weakly typed #31
Comments
Note that C type safety does nothing for interop with other languages, so if we push for this we should also invest in that area -- e.g. publish enough parseable info to make safe wrappers easy to generate (#7). |
Violating type rules is unsafe. Wrappers don't make things safer, just more convenient. |
Yup, C wrappers are useless. Let me try to rephrase. Languages without a C compiler only see exported DLL symbols (and whatever they manage to parse from C headers). |
I would prefer to do the opposite: move to HPy-like opaque handles to fix issue #22. |
One way of looking at this is that we still do not want to expose the memory layout and have opaque pointers, but we can still use C types to provide bit better user experience and some level of type checking, but not a real type system that would allow us to, for example, be absolutely sure that only handles pointing to tuples ever make it to We did consider this in HPy and I think it would be useful, but we need to iron out the API and do more research into use-cases. For example, this would be also useful for user defined types since those are also opaque. When porting NumPy we noticed that by replacing the types like Example:
|
See also issue #37. |
For me, the question here is also how much the Python release build should help developers when they misuse the C API. Provide a cute well formatted exception? Kill the process with SIGABRT on an assertion error? Ignore silently the error and attempt to provide a behavior / result which prevent a crash? A practical problem is that a debug build of Python is not widely available: see issue #36. I'm in favor of removing assertions in a release build and require developers to use a debug build, but there are practical issues with that. |
I would prefer to move the API towards type agnostic functions like PyObject_GetItem() instead of PyDict_GetItem(), PyObject functions have a good API: return a strong reference and raise the expected exception in case of error. So far, I'm not convinced that outside CPython it's really worth it to use specialized functions like PyTuple_GetItem() or PyTuple_GET_ITEM(), instead of generic PySequence_GetItem(). Did anyone run a (micro or macro) benchmark to see the benefits on specialized functions? Is it really a bottleneck in terms of performance? The other problems is that the C API has no clear tradeoff between performance and stability. "It depends" on the function, on the type, on the context. Some types have specialized functions. Some types have macros. Some others don't. How can users discover what's the best practice? |
If "nogil" is accepted, at least internally we may have |
It is worth noting that debug builds should still do the type checks, so we expect any type errors that slip pass the compiler to be caught in testing. uintptr_t PyTuple_GetSize(PyTupleObject *t)
{
assert(PyTuple_Check(t));
return Py_SIZE(t);
} |
It makes sense to have both, and to use the appropriate one. PyObject *obj, PyDictObject *dict;
PyObject_GetItem(obj); // 👍
PyDict_GetItem(dict); // 👍
PyObject_GetItem((PyObject *)dict); // Pointlessly inefficient
PyDict_GetItem((PyDictObject *)obj); // Unsafe |
IMO, we should try to come up with a guideline regarding this problem before adding new APIs. I don't have a strong preference for strongly or weakly typed APIs; there are pros and cons for each variant as I see it1. I think @markshannon's proposal in #31 (comment) of having both weakly typed and strongly typed APIs makes sense, though. Footnotes
|
I created python/devguide#1127 to discuss a solution (that is: guidelines for new APIs) |
The
|
Yeah, that's a better term; thanks. |
Let me elaborate: I would prefer that PyObject, PyDictObject and PyTupleObject members to not be part of the public C API: enforce the usage of getter and setter functions. If we move towards "opaque handles", technically, The problem is that a big part of the existing C API currently uses #define PyObject_New(type, typeobj) ((type *)_PyObject_New(typeobj)) If possible I would prefer to move away from macros (see PEP 670) :-( |
Example of worst case: "downgrade" the type to be able to call a generic Py_NewRef() function which expects PyTypeObject *base = type->tp_base;
// !!! 2 casts are required in a single line of code :-( !!!
type->tp_base = (PyTypeObject*)Py_NewRef((PyObject*)base); Maybe casting Py_NewRef() argument to Maybe we could have a macro calling Py_NewRef() and casting the result to an expected type. Would it be better than doing the cast explicitly? I'm not sure. |
In PR #106005, @markshannon asked me to use a PyAPI_FUNC(int) PyDict_GetItemRef(PyObject *mp, PyObject *key, PyObject **result); My concern is that it would make the API inconsistent: all PyDict functions create and expect PyAPI_FUNC(PyObject *) PyDict_New(void);
PyAPI_FUNC(PyObject *) PyDict_GetItem(PyObject *mp, PyObject *key);
PyAPI_FUNC(void) PyDict_Clear(PyObject *mp);
PyAPI_FUNC(PyObject *) PyDict_Keys(PyObject *mp);
PyAPI_FUNC(Py_ssize_t) PyDict_Size(PyObject *mp);
PyAPI_FUNC(PyObject *) PyDict_Copy(PyObject *mp);
PyAPI_FUNC(int) PyDict_Contains(PyObject *mp, PyObject *key); If a single function expects Would it be possible to have a "switch" to opt-in for the I suppose that such "switch" could be two things:
I'm also open to consider an incremental approach: use specific type but only for newly added function. |
Can HPy provide a type which can be used to define types which are aliases to HPy? The problem is the addition of the pointer: |
I'm proposing to add PyDict_GetItemRef() which is basically the same (just the name is different): PR #106005. |
Why is
Except that You claim this would make the API inconsistent because Having some functions produce and take a |
I suspect one barrier against returning |
I wrote private macros to cast a pointer to #define _PyObject_CAST(op) _Py_CAST(PyObject*, (op))
#define _PyTuple_CAST(op) \
(assert(PyTuple_Check(op)), _Py_CAST(PyTupleObject*, (op))) When a generic These macros might implement more advanced checks tomorrow if needed: I recommend using them :-) By the way, I tried but failed to fix |
I corrected myself: using
I mean that the current PyDict_New() function returns |
Even in programming languages like C++ and Java there is a problem of non-homogeneous containers (generics and templates solve it only partially). For example, let you have a list of dicts. |
Most Python developers and Python C API extension maintainers never have a debug build of CPython. So those asserts do not help them. They're always developing against release builds of CPython. I know this because we do our default testing builds at Google by default with (what we find and fix tends to get pushed upstream, in a randomly delayed fashion as we're usually not on the latest versions of things, but lets not assume we can rely on the goodwill of one unusual large user eventually cleaning up everyone else's undetected already-shipped messes for this) |
IMNSHO for the entire history of CPython our C API has used the opaque I understand the theoretical desires here, but the type of an object coming out of a Python C API is generally opaque and we never guarantee that there even is a C type corresponding to a given Python type. If and when true, that is supposed to be a hidden implementation detail. ie: Start returning specific C types from our constructors and everyone will then need to pull their hair out adding typecasts everywhere as they pass that into other API calls to make use of it. There is no reason to encourage the use of specific C pointer types. The only thing that'll lead to is users needing to litter their code with typecasts that they rarely need today because we're a dynamic language. Every typecast is an opportunity to be wrong in a manner that reads to glazed over eyes as if it were correct to future maintainers of the code. Put another way: What specific C API use bugs will exploding our C API to litter people's code with Weigh that against what long term bugs it'll cause anytime a typecast is wrong. I wouldn't mind if we had strongly typed C APIs for use by the CPython internals for our own purposes where we control everything - but I think it'd be a mistake to clutter our public C API with these. |
fwiw the opening statement of this issue conflates two problems: C types for Python internal things, and existing C APIs that have awful behavior. those are unrelated concepts. we should seek to provide non-awful behavior for our C APIs that have such behaviors. But that has nothing to do with weak C types. |
Long-term: |
TBH, this comment made me quite angry. I am not conflating internal things with the C API. The title of this issue is "The C API is weakly typed". The project is "CAPI workgroup". This is about the C API. |
While it is true that almost all C API functions accept Functions that are strongly typed like It is functions like Let me give an example (skipping error handling and refcounts for brevity): PyObject *t = PyTuple_Pack(...);
PyObject *i = PyLong_FromLong(...);
PyObject *v = PyObject_GetItem(t, i); which is fine, it all works as expected. Whereas PyObject *t = PyTuple_Pack(...);
PyObject *i = PyLong_FromLong(...);
PyObject *v = PyDict_GetItem(t, i); is obviously broken, but is just fine as far as the C compiler is concerned. The two examples type check the same.
This feels like a strawman argument. You are proposing a bad solution to a problem (littering the code with typecasts) and using that to claim that there is no problem.
While downcasts are an opportunity to be wrong, upcasts are always safe. The distinction is important. I don't want to propose a solution here, but here are some ideas on how to handle casts. An alternative to casts would be to make all API functions dynamically typed, removing Whatever the solution, weak typing in the C API is a problem. |
(edit: I was wrong!) |
They are different, |
I would argue that this is not true with custom types. Take a look at the signature of this function from NumPy:
With universal ABI and opaque
and I would argue that this is bit less readable and can lead to confusion about whether, e.g., We actually have experience with this when porting NumPy to HPy, where you get the same problem, and we run into few bugs due to this "type" confusion. This GitHub issue is about builtin types, but I wanted to point this out, because if this is to be solved for custom types, then the solution for builtin types (if any) should be consistent or ideally the same. |
I feel that people are talking past each others. @markshannon clearly wants to use types other than @gpshead is talking about the current C API. He explains that it's rare that people have access to a Python built with assertions, so we must keep the runtime checks. He explained that For now, I would like to ask if for API addition to the current existing C API, should we continue the trend of using As I wrote previously, if we switch to specific types like |
Please stop putting words in my mouth. All this issue says is that the current C API is weakly typed, and that is a problem, both in terms of performance and ease of use. What I, or anyone else, wants to do about it is out of scope for this repo. |
Also, please stop with the straw man arguments about casts. Casts undermines the type safety that C provides, we all know that. |
An example of why I bring up casts: If This is a practical example of existing C API code people routinely write today. Code goes both directions from knowing a specific type of something to the generic object and from a generic object to claiming it is a specific type in existing C API use all the time. Maybe those don't have to be written as direct C and C++ cast syntax, they could be hidden behind APIs such as static inlines or macros that do the type laundering job (@vstinner's example) including possible assertions at a level that end users compiling their own code, regardless of CPython's own build mode, might see. I'll call this a "type transition API" to avoid the word cast there (even though it probably does one internally - the point is that the API user isn't writing a cast). From a C API evolution standpoint, we could declare user code doing casting to be a bad idea with a long term goal of requiring the use of specific APIs for all object<->known_type transitions. Those would replace any C or C++ casts that exist today or would need to exist if we changed the pointer types. Channelling @encukou, we could change this today on an opt-in basis: generate flavors of our C API headers, or use preprocessor defines to select if specific-pointer-types are desired as inputs and outputs from type specific APIs. (at an ABI level nothing changes, binary pointers are generic w/o a type). But we shouldn't let the possibility of such an overall change hold us back on adding new needed C APIs today using our existing generic PyObject practice, or encourage us to start using specific types on new APIs instead of generic objects. Adding specific types on new APIs before we've decided on and provided type transition APIs is harmful. If we do manage to provide such type transition APIs before a newly added C API has shipped, we can go ahead and just upgrade it to require the specific type from the start and point users of it to the similarly new type transition API. The transition period for people's C code if doing this for our existing APIs will be long. Most PyPI extension authors need to support compilation against the oldest widely used version of CPython (assume 3.8 today). So they'd be unlikely to opt-in to the more-C-typed API themselves until the version we ship it in has become "oldest". Otherwise their code would be full of ifdefs or C/C++ casts in order to span compilation across all versions. Changing types on parameters in public .h files tends to be quite painful for existing code. A recent practical example of a mistake we made in doing this: 3.7 shipped a change to add |
To a large extent the C API emulates pure Python APIs which are dynamically typed. So the C API is also dynamically typed. In many cases this is good for discoverability. We should think hard whether we want that for a new C API. |
To summarize this issue so far.
Those two viewpoints, that weak typing is bad, and that dynamic typing is good, are not in contradiction. Much of the API that appears more statically typed is weakly typed. |
Amen. |
As I wrote, IMO two groups are talking past each other because they are talking about two different things. I created issue #61: No clear separation between "fast API" (unsafe) and "safe API".
I don't think that "good vs bad" is a good summary. Each solution has advantages and disadvantages. Depending on the use cases, some advantages become disadvantages and the opposite. Debating is part of the process to list use cases, advantages and disadvantages.
I would prefer to say that this GitHub project is not a place to take decisions, but to collect use cases, issues, and maybe solutions. It's hard to not mention solutions to list their advantages and disadvantages. The issue title is "The C API is weakly typed". For me, it's more a solution ("the C API should use dynamic typing") than a problem. Apparently, the root problem is: "The error handling of these functions is awful." Sadly, I don't think that "awful" is helpful here. But the problem is elaborated later:
From what I understood, dynamic typing advantages are:
And disadvantages:
IMO the root underline question is: should the C API be only consumed by 3rd party C extensions who want to best performance? Should it be only used by CPython itself? Or should the C API be designed for any C extensions, and so remain (as currently) very nice with developers who don't read the documentation? Some people would prefer to advertize a safe HPy API for everyone, and have a faster low-level API without error checking. For me, it means that a single API cannot fit all use cases. Currently, the problem is made of multiple sub-problems:
It's not really possible to choose between an "universal API" working on most Python versions and implementations versus an "unstable API" which has best performance and is likely to break between PYthon version and implementations. |
@markshannon your use the phrases "please stop", "please no more", "strawman arguments", "this repo is not the place for", "out of scope for this repo" and claims that I misrepresented you... are all indications to me that you don't want me here and don't appear interested in listening to what I had to say. I did not find that respectful. I'm done with this github repo/project. You've driven me away (intended or not). See y'all over on discuss.python.org. /unsub |
Issue for proposed guidelines: capi-workgroup/api-evolution#29 IMO, we can do better on a lower layer, see capi-workgroup/api-revolution#1 (comment) |
The C type system is weak, compared to Java or Rust, but we should use it where possible.
The function
PyTuple_Size()
should be infallible; all tuples have a size.Yet, it can fail if passed a non-tuple, which can only happen because it is weakly typed.
Like
PyTuple_Size(PyObject *)
, many API functions takePyObject *
when they should take more specific types.Here are a few examples:
PyTuple_Size()
PyList_Append()
PyDict_GetItem()
The error handling of these functions is awful.
PyTuple_Size()
andPyList_Append()
raise aSystemError
, not aTypeError
if passed the wrong type.PyDict_GetItem()
just acts as if the item is not in the container, not raising at all.The text was updated successfully, but these errors were encountered: