No clear separation between "fast API" (unsafe) and "safe API" #61

vstinner · 2023-07-17T15:40:25Z

For historical reasons, the Python C API is a mixture of various kinds of functions:

Internal functions, exposed simply because it was the only way to be able to use it in CPython itself
Private functions, supposed to be not be used by 3rd party code, but sometimes added on purpose since we couldn't agree if it should be added or not
Clean and safe public API which is designed to be stable accross Python versions

In Python 3.7, a new Include/internal/ subdirectory was added: internal API which must not be used outside CPython have been moved slowly (one by one) there. Most of these functions are no longer exported: so technically it's no longer possible to use them outside Python. Moreover, the Py_BUILD_CORE macro must be defined to access it: it's an opt-in option (similar to issue #54 idea) to clearly point out that: hey, you're in the danger zone!

Recently, in Python 3.12, the concept of "Unstable API" was added: PEP 689 – Unstable C API tier. This one is a middleground for function that we want to expose as "public", but are known to be unstable between Python versions, like PyCore_New() which became: **PyUnstable_**Code_New().

There is still a gray area between "fast API" and "safe API".

Examples of unsafe fast APIs vs safe APIs:

PyTuple_GET_ITEM() vs PyTuple_GetItem() (runtime type check, check index bounds)
_PyThreadState_UncheckedGet() (can return NULL) vs PyThreadState_Get() (fatal error instead of returning NULL)

Differences between "fast API" and "safe API":

Fast API usually avoid runtime type check and avoid index bounds check, assertions may be used. Safe API implement runtime checks.
Fast API can sometimes implemented as macro or static inline function (ex: access structure members). Safe API is usually implemente as opaque function call.

It's more common for the "fast API" to rely on the compiler to produce efficient code (ex: inline code). Sadly, it may cause more portability issues (compiler not fully implementing a C standard or a specific function).

The text was updated successfully, but these errors were encountered:

vstinner · 2023-07-17T15:42:02Z

See issue #31 "The C API is weakly typed" which is related.

vstinner · 2023-07-17T15:43:08Z

One practical problem is the lack of a real "debug mode" to debug bugs or crashes in the unsafe fast API: see issue #36 "Debug mode that can be easily activated without recompilation" for example.

vstinner · 2023-07-17T16:22:31Z

Issue #31 asks also to raise TypeError rather than SystemError in functions like PyTuple_Size(), PyList_Append() or PyDict_GetItemWithError() if the first parameter has an unexpected type (these functions expect respectively: tuple, list, dict; or subtypes of these types).

That's another difference: passing the wrong type to a fast API is treated as a bug in the C extension by calling PyErr_BadInternalCall() which raises a *SystemError.

Whereas generic safe API like PyObject_GetItem() raises TypeError if the argument is not a mapping (nor a sequence). That's not "a bug" but the "normal expected behavior". In Python, you get the same exception when using obj["key"] if obj is not a mapping (nor a sequence).

vstinner · 2023-07-17T17:00:32Z

CPython, Cython and pybind11 would be clear consumers of a fast API: they implement their own efficient checks in the calling site, so calling unsafe API is perfectly fine. Cython is used to use private APIs to get best performance; it even uses the internal API in some places.

For example, CPython uses such fast-path for the dict type:

            if (PyDict_CheckExact(ns))
                err = PyDict_SetItem(ns, name, v);
            else
                err = PyObject_SetItem(ns, name, v);

This code cannot fail because the first parameter has the wrong type, since the caller does check the type.

Consumers of the "Safe API" are all C extensions which use directly the C API, without using higher level API like Cython. This code is sometimes called "hand written" code. Using directly the C API is hard: bugs are common, especially related to reference counting. Any help to reduce the error rate is welcomed. It can be an opt-in debug mode to run a test suite in slower mode to catch some common known bugs.

Since there is no clear separation between the "Fast API" and the "Safe API", this category of users sometimes pick the "Fast API" "by mistake": either because they "believe" that it can make their extension way faster (it's worth it!), or because they didn't notice that a safer API is available.

When I write code with the Python C API, honestly, it's common that I just copy/paste code around without fully understanding its purpose. Since this code "just works", I'm happy that my new code also works. Sometimes, I make tiny changes, and oops, suddenly it does crash or leak a reference :-)

pitrou · 2023-07-24T09:55:40Z

Just a note: if you call it "fast API", everyone will instinctively try to use it. If you call it "intrusive API", people will be more cautious.

You could also call it "unsafe API", but it's only unsafe if you're using it wrong, of course.

It's more common for the "fast API" to rely on the compiler to produce efficient code (ex: inline code). Sadly, it may cause more portability issues (compiler not fully implementing a C standard or a specific function).

Do we care about non-conforming compilers?

vstinner · 2023-07-24T12:50:01Z

About the naming, IMO Unsafe is more a better name than "Fast" or "Unchecked". For example, I don't think that PyUnicodeFastCopyCharacters() was a clever choice. The other existing name is "Unchecked": PyThreadStateUncheckedGet().

"Unsafe" means that "it is unsafe unless you check assumptions in the caller". In many cases, it's easy to check these assumptions. For example, PyDict_GetItem() makes the assumption that you pass a dictionary. (Well, this one currently that the argument type is a dict.) It's common that function creates a dict or store a dict, and so know the argument type.

I prefer to mark unsafe functions as "Unsafe" (in their name) to help people discovering the API that: oh, maybe they should avoid these functions until they fully understand their usage.

The "PyUnstable" API seems to have a different purpose.

vstinner · 2023-07-24T12:51:30Z

Do we care about non-conforming compilers?

I don't think that you should pretend that putting "Fast" in a function name makes it "faster" :-) I'm not convinced that removing a few pre-conditions and avoiding error handling in general has a significant performance difference. Maybe avoiding indirections by using PyDict_GetItem() instead of PyObject_GetItem() makes a larger difference. I don't know. People just have believes and nobody runs benchmarks :-)

pitrou · 2023-07-24T13:31:18Z

Well, if you're accessing all (or most) elements in a list, avoiding checks every time probably makes a difference. If you're doing a single access, then it probably doesn't matter.

vstinner · 2024-09-19T14:30:51Z

Follow-up issue: capi-workgroup/api-evolution#45. I close this issue since there is no concrete action which can be triggered to enhance the situation. If I manage to design a more concrete action, I will open a new issue.

vstinner mentioned this issue Jul 17, 2023

The C API is weakly typed #31

Open

iritkatriel added the v label Jul 20, 2023

iritkatriel removed the v label Oct 23, 2023

vstinner closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No clear separation between "fast API" (unsafe) and "safe API" #61

No clear separation between "fast API" (unsafe) and "safe API" #61

vstinner commented Jul 17, 2023 •

edited

Loading

vstinner commented Jul 17, 2023

vstinner commented Jul 17, 2023

vstinner commented Jul 17, 2023

vstinner commented Jul 17, 2023

pitrou commented Jul 24, 2023

vstinner commented Jul 24, 2023

vstinner commented Jul 24, 2023

pitrou commented Jul 24, 2023

vstinner commented Sep 19, 2024

No clear separation between "fast API" (unsafe) and "safe API" #61

No clear separation between "fast API" (unsafe) and "safe API" #61

Comments

vstinner commented Jul 17, 2023 • edited Loading

vstinner commented Jul 17, 2023

vstinner commented Jul 17, 2023

vstinner commented Jul 17, 2023

vstinner commented Jul 17, 2023

pitrou commented Jul 24, 2023

vstinner commented Jul 24, 2023

vstinner commented Jul 24, 2023

pitrou commented Jul 24, 2023

vstinner commented Sep 19, 2024

vstinner commented Jul 17, 2023 •

edited

Loading