Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No clear separation between "fast API" (unsafe) and "safe API" #61

Closed
vstinner opened this issue Jul 17, 2023 · 9 comments
Closed

No clear separation between "fast API" (unsafe) and "safe API" #61

vstinner opened this issue Jul 17, 2023 · 9 comments

Comments

@vstinner
Copy link
Contributor

vstinner commented Jul 17, 2023

For historical reasons, the Python C API is a mixture of various kinds of functions:

  • Internal functions, exposed simply because it was the only way to be able to use it in CPython itself
  • Private functions, supposed to be not be used by 3rd party code, but sometimes added on purpose since we couldn't agree if it should be added or not
  • Clean and safe public API which is designed to be stable accross Python versions

In Python 3.7, a new Include/internal/ subdirectory was added: internal API which must not be used outside CPython have been moved slowly (one by one) there. Most of these functions are no longer exported: so technically it's no longer possible to use them outside Python. Moreover, the Py_BUILD_CORE macro must be defined to access it: it's an opt-in option (similar to issue #54 idea) to clearly point out that: hey, you're in the danger zone!

Recently, in Python 3.12, the concept of "Unstable API" was added: PEP 689 – Unstable C API tier. This one is a middleground for function that we want to expose as "public", but are known to be unstable between Python versions, like PyCore_New() which became: **PyUnstable_**Code_New().

There is still a gray area between "fast API" and "safe API".

Examples of unsafe fast APIs vs safe APIs:

  • PyTuple_GET_ITEM() vs PyTuple_GetItem() (runtime type check, check index bounds)
  • _PyThreadState_UncheckedGet() (can return NULL) vs PyThreadState_Get() (fatal error instead of returning NULL)

Differences between "fast API" and "safe API":

  • Fast API usually avoid runtime type check and avoid index bounds check, assertions may be used. Safe API implement runtime checks.
  • Fast API can sometimes implemented as macro or static inline function (ex: access structure members). Safe API is usually implemente as opaque function call.

It's more common for the "fast API" to rely on the compiler to produce efficient code (ex: inline code). Sadly, it may cause more portability issues (compiler not fully implementing a C standard or a specific function).

@vstinner
Copy link
Contributor Author

See issue #31 "The C API is weakly typed" which is related.

@vstinner
Copy link
Contributor Author

One practical problem is the lack of a real "debug mode" to debug bugs or crashes in the unsafe fast API: see issue #36 "Debug mode that can be easily activated without recompilation" for example.

@vstinner
Copy link
Contributor Author

Issue #31 asks also to raise TypeError rather than SystemError in functions like PyTuple_Size(), PyList_Append() or PyDict_GetItemWithError() if the first parameter has an unexpected type (these functions expect respectively: tuple, list, dict; or subtypes of these types).

That's another difference: passing the wrong type to a fast API is treated as a bug in the C extension by calling PyErr_BadInternalCall() which raises a *SystemError.

Whereas generic safe API like PyObject_GetItem() raises TypeError if the argument is not a mapping (nor a sequence). That's not "a bug" but the "normal expected behavior". In Python, you get the same exception when using obj["key"] if obj is not a mapping (nor a sequence).

@vstinner
Copy link
Contributor Author

CPython, Cython and pybind11 would be clear consumers of a fast API: they implement their own efficient checks in the calling site, so calling unsafe API is perfectly fine. Cython is used to use private APIs to get best performance; it even uses the internal API in some places.

For example, CPython uses such fast-path for the dict type:

            if (PyDict_CheckExact(ns))
                err = PyDict_SetItem(ns, name, v);
            else
                err = PyObject_SetItem(ns, name, v);

This code cannot fail because the first parameter has the wrong type, since the caller does check the type.


Consumers of the "Safe API" are all C extensions which use directly the C API, without using higher level API like Cython. This code is sometimes called "hand written" code. Using directly the C API is hard: bugs are common, especially related to reference counting. Any help to reduce the error rate is welcomed. It can be an opt-in debug mode to run a test suite in slower mode to catch some common known bugs.

Since there is no clear separation between the "Fast API" and the "Safe API", this category of users sometimes pick the "Fast API" "by mistake": either because they "believe" that it can make their extension way faster (it's worth it!), or because they didn't notice that a safer API is available.

When I write code with the Python C API, honestly, it's common that I just copy/paste code around without fully understanding its purpose. Since this code "just works", I'm happy that my new code also works. Sometimes, I make tiny changes, and oops, suddenly it does crash or leak a reference :-)

@iritkatriel iritkatriel added the v label Jul 20, 2023
@pitrou
Copy link

pitrou commented Jul 24, 2023

Just a note: if you call it "fast API", everyone will instinctively try to use it. If you call it "intrusive API", people will be more cautious.

You could also call it "unsafe API", but it's only unsafe if you're using it wrong, of course.

It's more common for the "fast API" to rely on the compiler to produce efficient code (ex: inline code). Sadly, it may cause more portability issues (compiler not fully implementing a C standard or a specific function).

Do we care about non-conforming compilers?

@vstinner
Copy link
Contributor Author

About the naming, IMO Unsafe is more a better name than "Fast" or "Unchecked". For example, I don't think that PyUnicodeFastCopyCharacters() was a clever choice. The other existing name is "Unchecked": PyThreadStateUncheckedGet().

"Unsafe" means that "it is unsafe unless you check assumptions in the caller". In many cases, it's easy to check these assumptions. For example, PyDict_GetItem() makes the assumption that you pass a dictionary. (Well, this one currently that the argument type is a dict.) It's common that function creates a dict or store a dict, and so know the argument type.

I prefer to mark unsafe functions as "Unsafe" (in their name) to help people discovering the API that: oh, maybe they should avoid these functions until they fully understand their usage.

The "PyUnstable" API seems to have a different purpose.

@vstinner
Copy link
Contributor Author

Do we care about non-conforming compilers?

I don't think that you should pretend that putting "Fast" in a function name makes it "faster" :-) I'm not convinced that removing a few pre-conditions and avoiding error handling in general has a significant performance difference. Maybe avoiding indirections by using PyDict_GetItem() instead of PyObject_GetItem() makes a larger difference. I don't know. People just have believes and nobody runs benchmarks :-)

@pitrou
Copy link

pitrou commented Jul 24, 2023

Well, if you're accessing all (or most) elements in a list, avoiding checks every time probably makes a difference. If you're doing a single access, then it probably doesn't matter.

@iritkatriel iritkatriel removed the v label Oct 23, 2023
@vstinner
Copy link
Contributor Author

Follow-up issue: capi-workgroup/api-evolution#45. I close this issue since there is no concrete action which can be triggered to enhance the situation. If I manage to design a more concrete action, I will open a new issue.

@vstinner vstinner closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants