Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce trampolines for methods #2705

Merged
merged 6 commits into from
Oct 25, 2022
Merged

Conversation

davidhewitt
Copy link
Member

This is a refactoring to the generated #[pyfunction] and #[pymethods] with the intention of reducing compile time and binary size.

The motivation for the change is that cargo llvm-lines currently reports many monomorphizations of std::panicking::try::do_call and friends being instantiated - in fact, it's one per #[pyfunction] and every method in #[pymethods], which is quite a lot of instantiations.

The change follows a similar idea to #2478 changing extract_argument to make use of "dynamic dispatch". Here, we create "trampoline" wrappers for all possible C-API function pointers we expose to Python. These wrappers take a function pointer to the real implementation of the function in question, and call it after starting the std::panic::catch_unwind machiner and creating a GILPool.

Much like in #2704 I suspect LLVM can do a pretty good job at optimizing out the dynamic dispatch. I saw no change at all in benchmarks for the pytests crate. So overall I believe this should have a compile-time win with little-to-no impact on runtime performance.

For our pytests crate I get a 10% reduction in cargo llvm-lines count, and for real-world use cases potentially with many small function wrappers (e.g. getters / setters) I suspect the reduction could be even greater.

Detail-wise, for some method MyClass::foo, the generated code change from a single C-ABI symbol:

unsafe extern "C" fn __pymethod_foo__(
    slf: *mut ffi::PyObject,
    args: *const *mut ffi::PyObject,
    nargs: ffi::Py_ssize_t,
    kwnames: *mut ffi::PyObject,
) -> *mut ffi::PyObject {
    let gil = GILPool::new();
    let py = gil.python();
    pyo3::callback::panic_result_into_callback_output(
        py,
        panic::catch_unwind(move || -> PyResult<_> {
            /* implementation */
        }),
    )
}

to an C-ABI symbol which just calls the trampoline wrapper with a pointer to the actual implementation.

unsafe extern "C" fn trampoline(
    slf: *mut ffi::PyObject,
    args: *const *mut ffi::PyObject,
    nargs: ffi::Py_ssize_t,
    kwnames: *mut ffi::PyObject,
) -> *mut ffi::PyObject
{
    impl_::trampoline::fastcall_with_keywords(
        slf,
        args,
        nargs,
        kwnames,
        MyClass::__pymethod_foo__,
    )
}
unsafe fn __pymethod_foo__<'py>(
    py: Python<'py>,
    slf: *mut ffi::PyObject,
    args: *const *mut ffi::PyObject,
    nargs: ffi::Py_ssize_t,
    kwnames: *mut ffi::PyObject,
) -> PyResult<*mut ffi::PyObject> {
    /* implementation */
}

src/impl_/pyclass.rs Outdated Show resolved Hide resolved
@adamreichold
Copy link
Member

It is not really related, but I had a look while reviewing: Doesn't panic_result_into_callback_output need a PanicTrap as well since PanicException::from_panic_payload drops the payload which could make a panic in its Drop impl reach the FFI boundary?

Copy link
Member

@adamreichold adamreichold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great code bloat reduction! A nit and an unrelated question, but this looks good to me.

@davidhewitt
Copy link
Member Author

It is not really related, but I had a look while reviewing: Doesn't panic_result_into_callback_output need a PanicTrap as well since PanicException::from_panic_payload drops the payload which could make a panic in its Drop impl reach the FFI boundary?

Ahh, yes, when working on PanicTrap was one of the moments when I was thinking a refactor like this could be in order... Then I forgot to install a PanicTrap here. Let me push 😆

@@ -571,21 +558,66 @@ impl<'a> FnSpec<'a> {
CallingConvention::Noargs => quote! {
_pyo3::impl_::pymethods::PyMethodDef::noargs(
#python_name,
_pyo3::impl_::pymethods::PyCFunction(#wrapper),
_pyo3::impl_::pymethods::PyCFunction({
unsafe extern "C" fn trampoline(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool not to have to repeat the (almost) identical signature here and in the trampoline fn. But the required macro trickery isn't worth it, likely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, however I'd like to defer it to a follow-up PR sometime (there's a fair bit of refactoring which could improve the macro code, and I'd like it not to distract readers of this PR from the behavioural change).

@davidhewitt davidhewitt merged commit 747d791 into PyO3:main Oct 25, 2022
@davidhewitt davidhewitt deleted the trampoline branch October 25, 2022 06:22
bors bot added a commit that referenced this pull request May 9, 2023
3029: use dynamic trampoline for all getters and setters r=adamreichold a=davidhewitt

This is an extension to the "trampoline" changes made in #2705 to re-use a single trampoline for all `#[getter]`s (and similar for all `#[setters]`). It works by setting the currently-unused `closure` member of the `ffi::PyGetSetDef` structure to point at a new `struct GetSetDefClosure` which contains function pointers to the `getter` / `setter` implementations.

A universal trampoline for all `getter`, for example, then works by reading the actual getter implementation out of the `GetSetDefClosure`.

Advantages of doing this:
- Very minimal simplification to the macro code / generated code size. It made a 4.4% reduction to `test_getter_setter` generated size, which is an exaggerated result as most code will probably have lots of bulk that isn't just the macro code.

Disadvantages:
- Additional level of complexity in the `getter` and `setter` trampolines and accompanying code.
- To keep the `GetSetDefClosure` objects alive, I've added them to the static `LazyTypeObject` inner.
- Very slight performance overhead at runtime (shouldn't be more than an additional pointer read). It's so slight I couldn't measure it.

Overall I'm happy to either merge or close this based on what reviewers think!

Co-authored-by: David Hewitt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants