Performance of FromPyObject::extract #2968

hombit · 2023-02-20T20:31:32Z

hombit
Feb 20, 2023

I'm working on a Python package which uses numpy arrays as a primary input/output objects. I need to support generic numpy float arrays, so I implemented a simple enum:

#[derive(FromPyObject)]
pub enum GenericFloatArray1<'a> {
    #[pyo3(transparent, annotation = "np.ndarray[float32]")]
    Float32(Arr<'a, f32>),
    #[pyo3(transparent, annotation = "np.ndarray[float64]")]
    Float64(Arr<'a, f64>),
}

https://github.com/light-curve/light-curve-python/blob/abaa7d0970ce83dd955c0992a1bafddb768629b8/light-curve/src/np_array.rs#L7-L13
https://github.com/hombit/pyo3-enum-frompyobject-bench/blob/76f279e1d4782dc55770bdd252294f198c1801d4/src/lib.rs#L7-L13

However I've found that inter-op overhead with my Python package is pretty large. Do debug this issue I created a repository where I benchmark #[derive(FromPyObject)] versus other extraction strategies. Their I have one more enum for testing:

#[derive(FromPyObject)]
pub enum Collection<'a> {
    #[pyo3(transparent, annotation = "list")]
    List(&'a PyList),
    #[pyo3(transparent, annotation = "tuple")]
    Tuple(&'a PyTuple),
    #[pyo3(transparent, annotation = "set")]
    Set(&'a PySet),
}

https://github.com/hombit/pyo3-enum-frompyobject-bench/blob/76f279e1d4782dc55770bdd252294f198c1801d4/src/lib.rs#L15-L23

In both cases I've found that ::extract on these enums is very slow when the input object doesn't correspond to the first enum variant. The performance is different by the factor of ~100! For instance:

// x is a PyObject

// x is a list
let y: &PyList = x.downcast(py)?; // 961.09 ps
let y: Collection = x.extract(py)?; // 5.1340 ns

// x is a set
let y: &PySet = x.downcast(py)?; // 1.2817 ns
let y: Collection = x.extract(py)?; // 650.84 ns
let y = if let Ok(x) = x.extract(py) {
    Collection::List(x)
} else if let Ok(x) = x.extract(py) {
    Collection::Tuple(x)
} else if let Ok(x) = x.extract(py) {
    Collection::Set(x)
} else {
    return Err(PyErr::new::<PyTypeError, _>(
        "Expected a list, tuple or set",
    ));
}; // 48.850 ns

See full code here, you could run it with cargo bench from the root of the repo

As you could see even a manual implementation of the dynamic dispatch of this Collection object is ~10 times faster than the derived one.

I've found an related discussion in #2278, but it is still unclear what I can do to make my code faster.

Answered by adamreichold

Feb 20, 2023

This is a known foot gun of how the FromPyObject trait is defined insofar it has to return a PyResult whereas downcast returns a Result<_, PyDowncastErr>. Wrapping up the PyDowncastErr into a PyErr is the cost you are seeing.

The best workaround at the moment is to manually call downcast in if-let-else-if-let chains. Note that PyReadonlyArray is not a valid target for downcast though, i.e. you would need to downcast into PyArray first and then manually call readonly on it.

View full answer

adamreichold · 2023-02-20T20:46:14Z

adamreichold
Feb 20, 2023
Maintainer

This is a known foot gun of how the FromPyObject trait is defined insofar it has to return a PyResult whereas downcast returns a Result<_, PyDowncastErr>. Wrapping up the PyDowncastErr into a PyErr is the cost you are seeing.

The best workaround at the moment is to manually call downcast in if-let-else-if-let chains. Note that PyReadonlyArray is not a valid target for downcast though, i.e. you would need to downcast into PyArray first and then manually call readonly on it.

6 replies

hombit Feb 20, 2023
Author

Ah, right, I cannot accept PyArray in my function because I don't know the concrete dtype...

adamreichold Feb 20, 2023
Maintainer

I would expect your #[pyfunction]s to accept &PyAny which you then downcast to e.g. &PyArray1<f32> and then call .readonly() on this.

If you only need to accept a single type of value, you can of course also have your #[pyfunction] take &PyArray1<f32> directly. But you would not have used #[derive(FromPyObject)] enum in that case anyway so I am not sure this is relevant? (And there is no way to accept something like PyArray1<f32_or_f64>.)

hombit Feb 20, 2023
Author

Yes, you are right, I need polymorphism for the dtype

davidhewitt Feb 21, 2023
Maintainer

Indeed, I think there's a possible opportunity for something like PyAnyArray which has no generic parameters (neither T nor dimensions). IIRC pybind11 supports this (might have just called it py::array, and had py::array_t<T> for the typed case).

hombit Feb 21, 2023
Author

The idea of a generic array is great and useful for many purposes. However, it still would be great to have a faster implementation of #[derive(FromPyObject)] which wouldn't create expensive Python exception objects which are not used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of FromPyObject::extract #2968

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Performance of FromPyObject::extract #2968

hombit Feb 20, 2023

Replies: 1 comment · 6 replies

adamreichold Feb 20, 2023 Maintainer

hombit Feb 20, 2023 Author

adamreichold Feb 20, 2023 Maintainer

hombit Feb 20, 2023 Author

davidhewitt Feb 21, 2023 Maintainer

hombit Feb 21, 2023 Author

hombit
Feb 20, 2023

Replies: 1 comment 6 replies

adamreichold
Feb 20, 2023
Maintainer

hombit Feb 20, 2023
Author

adamreichold Feb 20, 2023
Maintainer

hombit Feb 20, 2023
Author

davidhewitt Feb 21, 2023
Maintainer

hombit Feb 21, 2023
Author