-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with custom_vec_iter_impl!
NodeIndices
#1090
Comments
In general there is a tradeoff we make with Basically when you return Now that all being said we added those custom return types are something we added years ago, and PyO3 was much less mature back then. It is entirely possible that the overhead of creating the Python |
The I have a pretty good test environment set up, so I can do some checks on just a single iteration of the indices for a large number of items. |
I ran a non-scientific experiment on my machine and I believe the optimizations Matthew introduced still hold true. Here is the code for getting the times:
Attempt 1I left
The results were:
Atempt 2I modified the
The results were:
CorollaryReturning a Python list from Rust is slow. So to make a function return faster it makes sense. However, accessing a specific element of the Rust object is very slow comparing to accessing a specific element of a Python list. |
I think we should definitely profile the Line 540 in 52a4d05
We need to see where it spends time and maybe work with the maintainers of PyO3 to see if we can make it faster. The current overhead is very high |
It turns out that having [derive(FromPyObject)]
enum SliceOrInt {
Slice(PySlice),
Int(...),
} means that PyO3 attempts to downcast to Flipping the order in the enum for me switched I think another change to potentially consider is adding custom direct #[pyclass]
struct NodeIndicesIter {
base: Py<NodeIndices>,
index: usize,
}
#[pymethods]
impl NodeIndicesIter {
fn __next__(&mut self, py: Python) -> PyResult<Py<PyAny>> {
// ...
}
fn __iter__(&self) -> Self { self } which avoids any need to convert Python object inputs during the function inputs. I haven't timed that, but I could do if it's something you'd want to consider. |
We have support like that with the mapping type macro right now (to get keys, values, and items iterators), so I think it'd be a good idea to add it to the vec ones too. I expect that will be a bit of a speed boost for a lot of cases since we avoid more python/pyo3 overhead by going through My unrelated idea to also speed things up here was maybe to cache the output |
While working on
DAGDependencyV2
in Qiskit, I was usingdigraph.predecessor_indices
anddigraph.predecessors
and noticedpredecessors
was significantly faster thanpredecessor_indices
. Looking at thedigraph
code, it appearedpredecessor_indices
was simpler and therefore should have been faster.I noticed that
predecessors
returned aVec
andpredecessor_indices
returnedNodeIndices
. I modified the latter function to returnVec<usize>
and the tests I was running forDAGDependencyV2
went from 74 sec forNodeIndices
to 12 sec forVec<usize>
constructing an equivalentDAGDependencyV2
.Would appreciate any comments on why this is happening and whether this might be a wider problem.
Main code
Modified code
The text was updated successfully, but these errors were encountered: