Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom iterator class for BFS successors return #185

Merged
merged 9 commits into from
Nov 11, 2020

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Oct 28, 2020

This commit changes the return type of the bfs_successors function to be
a custom class BFSSuccessors. This new return class implements both the
sequence protocol and iterator protocol. This means that aside from
explicit type checking it should be backwards compatible with the list
being previously returned. It can be used with either index based access
or iterated over.

This should be more efficient for large graphs because instead of doing
the copy and type conversion and iterating over the entire nested Vec of
results it instead does it per access (either via __getitem__ or
__next__). It does add a small amount of overhead for smaller graphs but
it is minimal since the function returns in microseconds in such cases
so a 10-20% overhead is not a big dea).

It's worth noting while this defers the type conversion, it does not
defer execution like most python iterators normally do. When
bfs_successors is called it will still always fully traverse the graph.
However, in practice the bottleneck for the bfs_successor function
wasn't actually the graph traversal, but instead the type conversion.

Related to #71

This commit changes the return type of the bfs_successors function to be
a custom class BFSSuccessors. This new return class implements both the
sequence protocol and iterator protocol. This means that aside from
explicit type checking it should be backwards compatible with the list
being previously returned. It can be used with either index based access
or iterated over.

This should be more efficient for large graphs because instead of doing
the copy and type conversion and iterating over the entire nested Vec of
results it instead does it per access (either via __getitem__ or
__next__). It does add a small amount of overhead for smaller graphs but
it is minimal since the function returns in microseconds in such cases
so a 10-20% overhead is not a big deal.

It's worth noting while this defers the type conversion, it does not
defer execution like most python iterators normally do. When
bfs_successors is called it will still always fully traverse the graph.
However, in practice the bottleneck for the bfs_successor function
wasn't actually the graph traversal, but instead the type conversion.

Related to Qiskit#71
@coveralls
Copy link

coveralls commented Oct 28, 2020

Pull Request Test Coverage Report for Build 358619898

  • 18 of 18 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.03%) to 94.46%

Totals Coverage Status
Change from base Build 356454191: 0.03%
Covered Lines: 2660
Relevant Lines: 2816

💛 - Coveralls

@mtreinish mtreinish added this to the 0.6.0 milestone Oct 28, 2020
@mtreinish mtreinish mentioned this pull request Oct 28, 2020
Copy link
Member

@kdk kdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Out of curiosity, would it be more natural to implement either the sequence protocol (keeping the lazy copy and type conversion) or the iterator protocol? Implementing both seems better for backwards compatibility, but it seems like the differences in the two protocols might lead non-obvious behavior (see one comment below.)

tests/test_pred_succ.py Show resolved Hide resolved
src/iterators.rs Show resolved Hide resolved
@mtreinish
Copy link
Member Author

This looks good. Out of curiosity, would it be more natural to implement either the sequence protocol (keeping the lazy copy and type conversion) or the iterator protocol? Implementing both seems better for backwards compatibility, but it seems like the differences in the two protocols might lead non-obvious behavior (see one comment below.)

That was my first thought on this PR too, and I started with just that. But IIRC the issue wass that just the sequence protocol doesn't actually work with iter(BFSSuccessors). But I was trying a couple of different things at once, and I might be misremembering. Let me try dropping the iter protocol and see what happens.

@mtreinish
Copy link
Member Author

Ok, yeah I took a look at this locally and yeah we can remove the iter protocol implementation and it will still work. You'll need to call iter() around the return to use it as an iterator (which is probably where I got confused before). With that it works fine. I'll make this change in a sec (just need to clean up the docs).

Using the sequence protocol we can still get an implicit iterator by
just casting it on the python side. This will still get us the lazy type
conversion but simplify the api and also make the behavior more
consistent.

At the same time to ensure we're handling negative indices correctly a
test method is added to verify that a negative index access to the
sequence works as expected.
src/lib.rs Outdated Show resolved Hide resolved
Co-authored-by: Kevin Krsulich <[email protected]>
@mtreinish mtreinish merged commit 635bf2e into Qiskit:master Nov 11, 2020
@mtreinish mtreinish deleted the test-bfs_iterator branch November 11, 2020 22:55
mtreinish added a commit to mtreinish/retworkx that referenced this pull request Nov 16, 2020
This commit adds a new return type NodeIndices that is used as the
return type for functions that return Vec<usize> when its used as the
return type to return a list of node indices. This custom type implements
the Python sequence protocol and can be used as the list which was
previously returned except for where explicit type checking was used.
Just as in Qiskit#185 the advantage here is that for large lists of node
indices the type conversion from a Vec<usize> to list(int) becomes a
large bottleneck. This avoids that conversion and only does a usize to
python int conversion on access.

Related to Qiskit#71
mtreinish added a commit that referenced this pull request Nov 19, 2020
* Add NodeIndices return type

This commit adds a new return type NodeIndices that is used as the
return type for functions that return Vec<usize> when its used as the
return type to return a list of node indices. This custom type implements
the Python sequence protocol and can be used as the list which was
previously returned except for where explicit type checking was used.
Just as in #185 the advantage here is that for large lists of node
indices the type conversion from a Vec<usize> to list(int) becomes a
large bottleneck. This avoids that conversion and only does a usize to
python int conversion on access.

Related to #71

* Add tests for comparison

* Apply suggestions from code review

Co-authored-by: Lauren Capelluto <[email protected]>
mtreinish added a commit to mtreinish/retworkx that referenced this pull request Nov 19, 2020
This commit adds 2 new return types EdgeList and WeightedEdgeList
that are used as the return type for functions that return
Vec<(usize, usize)> and Vec<(usize, usize, PyObject)> respectively.
This custom type implements the Python sequence protocol and can be
used as the list which was previously returned except for where
explicit type checking was used. Just as in Qiskit#185 and Qiskit#198 the
advantage here is that for large edge lists the conversion from
Rust to a Python type becomes a large bottleneck. This avoids that
conversion and only does it per element on access.
mtreinish added a commit that referenced this pull request Dec 2, 2020
* Add EdgeList and WeightedEdgeList return types

This commit adds 2 new return types EdgeList and WeightedEdgeList
that are used as the return type for functions that return
Vec<(usize, usize)> and Vec<(usize, usize, PyObject)> respectively.
This custom type implements the Python sequence protocol and can be
used as the list which was previously returned except for where
explicit type checking was used. Just as in #185 and #198 the
advantage here is that for large edge lists the conversion from
Rust to a Python type becomes a large bottleneck. This avoids that
conversion and only does it per element on access.

* Add new return types to docs

* Apply suggestions from code review

Co-authored-by: Lauren Capelluto <[email protected]>

* Fix docstring mistakes

* Update in_edges and out_edges to use WeightedEdgeList
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants