Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manual iterator implementations for custom vec iterables #1107

Merged
merged 6 commits into from
Feb 21, 2024

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented Feb 21, 2024

This adds manual Iterator implementors for the custom vec Iterable pyclasses, which were previously using the Python default iterator implementation that attempts calls to __getitem__ with successive integers until an IndexError is raised.

Doing this means that we no longer have any overhead from converting Python-space integer indices to Rust-space ones, because we just store a pointer to the backing vec internally and yield from that using an index we track ourselves.


Using the same setup code as discussed in #1096, with a manual iteration through afterwards:

import rustworkx as rx

N = 1_000_000
g = rx.PyGraph()
g.add_nodes_from(list(range(N)))
nodes = g.node_indexes()

%timeit for _ in nodes: pass

On main (which includes #1096), the timeit measured the iteration at ~78ms, and with this PR it dropped to ~48ms on my machine.

I didn't add all the new iterator structs to the typing information concretely, instead just referencing them opaquely as Iterator[T], because these objects only implement magic protocol methods and are not intended to be constructable / picklable / nameable from Python space at all (similar to how list_iterator isn't as a builtin). Open to changing that if preferred.

I'll mark this as:

but feel free to edit this to remove that link if you think there's more places to look for performance here.

This adds manual `Iterator` implementors for the custom vec `Iterable`
pyclasses, which were previously using the Python default iterator
implementation that attempts calls to `__getitem__` with successive
integers until an `IndexError` is raised.

Doing this means that we no longer have any overhead from converting
Python-space integer indices to Rust-space ones, because we just store
a pointer to the backing vec internally and yield from that using an
index we track ourselves.
@mtreinish mtreinish added this to the 0.15.0 milestone Feb 21, 2024
@coveralls
Copy link

coveralls commented Feb 21, 2024

Pull Request Test Coverage Report for Build 7995769315

Details

  • -18 of 93 (80.65%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.09%) to 96.517%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/iterators.rs 75 93 80.65%
Totals Coverage Status
Change from base Build 7982100393: -0.09%
Covered Lines: 16821
Relevant Lines: 17428

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, it's basically following what we did for mapping structs already with keys, values, and items but does it for the vec structs too. The performance improvements are really nice here too.

My only 2 thoughts are do you want to add a test using iter() explicitly in https://github.com/Qiskit/rustworkx/blob/main/tests/test_custom_return_types.py and also maybe a release not documenting the new feature?

@jakelishman
Copy link
Member Author

Sure, I can add those. Flicking quickly through the tests now, I suppose there's a question of whether we want to implement more efficient reversed iterators as well. That's generated separately to the sequence-protocol iterator, so what I've done here doesn't affect reversed(g.node_indices()) nor reversed(iter(g.node_indices()) (which continues to not work, which is valid behaviour).

I can throw in a reversed iterator for each of these if you like as well, but it equally might not be worth whatever increase in compilation times it comes with?

@jakelishman
Copy link
Member Author

jakelishman commented Feb 21, 2024

Actually: what kind of iter tests were you thinking? I deliberately made the actual iterator type unnameable from Python, so I can't really isinstance check it or anything like that, and there shouldn't be any real testable behaviour that differs from the implicit sequence-protocol iterator. Any test case that iterates through one of these nodes (which many existing tests do) is already calling iter on them, it's just implicit within CPython.

@jakelishman
Copy link
Member Author

Release node added in 8f9a991.

This matches the Python behaviour for the implicit sequence-protocol
iterators; they have only `__length_hint__` and not `__len__`.
@mtreinish
Copy link
Member

mtreinish commented Feb 21, 2024

Actually: what kind of iter tests were you thinking? I deliberately made the actual iterator type unnameable from Python, so I can't really isinstance check it or anything like that, and there shouldn't be any real testable behaviour that differs from the implicit sequence-protocol iterator. Any test case that iterates through one of these nodes (which many existing tests do) is already calling iter on them, it's just implicit within CPython.

I was just thinking of something that explicitly exercises it, mostly to say we did. I wasn't expecting a type check, I think the existing tests we have for this on mappings do something like:

test = iter(object)
result = list(test)
self.assertEqual(result, expected)

It might just be duplicate coverage as that implicitly happens in other tests, but I didn't see the harm in at least validating it explicitly

@mtreinish
Copy link
Member

I can throw in a reversed iterator for each of these if you like as well, but it equally might not be worth whatever increase in compilation times it comes with?

That seems worth it to me, personally I'm not too worried about rustworkx compile times. It might slow down development and CI a little bit, but we're not really bottlenecked on that (if it came down to it we could leverage debug mode a lot more and tweak things to optimize more for build speed). But, making sure the library is higher performance for python users is in general a high priority, so having faster reversed() iterator implementations makes sense to me.

@jakelishman
Copy link
Member Author

Ok, I've added the reversed iterators in 5f60d9e and explicit tests of both iter and reversed in b68aa5f. I tweaked the iter tests slightly so they compare list(x) == list(iter(x)), which doubles it as a test that the iterable is multiply consumable.

The timings of the reversed iterators are the same as the timings as the forwards iterators: this PR took the million-node NodeIndices iteration (with a reversed() around it) from about 76ms to about 47ms this time when I ran them.

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this.

.borrow(py)
.$data
.len()
.saturating_sub(self.index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a good trick to comply with __length_hint__ not returning a negative. Although as a usize I guess without this in release mode we would just overflow which wouldn't result in a ValueError although the hint would be wildly inaccurate (although PEP424 says the return is "not required to be accurate") for an empty or consumed iterator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wrote this initially for __len__ (which is required to be accurate) and reused it for __length_hint__, but then I remembered / realised that iterators don't typically implement __len__ so just renamed the method.

@mtreinish mtreinish merged commit bf3f703 into Qiskit:main Feb 21, 2024
25 checks passed
@jakelishman jakelishman deleted the custom-iter-manual branch February 21, 2024 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue with custom_vec_iter_impl! NodeIndices
3 participants