Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use retworkx #3810

Closed
wants to merge 44 commits into from
Closed

Use retworkx #3810

wants to merge 44 commits into from

Conversation

mtreinish
Copy link
Member

Summary

This PR switches the graph library used for the DAGCircuit class to use retworkx[1] instead of networkx. This provides a significant speed up over networkx because retworkx is written in rust instead of pure python.

[1] https://github.com/mtreinish/retworkx

Details and comments

Co-Authored-By: Lauren Capelluto [email protected]

mtreinish and others added 16 commits February 6, 2020 11:13
This commit is the start of the conversion from using networkx as the
internal structure of the dagcircuit to use retworkx instead.
The to_networkx() method is used to return a networkx graph object
representing the internal dag structure. Since the previous commit
switched this to retwork's PyDAG class this function was broken. To
enable the continued function of networkx based utilities (mainly the
dag drawer) this commit adds an implementation that iterates over PyDAG
twice (so it's not super efficient at O(2n)) to add the nodes and edges.
While this isn't the most effecient algorithm it works. To speed it up
in the future a function can be added to retworkx to either directly
generate the networkx object or alternatively generate the parameters so
it can be passed to networkx in a single call.
The predecessors and successors methods of the DAGCircuit class were
expecting an iterator returned not a list. Thist commit corrects the
return type by wrapping the list in iter().
The output of retworkx.dag_longest_path() returns the node indexes, not
the nodes themselves. This commit updates the DAGCircuit.longest_path()
method to make sure it's returning DAGNode objects, not the indexes.
The apply operation back conditional tests were using in_edges and
out_edges to check the structure of the dag. However with retworkx the
return type from those functions change to return the node id instead of
the node instance. This was causing the tests to fail because it was
attempting to compare a DAGNode object with an int. This commit fixes
the issue by making comparing DAGNode._node_id to the value of
in_edges() and out_edges().
Using python to deduplicate the output from retworkx in the dagcircuit
class gets increasinly inefficient as the number of nodes increases.
Relying on retworkx to deduplicate moving forward will be more
efficient.
… Update hardcoded test cases to be zero indexed to match retworkx
@mtreinish mtreinish changed the title [WIP] Use retworkx Use retworkx Feb 11, 2020
@ajavadia
Copy link
Member

I can see #2860 uses NetworkX max_weight_matching and floyd_warshall algorithms. Do you have any sense of how easy these are to implement?

#3762 (which is a restricted version of the above) also uses NetworkX's dfs_edges, floyd_warshall_numpy, find_cycle. Plus a bunch of other methods for building simple graphs for testing. Can it continue to use NetworkX (potentially taking the performance hit)?

@mtreinish
Copy link
Member Author

mtreinish commented Feb 11, 2020

I'll have to look into those algorithms to see how they work, but none of the algorithms I've looked at so far have been difficult to implement. We can open issues about those on the retworkx repo. dfs_edges should be very easy to implement since there is builtin traverser in the dag library used in retworkx: https://docs.rs/petgraph/0.5.0/petgraph/visit/struct.Dfs.html . find_cycle doesn't really apply to the PyDAG class because at least for right now it's checking that we don't introduce a cycle whenever an edge is added.

You can continue to use networkx with this pr, if you want it from the dagcircuit class you can do to_networkx() and it will return a networkx MultiDiGraph object like what was there before using retworkx: https://github.com/Qiskit/qiskit-terra/pull/3810/files#diff-feb434f18758792899804d6e18c57707R85-R94 I added this for compatibility with the dag drawer (which uses networkx). It's a one way transformation right now so DAGCircuit->networkx.MultiDiGraph but there is no reverse method for going from networkx to DAGCircuit here.

Copy link
Contributor

@itoko itoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speed up without API change is great.
Looks good to me mostly. Just a few small comments.

qiskit/dagcircuit/dagcircuit.py Outdated Show resolved Hide resolved
qiskit/dagcircuit/dagcircuit.py Show resolved Hide resolved

def quantum_successors(self, node):
"""Returns iterator of the successors of a node that are
connected by a quantum edge as DAGNodes."""
for successor in self.successors(node):
for successor in reversed(list(self.successors(node))):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reversed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed from the test results, I think the traversal order was different or something. But, there was a test case expecting the output in an order that was backwards from what retworkx was doing

Copy link
Member

@kdk kdk Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests are wrong (e.g. asserting an order on an unordered output), it'd be better to fix the tests (and ideally validate that they still pass on networkx) rather than introduce workarounds in the code. If there are too many to tackle at once (which seems likely), can we collect them in an issue to address later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed this and reran the tests locally only one test is failing:

test.python.test_dagcircuit.TestDagOperations.test_quantum_successors
---------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/home/mtreinish/git/qiskit/qiskit-core/test/python/test_dagcircuit.py", line 378, in test_quantum_successors
        self.assertEqual(next(successor_cnot).type, 'out')
      File "/usr/lib64/python3.8/unittest/case.py", line 912, in assertEqual
        assertion_func(first, second, msg=msg)
      File "/usr/lib64/python3.8/unittest/case.py", line 1292, in assertMultiLineEqual
        self.fail(self._formatMessage(msg, standardMsg))
      File "/usr/lib64/python3.8/unittest/case.py", line 753, in fail
        raise self.failureException(msg)
    AssertionError: 'op' != 'out'
    - op
    + out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the failure here is related to @itoko's next comment on the using out_edges vs in_edges for the definition of edges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I tried both updating the edges function and removing the reversed here individually and both introduced failures by themselves. But let me give fixing that func and removing this together and see if I can get that working

qiskit/dagcircuit/dagcircuit.py Outdated Show resolved Hide resolved
@ajavadia
Copy link
Member

Ok #3762 is going to merge soon, can you rebase after that? Hopefully not much needs changing. Then I'll take another look

@mtreinish
Copy link
Member Author

Ok #3762 is going to merge soon, can you rebase after that? Hopefully not much needs changing. Then I'll take another look

Taking a quick look at #3762 I don't think there will be any changes needed, since it only interacts with the dagcircuit api which doesn't change here, but I'll rebase it after it merges and keep an eye on it.

@mtreinish
Copy link
Member Author

I just ran the tests locally after rebasing and everything worked fine without any changes

test/python/test_dagcircuit.py Outdated Show resolved Hide resolved
As @itoko pointed out in review the edges() function in networkx was
using the outgoing edges from the graph not the incoming edges for it's
edges() function. This doesn't normally matter for the case of returning
edges for the entire dagcircuit because the list will be the same given
the structure. But, there was a possible edge case when edges() was
called with an output node or input node that would have behaved
differently. This was not caught in the tests because there are no
places in the code that currently do this, but there are potentially
users of the function relying on this behavior currently so it's better
to mirror the functionality. To ensure this remains consistent a test
case is added to verify input and output nodes behave as expected when
passed into edges().
There was as stary iter() wrapper around the output of
topological_nodes() in the tests. This was leftover from an earlier
iteration of the branch where topological_nodes() returned a list. This
has been fixed (to maintain compatibility with the current api) so the
iter is no longer needed.
@mtreinish
Copy link
Member Author

After our discussion earlier today I've been thinking more about how we can provide a fallback for the next release in case something unforeseen comes up. Besides merging this sooner so we have time to test and debug any issues that arise, I was thinking maybe adding a copy of the existing networkx DAGCircuit class and freezing changes to DAGCircuit's api for the rest of the release. Then when 0.13.0 is released we delete that fallback copy. So instead of having to keep up with changes and having a dual maintenance problem we just agree to not change the api until after the release (or an alternative abstraction layer on top of dagcircuit). Then if a user encounters an unforeseen issue with the new library they can just monkey patch circuit_to_dag and dag_to_circuit to use the fallback class until we get a bugfix release out. Thoughts @kdk @1ucian0 @ajavadia @ewinston ?

@kdk
Copy link
Member

kdk commented Feb 26, 2020

I'm on board with this idea. Two comments:

  • It would be better to have the networkx/retworkx toggle set by say an environment variable. Monkey patching the converters might not be entirely straightforward for users, and requires restarting a session. (We would have to check that the env variable propagates through e.g. parallel_map across platforms.)

  • For the rollout, I'd vote for having 1) announce retworkx as opt-in, 2) default retworkx+deprecate networkx, and 3) remove networkx as separate steps, at one release each. This puts the process relatively on par with our standard deprecation policy and allows some time for issues to arise and be reported.

For the dual maintenance issue, I don't expect it will be a huge problem, DAGCircuit doesn't generally see a ton of activity: https://github.com/Qiskit/qiskit-terra/commits/master/qiskit/dagcircuit/dagcircuit.py . CI coverage would be more difficult though. Maybe convert current CI jobs to retworkx and add one compatibility job for networkx.

@mtreinish
Copy link
Member Author

mtreinish commented Feb 26, 2020

I'm on board with this idea. Two comments:

* It would be better to have the networkx/retworkx toggle set by say an environment variable. Monkey patching the converters might not be entirely straightforward for users, and requires restarting a session. (We would have to check that the env variable propagates through e.g.  `parallel_map` across platforms.)

Sure we can do this, monkey patching just seemed the easiest to me if we documented it in the release note (since it wouldn't require adding anything to the code).

* For the rollout, I'd vote for having 1) announce retworkx as opt-in, 2) default retworkx+deprecate networkx, and 3) remove networkx as separate steps, at one release each. This puts the process relatively on par with our standard deprecation policy and allows some time for issues to arise and be reported.

I still think making it opt-out is a better approach here. This isn't really a deprecation it's a new feature (significantly improved performance) and we don't normally go through such a conservative process for new features. There will always be bugs (and often are in new features), but I agree that since this is such a core piece of terra being a bit more conservative makes sense. So I'd rather we give users a relief valve in case we're missing something and they encounter a bug. So we don't push out a release that breaks someone with no alternative. If we make it opt-in I doubt most users will ever see it until we flip the switch for real and then we're just delaying finding any potential bugs we're missing until that future date.

For the dual maintenance issue, I don't expect it will be a huge problem, DAGCircuit doesn't generally see a ton of activity: https://github.com/Qiskit/qiskit-terra/commits/master/qiskit/dagcircuit/dagcircuit.py . CI coverage would be more difficult though. Maybe convert current CI jobs to retworkx and add one compatibility job for networkx.

The reason I'm concerned about dual maintenance is that there are at least 2 or 3 PRs open right now besides this one which touch the DAGCircuit API: #3772, #3503, and #3860 (or docs in the case of this one). It's true in general there aren't many changes but I'd rather just have us make sure we're agreeing to minimize changes during the period of having multiple copies. Otherwise we're basically making ourselves duplicate all the work

@kdk
Copy link
Member

kdk commented Feb 27, 2020

* For the rollout, I'd vote for having 1) announce retworkx as opt-in, 2) default retworkx+deprecate networkx, and 3) remove networkx as separate steps, at one release each. This puts the process relatively on par with our standard deprecation policy and allows some time for issues to arise and be reported.

I still think making it opt-out is a better approach here. This isn't really a deprecation it's a new feature (significantly improved performance) and we don't normally go through such a conservative process for new features. There will always be bugs (and often are in new features), but I agree that since this is such a core piece of terra being a bit more conservative makes sense.

I think that's the key concern. Unlike adding a new feature, this is a restructuring that affects nearly every qiskit user.

So I'd rather we give users a relief valve in case we're missing something and they encounter a bug. So we don't push out a release that breaks someone with no alternative. If we make it opt-in I doubt most users will ever see it until we flip the switch for real and then we're just delaying finding any potential bugs we're missing until that future date.

The goal of a release of opt-in is increasing exposure beyond the terra development team, to the broader community and those with an active performance problem that could be addressed by retworkx. True there will always be bugs, but a rollout helps us gauge rough scale and severity, and more importantly, to learn early in the transition if there are users or use cases for which the new structure is difficult or not viable. From all the work on retworkx, we don't expect it will be, but there will always be unforeseen issues.

For the dual maintenance issue, I don't expect it will be a huge problem, DAGCircuit doesn't generally see a ton of activity: https://github.com/Qiskit/qiskit-terra/commits/master/qiskit/dagcircuit/dagcircuit.py . CI coverage would be more difficult though. Maybe convert current CI jobs to retworkx and add one compatibility job for networkx.

The reason I'm concerned about dual maintenance is that there are at least 2 or 3 PRs open right now besides this one which touch the DAGCircuit API: #3772, #3503, and #3860 (or docs in the case of this one). It's true in general there aren't many changes but I'd rather just have us make sure we're agreeing to minimize changes during the period of having multiple copies. Otherwise we're basically making ourselves duplicate all the work

There are a few ways we can manage dual implementations, but I wouldn't want to forgo a rollout to ease this process. (None of those three PRs are urgent.)

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Mar 3, 2020
This commit defines the __slots__ property for the base Bit class.
__slots__ is used to explicitly declare the members of a class. Which
means that internally the class no longer is using __dict__ and reduces
both memory and speed overhead associated with that. [1] After
switching to retworkx in Qiskit#3810 bit.__eq__ becomes the biggest
bottleneck in performance. This will improve the runtime performance of
that.

[1] https://docs.python.org/3/reference/datamodel.html?#slots
mergify bot pushed a commit that referenced this pull request Mar 6, 2020
* Define __slots__ for bit class

This commit defines the __slots__ property for the base Bit class.
__slots__ is used to explicitly declare the members of a class. Which
means that internally the class no longer is using __dict__ and reduces
both memory and speed overhead associated with that. [1] After
switching to retworkx in #3810 bit.__eq__ becomes the biggest
bottleneck in performance. This will improve the runtime performance of
that.

[1] https://docs.python.org/3/reference/datamodel.html?#slots

* Regenerate pickles for new class format
@mtreinish
Copy link
Member Author

This has been superseded by #3916 which makes the DAGCircuit switchable and defaults to retworkx. This will give users an option to fallback to networkx for the 0.13.0 release. So I'm closing this I won't delete the branch though to keep this as a reference.

@mtreinish mtreinish closed this Mar 9, 2020
faisaldebouni pushed a commit to faisaldebouni/qiskit-terra that referenced this pull request Aug 5, 2020
* Define __slots__ for bit class

This commit defines the __slots__ property for the base Bit class.
__slots__ is used to explicitly declare the members of a class. Which
means that internally the class no longer is using __dict__ and reduces
both memory and speed overhead associated with that. [1] After
switching to retworkx in Qiskit#3810 bit.__eq__ becomes the biggest
bottleneck in performance. This will improve the runtime performance of
that.

[1] https://docs.python.org/3/reference/datamodel.html?#slots

* Regenerate pickles for new class format
@mtreinish mtreinish deleted the use-retworkx branch August 11, 2020 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants