-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove recursion from ConstrainedReschedule pass #10051
Conversation
The ConstrainedReschedule pass previosuly was using a recursive depth first traversal to push back overlapping gates after aligning operations. This however would cause a failure for a sufficiently large circuit when the recursion depth could potentially exceed the maximum stack depth allowed in python. To address this, this commit rewrites the depth first traversal to be iterative instead of recursive. This removes the stack depth limitation and should let the pass run with any size circuit. However, the performance of this pass is poor for large circuits. One thing we can look at using to try and speed it up is rustworkx's dfs_search() function which will let us shift the traversal to rust and call back to python to do the timing offsets. If this is insufficient we'll have to investigate a different algorithm for adjusting the time that doesn't require multiple iterations like the current approach. Fixes Qiskit#10049
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 4860848848
💛 - Coveralls |
bd64788
to
81cda8c
Compare
So far I have only taken a quick look at the old and the new code and was about to comment that I did not see "visited node tracking" in the old code, and it's already removed from the new code. Modulo that, it seems quite a straightforward way to convert a recursive approach into a stack-based approach (but I would like to look at the code more carefully). Hmm, because there is no visited node tracking, it seems that some nodes might be examined multiple times, leading to a potentially quadratic complexity, does this make sense? That is, we don't have cycles but we still have reconvergent paths. Would a more bfs-like approach make the complexity linear? |
Yeah, I'm not sure about the visited node handling. I've been oscillating on whether to include it or not. I removed it earlier today because the unit tests didn't cover the lines so I briefly thought it wasn't necessary. But then I wasn't sure and added it back. As for using a BFS approach instead I wasn't sure, I assumed the original pass used a DFS for a reason and tried to replicate it as closely as I could and just remove the recursion. But, maybe @nkanazawa1989 has more insights here as to the rationale behind the implementation of the pass. |
This commit rewrites the pass to leverage rustworkx's dfs_search function which provides a way to have rustworkx traverse the graph in a depth first manner and then provides hook points to execute code at different named portions of the DFS. By leveraging this function we're able to speed up the search by leveraging rust to perform the actual graph traversal.
This made performance of the pass worse so reverting this for now. We can investigate this at a later date. This reverts commit bd3cbb2.
Seems like this is okey for practical cases, e.g. T1 experiment with variable delay, but maybe we can consider some edge cases. Some node may be shifted once due to qreg overlap with certain node, then it might be shifted again by creg overlap with different node. I cannot write good test case immediately, but I'm curious if visited node logic works as expected in this situation. |
I have experimented a bit with both implementations, and so far could not find a single example where Note that the main loop in
I think that simply removing the "visited" logic should be equivalent to the old code (including the possibility that the same node might appear in |
Ok, since my goal for this PR was to keep the behavior the same and just remove the recursion I'll remove the visited logic again to keep it exactly the same. I fully agree we need to revisit the logic in this pass for performance/scaling because it's quite slow, the reproduce example I wrote in #10049 takes ~20min to run this pass with this PR (which is better than a recursion error but still not great) but given the release crunch we can do that for 0.25.0. |
This commit removes the visited node check and skip logic from the DFS traversal. To ensure this code behaves identically to the recursive version before this PR this logic is removed because there wasn't a similar check in that version.
Ok, I removed the visited node check in: d45555d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
* Remove recurssion from ConstrainedReschedule pass The ConstrainedReschedule pass previosuly was using a recursive depth first traversal to push back overlapping gates after aligning operations. This however would cause a failure for a sufficiently large circuit when the recursion depth could potentially exceed the maximum stack depth allowed in python. To address this, this commit rewrites the depth first traversal to be iterative instead of recursive. This removes the stack depth limitation and should let the pass run with any size circuit. However, the performance of this pass is poor for large circuits. One thing we can look at using to try and speed it up is rustworkx's dfs_search() function which will let us shift the traversal to rust and call back to python to do the timing offsets. If this is insufficient we'll have to investigate a different algorithm for adjusting the time that doesn't require multiple iterations like the current approach. Fixes #10049 * Use rustworkx's dfs_search instead of manual dfs implementation This commit rewrites the pass to leverage rustworkx's dfs_search function which provides a way to have rustworkx traverse the graph in a depth first manner and then provides hook points to execute code at different named portions of the DFS. By leveraging this function we're able to speed up the search by leveraging rust to perform the actual graph traversal. * Revert "Use rustworkx's dfs_search instead of manual dfs implementation" This made performance of the pass worse so reverting this for now. We can investigate this at a later date. This reverts commit bd3cbb2. * Remove visited node check from DFS This commit removes the visited node check and skip logic from the DFS traversal. To ensure this code behaves identically to the recursive version before this PR this logic is removed because there wasn't a similar check in that version. (cherry picked from commit 112bd6e)
* Remove recurssion from ConstrainedReschedule pass The ConstrainedReschedule pass previosuly was using a recursive depth first traversal to push back overlapping gates after aligning operations. This however would cause a failure for a sufficiently large circuit when the recursion depth could potentially exceed the maximum stack depth allowed in python. To address this, this commit rewrites the depth first traversal to be iterative instead of recursive. This removes the stack depth limitation and should let the pass run with any size circuit. However, the performance of this pass is poor for large circuits. One thing we can look at using to try and speed it up is rustworkx's dfs_search() function which will let us shift the traversal to rust and call back to python to do the timing offsets. If this is insufficient we'll have to investigate a different algorithm for adjusting the time that doesn't require multiple iterations like the current approach. Fixes #10049 * Use rustworkx's dfs_search instead of manual dfs implementation This commit rewrites the pass to leverage rustworkx's dfs_search function which provides a way to have rustworkx traverse the graph in a depth first manner and then provides hook points to execute code at different named portions of the DFS. By leveraging this function we're able to speed up the search by leveraging rust to perform the actual graph traversal. * Revert "Use rustworkx's dfs_search instead of manual dfs implementation" This made performance of the pass worse so reverting this for now. We can investigate this at a later date. This reverts commit bd3cbb2. * Remove visited node check from DFS This commit removes the visited node check and skip logic from the DFS traversal. To ensure this code behaves identically to the recursive version before this PR this logic is removed because there wasn't a similar check in that version. (cherry picked from commit 112bd6e) Co-authored-by: Matthew Treinish <[email protected]>
@mtreinish, Ouch! I was completely convinced that this PR keeps the behavior of Here is one such example:
The difference comes from when the As far as I can judge, there is a simple fix, since both for the original code and the new code we can update the node_start_time of a node before (and not after) the recursion from this node (e.g. nothing in the recursion downstream of B can modify B's starting time). That is, we can move the lines
to just before of
and completely remove the |
I think this makes sense,when I wrote this I think I was too fixated on maintaining the exact run time behavior and since the update happened after recursion in the recursive version I added the shift stack loop. But, I think you're correct there isn't anything in the children nodes that can impact the start time of a parent so we should shift before looping over the successors. Can you push a quick PR to fix this and we can try to get it in before 0.24 goes out today. |
* Remove recurssion from ConstrainedReschedule pass The ConstrainedReschedule pass previosuly was using a recursive depth first traversal to push back overlapping gates after aligning operations. This however would cause a failure for a sufficiently large circuit when the recursion depth could potentially exceed the maximum stack depth allowed in python. To address this, this commit rewrites the depth first traversal to be iterative instead of recursive. This removes the stack depth limitation and should let the pass run with any size circuit. However, the performance of this pass is poor for large circuits. One thing we can look at using to try and speed it up is rustworkx's dfs_search() function which will let us shift the traversal to rust and call back to python to do the timing offsets. If this is insufficient we'll have to investigate a different algorithm for adjusting the time that doesn't require multiple iterations like the current approach. Fixes Qiskit#10049 * Use rustworkx's dfs_search instead of manual dfs implementation This commit rewrites the pass to leverage rustworkx's dfs_search function which provides a way to have rustworkx traverse the graph in a depth first manner and then provides hook points to execute code at different named portions of the DFS. By leveraging this function we're able to speed up the search by leveraging rust to perform the actual graph traversal. * Revert "Use rustworkx's dfs_search instead of manual dfs implementation" This made performance of the pass worse so reverting this for now. We can investigate this at a later date. This reverts commit bd3cbb2. * Remove visited node check from DFS This commit removes the visited node check and skip logic from the DFS traversal. To ensure this code behaves identically to the recursive version before this PR this logic is removed because there wasn't a similar check in that version.
Summary
The ConstrainedReschedule pass previosuly was using a recursive depth first traversal to push back overlapping gates after aligning operations. This however would cause a failure for a sufficiently large circuit when the recursion depth could potentially exceed the maximum stack depth allowed in python. To address this, this commit rewrites the depth first traversal to be iterative instead of recursive. This removes the stack depth limitation and should let the pass run with any size circuit.
However, the performance of this pass is poor for large circuits. One thing we can look at using to try and speed it up isrustworkx's dfs_search() function which will let us shift the traversal to rust and call back to python to do the timingoffsets.I tried this in bd3cbb2 and it was significantly slower. We'll have to investigate a different algorithmic approach for adjusting the time that doesn't require multiple iterations like the current approach.Details and comments
Fixes #10049