-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use VF2 to find a partial layout for seeding SabreLayout #9174
Conversation
This commit modifies the SabreLayout pass when run without the routing_pass argument to run primarily in Rust. This builds on top of the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388, and Qiskit#8572. Internally, when the routing_pass argument is not set SabreLayout will perform the full sabre algorithm both layout selection and final swap mapping in rust and return the selected initial layout, the final layout, the toplogical sorting used to traverse the circuit, and a SwapMap for any swaps inserted. This is then used to build the output circuit in place of running separate layout and routing passes. The preset pass managers are updated to handle the new combined layout and routing mode of operation for SabreLayout. The routing stage to the preset pass managers remains intact, it will just operate as if a perfect layout was selected and skip SabreSwap because the circuit is already matching the connectivity constraints. Besides just operating more quickly because the heavy lifting of the algorithm operates more efficiently in a compiled language, doing this in rust also lets change our parallelization model for running multiple seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run multiple parallel trials with different seeds this commit adds a layout_trials argument to SabreLayout to try multiple seeds in parallel. When this is used it parallelizes at the outer layer for each layout/routing combination and the total minimal swap count seed is used. So for example if you set swap_trials=5 and layout_trails=5 that will run 5 tasks in the threadpool with 5 different seeds for the outer layout run. Inside that every time sabre swap is run (which will be multiple times as part of layout plus the final routing run) it tries 5 different seeds for each execution serially inside that parallel task. This should hopefully further improve the quality of the transpiler output and better match expectations for users who were previously calling transpile() multiple times to emulate this behavior. Implements Qiskit#9090
Previously this PR was using copy() to copy the coupling map before we mutated it to be symmetric (a requirement for the sabre algorithm). However, this modification of the object was leaking out causing test failures. This commit switches it to a deepcopy to ensure there are no shared references (and a comment added to explain it's needed).
This PR branch modifies the default behavior of the SabreLayout pass so it is now a transformation pass that computes a layout, applies it, and then performs routing. This means when using sabre layout in a custom pass manager we no longer need to embed a layout after computing the layout. The failing unitary synthesis tests were using a custom pass manager and trying to apply the layout again after SabreLayout already did. This commit just removes this now unecessary steps from the test code.
Now that the routing stage is integrated into the SabreLayout pass we should be running the BarrierBeforeMeasurement pass prior to layout in the preset pass managers instead of before routing. The goal of the pass is to prevent the routing algorithms for accidentally reusing a qubit after a final measurement which would be invalid by inserting a barrier before the measurements to ensure all qubits are swap mapped prior to adding the measurements during routing. While this might not strictly be necessary (it didn't affect any test output) it feels like best practice to ensure we're doing this prior to potentially routing to prevent issues.
For reproducible results with a fixed seed this commit sets a fixed number of layout_trials for the SabreLayout pass in the preset pass managers. If we did not set a fixed value than the output of the transpiler with a fixed seed will vary based on the number of physical cores that is running the compilation. To start optimization levels 0 and 1 use 5, level 2 uses 10, and level 3 uses 20 which matches the swap_trials argument we used. This is just a starting point, we can adjust these values later if needed.
This commit updates the tests which are checking exact layouts with a fixed seed when running SabreLayout. The changes to SabreLayout breaks exact seed reproducibility from the earlier version of the pass. So we need to update these tests for their new layout assignment from the improved pass. One exception is a test which was trying to assert that transpile() preserves a swap if it's in the basis set. However, the new layout and routing output from SabreLayout for that test was resulting in all the swaps getting optimized away at optimization level 3 (resulting in 13 cx gates instead of ~4 cx gates and 5 swaps before, which would be more efficient on real hardware). So the test was removed and only run at lower optimziation levels.
The dedicated tests for SabreLayout were not running a fixed number of trials. This was causing a different layout to be returned in tests when run across multiple systems as the number of trials defaults to the number of physical CPUs. This commit fixes the trial count to the number of cores on the local system where the layout was updated. This should fix the non-determinism in the tests causing failures in CI and on different local systems.
If there is only a single layout trial being run we don't have to worry about trying to do too much work in parallel at once by parallelizing the inner sabre swap execution. This commit updates the threading logic to enable running the inner sabre swap trials in parallel if there is only a single layout trial.
This commit corrects a bug in the PR branch that was caused by applying the selected initial layout in a trial to the swapped order node list. This was causing unexpected results when applying the circuit because the intent was to apply it only to the original input not the reversed input.
In the case we're evaluating the layout trials serially instead of in a parallel iterator we don't need to clone the dag nodes list. This is because nothing will be modifying it in parallel, so we don't need a thread local copy. Each call to layout_trial() will keep the dag nodes vector intact (see previous commit for fixing this) so it can just be passed by reference if there are no parallel threads involved.
This commit fixes an issue prevent seed randomization when no seed is specified. On subsequent uses of a pass SabreLayout would not randomize the seed between runs because it was setting the seed to instance state. This commit fixes this issue by relying on initializing the RNG from entropy each time run() is called if no user specified seed is provided.
This commit fixes the routing run to run from a trivial layout instead of the initial layout. By the time we do final routing for a trial we've already applied the selected initial layout to the SabreDAG. So the correct layout to use for running final swap mapping is a trivial layout where logical bit 0 is at physical bit 0. Using initial layout twice means we end up mapping more than is needed resulting in incorrect results.
This change was incorrect, the output was already in the correct order and this was causing the behavior it strived to fix. This commit reverts the addition of the extra mem::swap() call to fix things. This reverts commit d98ef6c.
This commit deduplicates the trivial layout generation for the NLayout class. Previously there were a few places both in rust and python that sabre layout was manually generating a trivial NLayout object. THis commit adds a static method to the NLayout class that allows both Python and Rust to easily create a new trivial NLayout object instead of manually creating the object.
Since more recent commits fixed a few bugs in the behavior of the SabreLayout pass, the previously updated fixed layout tests were no longer correct. This commit updates the tests which were now failing because the layout changed again after fixing bugs in the new pass code.
Looking at profiles for running the new SabreLayout pass, as expected the runtime of the rust SabreSwap routines is dominating. This is because we've basically serialized the sabre swap routines and are running multiple seed trials. As an experiment this commit sets the inner SabreSwap routines to run in parallel too. Since the rayon algorithm uses a work stealing algorithm this hopefully shouldn't cause too much extra overhead, especially because the layout trials are quite fast. This ideally means we're just scheduling each sabre swap trial in a big parallel work queue and rayon does the rest of the magic to figure out how to execute things. Initial testing is showing an improvement for large circuits and a more modest improvement for more modest circuits.
This commit adds a new argument, skip_routing, to the SabreLayout constructor. The intent of this new option is to enable mixing custom routing_method user arguments with SabreLayout in it's new accelerated mode of operation. In the earlier commits no matter what users specified the preset pass manager construction would use sabreswap for routing as it was run internally as part of layout. This meant doing something like: transpile(qc, backend, routing_method='stochastic') would really run SabreSwap which is clearly not the user intent. To provide the layout benefits with multiple seed trials this new argument allows disabling the application of the routing found. This comes with a runtime penalty because effectively we end up running routing twice and only using one of the results. But for custom user provided methods or plugins this seems like a reasonable tradeoff.
In Qiskit#9132 we updated the random seed parameters in the rust code for sabre swap to make the seed optional and default to initializing from entropy if it's not specified. This commit updates the usage to account for this change on main.
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
From some quick initial testing I'm not convinced this is any better performing than just a random starting layout, it seems to be marginally slower (because of the vf2 part) and also the quality seems to be worse in general (but not always) as the output returns more swaps on average than just with #9116 . I'll continue to play with this some more and see if I can find something I'm doing wrong or other ways to improve this, but if it's not any better we can just abandon this approach. |
This commit builds on the VF2PartialLayout pass which was an experiment available as an external plugin here: https://github.com/mtreinish/vf2_partial_layout That pass used the vf2 algorithm in rustworkx to find the deepest partial interaction graph of a circuit which is isomorphic with the coupling graph and uses that mapping to apply an initial layout. The issue with the performance of that pass was the selection of the qubits outside the partial interaction graph. Selecting the mapping for those qubits is similar to the same heuristic layout that SabreLayout is trying to solve, just for a subset of qubits. In VF2PartialLayout a simple nearest neighbor based approach was used for selecting qubits from the coupling graph for any virtual qubits outside the partial layout. In practice this ended up performing worse than SabreLayout. To address the shortcomings of that pass this commit combines the partial layout selection from that external plugin with SabreLayout. The sabre layout algorithm starts by randomly selecting a layout and then progressively working forwards and backwards across the circuit and swap mapping it to find the permutation caused by inserted swaps. Those permutations are then used to modify the random layout and eventual an initial layout that minimizes the number of swaps needed is selected. With this commit instead of using a completely random layout the initial guess starts with the partial layout found in the same way as VF2PartialLayout. Then the remaining qubits are selected at random and the Sabrelayout algorithm is run in the same manner as before. This hopefully should improve the quality of the results because we're starting from a partial layout that doesn't require swaps.
3808492
to
be74de6
Compare
This commit builds on the VF2PartialLayout pass which was an experiment available as an external plugin here: https://github.com/mtreinish/vf2_partial_layout That pass used the vf2 algorithm in rustworkx to find the deepest partial interaction graph of a circuit which is isomorphic with the coupling graph and uses that mapping to apply an initial layout. The issue with the performance of that pass was the selection of the qubits outside the partial interaction graph. Selecting the mapping for those qubits is similar to the same heuristic layout that SabreLayout is trying to solve, just for a subset of qubits. In VF2PartialLayout a simple nearest neighbor based approach was used for selecting qubits from the coupling graph for any virtual qubits outside the partial layout. In practice this ended up performing worse than SabreLayout. To address the shortcomings of that pass this commit combines the partial layout selection from that external plugin with SabreLayout. The sabre layout algorithm starts by randomly selecting a layout and then progressively working forwards and backwards across the circuit and swap mapping it to find the permutation caused by inserted swaps. Those permutations are then used to modify the random layout and eventual an initial layout that minimizes the number of swaps needed is selected. With this commit instead of using a completely random layout for all the initial guesses this starts a single trial with the partial layout found in the same way as VF2PartialLayout. Then the remaining qubits are selected at random and the Sabrelayout algorithm is run in the same manner as before. This hopefully should improve the quality of the results because we're starting from a partial layout that doesn't require swaps for those qubits. A similar (almost identical approach) was tried in Qiskit#9174 except instead of seeding a single trial with the partial layout it used the partial layout for all the the trials. In that case the results were not generally better and the results were mixed. At the time my guess was that using the partial layout constrained the search space too much and was inducing more swaps to be needed. However, looking at the details in issue Qiskit#10160 this adapts Qiskit#9174 to see if doing the partial layout in a more limited manner has any impact there.
This commit builds on the VF2PartialLayout pass which was an experiment available as an external plugin here: https://github.com/mtreinish/vf2_partial_layout That pass used the vf2 algorithm in rustworkx to find the deepest partial interaction graph of a circuit which is isomorphic with the coupling graph and uses that mapping to apply an initial layout. The issue with the performance of that pass was the selection of the qubits outside the partial interaction graph. Selecting the mapping for those qubits is similar to the same heuristic layout that SabreLayout is trying to solve, just for a subset of qubits. In VF2PartialLayout a simple nearest neighbor based approach was used for selecting qubits from the coupling graph for any virtual qubits outside the partial layout. In practice this ended up performing worse than SabreLayout. To address the shortcomings of that pass this commit combines the partial layout selection from that external plugin with SabreLayout. The sabre layout algorithm starts by randomly selecting a layout and then progressively working forwards and backwards across the circuit and swap mapping it to find the permutation caused by inserted swaps. Those permutations are then used to modify the random layout and eventual an initial layout that minimizes the number of swaps needed is selected. With this commit instead of using a completely random layout for all the initial guesses this starts a single trial with the partial layout found in the same way as VF2PartialLayout. Then the remaining qubits are selected at random and the Sabrelayout algorithm is run in the same manner as before. This hopefully should improve the quality of the results because we're starting from a partial layout that doesn't require swaps for those qubits. A similar (almost identical approach) was tried in Qiskit#9174 except instead of seeding a single trial with the partial layout it used the partial layout for all the the trials. In that case the results were not generally better and the results were mixed. At the time my guess was that using the partial layout constrained the search space too much and was inducing more swaps to be needed. However, looking at the details in issue Qiskit#10160 this adapts Qiskit#9174 to see if doing the partial layout in a more limited manner has any impact there.
Summary
This commit builds on the VF2PartialLayout pass which was an experiment
available as an external plugin here:
https://github.com/mtreinish/vf2_partial_layout
That pass used the vf2 algorithm in rustworkx to find the deepest
partial interaction graph of a circuit which is isomorphic with the
coupling graph and uses that mapping to apply an initial layout. The
issue with the performance of that pass was the selection of the qubits
outside the partial interaction graph. Selecting the mapping for those
qubits is similar to the same heuristic layout that SabreLayout is
trying to solve, just for a subset of qubits. In VF2PartialLayout a
simple nearest neighbor based approach was used for selecting qubits
from the coupling graph for any virtual qubits outside the partial
layout. In practice this ended up performing worse than SabreLayout.
To address the shortcomings of that pass this commit combines the
partial layout selection from that external plugin with SabreLayout.
The sabre layout algorithm starts by randomly selecting a layout and
then progressively working forwards and backwards across the circuit
and swap mapping it to find the permutation caused by inserted swaps.
Those permutations are then used to modify the random layout and
eventual an initial layout that minimizes the number of swaps needed is
selected. With this commit instead of using a completely random layout
the initial guess starts with the partial layout found in the same way
as VF2PartialLayout. Then the remaining qubits are selected at random
and the Sabrelayout algorithm is run in the same manner as before. This
hopefully should improve the quality of the results because we're
starting from a partial layout that doesn't require swaps.
Details and comments
This is based on top of #9116 and will need to be rebased after #9116 merges
TODO: