Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use VF2 to find a partial layout for seeding SabreLayout #9174

Closed
wants to merge 29 commits into from

Conversation

mtreinish
Copy link
Member

Summary

This commit builds on the VF2PartialLayout pass which was an experiment
available as an external plugin here:

https://github.com/mtreinish/vf2_partial_layout

That pass used the vf2 algorithm in rustworkx to find the deepest
partial interaction graph of a circuit which is isomorphic with the
coupling graph and uses that mapping to apply an initial layout. The
issue with the performance of that pass was the selection of the qubits
outside the partial interaction graph. Selecting the mapping for those
qubits is similar to the same heuristic layout that SabreLayout is
trying to solve, just for a subset of qubits. In VF2PartialLayout a
simple nearest neighbor based approach was used for selecting qubits
from the coupling graph for any virtual qubits outside the partial
layout. In practice this ended up performing worse than SabreLayout.

To address the shortcomings of that pass this commit combines the
partial layout selection from that external plugin with SabreLayout.
The sabre layout algorithm starts by randomly selecting a layout and
then progressively working forwards and backwards across the circuit
and swap mapping it to find the permutation caused by inserted swaps.
Those permutations are then used to modify the random layout and
eventual an initial layout that minimizes the number of swaps needed is
selected. With this commit instead of using a completely random layout
the initial guess starts with the partial layout found in the same way
as VF2PartialLayout. Then the remaining qubits are selected at random
and the Sabrelayout algorithm is run in the same manner as before. This
hopefully should improve the quality of the results because we're
starting from a partial layout that doesn't require swaps.

Details and comments

This is based on top of #9116 and will need to be rebased after #9116 merges

TODO:

  • Rebase after Oxidize SabreLayout pass #9116 merges
  • Fix tests (different layouts being found in exact layout tests)
  • Add release note
  • Run benchmarks to validate performance (in quality and run time)

This commit modifies the SabreLayout pass when run without the
routing_pass argument to run primarily in Rust. This builds on top of
the rust version of SabreSwap previously added in Qiskit#7977, Qiskit#8388,
and Qiskit#8572. Internally, when the routing_pass argument is not set
SabreLayout will perform the full sabre algorithm both layout selection
and final swap mapping in rust and return the selected initial layout,
the final layout, the toplogical sorting used to traverse the circuit,
and a SwapMap for any swaps inserted. This is then used to build the
output circuit in place of running separate layout and routing passes.
The preset pass managers are updated to handle the new combined layout
and routing mode of operation for SabreLayout. The routing stage to the
preset pass managers remains intact, it will just operate as if a
perfect layout was selected and skip SabreSwap because the circuit is
already matching the connectivity constraints.

Besides just operating more quickly because the heavy lifting of the
algorithm operates more efficiently in a compiled language, doing this
in rust also lets change our parallelization model for running multiple
seed in Sabre. Just as in Qiskit#8572 we added support for SabreSwap to run
multiple parallel trials with different seeds this commit adds a
layout_trials argument to SabreLayout to try multiple seeds in parallel.
When this is used it parallelizes at the outer layer for each
layout/routing combination and the total minimal swap count seed is used.
So for example if you set swap_trials=5 and layout_trails=5 that will run
5 tasks in the threadpool with 5 different seeds for the outer layout run.
Inside that every time sabre swap is run (which will be multiple times
as part of layout plus the final routing run) it tries 5 different seeds
for each execution serially inside that parallel task. This should
hopefully further improve the quality of the transpiler output and better
match expectations for users who were previously calling transpile()
multiple times to emulate this behavior.

Implements Qiskit#9090
Previously this PR was using copy() to copy the coupling map before we
mutated it to be symmetric (a requirement for the sabre algorithm).
However, this modification of the object was leaking out causing test
failures. This commit switches it to a deepcopy to ensure there are no
shared references (and a comment added to explain it's needed).
This PR branch modifies the default behavior of the SabreLayout pass so
it is now a transformation pass that computes a layout, applies it, and
then performs routing. This means when using sabre layout in a custom
pass manager we no longer need to embed a layout after computing the
layout. The failing unitary synthesis tests were using a custom pass
manager and trying to apply the layout again after SabreLayout already
did. This commit just removes this now unecessary steps from the test
code.
Now that the routing stage is integrated into the SabreLayout pass we
should be running the BarrierBeforeMeasurement pass prior to layout in
the preset pass managers instead of before routing. The goal of the pass
is to prevent the routing algorithms for accidentally reusing a qubit
after a final measurement which would be invalid by inserting a barrier
before the measurements to ensure all qubits are swap mapped prior to
adding the measurements during routing. While this might not strictly be
necessary (it didn't affect any test output) it feels like best practice
to ensure we're doing this prior to potentially routing to prevent
issues.
For reproducible results with a fixed seed this commit sets a fixed
number of layout_trials for the SabreLayout pass in the preset pass
managers. If we did not set a fixed value than the output of the
transpiler with a fixed seed will vary based on the number of
physical cores that is running the compilation. To start
optimization levels 0 and 1 use 5, level 2 uses 10, and level
3 uses 20 which matches the swap_trials argument we used. This is just a
starting point, we can adjust these values later if needed.
This commit updates the tests which are checking exact layouts with a
fixed seed when running SabreLayout. The changes to SabreLayout breaks
exact seed reproducibility from the earlier version of the pass. So we
need to update these tests for their new layout assignment from the
improved pass. One exception is a test which was trying to assert that
transpile() preserves a swap if it's in the basis set. However, the new
layout and routing output from SabreLayout for that test was resulting
in all the swaps getting optimized away at optimization level 3
(resulting in 13 cx gates instead of ~4 cx gates and 5 swaps before,
which would be more efficient on real hardware). So the test was removed
and only run at lower optimziation levels.
The dedicated tests for SabreLayout were not running a fixed number of
trials. This was causing a different layout to be returned in tests when
run across multiple systems as the number of trials defaults to the
number of physical CPUs. This commit fixes the trial count to the number
of cores on the local system where the layout was updated. This should
fix the non-determinism in the tests causing failures in CI and on
different local systems.
If there is only a single layout trial being run we don't have to worry
about trying to do too much work in parallel at once by parallelizing
the inner sabre swap execution. This commit updates the threading logic
to enable running the inner sabre swap trials in parallel if there is
only a single layout trial.
This commit corrects a bug in the PR branch that was caused by applying
the selected initial layout in a trial to the swapped order node list.
This was causing unexpected results when applying the circuit because
the intent was to apply it only to the original input not the reversed
input.
In the case we're evaluating the layout trials serially instead of in a
parallel iterator we don't need to clone the dag nodes list. This is
because nothing will be modifying it in parallel, so we don't need a
thread local copy. Each call to layout_trial() will keep the dag nodes
vector intact (see previous commit for fixing this) so it can just be
passed by reference if there are no parallel threads involved.
This commit fixes an issue prevent seed randomization when no seed is
specified. On subsequent uses of a pass SabreLayout would not randomize
the seed between runs because it was setting the seed to instance state.
This commit fixes this issue by relying on initializing the RNG from
entropy each time run() is called if no user specified seed is provided.
This commit fixes the routing run to run from a trivial layout instead
of the initial layout. By the time we do final routing for a trial we've
already applied the selected initial layout to the SabreDAG. So the
correct layout to use for running final swap mapping is a trivial layout
where logical bit 0 is at physical bit 0. Using initial layout twice
means we end up mapping more than is needed resulting in incorrect
results.
This change was incorrect, the output was already in the correct order
and this was causing the behavior it strived to fix. This commit reverts
the addition of the extra mem::swap() call to fix things.

This reverts commit d98ef6c.
This commit deduplicates the trivial layout generation for the NLayout
class. Previously there were a few places both in rust and python that
sabre layout was manually generating a trivial NLayout object. THis
commit adds a static method to the NLayout class that allows both Python
and Rust to easily create a new trivial NLayout object instead of
manually creating the object.
Since more recent commits fixed a few bugs in the behavior of the
SabreLayout pass, the previously updated fixed layout tests were no
longer correct. This commit updates the tests which were now failing
because the layout changed again after fixing bugs in the new pass code.
Looking at profiles for running the new SabreLayout pass, as expected
the runtime of the rust SabreSwap routines is dominating. This is
because we've basically serialized the sabre swap routines and are
running multiple seed trials. As an experiment this commit sets the
inner SabreSwap routines to run in parallel too. Since the rayon
algorithm uses a work stealing algorithm this hopefully shouldn't cause
too much extra overhead, especially because the layout trials are quite
fast. This ideally means we're just scheduling each sabre swap trial in
a big parallel work queue and rayon does the rest of the magic to figure
out how to execute things. Initial testing is showing an improvement for
large circuits and a more modest improvement for more modest circuits.
This commit adds a new argument, skip_routing, to the SabreLayout
constructor. The intent of this new option is to enable mixing custom
routing_method user arguments with SabreLayout in it's new accelerated
mode of operation. In the earlier commits no matter what users specified
the preset pass manager construction would use sabreswap for routing as
it was run internally as part of layout. This meant doing something
like:

transpile(qc, backend, routing_method='stochastic')

would really run SabreSwap which is clearly not the user intent. To
provide the layout benefits with multiple seed trials this new argument
allows disabling the application of the routing found. This comes with a
runtime penalty because effectively we end up running routing twice and
only using one of the results. But for custom user provided methods or
plugins this seems like a reasonable tradeoff.
In Qiskit#9132 we updated the random seed parameters in the rust code for
sabre swap to make the seed optional and default to initializing from
entropy if it's not specified. This commit updates the usage to account
for this change on main.
@mtreinish mtreinish requested a review from a team as a code owner November 21, 2022 20:39
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@mtreinish mtreinish added on hold Can not fix yet performance Changelog: New Feature Include in the "Added" section of the changelog labels Nov 21, 2022
@mtreinish
Copy link
Member Author

mtreinish commented Nov 21, 2022

From some quick initial testing I'm not convinced this is any better performing than just a random starting layout, it seems to be marginally slower (because of the vf2 part) and also the quality seems to be worse in general (but not always) as the output returns more swaps on average than just with #9116 . I'll continue to play with this some more and see if I can find something I'm doing wrong or other ways to improve this, but if it's not any better we can just abandon this approach.

This commit builds on the VF2PartialLayout pass which was an experiment
available as an external plugin here:

https://github.com/mtreinish/vf2_partial_layout

That pass used the vf2 algorithm in rustworkx to find the deepest
partial interaction graph of a circuit which is isomorphic with the
coupling graph and uses that mapping to apply an initial layout. The
issue with the performance of that pass was the selection of the qubits
outside the partial interaction graph. Selecting the mapping for those
qubits is similar to the same heuristic layout that SabreLayout is
trying to solve, just for a subset of qubits. In VF2PartialLayout a
simple nearest neighbor based approach was used for selecting qubits
from the coupling graph for any virtual qubits outside the partial
layout. In practice this ended up performing worse than SabreLayout.

To address the shortcomings of that pass this commit combines the
partial layout selection from that external plugin with SabreLayout.
The sabre layout algorithm starts by randomly selecting a layout and
then progressively working forwards and backwards across the circuit
and swap mapping it to find the permutation caused by inserted swaps.
Those permutations are then used to modify the random layout and
eventual an initial layout that minimizes the number of swaps needed is
selected. With this commit instead of using a completely random layout
the initial guess starts with the partial layout found in the same way
as VF2PartialLayout. Then the remaining qubits are selected at random
and the Sabrelayout algorithm is run in the same manner as before. This
hopefully should improve the quality of the results because we're
starting from a partial layout that doesn't require swaps.
@mtreinish mtreinish force-pushed the vf2-partial-sabre-layout branch from 3808492 to be74de6 Compare November 21, 2022 21:55
@mtreinish mtreinish closed this Jan 12, 2023
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request May 26, 2023
This commit builds on the VF2PartialLayout pass which was an experiment
available as an external plugin here:

https://github.com/mtreinish/vf2_partial_layout

That pass used the vf2 algorithm in rustworkx to find the deepest
partial interaction graph of a circuit which is isomorphic with the
coupling graph and uses that mapping to apply an initial layout. The
issue with the performance of that pass was the selection of the qubits
outside the partial interaction graph. Selecting the mapping for those
qubits is similar to the same heuristic layout that SabreLayout is
trying to solve, just for a subset of qubits. In VF2PartialLayout a
simple nearest neighbor based approach was used for selecting qubits
from the coupling graph for any virtual qubits outside the partial
layout. In practice this ended up performing worse than SabreLayout.

To address the shortcomings of that pass this commit combines the
partial layout selection from that external plugin with SabreLayout.
The sabre layout algorithm starts by randomly selecting a layout and
then progressively working forwards and backwards across the circuit
and swap mapping it to find the permutation caused by inserted swaps.
Those permutations are then used to modify the random layout and
eventual an initial layout that minimizes the number of swaps needed is
selected. With this commit instead of using a completely random layout
for all the initial guesses this starts a single trial with the partial
layout found in the same way as VF2PartialLayout. Then the remaining
qubits are selected at random and the Sabrelayout algorithm is run in
the same manner as before. This hopefully should improve the quality
of the results because we're starting from a partial layout that
doesn't require swaps for those qubits.

A similar (almost identical approach) was tried in Qiskit#9174 except instead
of seeding a single trial with the partial layout it used the partial
layout for all the the trials. In that case the results were not
generally better and the results were mixed. At the time my guess was
that using the partial layout constrained the search space too much and
was inducing more swaps to be needed. However, looking at the details in
issue Qiskit#10160 this adapts Qiskit#9174 to see if doing the partial layout in a
more limited manner has any impact there.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request May 26, 2023
This commit builds on the VF2PartialLayout pass which was an experiment
available as an external plugin here:

https://github.com/mtreinish/vf2_partial_layout

That pass used the vf2 algorithm in rustworkx to find the deepest
partial interaction graph of a circuit which is isomorphic with the
coupling graph and uses that mapping to apply an initial layout. The
issue with the performance of that pass was the selection of the qubits
outside the partial interaction graph. Selecting the mapping for those
qubits is similar to the same heuristic layout that SabreLayout is
trying to solve, just for a subset of qubits. In VF2PartialLayout a
simple nearest neighbor based approach was used for selecting qubits
from the coupling graph for any virtual qubits outside the partial
layout. In practice this ended up performing worse than SabreLayout.

To address the shortcomings of that pass this commit combines the
partial layout selection from that external plugin with SabreLayout.
The sabre layout algorithm starts by randomly selecting a layout and
then progressively working forwards and backwards across the circuit
and swap mapping it to find the permutation caused by inserted swaps.
Those permutations are then used to modify the random layout and
eventual an initial layout that minimizes the number of swaps needed is
selected. With this commit instead of using a completely random layout
for all the initial guesses this starts a single trial with the partial
layout found in the same way as VF2PartialLayout. Then the remaining
qubits are selected at random and the Sabrelayout algorithm is run in
the same manner as before. This hopefully should improve the quality
of the results because we're starting from a partial layout that
doesn't require swaps for those qubits.

A similar (almost identical approach) was tried in Qiskit#9174 except instead
of seeding a single trial with the partial layout it used the partial
layout for all the the trials. In that case the results were not
generally better and the results were mixed. At the time my guess was
that using the partial layout constrained the search space too much and
was inducing more swaps to be needed. However, looking at the details in
issue Qiskit#10160 this adapts Qiskit#9174 to see if doing the partial layout in a
more limited manner has any impact there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog on hold Can not fix yet performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants