Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of VF2 scoring and add support for scoring passes #9026

Merged
merged 27 commits into from
Nov 18, 2022

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Oct 28, 2022

Summary

This commit makes 2 key changes to the vf2 layout pass. The first is it migrates the scoring routine to rust. When running vf2 layout and vf2 post layout we're bottle necked by the performance of the scoring of a layout since in practice scoring a large circuit ends up taking more time than the vf2_mapping() function. To address this the scoring function is migrated to rust where the iteration will be much faster. To enable this rust migration the average error map is made into an ErrorMap class which can be efficiently be accessed by reference from rust. This additionally also enables a convenient interface for future expansion of the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will now both look for a "vf2_avg_error_map" entry in the property set which contains a ErrorMap used for scoring. If present that array will be used for scoring instead of the computing one from the target's error rates. This will enable custom analysis passes to be run pre-layout to compute or inject a custom scoring heuristic.

Details and comments

TODO:

  • Fix test failures. Fails with faulty qubits (I assume the backend properties mapping is broken and only worked by chance before this) and fake melbourne (which had a weird broken qubit and I bet the stored payload is broken)
  • Benchmark and potentially tune performance

This commit makes 2 key changes to the vf2 layout pass. The first is it
migrates the scoring routine to rust. When running vf2 layout and vf2
post layout we're bottlenecked by the performance of the scoring of a
layout since in practice scoring a large circuit ends up taking more
time than the vf2_mapping() function. To address this the scoring
function is migrated to rust where the iteration will be much faster. To
enable this rust migration the average error map is made into a 2D numpy
array which can be efficiently be accessed by reference from rust. This
additionally also enables a convenient interface for future expansion of
the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will
now both look for a "vf2_avg_error_map" entry in the property set which
contains a 2d array used for scoring. If present that array will be used
for scoring instead of the computing one from the target's error rates.
This will enable custom analysis passes to be run pre-layout to compute
or inject a custom scoring heuristic.
@mtreinish mtreinish added on hold Can not fix yet performance Changelog: New Feature Include in the "Added" section of the changelog Rust This PR or issue is related to Rust code in the repository labels Oct 28, 2022
@mtreinish mtreinish requested a review from a team as a code owner October 28, 2022 18:58
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Oct 28, 2022
This commit fixes a few copy paste errors and errors in the docstring
for the VF2PostLayout pass. It also adds a a link to the paper for the
pass.

This was originally part of Qiskit#9026 as these fixes were part of modifying
the docstring to document the new feature being added in that PR. This
commit just extracts those docstring fixes from that PR.
For BackendV1 based backends it's possible for the BackendProperties
object for that beackend to get out of sync with the number fo qubits
actually available in the system. In such cases looking up the noise
characteristics can potentially fail when building the error map because
the reported number of qubits is less than the qubits there are
properties for. This wasn't an issue in the previous error map data
structure because it was a dictionary and it would just add the error
rate for the extra qubits even though it wasn't valid. However, now that
we're using a numpy array with a fixed size this isn't the case anymore
and an error would be raised in these cases. To workaround this issue
this commit skips any qubits outside the allowed range in the
BackendProperties when building the error map to account for this
potential discrepency. The extra properties couldn't be used anyway
since they're not valid device qubits in such cases.
@mtreinish mtreinish changed the title [WIP] Improve performance of VF2 scoring and add support for scoring passes Improve performance of VF2 scoring and add support for scoring passes Oct 31, 2022
@mtreinish mtreinish removed the on hold Can not fix yet label Oct 31, 2022
@coveralls
Copy link

coveralls commented Oct 31, 2022

Pull Request Test Coverage Report for Build 3500758612

  • 138 of 164 (84.15%) changed or added relevant lines in 7 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 84.575%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/layout/vf2_utils.py 39 41 95.12%
src/vf2_layout.rs 54 57 94.74%
src/error_map.rs 18 39 46.15%
Totals Coverage Status
Change from base Build 3500114179: -0.04%
Covered Lines: 62629
Relevant Lines: 74051

💛 - Coveralls

This commit updates the vf2 layout scoring to work with a dictionary
object instead of a Layout object.  Previously we were creating a Layout
object on each mapping found and passing that to scoring. However, this
was unecessary overhead as the Layout object is slow to create and
interact with. Since we only need a Layout object if we're potentially
returning the layout as the best result we can avoid this extra
overhead.
This commit removes the lookup for the QISKIT_IN_PARALLEL env variable
from the rust code for vf2 scoring. THis was adding unecessary overhead
to a frequently called function when it only needs to be computed once.
This commit moves the lookup to python outside the for loop and just
passes the evaluated boolean to the rust function instead.
mergify bot added a commit that referenced this pull request Oct 31, 2022
This commit fixes a few copy paste errors and errors in the docstring
for the VF2PostLayout pass. It also adds a a link to the paper for the
pass.

This was originally part of #9026 as these fixes were part of modifying
the docstring to document the new feature being added in that PR. This
commit just extracts those docstring fixes from that PR.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
mergify bot pushed a commit that referenced this pull request Oct 31, 2022
This commit fixes a few copy paste errors and errors in the docstring
for the VF2PostLayout pass. It also adds a a link to the paper for the
pass.

This was originally part of #9026 as these fixes were part of modifying
the docstring to document the new feature being added in that PR. This
commit just extracts those docstring fixes from that PR.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 234816c)
mergify bot added a commit that referenced this pull request Nov 1, 2022
This commit fixes a few copy paste errors and errors in the docstring
for the VF2PostLayout pass. It also adds a a link to the paper for the
pass.

This was originally part of #9026 as these fixes were part of modifying
the docstring to document the new feature being added in that PR. This
commit just extracts those docstring fixes from that PR.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 234816c)

Co-authored-by: Matthew Treinish <[email protected]>
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving scoring to Rust generally seems sensible to me. Most of the comments below are very minor.

My two main things are:

  • I'm concerned we're overloading NaN in the gate-error matrix with two incompatible meanings.
  • I think that even for 1000q systems, the 2D matrix is probably going to be fine, but there's a possibility we might end up with better cache locality in the scoring for large systems if we considered some sparser structure, since we generally expect that most real-world systems will have limited connectivity, so this 2D matrix will in practice be very sparse. This certainly doesn't need investigating for this PR, just wondering if you'd thought anything about it?

qiskit/transpiler/passes/layout/vf2_layout.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/layout/vf2_layout.py Show resolved Hide resolved
qiskit/transpiler/passes/layout/vf2_post_layout.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/layout/vf2_utils.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/layout/vf2_utils.py Outdated Show resolved Hide resolved
edge_list: IndexMap<[usize; 2], i32>,
error_matrix: PyReadonlyArray2<f64>,
layout: &NLayout,
strict_direction: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag seems unnecessary in Rust with things now represented as an error_matrix? I'd have thought that it's just built in during the construction of the matrix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily reflected in the error matrix, it really depends on the backend properties since we could end up with an error rate defined directionally in the backend and it's hard to know until after we finish building the matrix so we don't build the error matrix bidirectional if strict_direction=False. It becomes more a question of scoring behavior than representing it. If strict_direction = False the and err_mat[[0, 1]] is NaN it will try err_mat[[1, 0]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair enough then. In all the situations I could think of, the backend would just have constructed the directional error matrix itself, but if there's a chance those are decoupled, then it's right to include the swap.

That said, it feels a bit odd in the scoring that we don't take into account both sides of the link in other cases. If it's not strict directionality, but the two directions have different average errors, it feels weird that we don't take that into account somehow? In my mental model, that situation shouldn't be possible, but if the error matrix and strict directionality aren't coupled at the level of the backend, it starts to feel possible and unclear in how it should be handled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a tradeoff to find a path vs not finding a path. There are two modes of operation right now with the vf2 mapping, either we work with directed graphs and respect the direction of gates on the backend or we treat every edge as weak and work with undirected edges and then rely on GateDirection later to correct things. In the former case if the edge is defined bidirectionally with different error rates rustworkx will return a different mapping with either direction and score them differently. But for the latter case we can't really make an assertion about the order of the qubits for the qargs because it's not explicitly set.

That being said I think you're right for the undirected case maybe we should be looking at the other direction in scoring if both directions have defined error rates. We'll have to think of the best way to do this. (for this I just copied the scoring algorithm we were using before anyway)

src/vf2_layout.rs Outdated Show resolved Hide resolved
src/vf2_layout.rs Outdated Show resolved Hide resolved
src/vf2_layout.rs Outdated Show resolved Hide resolved
mtreinish and others added 8 commits November 1, 2022 10:56
This commit deduplicates a bunch of the rust side code for scoring into
2 closures and replaces all the reduce() calls with product() to
accomplish the same thing.
In order to support large (> 1000) qubit systems efficiently this commit
pivots away from using a 2d numpy array to represent the average error
rates for a target. For 1000q this error matrix would take 8 MB of
memory but for 10k qubit it would take 800 MB. Considering by their
nature these error matricies should be fairly sparse as connectivity in
typical QPUs is sparse. This was just wasted memory as we'll end up with
a lot of NaN values in the array. Instead this commit adds a new Rust
struct/Python class ErrorMap which just wraps a HashMap and maps a 2
element int array to a float. This way we only store entries where there
is defined connectivity and are more memory efficient.
@mtreinish
Copy link
Member Author

I've updated the PR to use a custom hash map based class instead of a 2d numpy array to represent the average error map. This should both be much more memory efficient and also make the usage a bit clearer for people to interact with.

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Nov 2, 2022
This commit moves the NLayout rust class out of the stochastic swap
python module into a new standalone nlayout module in
qiskit._accelerate. The NLayout class was originally added with the
stochastic swap rust code, but since then it's started being used by
other rust code including SabreSwap (and soon to be VF2Layout and
VF2PostLayout scoring in Qiskit#9026).
mergify bot pushed a commit that referenced this pull request Nov 2, 2022
This commit moves the NLayout rust class out of the stochastic swap
python module into a new standalone nlayout module in
qiskit._accelerate. The NLayout class was originally added with the
stochastic swap rust code, but since then it's started being used by
other rust code including SabreSwap (and soon to be VF2Layout and
VF2PostLayout scoring in #9026).
@mtreinish
Copy link
Member Author

To show the performance improvements I threw together a small test script:

import statistics
import time

from qiskit.transpiler.passes.layout import VF2Layout
from qiskit.circuit import QuantumCircuit
from qiskit.providers.fake_provider import FakeMumbaiV2
from qiskit.converters import circuit_to_dag

qc = QuantumCircuit(7)

qc.h(0)
qc.cz(0, 1)
qc.cz(1, 2)
qc.cz(2, 3)
qc.measure_all()
dag = circuit_to_dag(qc)
backend = FakeMumbaiV2()

times = []
vf2_pass = VF2Layout(target=backend.target, max_trials=-1)
for i in range(5):
    print(f"Run {i}")
    start = time.perf_counter()
    vf2_pass.run(dag)
    stop = time.perf_counter()
    times.append(stop - start)
    print(stop - start)
print(statistics.geometric_mean(times))

This script is a worst case from a scoring perspective, it's mapping a line with 4 nodes and 3 free nodes onto the coupling graph. This means there are a lot of possible permutations for valid isomorphic mappings but each is trivial for rustworkx to calculate. It also turns off any limits in the pass so it will fully iterate through all available mappings.

Running this script with this PR applied returned:

6.825820831970144

Then running it on main:

14.780817176541099

This obviously is a best case improvement. In more realistic testing with limits set I was seeing a ~10% improvement over main with this PR applied.

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Should we add some test to make sure the avg_error_map property set setting is properly loaded/used when present?

qiskit/transpiler/passes/layout/vf2_layout.py Outdated Show resolved Hide resolved
qiskit/transpiler/passes/layout/vf2_utils.py Show resolved Hide resolved
qiskit/transpiler/passes/layout/vf2_utils.py Outdated Show resolved Hide resolved
src/error_map.rs Outdated Show resolved Hide resolved
src/error_map.rs Outdated Show resolved Hide resolved
@mtreinish
Copy link
Member Author

LGTM.

Should we add some test to make sure the avg_error_map property set setting is properly loaded/used when present?

Good call, I added a test with avg_error_map in: 50aab71

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit 83c84b0 into Qiskit:main Nov 18, 2022
@mtreinish mtreinish deleted the vf2-rust branch November 19, 2022 00:13
Cryoris pushed a commit to Cryoris/qiskit-terra that referenced this pull request Jan 12, 2023
…Qiskit#9026)

* Improve performance of VF2 scoring and add support for scoring passes

This commit makes 2 key changes to the vf2 layout pass. The first is it
migrates the scoring routine to rust. When running vf2 layout and vf2
post layout we're bottlenecked by the performance of the scoring of a
layout since in practice scoring a large circuit ends up taking more
time than the vf2_mapping() function. To address this the scoring
function is migrated to rust where the iteration will be much faster. To
enable this rust migration the average error map is made into a 2D numpy
array which can be efficiently be accessed by reference from rust. This
additionally also enables a convenient interface for future expansion of
the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will
now both look for a "vf2_avg_error_map" entry in the property set which
contains a 2d array used for scoring. If present that array will be used
for scoring instead of the computing one from the target's error rates.
This will enable custom analysis passes to be run pre-layout to compute
or inject a custom scoring heuristic.

* Handle missing qubits from properties Payload

For BackendV1 based backends it's possible for the BackendProperties
object for that beackend to get out of sync with the number fo qubits
actually available in the system. In such cases looking up the noise
characteristics can potentially fail when building the error map because
the reported number of qubits is less than the qubits there are
properties for. This wasn't an issue in the previous error map data
structure because it was a dictionary and it would just add the error
rate for the extra qubits even though it wasn't valid. However, now that
we're using a numpy array with a fixed size this isn't the case anymore
and an error would be raised in these cases. To workaround this issue
this commit skips any qubits outside the allowed range in the
BackendProperties when building the error map to account for this
potential discrepency. The extra properties couldn't be used anyway
since they're not valid device qubits in such cases.

* Limit number of intermediate Layout objects created

This commit updates the vf2 layout scoring to work with a dictionary
object instead of a Layout object.  Previously we were creating a Layout
object on each mapping found and passing that to scoring. However, this
was unecessary overhead as the Layout object is slow to create and
interact with. Since we only need a Layout object if we're potentially
returning the layout as the best result we can avoid this extra
overhead.

* Move environment variable check outside loop

This commit removes the lookup for the QISKIT_IN_PARALLEL env variable
from the rust code for vf2 scoring. THis was adding unecessary overhead
to a frequently called function when it only needs to be computed once.
This commit moves the lookup to python outside the for loop and just
passes the evaluated boolean to the rust function instead.

* Fix rust lint

* Apply suggestions from code review

Co-authored-by: Jake Lishman <[email protected]>

* Simplify duplicated rust iteration code

This commit deduplicates a bunch of the rust side code for scoring into
2 closures and replaces all the reduce() calls with product() to
accomplish the same thing.

* Update qiskit/transpiler/passes/layout/vf2_layout.py

Co-authored-by: Jake Lishman <[email protected]>

* Use np.full() instead of np.empty() and np.fill()

* Pivot from 2d numpy array to a custom ErrorMap class

In order to support large (> 1000) qubit systems efficiently this commit
pivots away from using a 2d numpy array to represent the average error
rates for a target. For 1000q this error matrix would take 8 MB of
memory but for 10k qubit it would take 800 MB. Considering by their
nature these error matricies should be fairly sparse as connectivity in
typical QPUs is sparse. This was just wasted memory as we'll end up with
a lot of NaN values in the array. Instead this commit adds a new Rust
struct/Python class ErrorMap which just wraps a HashMap and maps a 2
element int array to a float. This way we only store entries where there
is defined connectivity and are more memory efficient.

* Fix lint

* Fix import path after rebase

* Update release notes

* Apply suggestions from code review

Co-authored-by: Kevin Hartman <[email protected]>

* Build empty ErrorMap in case of no target or coupling map

* Add helper function for layout creation in VF2Layout scoring loop

* Add test with custom ErrorMap analysis pass

Co-authored-by: Jake Lishman <[email protected]>
Co-authored-by: Kevin Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants