-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of VF2 scoring and add support for scoring passes #9026
Conversation
This commit makes 2 key changes to the vf2 layout pass. The first is it migrates the scoring routine to rust. When running vf2 layout and vf2 post layout we're bottlenecked by the performance of the scoring of a layout since in practice scoring a large circuit ends up taking more time than the vf2_mapping() function. To address this the scoring function is migrated to rust where the iteration will be much faster. To enable this rust migration the average error map is made into a 2D numpy array which can be efficiently be accessed by reference from rust. This additionally also enables a convenient interface for future expansion of the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will now both look for a "vf2_avg_error_map" entry in the property set which contains a 2d array used for scoring. If present that array will be used for scoring instead of the computing one from the target's error rates. This will enable custom analysis passes to be run pre-layout to compute or inject a custom scoring heuristic.
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of Qiskit#9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR.
For BackendV1 based backends it's possible for the BackendProperties object for that beackend to get out of sync with the number fo qubits actually available in the system. In such cases looking up the noise characteristics can potentially fail when building the error map because the reported number of qubits is less than the qubits there are properties for. This wasn't an issue in the previous error map data structure because it was a dictionary and it would just add the error rate for the extra qubits even though it wasn't valid. However, now that we're using a numpy array with a fixed size this isn't the case anymore and an error would be raised in these cases. To workaround this issue this commit skips any qubits outside the allowed range in the BackendProperties when building the error map to account for this potential discrepency. The extra properties couldn't be used anyway since they're not valid device qubits in such cases.
Pull Request Test Coverage Report for Build 3500758612
💛 - Coveralls |
This commit updates the vf2 layout scoring to work with a dictionary object instead of a Layout object. Previously we were creating a Layout object on each mapping found and passing that to scoring. However, this was unecessary overhead as the Layout object is slow to create and interact with. Since we only need a Layout object if we're potentially returning the layout as the best result we can avoid this extra overhead.
This commit removes the lookup for the QISKIT_IN_PARALLEL env variable from the rust code for vf2 scoring. THis was adding unecessary overhead to a frequently called function when it only needs to be computed once. This commit moves the lookup to python outside the for loop and just passes the evaluated boolean to the rust function instead.
This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of #9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of #9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> (cherry picked from commit 234816c)
This commit fixes a few copy paste errors and errors in the docstring for the VF2PostLayout pass. It also adds a a link to the paper for the pass. This was originally part of #9026 as these fixes were part of modifying the docstring to document the new feature being added in that PR. This commit just extracts those docstring fixes from that PR. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> (cherry picked from commit 234816c) Co-authored-by: Matthew Treinish <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving scoring to Rust generally seems sensible to me. Most of the comments below are very minor.
My two main things are:
- I'm concerned we're overloading
NaN
in the gate-error matrix with two incompatible meanings. - I think that even for 1000q systems, the 2D matrix is probably going to be fine, but there's a possibility we might end up with better cache locality in the scoring for large systems if we considered some sparser structure, since we generally expect that most real-world systems will have limited connectivity, so this 2D matrix will in practice be very sparse. This certainly doesn't need investigating for this PR, just wondering if you'd thought anything about it?
releasenotes/notes/vf2_custom_score_analysis-abb191d56c0c1578.yaml
Outdated
Show resolved
Hide resolved
edge_list: IndexMap<[usize; 2], i32>, | ||
error_matrix: PyReadonlyArray2<f64>, | ||
layout: &NLayout, | ||
strict_direction: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag seems unnecessary in Rust with things now represented as an error_matrix
? I'd have thought that it's just built in during the construction of the matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily reflected in the error matrix, it really depends on the backend properties since we could end up with an error rate defined directionally in the backend and it's hard to know until after we finish building the matrix so we don't build the error matrix bidirectional if strict_direction=False
. It becomes more a question of scoring behavior than representing it. If strict_direction = False
the and err_mat[[0, 1]]
is NaN
it will try err_mat[[1, 0]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair enough then. In all the situations I could think of, the backend would just have constructed the directional error matrix itself, but if there's a chance those are decoupled, then it's right to include the swap.
That said, it feels a bit odd in the scoring that we don't take into account both sides of the link in other cases. If it's not strict directionality, but the two directions have different average errors, it feels weird that we don't take that into account somehow? In my mental model, that situation shouldn't be possible, but if the error matrix and strict directionality aren't coupled at the level of the backend, it starts to feel possible and unclear in how it should be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tradeoff to find a path vs not finding a path. There are two modes of operation right now with the vf2 mapping, either we work with directed graphs and respect the direction of gates on the backend or we treat every edge as weak and work with undirected edges and then rely on GateDirection
later to correct things. In the former case if the edge is defined bidirectionally with different error rates rustworkx will return a different mapping with either direction and score them differently. But for the latter case we can't really make an assertion about the order of the qubits for the qargs because it's not explicitly set.
That being said I think you're right for the undirected case maybe we should be looking at the other direction in scoring if both directions have defined error rates. We'll have to think of the best way to do this. (for this I just copied the scoring algorithm we were using before anyway)
Co-authored-by: Jake Lishman <[email protected]>
This commit deduplicates a bunch of the rust side code for scoring into 2 closures and replaces all the reduce() calls with product() to accomplish the same thing.
Co-authored-by: Jake Lishman <[email protected]>
In order to support large (> 1000) qubit systems efficiently this commit pivots away from using a 2d numpy array to represent the average error rates for a target. For 1000q this error matrix would take 8 MB of memory but for 10k qubit it would take 800 MB. Considering by their nature these error matricies should be fairly sparse as connectivity in typical QPUs is sparse. This was just wasted memory as we'll end up with a lot of NaN values in the array. Instead this commit adds a new Rust struct/Python class ErrorMap which just wraps a HashMap and maps a 2 element int array to a float. This way we only store entries where there is defined connectivity and are more memory efficient.
I've updated the PR to use a custom hash map based class instead of a 2d numpy array to represent the average error map. This should both be much more memory efficient and also make the usage a bit clearer for people to interact with. |
This commit moves the NLayout rust class out of the stochastic swap python module into a new standalone nlayout module in qiskit._accelerate. The NLayout class was originally added with the stochastic swap rust code, but since then it's started being used by other rust code including SabreSwap (and soon to be VF2Layout and VF2PostLayout scoring in Qiskit#9026).
This commit moves the NLayout rust class out of the stochastic swap python module into a new standalone nlayout module in qiskit._accelerate. The NLayout class was originally added with the stochastic swap rust code, but since then it's started being used by other rust code including SabreSwap (and soon to be VF2Layout and VF2PostLayout scoring in #9026).
To show the performance improvements I threw together a small test script: import statistics
import time
from qiskit.transpiler.passes.layout import VF2Layout
from qiskit.circuit import QuantumCircuit
from qiskit.providers.fake_provider import FakeMumbaiV2
from qiskit.converters import circuit_to_dag
qc = QuantumCircuit(7)
qc.h(0)
qc.cz(0, 1)
qc.cz(1, 2)
qc.cz(2, 3)
qc.measure_all()
dag = circuit_to_dag(qc)
backend = FakeMumbaiV2()
times = []
vf2_pass = VF2Layout(target=backend.target, max_trials=-1)
for i in range(5):
print(f"Run {i}")
start = time.perf_counter()
vf2_pass.run(dag)
stop = time.perf_counter()
times.append(stop - start)
print(stop - start)
print(statistics.geometric_mean(times)) This script is a worst case from a scoring perspective, it's mapping a line with 4 nodes and 3 free nodes onto the coupling graph. This means there are a lot of possible permutations for valid isomorphic mappings but each is trivial for rustworkx to calculate. It also turns off any limits in the pass so it will fully iterate through all available mappings. Running this script with this PR applied returned:
Then running it on main:
This obviously is a best case improvement. In more realistic testing with limits set I was seeing a ~10% improvement over main with this PR applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Should we add some test to make sure the avg_error_map
property set setting is properly loaded/used when present?
releasenotes/notes/vf2_custom_score_analysis-abb191d56c0c1578.yaml
Outdated
Show resolved
Hide resolved
Co-authored-by: Kevin Hartman <[email protected]>
Good call, I added a test with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…Qiskit#9026) * Improve performance of VF2 scoring and add support for scoring passes This commit makes 2 key changes to the vf2 layout pass. The first is it migrates the scoring routine to rust. When running vf2 layout and vf2 post layout we're bottlenecked by the performance of the scoring of a layout since in practice scoring a large circuit ends up taking more time than the vf2_mapping() function. To address this the scoring function is migrated to rust where the iteration will be much faster. To enable this rust migration the average error map is made into a 2D numpy array which can be efficiently be accessed by reference from rust. This additionally also enables a convenient interface for future expansion of the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will now both look for a "vf2_avg_error_map" entry in the property set which contains a 2d array used for scoring. If present that array will be used for scoring instead of the computing one from the target's error rates. This will enable custom analysis passes to be run pre-layout to compute or inject a custom scoring heuristic. * Handle missing qubits from properties Payload For BackendV1 based backends it's possible for the BackendProperties object for that beackend to get out of sync with the number fo qubits actually available in the system. In such cases looking up the noise characteristics can potentially fail when building the error map because the reported number of qubits is less than the qubits there are properties for. This wasn't an issue in the previous error map data structure because it was a dictionary and it would just add the error rate for the extra qubits even though it wasn't valid. However, now that we're using a numpy array with a fixed size this isn't the case anymore and an error would be raised in these cases. To workaround this issue this commit skips any qubits outside the allowed range in the BackendProperties when building the error map to account for this potential discrepency. The extra properties couldn't be used anyway since they're not valid device qubits in such cases. * Limit number of intermediate Layout objects created This commit updates the vf2 layout scoring to work with a dictionary object instead of a Layout object. Previously we were creating a Layout object on each mapping found and passing that to scoring. However, this was unecessary overhead as the Layout object is slow to create and interact with. Since we only need a Layout object if we're potentially returning the layout as the best result we can avoid this extra overhead. * Move environment variable check outside loop This commit removes the lookup for the QISKIT_IN_PARALLEL env variable from the rust code for vf2 scoring. THis was adding unecessary overhead to a frequently called function when it only needs to be computed once. This commit moves the lookup to python outside the for loop and just passes the evaluated boolean to the rust function instead. * Fix rust lint * Apply suggestions from code review Co-authored-by: Jake Lishman <[email protected]> * Simplify duplicated rust iteration code This commit deduplicates a bunch of the rust side code for scoring into 2 closures and replaces all the reduce() calls with product() to accomplish the same thing. * Update qiskit/transpiler/passes/layout/vf2_layout.py Co-authored-by: Jake Lishman <[email protected]> * Use np.full() instead of np.empty() and np.fill() * Pivot from 2d numpy array to a custom ErrorMap class In order to support large (> 1000) qubit systems efficiently this commit pivots away from using a 2d numpy array to represent the average error rates for a target. For 1000q this error matrix would take 8 MB of memory but for 10k qubit it would take 800 MB. Considering by their nature these error matricies should be fairly sparse as connectivity in typical QPUs is sparse. This was just wasted memory as we'll end up with a lot of NaN values in the array. Instead this commit adds a new Rust struct/Python class ErrorMap which just wraps a HashMap and maps a 2 element int array to a float. This way we only store entries where there is defined connectivity and are more memory efficient. * Fix lint * Fix import path after rebase * Update release notes * Apply suggestions from code review Co-authored-by: Kevin Hartman <[email protected]> * Build empty ErrorMap in case of no target or coupling map * Add helper function for layout creation in VF2Layout scoring loop * Add test with custom ErrorMap analysis pass Co-authored-by: Jake Lishman <[email protected]> Co-authored-by: Kevin Hartman <[email protected]>
Summary
This commit makes 2 key changes to the vf2 layout pass. The first is it migrates the scoring routine to rust. When running vf2 layout and vf2 post layout we're bottle necked by the performance of the scoring of a layout since in practice scoring a large circuit ends up taking more time than the vf2_mapping() function. To address this the scoring function is migrated to rust where the iteration will be much faster. To enable this rust migration the average error map is made into an
ErrorMap
class which can be efficiently be accessed by reference from rust. This additionally also enables a convenient interface for future expansion of the vf2 layout passes. The VF2LayoutPass and VF2PostLayout passes will now both look for a "vf2_avg_error_map" entry in the property set which contains aErrorMap
used for scoring. If present that array will be used for scoring instead of the computing one from the target's error rates. This will enable custom analysis passes to be run pre-layout to compute or inject a custom scoring heuristic.Details and comments
TODO: