Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxidize TwoQubitWeylDecomposition #11946

Merged
merged 49 commits into from
Mar 14, 2024

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Mar 4, 2024

Summary

This commit is part 1 of migrating the default 2q unitary synthesis to leverage parallel rust described in #8774, the eventual goal is to be able to run unitary synthesis in parallel for all the unitary matrices in a circuit in a single call from the UnitarySynthesis pass. This commit lays the initial groundwork for doing this by starting with the largest piece of the default 2q unitary synthesis code, the TwoQubitWeylDecomposition class. It migrates the entire class to be a pyclass in rust. There is still a Python subclass for it that handles the actual QuantumCircuit generation and also the string representations which are dependent on circuit generation. The goal of this is to keep the same basic algorithm in place but re-implement as-is in Rust as a common starting point for eventual improvements to the underlying algorithm as well as parallelizing the synthesis of multiple 2q unitary matrices.

Details and comments

TODO:

  • Fix test failures (this should be a drop in replacement)
    • Restore specialization tests using rust specialization enum instead of isinstance check. These were initially removed because the specialized subclassing mechanism has been removed, but we shouldn't lose the coverage.
  • Figure out path to raise QiskitError from rust code.
  • Benchmark, profile, and optimize after tests are all passing
  • Add release note

@mtreinish mtreinish added on hold Can not fix yet type: feature request New feature or request performance synthesis Rust This PR or issue is related to Rust code in the repository labels Mar 4, 2024
@mtreinish mtreinish added this to the 1.1.0 milestone Mar 4, 2024
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Mar 11, 2024
This commit is a follow up to Qiskit#11019 that uses the rust implementation
of computing the weyl coordinates for a unitary and the
transform_to_magic_basis functions everywhere. Previously they were only
used internally by the num_basis_gates() rust function to estimate the
number of basis gates needed for a decomposition of a given unitary.
This will likely be superseded by Qiskit#11946 when that is working, but this
is mainly an incremental step working towards finishin Qiskit#11946 that
proves the internal rust functions work in the larger decomposer code to
isolate the source of failures in Qiskit#11946.
@mtreinish mtreinish added Changelog: New Feature Include in the "Added" section of the changelog and removed type: feature request New feature or request labels Mar 11, 2024
@mtreinish mtreinish force-pushed the rust-whales-transcription branch from 1498975 to ccccbe6 Compare March 11, 2024 08:39
mtreinish and others added 13 commits March 11, 2024 17:41
This commit is part 1 of migrating the default 2q unitary synthesis to leverage
parallel rust described in Qiskit#8774, the eventual goal is to be able to
run unitary synthesis in parallel for all the unitary matrices in a
circuit in a single call from the `UnitarySynthesis` pass. This commit
lays the initial groundwork for doing this by starting with the largest
piece of the default 2q unitary synthesis code, the
TwoQubitWeylDecomposition class. It migrates the entire class
to be a pyclass in rust. There is still a Python subclass for it that
handles the actual QuantumCircuit generation and also the string
representations which are dependent on circuit generation. The goal of
this is to keep the same basic algorithm in place but re-implement
as-is in Rust as a common starting point for eventual improvements to
the underlying algorithm as well as parallelizing the synthesis of
multiple 2q unitary matrices.
This commit fixes a typo the formula in the function.
This is the same fix from Qiskit#11953.

Co-authored-by: Shelly Garion <[email protected]>
To aid in debugging and rule out rng differences causing different
results this commit switches the first iteration of the randomized loop
to have hard coded values that are identical to what the rng in numpy
was returning. It is very unlikely that this will have any impact
because the specific random numbers used shouldn't matter, this is
mostly to just rule it out as a possibility in debugging the remaining
test failures.
This commit fixes two fundamental issues in the code. The first is the
rz and ry matrix were being incorrectly constructed for a given angle.
This caused the specializations that were computing the 1q matrices in
the decomposition based on a product with these gates' matrices to
return invalid results. The second issue is for the MirrorControlledEquiv
specialization had the angles backwards for computing the matrix of the
rz gates used in the products for the 1q matrices:

`K1l = K1l @ Rz(K1r)` and `K1r = K1r @ Rz(K1l)` not
`K1l = K1l @ Rz(K1l)` and `K1r = K1r @ Rz(K1r)`

This was a typo from the original transcription.
@mtreinish mtreinish force-pushed the rust-whales-transcription branch from ccccbe6 to 38e7c62 Compare March 11, 2024 08:41
@coveralls
Copy link

coveralls commented Mar 11, 2024

Pull Request Test Coverage Report for Build 8280746382

Details

  • 812 of 897 (90.52%) changed or added relevant lines in 4 files are covered.
  • 18 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.02%) to 89.279%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/euler_one_qubit_decomposer.rs 104 124 83.87%
crates/accelerate/src/two_qubit_decompose.rs 677 742 91.24%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 93.81%
qiskit/synthesis/two_qubit/weyl.py 1 97.62%
crates/qasm2/src/lex.rs 3 91.69%
qiskit/synthesis/two_qubit/two_qubit_decompose.py 3 95.03%
crates/accelerate/src/euler_one_qubit_decomposer.rs 10 90.58%
Totals Coverage Status
Change from base Build 8265611630: -0.02%
Covered Lines: 59581
Relevant Lines: 66736

💛 - Coveralls

Copy link
Member

@levbishop levbishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments to consider - I'm not dead-set on any of them.

crates/accelerate/src/two_qubit_decompose.rs Outdated Show resolved Hide resolved
qiskit/synthesis/two_qubit/two_qubit_decompose.py Outdated Show resolved Hide resolved
qiskit/synthesis/two_qubit/two_qubit_decompose.py Outdated Show resolved Hide resolved
qiskit/synthesis/two_qubit/two_qubit_decompose.py Outdated Show resolved Hide resolved
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Mar 14, 2024
This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in Qiskit#8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of Qiskit#11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946
is merged.

Fixes Qiskit#12004
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Mar 14, 2024
This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in Qiskit#8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of Qiskit#11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946
is merged.

Fixes Qiskit#12004
Copy link
Member

@levbishop levbishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. 🚀

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work!

@jakelishman jakelishman added this pull request to the merge queue Mar 14, 2024
Merged via the queue into Qiskit:main with commit dd802ca Mar 14, 2024
12 checks passed
@mtreinish mtreinish deleted the rust-whales-transcription branch March 14, 2024 19:32
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Mar 14, 2024
This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in Qiskit#8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of Qiskit#11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on Qiskit#11946 and will need to be rebased after Qiskit#11946
is merged.

Fixes Qiskit#12004
github-merge-queue bot pushed a commit that referenced this pull request Mar 28, 2024
* Oxidize TwoQubitBasisDecomposer

This commit is the second part of migrating the default 2q unitary
synthesis method to leverage parallel rust as described in #8774. The
Eventual goal is to be able to run unitary synthesis in parallel for all
the unitary matrices in a single call from the `UnitarySynthesis` pass.
The TwoQubitBasisDecomposer class is one of the default decomposers used
by the unitary synthesis plugin. After this we can build an interface
that will run the decomposition in parallel for a given decomposer.

This commit re-implements the TwoQubitBasisDecomposer class in rust. It
keeps the same algorithm from the previous python version but implements
it in rust. This builds off of #11946 and for the operation of the
decomposer class the TwoQubitWeylDecomposition class is used solely
through rust.

This commit depends on #11946 and will need to be rebased after #11946
is merged.

Fixes #12004

* Fix errors after rebase

* Fix traces method

* Fix pulse optimized synthesis

* Add release notes

* Fix lint

* Use consts for static decomposition arrays

* Run cargo fmt

* Handle basis_fidelity inside unitary synthesis path

* Cast input to TwoQubitBasisDecomposer.num_basis_gates

* Use statics instead of consts

* Pre-allocate 2q circuit sequence outside the pulse optimal path.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Apr 5, 2024
This commit tweaks the heuristic effort in optimization level 2 to be
more of a middle ground between level 1 and 3; with a better balance
between output quality and runtime. This places it to be a better
default for a pass manager we use if one isn't specified. The
tradeoff here is that the vf2layout and vf2postlayout search space is
reduced to be the same as level 1. There are diminishing margins of
return on the vf2 layout search especially for cases when there are a
large number of qubit permutations for the mapping found. Then the
number of sabre trials is brought up to the same level as optimization
level 3. As this can have a significant impact on output and the extra
runtime cost is minimal. The larger change is that the optimization
passes from level 3. This ends up mainly being 2q peephole optimization.
With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the
follow-on PRs this is now fast enough to rely on in optimization level
2.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Apr 5, 2024
This commit tweaks the heuristic effort in optimization level 2 to be
more of a middle ground between level 1 and 3; with a better balance
between output quality and runtime. This places it to be a better
default for a pass manager we use if one isn't specified. The
tradeoff here is that the vf2layout and vf2postlayout search space is
reduced to be the same as level 1. There are diminishing margins of
return on the vf2 layout search especially for cases when there are a
large number of qubit permutations for the mapping found. Then the
number of sabre trials is brought up to the same level as optimization
level 3. As this can have a significant impact on output and the extra
runtime cost is minimal. The larger change is that the optimization
passes from level 3. This ends up mainly being 2q peephole optimization.
With the performance improvements from Qiskit#12010 and Qiskit#11946 and all the
follow-on PRs this is now fast enough to rely on in optimization level
2.
github-merge-queue bot pushed a commit that referenced this pull request Apr 23, 2024
* Increase heuristic effort for optimization level 2

This commit tweaks the heuristic effort in optimization level 2 to be
more of a middle ground between level 1 and 3; with a better balance
between output quality and runtime. This places it to be a better
default for a pass manager we use if one isn't specified. The
tradeoff here is that the vf2layout and vf2postlayout search space is
reduced to be the same as level 1. There are diminishing margins of
return on the vf2 layout search especially for cases when there are a
large number of qubit permutations for the mapping found. Then the
number of sabre trials is brought up to the same level as optimization
level 3. As this can have a significant impact on output and the extra
runtime cost is minimal. The larger change is that the optimization
passes from level 3. This ends up mainly being 2q peephole optimization.
With the performance improvements from #12010 and #11946 and all the
follow-on PRs this is now fast enough to rely on in optimization level
2.

* Add test workaround from level 3 to level 2 too

* Expand vf2 call limit on VF2Layout

For the initial VF2Layout call this commit expands the vf2 call limit
back to the previous level instead of reducing it to the same as level 1.
The idea behind making this change is that spending up to 10s to find a
perfect layout is a worthwhile tradeoff as that will greatly improve the
result from execution. But scoring multiple layouts to find the lowest
error rate subgraph has a diminishing margin of return in most cases as
there typically aren't thousands of unique subgraphs and often when we
hit the scoring limit it's just permuting the qubits inside a subgraph
which doesn't provide the most value.

For VF2PostLayout the lower call limits from level 1 is still used. This
is because both the search for isomorphic subgraphs is typically much
shorter with the vf2++ node ordering heuristic so we don't need to spend
as much time looking for alternative subgraphs.

* Move 2q peephole outside of optimization loop in O2

Due to potential instability in the 2q peephole optimization we run we
were using the `MinimumPoint` pass to provide backtracking when we reach
a local minimum. However, this pass adds a significant amount of
overhead because it deep copies the circuit at every iteration of the
optimization loop that improves the output quality. This commit tweaks
the O2 pass manager construction to only run 2q peephole once, and then
updates the optimization loop to be what the previous O2 optimization
loop was.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog performance Rust This PR or issue is related to Rust code in the repository synthesis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port TwoQubitWeylDecomposition to Rust
6 participants