Use rust gates for Optimize1QGatesDecomposition #12650

mtreinish · 2024-06-24T21:10:41Z

Summary

This commit moves to using rust gates for the Optimize1QGatesDecomposition transpiler pass. It takes in a sequence of runs (which are a list of DAGOpNodes) from the python side of the transpiler pass which are generated from DAGCircuit.collect_1q_runs() (which in the future should be moved to rust after #12550 merges). The rust portion of the pass now iterates over each run, performs the matrix multiplication to compute the unitary of the run, then synthesizes that unitary, computes the estimated error of the circuit synthesis and returns a tuple of the circuit sequence in terms of rust StandardGate enums. The python portion of the code then takes those sequences and does inplace substitution of each run with the sequence returned from rust.

Once #12550 merges we should be able to move the input collect_1q_runs() call and perform the output node substitions in rust making the full pass execute in the rust domain without any python interaction.

Additionally, the OneQubitEulerDecomposer class is updated to use rust for circuit generation instead of doing this python side. The internal changes done to use rust gates in the transpiler pass meant we were half way to this already by emitting rust StandardGates instead of python gate objects. The dag handling is still done in Python however until #12550 merges.

This also includes an implementation of the r gate, I temporarily added this to unblock this effort as it was the only gate missing needed to complete this. We can rebase this if a standalone implementation of the gate merges before this. Rebased on top of: #12662

Details and comments

TODO:

Investigate multithreading across runs. This might not make sense until [DAGCircuit Oxidation] Port DAGCircuit to Rust #12550 merges so we can do a parallel bridge iterator over runs and avoid having to collect runs generated in python

This commit moves to using rust gates for the Optimize1QGatesDecomposition transpiler pass. It takes in a sequence of runs (which are a list of DAGOpNodes) from the python side of the transpiler pass which are generated from DAGCircuit.collect_1q_runs() (which in the future should be moved to rust after Qiskit#12550 merges). The rust portion of the pass now iterates over each run, performs the matrix multiplication to compute the unitary of the run, then synthesizes that unitary, computes the estimated error of the circuit synthesis and returns a tuple of the circuit sequence in terms of rust StandardGate enums. The python portion of the code then takes those sequences and does inplace substitution of each run with the sequence returned from rust. Once Qiskit#12550 merges we should be able to move the input collect_1q_runs() call and perform the output node substitions in rust making the full pass execute in the rust domain without any python interaction. Additionally, the OneQubitEulerDecomposer class is updated to use rust for circuit generation instead of doing this python side. The internal changes done to use rust gates in the transpiler pass meant we were half way to this already by emitting rust StandardGates instead of python gate objects. The dag handling is still done in Python however until Qiskit#12550 merges. This also includes an implementation of the r gate, I temporarily added this to unblock this effort as it was the only gate missing needed to complete this. We can rebase this if a standalone implementation of the gate merges before this.

qiskit-bot · 2024-06-24T21:10:46Z

One or more of the following people are relevant to this code:

@Cryoris
@Qiskit/terra-core
@ajavadia
@kevinhartman
@levbishop
@mtreinish

coveralls · 2024-06-24T21:34:58Z

Pull Request Test Coverage Report for Build 9652482316

Details

354 of 400 (88.5%) changed or added relevant lines in 9 files are covered.
18 unchanged lines in 5 files lost coverage.
Overall coverage increased (+0.04%) to 89.796%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/euler_one_qubit_decomposer.rs	218	219	99.54%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	36	38	94.74%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	2	15	13.33%
crates/circuit/src/dag_node.rs	25	39	64.1%
crates/circuit/src/operations.rs	30	46	65.22%

Files with Coverage Reduction	New Missed Lines	%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	1	95.35%
crates/qasm2/src/lex.rs	3	93.38%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	4	70.67%
crates/accelerate/src/euler_one_qubit_decomposer.rs	4	91.37%
crates/qasm2/src/parse.rs	6	97.61%

Totals
Change from base Build 9650973845:	0.04%
Covered Lines:	64062
Relevant Lines:	71342

💛 - Coveralls

qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py

Previously this PR was re-computing the target bases to synthesize with for each run found in the circuit. But in cases where there were multiple runs repeated on a qubit this was unecessary work. Prior to moving this code to rust there was already caching code to make this optimization, but the rust path short circuited around this. This commit fixes this so we're caching the target bases for each qubit and only computing it once.

coveralls · 2024-06-25T12:48:53Z

Pull Request Test Coverage Report for Build 9662366170

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

357 of 403 (88.59%) changed or added relevant lines in 9 files are covered.
20 unchanged lines in 6 files lost coverage.
Overall coverage increased (+0.03%) to 89.786%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/euler_one_qubit_decomposer.rs	217	218	99.54%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	40	42	95.24%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	2	15	13.33%
crates/circuit/src/dag_node.rs	25	39	64.1%
crates/circuit/src/operations.rs	30	46	65.22%

Files with Coverage Reduction	New Missed Lines	%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	1	95.45%
crates/qasm2/src/expr.rs	1	94.02%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	4	70.67%
crates/qasm2/src/lex.rs	4	91.86%
crates/accelerate/src/euler_one_qubit_decomposer.rs	4	91.35%
crates/qasm2/src/parse.rs	6	97.61%

Totals
Change from base Build 9650973845:	0.03%
Covered Lines:	64056
Relevant Lines:	71343

💛 - Coveralls

crates/accelerate/src/euler_one_qubit_decomposer.rs

mtreinish · 2024-06-25T16:35:47Z

I took a quick attempt at implementing this with multithreading locally and I could get it to work but it required doing extra collection of all the data from the collect runs into a second vec to strip the PyRef from each node. This added more overhead than parallelism saved in this case. I think it should be possible to investigate doing this in parallel after #12550 merges because then we can generate the runs from rust directly. For multithreading we'll likely still need to collect the runs into a vec (to avoid mutation while iterating) but we can measure if that overhead is higher than the speedup from doing the synthesis in parallel and investigate in a followup.

mtreinish · 2024-06-25T16:55:07Z

I threw together a quick benchmark to test this:

import statistics
import time

from qiskit.transpiler.passes import Optimize1qGatesDecomposition
from qiskit.circuit.random import random_circuit
from qiskit.converters import circuit_to_dag

seed = 42
n_qubits = 20
depth = 102400

qc = random_circuit(n_qubits, depth, measure=True, seed=seed)
dag = circuit_to_dag(qc)
basis_gates = ['sx', 'x', 'rz', 'cx', 'id']
opt_pass = Optimize1qGatesDecomposition(basis=basis_gates)
times = []
for i in range(10):
    start = time.perf_counter()
    opt_pass.run(dag)
    stop = time.perf_counter()
    times.append(stop - start)
print(f"Mean runtime: {statistics.geometric_mean(times)}")

and it's showing a pretty noticeable speedup over Qiskit 1.1. With this PR I got:

3.931521232593536 sec

and with Qiskit 1.1 I got:

6.184648169533495 sec

Looking at a profile with this script the majority of the time with this PR is spent in collect 1q runs and dag interactions in python.

coveralls · 2024-06-25T16:57:17Z

Pull Request Test Coverage Report for Build 9666227832

Details

374 of 420 (89.05%) changed or added relevant lines in 9 files are covered.
12 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.08%) to 89.797%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/euler_one_qubit_decomposer.rs	234	235	99.57%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	40	42	95.24%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	2	15	13.33%
crates/circuit/src/dag_node.rs	25	39	64.1%
crates/circuit/src/operations.rs	30	46	65.22%

Files with Coverage Reduction	New Missed Lines	%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	1	95.45%
crates/qasm2/src/lex.rs	3	92.62%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	4	70.67%
crates/accelerate/src/euler_one_qubit_decomposer.rs	4	91.51%

Totals
Change from base Build 9661630219:	0.08%
Covered Lines:	64079
Relevant Lines:	71360

💛 - Coveralls

coveralls · 2024-06-25T18:12:39Z

Pull Request Test Coverage Report for Build 9667036450

Details

374 of 418 (89.47%) changed or added relevant lines in 9 files are covered.
14 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.09%) to 89.805%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/euler_one_qubit_decomposer.rs	234	235	99.57%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	2	15	13.33%
crates/circuit/src/dag_node.rs	25	39	64.1%
crates/circuit/src/operations.rs	30	46	65.22%

Files with Coverage Reduction	New Missed Lines	%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	1	96.92%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	4	70.67%
crates/accelerate/src/euler_one_qubit_decomposer.rs	4	91.51%
crates/qasm2/src/lex.rs	5	92.62%

Totals
Change from base Build 9661630219:	0.09%
Covered Lines:	64083
Relevant Lines:	71358

💛 - Coveralls

mtreinish · 2024-06-25T19:19:09Z

To compare this against the current state of main I ran asv for the pass specific benchmarks:

Benchmarks that have improved:

| Change   | Before [1ed5951a] <optimize-1q-more-rust~3^2>   | After [130969a3] <optimize-1q-more-rust>   |   Ratio | Benchmark (Parameter)                                                                                         |
|----------|-------------------------------------------------|--------------------------------------------|---------|---------------------------------------------------------------------------------------------------------------|
| -        | 134±1ms                                         | 46.9±0.4ms                                 |    0.35 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
| -        | 99.3±1ms                                        | 34.0±0.2ms                                 |    0.34 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
| -        | 116±0.7ms                                       | 39.0±0.5ms                                 |    0.34 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(20, 1024, ['u', 'cx', 'id'])                    |
| -        | 85.6±0.3ms                                      | 27.9±0.1ms                                 |    0.33 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(14, 1024, ['u', 'cx', 'id'])                    |
| -        | 123±1ms                                         | 41.1±0.5ms                                 |    0.33 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
| -        | 92.0±0.5ms                                      | 29.6±0.2ms                                 |    0.32 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
| -        | 47.4±0.5ms                                      | 12.6±0.04ms                                |    0.27 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
| -        | 43.6±0.3ms                                      | 10.6±0.05ms                                |    0.24 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])  |
| -        | 40.7±0.4ms                                      | 9.31±0.06ms                                |    0.23 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['u', 'cx', 'id'])                     |

Benchmarks that have stayed the same:

| Change   | Before [1ed5951a] <optimize-1q-more-rust~3^2>   | After [130969a3] <optimize-1q-more-rust>   |   Ratio | Benchmark (Parameter)                                                                                           |
|----------|-------------------------------------------------|--------------------------------------------|---------|-----------------------------------------------------------------------------------------------------------------|
|          | 2.65±0.01s                                      | 2.66±0.02s                                 |    1    | passes.PassBenchmarks.time_optimize_swap_before_measure(20, 1024)                                               |
|          | 231±1ms                                         | 230±1ms                                    |    1    | passes.PassBenchmarks.time_optimize_swap_before_measure(5, 1024)                                                |
|          | 737±10ms                                        | 719±6ms                                    |    0.98 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 1.53±0.01s                                      | 1.51±0s                                    |    0.98 | passes.PassBenchmarks.time_optimize_swap_before_measure(14, 1024)                                               |
|          | 1.14±0.01s                                      | 1.11±0s                                    |    0.97 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 256±2ms                                         | 247±1ms                                    |    0.97 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])  |
|          | 822±10ms                                        | 778±3ms                                    |    0.95 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 746±7ms                                         | 706±2ms                                    |    0.95 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])                    |
|          | 1.26±0.01s                                      | 1.20±0s                                    |    0.95 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 1.16±0.02s                                      | 1.10±0s                                    |    0.95 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['u', 'cx', 'id'])                    |
|          | 298±2ms                                         | 277±1ms                                    |    0.93 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
|          | 260±2ms                                         | 239±1ms                                    |    0.92 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['u', 'cx', 'id'])                     |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

coveralls · 2024-06-26T10:57:32Z

Pull Request Test Coverage Report for Build 9678061360

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

375 of 419 (89.5%) changed or added relevant lines in 10 files are covered.
29 unchanged lines in 7 files lost coverage.
Overall coverage increased (+0.08%) to 89.798%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/euler_one_qubit_decomposer.rs	234	235	99.57%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	2	15	13.33%
crates/circuit/src/dag_node.rs	25	39	64.1%
crates/circuit/src/operations.rs	30	46	65.22%

Files with Coverage Reduction	New Missed Lines	%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	1	96.92%
crates/qasm2/src/expr.rs	1	94.02%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	4	70.67%
qiskit/synthesis/discrete_basis/solovay_kitaev.py	4	94.74%
crates/accelerate/src/euler_one_qubit_decomposer.rs	4	91.51%
crates/qasm2/src/lex.rs	5	92.62%
qiskit/providers/fake_provider/generic_backend_v2.py	10	94.71%

Totals
Change from base Build 9661630219:	0.08%
Covered Lines:	64084
Relevant Lines:	71365

💛 - Coveralls

This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.

jlapeyre · 2024-07-02T22:37:38Z

Another note for later. This function

qiskit/crates/accelerate/src/euler_one_qubit_decomposer.rs

Line 386 in 3fb1753

pub fn generate_circuit(

has return type PyResult<OneQubitGateSequence>. But as far as I can tell it is only called in one place, and that is in the same file. The returned value is unwrapped with no check. In fact the return value is always ok. This is so the function can be called from Python. (maybe in the future, since I don't see this called anywhere) Reading and understanding the unwrap takes a bit of time.

I'd prefer that the Python-side function (if needed) wraps a function that returns OneQubitGateSequence rather than having this special requirement of the Python interface intrude in code that is only used in this function.

This is a quick fix. But probably outside the scope of this PR.

crates/accelerate/src/euler_one_qubit_decomposer.rs

jlapeyre · 2024-07-02T22:59:03Z

I left one more suggested simplification edit.

Other than that, this looks good to go.

coveralls · 2024-07-03T00:20:20Z

Pull Request Test Coverage Report for Build 9769535246

Details

347 of 361 (96.12%) changed or added relevant lines in 7 files are covered.
20 unchanged lines in 6 files lost coverage.
Overall coverage increased (+0.06%) to 89.846%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/circuit/src/dag_node.rs	37	38	97.37%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	2	15	13.33%

Files with Coverage Reduction	New Missed Lines	%
qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py	1	96.92%
crates/qasm2/src/expr.rs	1	94.02%
qiskit/synthesis/one_qubit/one_qubit_decompose.py	4	70.67%
crates/qasm2/src/lex.rs	4	93.38%
crates/accelerate/src/euler_one_qubit_decomposer.rs	4	93.34%
crates/qasm2/src/parse.rs	6	96.69%

Totals
Change from base Build 9767176201:	0.06%
Covered Lines:	64785
Relevant Lines:	72107

💛 - Coveralls

This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.

In the recently merged Qiskit#12650 a new rust function was added for the filter function of the collect_1q_runs() method's internal filter function. As part of this we need to do exclude any non-DAGOpNode dag nodes passed to the filter function. This was previously implemented using the pyo3 extract() method so that if the pyobject can be coerced into a DAGOpNode for later use we continue but if it errors we know the input object is not a DAGOpNode and return false. However, as we don't need to return the error to python we should be using downcast instead of extract (as per the pyo3 performance guide [1]) to avoid the error conversion cost. This was an oversight in Qiskit#12650 and I only used extract() out of habit. [1] https://pyo3.rs/v0.22.0/performance#extract-versus-downcast

In the recently merged #12650 a new rust function was added for the filter function of the collect_1q_runs() method's internal filter function. As part of this we need to do exclude any non-DAGOpNode dag nodes passed to the filter function. This was previously implemented using the pyo3 extract() method so that if the pyobject can be coerced into a DAGOpNode for later use we continue but if it errors we know the input object is not a DAGOpNode and return false. However, as we don't need to return the error to python we should be using downcast instead of extract (as per the pyo3 performance guide [1]) to avoid the error conversion cost. This was an oversight in #12650 and I only used extract() out of habit. [1] https://pyo3.rs/v0.22.0/performance#extract-versus-downcast

This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.

* Use rust gates for ConsolidateBlocks This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in #12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after #12459. To that end this builds on the thread of work in the two PRs #12692 and #12701 which changed the access patterns for other passes to minimize eager gate object construction. * Add rust filter function for DAGCircuit.collect_2q_runs() * Update crates/accelerate/src/convert_2q_block_matrix.rs --------- Co-authored-by: John Lapeyre <[email protected]>

This commit builds off of what Qiskit#12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag()` flag) the entire circuit is generated in Rust and returned to Python.

This commit builds off of what Qiskit#12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the entire circuit is generated in Rust and returned to Python.

* Use Rust gates for 2q unitary synthesis This commit builds off of what #12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the entire circuit is generated in Rust and returned to Python. * Run cargo fmt and black post rebase

* Use Rust gates for 2q unitary synthesis This commit builds off of what Qiskit#12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the entire circuit is generated in Rust and returned to Python. * Run cargo fmt and black post rebase

* Use rust gates for Optimize1QGatesDecomposition This commit moves to using rust gates for the Optimize1QGatesDecomposition transpiler pass. It takes in a sequence of runs (which are a list of DAGOpNodes) from the python side of the transpiler pass which are generated from DAGCircuit.collect_1q_runs() (which in the future should be moved to rust after Qiskit#12550 merges). The rust portion of the pass now iterates over each run, performs the matrix multiplication to compute the unitary of the run, then synthesizes that unitary, computes the estimated error of the circuit synthesis and returns a tuple of the circuit sequence in terms of rust StandardGate enums. The python portion of the code then takes those sequences and does inplace substitution of each run with the sequence returned from rust. Once Qiskit#12550 merges we should be able to move the input collect_1q_runs() call and perform the output node substitions in rust making the full pass execute in the rust domain without any python interaction. Additionally, the OneQubitEulerDecomposer class is updated to use rust for circuit generation instead of doing this python side. The internal changes done to use rust gates in the transpiler pass meant we were half way to this already by emitting rust StandardGates instead of python gate objects. The dag handling is still done in Python however until Qiskit#12550 merges. This also includes an implementation of the r gate, I temporarily added this to unblock this effort as it was the only gate missing needed to complete this. We can rebase this if a standalone implementation of the gate merges before this. * Cache target decompositions for each qubit Previously this PR was re-computing the target bases to synthesize with for each run found in the circuit. But in cases where there were multiple runs repeated on a qubit this was unecessary work. Prior to moving this code to rust there was already caching code to make this optimization, but the rust path short circuited around this. This commit fixes this so we're caching the target bases for each qubit and only computing it once. * Optimize rust implementation slightly * Avoid extra allocations by inlining matrix multiplication * Remove unnecessary comment * Remove stray code block * Add import path for rust gate * Use rust gate in circuit constructor * Remove duplicated op_name getter and just use existing name getter * Apply suggestions from code review Co-authored-by: John Lapeyre <[email protected]> * Simplify construction of target_basis_vec * Fix rebase issue * Update crates/accelerate/src/euler_one_qubit_decomposer.rs Co-authored-by: John Lapeyre <[email protected]> * Update crates/accelerate/src/euler_one_qubit_decomposer.rs --------- Co-authored-by: John Lapeyre <[email protected]>

In the recently merged Qiskit#12650 a new rust function was added for the filter function of the collect_1q_runs() method's internal filter function. As part of this we need to do exclude any non-DAGOpNode dag nodes passed to the filter function. This was previously implemented using the pyo3 extract() method so that if the pyobject can be coerced into a DAGOpNode for later use we continue but if it errors we know the input object is not a DAGOpNode and return false. However, as we don't need to return the error to python we should be using downcast instead of extract (as per the pyo3 performance guide [1]) to avoid the error conversion cost. This was an oversight in Qiskit#12650 and I only used extract() out of habit. [1] https://pyo3.rs/v0.22.0/performance#extract-versus-downcast

* Use rust gates for ConsolidateBlocks This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction. * Add rust filter function for DAGCircuit.collect_2q_runs() * Update crates/accelerate/src/convert_2q_block_matrix.rs --------- Co-authored-by: John Lapeyre <[email protected]>

* Use Rust gates for 2q unitary synthesis This commit builds off of what Qiskit#12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the entire circuit is generated in Rust and returned to Python. * Run cargo fmt and black post rebase

mtreinish added performance Changelog: None Do not include in changelog synthesis Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jun 24, 2024

mtreinish added this to the 1.2.0 milestone Jun 24, 2024

mtreinish requested review from alexanderivrii, ShellyGarion and a team as code owners June 24, 2024 21:10

mtreinish changed the title ~~[WIP] Use rust gates for Optimize1QGatesDecomposition~~ Use rust gates for Optimize1QGatesDecomposition Jun 24, 2024

mtreinish commented Jun 24, 2024

View reviewed changes

qiskit/transpiler/passes/optimization/optimize_1q_decomposition.py Outdated Show resolved Hide resolved

mtreinish added 2 commits June 25, 2024 06:37

Optimize rust implementation slightly

ac38a21

mtreinish added 2 commits June 25, 2024 11:50

Avoid extra allocations by inlining matrix multiplication

cab69de

Merge branch 'main' into optimize-1q-more-rust

411c300

mtreinish commented Jun 25, 2024

View reviewed changes

crates/accelerate/src/euler_one_qubit_decomposer.rs Outdated Show resolved Hide resolved

Remove unnecessary comment

1dd4ceb

mtreinish added 2 commits June 25, 2024 13:31

Remove stray code block

515c3ff

Add import path for rust gate

130969a

Use rust gate in circuit constructor

c325ffe

jlapeyre reviewed Jul 2, 2024

View reviewed changes

crates/accelerate/src/euler_one_qubit_decomposer.rs Outdated Show resolved Hide resolved

Update crates/accelerate/src/euler_one_qubit_decomposer.rs

1b65d73

jlapeyre enabled auto-merge July 3, 2024 02:54

jlapeyre approved these changes Jul 3, 2024

View reviewed changes

jlapeyre added this pull request to the merge queue Jul 3, 2024

Merged via the queue into Qiskit:main with commit bfc69a4 Jul 3, 2024
15 checks passed

mtreinish deleted the optimize-1q-more-rust branch July 3, 2024 10:00

mtreinish mentioned this pull request Jul 3, 2024

Optimize DAGCircuit.collect_1q_runs() filter function #12719

Merged

mtreinish mentioned this pull request Jul 8, 2024

Use Rust gates for 2q unitary synthesis #12740

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use rust gates for Optimize1QGatesDecomposition #12650

Use rust gates for Optimize1QGatesDecomposition #12650

mtreinish commented Jun 24, 2024 •

edited

Loading

qiskit-bot commented Jun 24, 2024

coveralls commented Jun 24, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading

mtreinish commented Jun 25, 2024

mtreinish commented Jun 25, 2024

coveralls commented Jun 25, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading

mtreinish commented Jun 25, 2024

coveralls commented Jun 26, 2024 •

edited

Loading

jlapeyre commented Jul 2, 2024

jlapeyre commented Jul 2, 2024

coveralls commented Jul 3, 2024 •

edited

Loading

Use rust gates for Optimize1QGatesDecomposition #12650

Use rust gates for Optimize1QGatesDecomposition #12650

Conversation

mtreinish commented Jun 24, 2024 • edited Loading

Summary

Details and comments

qiskit-bot commented Jun 24, 2024

coveralls commented Jun 24, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9652482316

Details

💛 - Coveralls

coveralls commented Jun 25, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9662366170

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

mtreinish commented Jun 25, 2024

mtreinish commented Jun 25, 2024

coveralls commented Jun 25, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9666227832

Details

💛 - Coveralls

coveralls commented Jun 25, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9667036450

Details

💛 - Coveralls

mtreinish commented Jun 25, 2024

coveralls commented Jun 26, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9678061360

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

jlapeyre commented Jul 2, 2024

jlapeyre commented Jul 2, 2024

coveralls commented Jul 3, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9769535246

Details

💛 - Coveralls

mtreinish commented Jun 24, 2024 •

edited

Loading

coveralls commented Jun 24, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading

coveralls commented Jun 26, 2024 •

edited

Loading

coveralls commented Jul 3, 2024 •

edited

Loading