Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Rust gates for 2q unitary synthesis #12740

Merged
merged 3 commits into from
Jul 23, 2024

Conversation

mtreinish
Copy link
Member

Summary

This commit builds off of what #12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling TwoQubitWeylDecomposition.circuit() or or TwoQubitBasisDecomposer.__call__ without the use_dag flag) the entire circuit is generated in Rust and returned to Python.

Details and comments

This commit builds off of what Qiskit#12650 did for the 1q decomposer and
moves to using rust gates for the 2q decomposer too. This means that the
circuit sequence generation is using rust's StandardGate representation
directly instead of relying on mapping strings. For places where
circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or
or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the
entire circuit is generated in Rust and returned to Python.
@mtreinish mtreinish added performance Changelog: None Do not include in changelog synthesis Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jul 8, 2024
@mtreinish mtreinish added this to the 1.2.0 milestone Jul 8, 2024
@mtreinish mtreinish requested review from alexanderivrii, ShellyGarion and a team as code owners July 8, 2024 21:13
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @levbishop
  • @mtreinish

@coveralls
Copy link

coveralls commented Jul 8, 2024

Pull Request Test Coverage Report for Build 10066245198

Details

  • 154 of 156 (98.72%) changed or added relevant lines in 8 files are covered.
  • 6 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.03%) to 89.944%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/two_qubit_decompose.rs 113 114 99.12%
crates/circuit/src/operations.rs 1 2 50.0%
Files with Coverage Reduction New Missed Lines %
crates/accelerate/src/two_qubit_decompose.rs 1 90.25%
crates/qasm2/src/lex.rs 5 92.62%
Totals Coverage Status
Change from base Build 10065639500: 0.03%
Covered Lines: 65846
Relevant Lines: 73208

💛 - Coveralls

@jakelishman
Copy link
Member

Note: this will conflict with #12730.

@mtreinish
Copy link
Member Author

mtreinish commented Jul 11, 2024

I wrote a little script to test 2q unitary synthesis performance:

import statistics
import timefrom qiskit.synthesis.two_qubit import TwoQubitBasisDecomposer
from qiskit.circuit.library import CXGate
from qiskit.quantum_info import random_unitarydecomp = TwoQubitBasisDecomposer(CXGate(), euler_basis="ZSXX")
unitaries = [random_unitary(4) for _ in range(10000)]
runtimes = []
for mat in unitaries:
    start = time.perf_counter()
    decomp(mat)
    stop = time.perf_counter()
    runtimes.append(stop - start)
print(f"Mean runtime per unitary: {statistics.geometric_mean(runtimes)} sec.")
print(f"Total runtime for 10k unitary matrices: {sum(runtimes)} sec.")

With 1.1.1 I ran this script on a m1 max mbp with Python 3.12 and it returned:

Mean runtime per unitary: 0.00014786494136577558 sec.
Total runtime for 10k unitary matrices: 1.504167526960373 sec.

Then with this PR applied on the same configuration this script returned:

Mean runtime per unitary: 3.408156984749889e-05 sec.
Total runtime for 10k unitary matrices: 0.361447433475405 sec.

EDIT: Just to showcase how far we've come in Qiskit 1.0.2 running this script returned:

Mean runtime per unitary: 0.00175304350427186 sec.
Total runtime for 10k unitary matrices: 17.648815331980586 sec.

ElePT
ElePT previously approved these changes Jul 15, 2024
Copy link
Contributor

@ElePT ElePT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken a look and up to what I can see, it makes sense. I am not sure if you'd prefer merging this before or after #12730, so I will not add it to the queue just in case.

let euler_basis: EulerBasis = match euler_basis {
Some(basis) => EulerBasis::__new__(basis.deref())?,
None => self.default_euler_basis,
};
let target_1q_basis_list: Vec<EulerBasis> = vec![euler_basis];

let mut gate_sequence = Vec::new();
let mut gate_sequence: WeylCircuitSequence = Vec::with_capacity(21);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the capacity specifically 21?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An arbitrary 2q decomposition is 3 CX plus local operations, which with single-rotation Eulers is up to 3 gates per local per qubit. Naively, this can run to 27 gates (3x CX interspersed into 4x 6 1q local operations), but maybe there's a way through the decomposition that guarantees that the last set of 6 is never needed? I don't have the maths off the top of my head.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I counted it out as 21 for the two qubit decomposer: https://github.com/Qiskit/qiskit/blob/main/crates/accelerate/src/two_qubit_decompose.rs#L1836-L1842 (but might have miscounted) and was just reusing that number here.

@mtreinish
Copy link
Member Author

I have taken a look and up to what I can see, it makes sense. I am not sure if you'd prefer merging this before or after #12730, so I will not add it to the queue just in case.

#12730 has merged now and I've rebased this on top of it now.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally fine to merge for me (as Elena had already reviewed) - just one minor question that's skippable.

Comment on lines 430 to 450
fn weyl_gate(
&self,
simplify: bool,
sequence: &mut TwoQubitSequenceVec,
sequence: &mut WeylCircuitSequence,
atol: f64,
global_phase: &mut f64,
) {
match self.specialization {
Specialization::MirrorControlledEquiv => {
sequence.push(("swap".to_string(), SmallVec::new(), smallvec![0, 1]));
sequence.push((
"rzz".to_string(),
smallvec![(PI4 - self.c) * 2.],
smallvec![0, 1],
StandardGate::SwapGate,
SmallVec::new(),
smallvec![Qubit(0), Qubit(1)],
));
sequence.push((
StandardGate::RZZGate,
smallvec![Param::Float((PI4 - self.c) * 2.)],
smallvec![Qubit(0), Qubit(1)],
));
*global_phase += PI4
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind too much, but is there anything stopping us from skipping the WeylCircuitSequence intermediate allocations, and just directly building a CircuitData?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I think we could use CircuitData directly here. I don't remember what my thinking was exactly on using the intermediate type here. Maybe I just wanted to use CircuitData::from_standard_gates()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always just look at that in a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this locally, but let's do this in a follow-up because to do it well I needed to add helper methods to CircuitData to make it easier to manipulate from rust space (unless I missed methods to do the same already) and think it would be good to review those interfaces in isolation (I'm not keen on the clbit interner usage I have there):

diff --git a/crates/accelerate/src/two_qubit_decompose.rs b/crates/accelerate/src/two_qubit_decompose.rs
index ed4d32bd3..3a0c395ac 100644
--- a/crates/accelerate/src/two_qubit_decompose.rs
+++ b/crates/accelerate/src/two_qubit_decompose.rs
@@ -398,8 +398,6 @@ impl Specialization {
     }
 }
 
-type WeylCircuitSequence = Vec<(StandardGate, SmallVec<[Param; 3]>, SmallVec<[Qubit; 2]>)>;
-
 #[derive(Clone, Debug)]
 #[allow(non_snake_case)]
 #[pyclass(module = "qiskit._accelerate.two_qubit_decompose", subclass)]
@@ -430,56 +428,57 @@ impl TwoQubitWeylDecomposition {
     fn weyl_gate(
         &self,
         simplify: bool,
-        sequence: &mut WeylCircuitSequence,
+        sequence: &mut CircuitData,
         atol: f64,
         global_phase: &mut f64,
-    ) {
+    ) -> PyResult<()> {
         match self.specialization {
             Specialization::MirrorControlledEquiv => {
-                sequence.push((
+                sequence.push_standard_gate(
                     StandardGate::SwapGate,
                     SmallVec::new(),
                     smallvec![Qubit(0), Qubit(1)],
-                ));
-                sequence.push((
+                )?;
+                sequence.push_standard_gate(
                     StandardGate::RZZGate,
                     smallvec![Param::Float((PI4 - self.c) * 2.)],
                     smallvec![Qubit(0), Qubit(1)],
-                ));
+                )?;
                 *global_phase += PI4
             }
             Specialization::SWAPEquiv => {
-                sequence.push((
+                sequence.push_standard_gate(
                     StandardGate::SwapGate,
                     SmallVec::new(),
                     smallvec![Qubit(0), Qubit(1)],
-                ));
+                )?;
                 *global_phase -= 3. * PI / 4.
             }
             _ => {
                 if !simplify || self.a.abs() > atol {
-                    sequence.push((
+                    sequence.push_standard_gate(
                         StandardGate::RXXGate,
                         smallvec![Param::Float(-self.a * 2.)],
                         smallvec![Qubit(0), Qubit(1)],
-                    ));
+                    )?;
                 }
                 if !simplify || self.b.abs() > atol {
-                    sequence.push((
+                    sequence.push_standard_gate(
                         StandardGate::RYYGate,
                         smallvec![Param::Float(-self.b * 2.)],
                         smallvec![Qubit(0), Qubit(1)],
-                    ));
+                    )?;
                 }
                 if !simplify || self.c.abs() > atol {
-                    sequence.push((
+                    sequence.push_standard_gate(
                         StandardGate::RZZGate,
                         smallvec![Param::Float(-self.c * 2.)],
                         smallvec![Qubit(0), Qubit(1)],
-                    ));
+                    )?;
                 }
             }
         }
+        Ok(())
     }
 }
 
@@ -1059,7 +1058,7 @@ impl TwoQubitWeylDecomposition {
         };
         let target_1q_basis_list: Vec<EulerBasis> = vec![euler_basis];
 
-        let mut gate_sequence: WeylCircuitSequence = Vec::with_capacity(21);
+        let mut gate_sequence = CircuitData::with_capacity(py, 2, 21, Param::Float(0.))?;
         let mut global_phase: f64 = self.global_phase;
 
         let c2r = unitary_to_gate_sequence_inner(
@@ -1072,11 +1071,11 @@ impl TwoQubitWeylDecomposition {
         )
         .unwrap();
         for gate in c2r.gates {
-            gate_sequence.push((
+            gate_sequence.push_standard_gate(
                 gate.0,
                 gate.1.into_iter().map(Param::Float).collect(),
                 smallvec![Qubit(0)],
-            ))
+            )?
         }
         global_phase += c2r.global_phase;
         let c2l = unitary_to_gate_sequence_inner(
@@ -1089,11 +1088,11 @@ impl TwoQubitWeylDecomposition {
         )
         .unwrap();
         for gate in c2l.gates {
-            gate_sequence.push((
+            gate_sequence.push_standard_gate(
                 gate.0,
                 gate.1.into_iter().map(Param::Float).collect(),
                 smallvec![Qubit(1)],
-            ))
+            )?
         }
         global_phase += c2l.global_phase;
         self.weyl_gate(
@@ -1101,7 +1100,7 @@ impl TwoQubitWeylDecomposition {
             &mut gate_sequence,
             atol.unwrap_or(ANGLE_ZERO_EPSILON),
             &mut global_phase,
-        );
+        )?;
         let c1r = unitary_to_gate_sequence_inner(
             self.K1r.view(),
             &target_1q_basis_list,
@@ -1112,11 +1111,11 @@ impl TwoQubitWeylDecomposition {
         )
         .unwrap();
         for gate in c1r.gates {
-            gate_sequence.push((
+            gate_sequence.push_standard_gate(
                 gate.0,
                 gate.1.into_iter().map(Param::Float).collect(),
                 smallvec![Qubit(0)],
-            ))
+            )?
         }
         global_phase += c2r.global_phase;
         let c1l = unitary_to_gate_sequence_inner(
@@ -1129,13 +1128,14 @@ impl TwoQubitWeylDecomposition {
         )
         .unwrap();
         for gate in c1l.gates {
-            gate_sequence.push((
+            gate_sequence.push_standard_gate(
                 gate.0,
                 gate.1.into_iter().map(Param::Float).collect(),
                 smallvec![Qubit(1)],
-            ))
+            )?
         }
-        CircuitData::from_standard_gates(py, 2, gate_sequence, Param::Float(global_phase))
+        gate_sequence.global_phase = Param::Float(global_phase);
+        Ok(gate_sequence)
     }
 }
 
diff --git a/crates/circuit/src/circuit_data.rs b/crates/circuit/src/circuit_data.rs
index a325ca4e1..2822fef75 100644
--- a/crates/circuit/src/circuit_data.rs
+++ b/crates/circuit/src/circuit_data.rs
@@ -96,7 +96,7 @@ pub struct CircuitData {
     clbits: BitData<Clbit>,
     param_table: ParamTable,
     #[pyo3(get)]
-    global_phase: Param,
+    pub global_phase: Param,
 }
 
 impl CircuitData {
@@ -127,22 +127,8 @@ impl CircuitData {
         I: IntoIterator<Item = (StandardGate, SmallVec<[Param; 3]>, SmallVec<[Qubit; 2]>)>,
     {
         let instruction_iter = instructions.into_iter();
-        let mut res = CircuitData {
-            data: Vec::with_capacity(instruction_iter.size_hint().0),
-            qargs_interner: IndexedInterner::new(),
-            cargs_interner: IndexedInterner::new(),
-            qubits: BitData::new(py, "qubits".to_string()),
-            clbits: BitData::new(py, "clbits".to_string()),
-            param_table: ParamTable::new(),
-            global_phase,
-        };
-        if num_qubits > 0 {
-            let qubit_cls = QUBIT.get_bound(py);
-            for _i in 0..num_qubits {
-                let bit = qubit_cls.call0()?;
-                res.add_qubit(py, &bit, true)?;
-            }
-        }
+        let mut res =
+            Self::with_capacity(py, num_qubits, instruction_iter.size_hint().0, global_phase)?;
         let no_clbit_index = (&mut res.cargs_interner)
             .intern(InternerKey::Value(Vec::new()))?
             .index;
@@ -164,6 +150,58 @@ impl CircuitData {
         Ok(res)
     }
 
+    /// Build an empty CircuitData object with an initially allocated instruction capacity
+    pub fn with_capacity(
+        py: Python,
+        num_qubits: u32,
+        instruction_capacity: usize,
+        global_phase: Param,
+    ) -> PyResult<Self> {
+        let mut res = CircuitData {
+            data: Vec::with_capacity(instruction_capacity),
+            qargs_interner: IndexedInterner::new(),
+            cargs_interner: IndexedInterner::new(),
+            qubits: BitData::new(py, "qubits".to_string()),
+            clbits: BitData::new(py, "clbits".to_string()),
+            param_table: ParamTable::new(),
+            global_phase,
+        };
+        if num_qubits > 0 {
+            let qubit_cls = QUBIT.get_bound(py);
+            for _i in 0..num_qubits {
+                let bit = qubit_cls.call0()?;
+                res.add_qubit(py, &bit, true)?;
+            }
+        }
+        Ok(res)
+    }
+
+    /// Append a standard gate to this CircuitData
+    pub fn push_standard_gate(
+        &mut self,
+        operation: StandardGate,
+        params: SmallVec<[Param; 3]>,
+        qargs: SmallVec<[Qubit; 2]>,
+    ) -> PyResult<()> {
+        let no_clbit_index = (&mut self.cargs_interner)
+            .intern(InternerKey::Value(Vec::new()))?
+            .index;
+        let params = (!params.is_empty()).then(|| Box::new(params));
+        let qubits = (&mut self.qargs_interner)
+            .intern(InternerKey::Value(qargs.to_vec()))?
+            .index;
+        self.data.push(PackedInstruction {
+            op: operation.into(),
+            qubits,
+            clbits: no_clbit_index,
+            params,
+            extra_attrs: None,
+            #[cfg(feature = "cache_pygates")]
+            py_op: RefCell::new(None),
+        });
+        Ok(())
+    }
+
     fn handle_manual_params(
         &mut self,
         py: Python,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's follow up. I've tagged for merge.

@jakelishman jakelishman added this pull request to the merge queue Jul 23, 2024
Merged via the queue into Qiskit:main with commit e362da5 Jul 23, 2024
15 checks passed
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 24, 2024
In the recently merged Qiskit#12740 a path was added for constructing the
`CircuitData` in rust when synthesizing to a `QuantumCircuit`
directly. However, in that PR this was done through a layer of
indirection by first collecting the gates into an intermediate `Vec`
and then passing an iterator of that into
`CircuitData::from_standard_gates()`. However this resulted in an
unecessary allocation for two `Vec`s and it would have been better
to just directly construct the `CircuitData` object directly. However,
to accomplish this we needed more rust space methods for creating
and manipulating a `CircuitData` object as it's mostly being constructed
from Python space or using `CircuitData::from_standard_gates()` with
an iterator so far. This commit makes those additions and then updates
the `TwoQubitWeylDecomposition` code to directly construct the
`CircuitData` object instead of using an intermediate `Vec`.
ElePT pushed a commit to mtreinish/qiskit-core that referenced this pull request Jul 24, 2024
* Use Rust gates for 2q unitary synthesis

This commit builds off of what Qiskit#12650 did for the 1q decomposer and
moves to using rust gates for the 2q decomposer too. This means that the
circuit sequence generation is using rust's StandardGate representation
directly instead of relying on mapping strings. For places where
circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or
or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the
entire circuit is generated in Rust and returned to Python.

* Run cargo fmt and black post rebase
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Use Rust gates for 2q unitary synthesis

This commit builds off of what Qiskit#12650 did for the 1q decomposer and
moves to using rust gates for the 2q decomposer too. This means that the
circuit sequence generation is using rust's StandardGate representation
directly instead of relying on mapping strings. For places where
circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or
or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the
entire circuit is generated in Rust and returned to Python.

* Run cargo fmt and black post rebase
github-merge-queue bot pushed a commit that referenced this pull request Aug 8, 2024
* Directly construct CircuitData in TwoQubitWeylDecomposition

In the recently merged #12740 a path was added for constructing the
`CircuitData` in rust when synthesizing to a `QuantumCircuit`
directly. However, in that PR this was done through a layer of
indirection by first collecting the gates into an intermediate `Vec`
and then passing an iterator of that into
`CircuitData::from_standard_gates()`. However this resulted in an
unecessary allocation for two `Vec`s and it would have been better
to just directly construct the `CircuitData` object directly. However,
to accomplish this we needed more rust space methods for creating
and manipulating a `CircuitData` object as it's mostly being constructed
from Python space or using `CircuitData::from_standard_gates()` with
an iterator so far. This commit makes those additions and then updates
the `TwoQubitWeylDecomposition` code to directly construct the
`CircuitData` object instead of using an intermediate `Vec`.

* Use set_global_phase instead of direct assignment

* Use slice inputs for push_standard_gate() params and qargs
@mtreinish mtreinish deleted the use-rust-gates-unitary-synth branch September 13, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository synthesis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants