-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Rust gates for 2q unitary synthesis #12740
Use Rust gates for 2q unitary synthesis #12740
Conversation
This commit builds off of what Qiskit#12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the entire circuit is generated in Rust and returned to Python.
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 10066245198Details
💛 - Coveralls |
Note: this will conflict with #12730. |
I wrote a little script to test 2q unitary synthesis performance: import statistics
import time
from qiskit.synthesis.two_qubit import TwoQubitBasisDecomposer
from qiskit.circuit.library import CXGate
from qiskit.quantum_info import random_unitary
decomp = TwoQubitBasisDecomposer(CXGate(), euler_basis="ZSXX")
unitaries = [random_unitary(4) for _ in range(10000)]
runtimes = []
for mat in unitaries:
start = time.perf_counter()
decomp(mat)
stop = time.perf_counter()
runtimes.append(stop - start)
print(f"Mean runtime per unitary: {statistics.geometric_mean(runtimes)} sec.")
print(f"Total runtime for 10k unitary matrices: {sum(runtimes)} sec.") With 1.1.1 I ran this script on a m1 max mbp with Python 3.12 and it returned:
Then with this PR applied on the same configuration this script returned:
EDIT: Just to showcase how far we've come in Qiskit 1.0.2 running this script returned:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken a look and up to what I can see, it makes sense. I am not sure if you'd prefer merging this before or after #12730, so I will not add it to the queue just in case.
let euler_basis: EulerBasis = match euler_basis { | ||
Some(basis) => EulerBasis::__new__(basis.deref())?, | ||
None => self.default_euler_basis, | ||
}; | ||
let target_1q_basis_list: Vec<EulerBasis> = vec![euler_basis]; | ||
|
||
let mut gate_sequence = Vec::new(); | ||
let mut gate_sequence: WeylCircuitSequence = Vec::with_capacity(21); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the capacity specifically 21?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An arbitrary 2q decomposition is 3 CX plus local operations, which with single-rotation Eulers is up to 3 gates per local per qubit. Naively, this can run to 27 gates (3x CX interspersed into 4x 6 1q local operations), but maybe there's a way through the decomposition that guarantees that the last set of 6 is never needed? I don't have the maths off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I counted it out as 21 for the two qubit decomposer: https://github.com/Qiskit/qiskit/blob/main/crates/accelerate/src/two_qubit_decompose.rs#L1836-L1842 (but might have miscounted) and was just reusing that number here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally fine to merge for me (as Elena had already reviewed) - just one minor question that's skippable.
fn weyl_gate( | ||
&self, | ||
simplify: bool, | ||
sequence: &mut TwoQubitSequenceVec, | ||
sequence: &mut WeylCircuitSequence, | ||
atol: f64, | ||
global_phase: &mut f64, | ||
) { | ||
match self.specialization { | ||
Specialization::MirrorControlledEquiv => { | ||
sequence.push(("swap".to_string(), SmallVec::new(), smallvec![0, 1])); | ||
sequence.push(( | ||
"rzz".to_string(), | ||
smallvec![(PI4 - self.c) * 2.], | ||
smallvec![0, 1], | ||
StandardGate::SwapGate, | ||
SmallVec::new(), | ||
smallvec![Qubit(0), Qubit(1)], | ||
)); | ||
sequence.push(( | ||
StandardGate::RZZGate, | ||
smallvec![Param::Float((PI4 - self.c) * 2.)], | ||
smallvec![Qubit(0), Qubit(1)], | ||
)); | ||
*global_phase += PI4 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind too much, but is there anything stopping us from skipping the WeylCircuitSequence
intermediate allocations, and just directly building a CircuitData
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I think we could use CircuitData
directly here. I don't remember what my thinking was exactly on using the intermediate type here. Maybe I just wanted to use CircuitData::from_standard_gates()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always just look at that in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this locally, but let's do this in a follow-up because to do it well I needed to add helper methods to CircuitData
to make it easier to manipulate from rust space (unless I missed methods to do the same already) and think it would be good to review those interfaces in isolation (I'm not keen on the clbit interner usage I have there):
diff --git a/crates/accelerate/src/two_qubit_decompose.rs b/crates/accelerate/src/two_qubit_decompose.rs
index ed4d32bd3..3a0c395ac 100644
--- a/crates/accelerate/src/two_qubit_decompose.rs
+++ b/crates/accelerate/src/two_qubit_decompose.rs
@@ -398,8 +398,6 @@ impl Specialization {
}
}
-type WeylCircuitSequence = Vec<(StandardGate, SmallVec<[Param; 3]>, SmallVec<[Qubit; 2]>)>;
-
#[derive(Clone, Debug)]
#[allow(non_snake_case)]
#[pyclass(module = "qiskit._accelerate.two_qubit_decompose", subclass)]
@@ -430,56 +428,57 @@ impl TwoQubitWeylDecomposition {
fn weyl_gate(
&self,
simplify: bool,
- sequence: &mut WeylCircuitSequence,
+ sequence: &mut CircuitData,
atol: f64,
global_phase: &mut f64,
- ) {
+ ) -> PyResult<()> {
match self.specialization {
Specialization::MirrorControlledEquiv => {
- sequence.push((
+ sequence.push_standard_gate(
StandardGate::SwapGate,
SmallVec::new(),
smallvec![Qubit(0), Qubit(1)],
- ));
- sequence.push((
+ )?;
+ sequence.push_standard_gate(
StandardGate::RZZGate,
smallvec![Param::Float((PI4 - self.c) * 2.)],
smallvec![Qubit(0), Qubit(1)],
- ));
+ )?;
*global_phase += PI4
}
Specialization::SWAPEquiv => {
- sequence.push((
+ sequence.push_standard_gate(
StandardGate::SwapGate,
SmallVec::new(),
smallvec![Qubit(0), Qubit(1)],
- ));
+ )?;
*global_phase -= 3. * PI / 4.
}
_ => {
if !simplify || self.a.abs() > atol {
- sequence.push((
+ sequence.push_standard_gate(
StandardGate::RXXGate,
smallvec![Param::Float(-self.a * 2.)],
smallvec![Qubit(0), Qubit(1)],
- ));
+ )?;
}
if !simplify || self.b.abs() > atol {
- sequence.push((
+ sequence.push_standard_gate(
StandardGate::RYYGate,
smallvec![Param::Float(-self.b * 2.)],
smallvec![Qubit(0), Qubit(1)],
- ));
+ )?;
}
if !simplify || self.c.abs() > atol {
- sequence.push((
+ sequence.push_standard_gate(
StandardGate::RZZGate,
smallvec![Param::Float(-self.c * 2.)],
smallvec![Qubit(0), Qubit(1)],
- ));
+ )?;
}
}
}
+ Ok(())
}
}
@@ -1059,7 +1058,7 @@ impl TwoQubitWeylDecomposition {
};
let target_1q_basis_list: Vec<EulerBasis> = vec![euler_basis];
- let mut gate_sequence: WeylCircuitSequence = Vec::with_capacity(21);
+ let mut gate_sequence = CircuitData::with_capacity(py, 2, 21, Param::Float(0.))?;
let mut global_phase: f64 = self.global_phase;
let c2r = unitary_to_gate_sequence_inner(
@@ -1072,11 +1071,11 @@ impl TwoQubitWeylDecomposition {
)
.unwrap();
for gate in c2r.gates {
- gate_sequence.push((
+ gate_sequence.push_standard_gate(
gate.0,
gate.1.into_iter().map(Param::Float).collect(),
smallvec![Qubit(0)],
- ))
+ )?
}
global_phase += c2r.global_phase;
let c2l = unitary_to_gate_sequence_inner(
@@ -1089,11 +1088,11 @@ impl TwoQubitWeylDecomposition {
)
.unwrap();
for gate in c2l.gates {
- gate_sequence.push((
+ gate_sequence.push_standard_gate(
gate.0,
gate.1.into_iter().map(Param::Float).collect(),
smallvec![Qubit(1)],
- ))
+ )?
}
global_phase += c2l.global_phase;
self.weyl_gate(
@@ -1101,7 +1100,7 @@ impl TwoQubitWeylDecomposition {
&mut gate_sequence,
atol.unwrap_or(ANGLE_ZERO_EPSILON),
&mut global_phase,
- );
+ )?;
let c1r = unitary_to_gate_sequence_inner(
self.K1r.view(),
&target_1q_basis_list,
@@ -1112,11 +1111,11 @@ impl TwoQubitWeylDecomposition {
)
.unwrap();
for gate in c1r.gates {
- gate_sequence.push((
+ gate_sequence.push_standard_gate(
gate.0,
gate.1.into_iter().map(Param::Float).collect(),
smallvec![Qubit(0)],
- ))
+ )?
}
global_phase += c2r.global_phase;
let c1l = unitary_to_gate_sequence_inner(
@@ -1129,13 +1128,14 @@ impl TwoQubitWeylDecomposition {
)
.unwrap();
for gate in c1l.gates {
- gate_sequence.push((
+ gate_sequence.push_standard_gate(
gate.0,
gate.1.into_iter().map(Param::Float).collect(),
smallvec![Qubit(1)],
- ))
+ )?
}
- CircuitData::from_standard_gates(py, 2, gate_sequence, Param::Float(global_phase))
+ gate_sequence.global_phase = Param::Float(global_phase);
+ Ok(gate_sequence)
}
}
diff --git a/crates/circuit/src/circuit_data.rs b/crates/circuit/src/circuit_data.rs
index a325ca4e1..2822fef75 100644
--- a/crates/circuit/src/circuit_data.rs
+++ b/crates/circuit/src/circuit_data.rs
@@ -96,7 +96,7 @@ pub struct CircuitData {
clbits: BitData<Clbit>,
param_table: ParamTable,
#[pyo3(get)]
- global_phase: Param,
+ pub global_phase: Param,
}
impl CircuitData {
@@ -127,22 +127,8 @@ impl CircuitData {
I: IntoIterator<Item = (StandardGate, SmallVec<[Param; 3]>, SmallVec<[Qubit; 2]>)>,
{
let instruction_iter = instructions.into_iter();
- let mut res = CircuitData {
- data: Vec::with_capacity(instruction_iter.size_hint().0),
- qargs_interner: IndexedInterner::new(),
- cargs_interner: IndexedInterner::new(),
- qubits: BitData::new(py, "qubits".to_string()),
- clbits: BitData::new(py, "clbits".to_string()),
- param_table: ParamTable::new(),
- global_phase,
- };
- if num_qubits > 0 {
- let qubit_cls = QUBIT.get_bound(py);
- for _i in 0..num_qubits {
- let bit = qubit_cls.call0()?;
- res.add_qubit(py, &bit, true)?;
- }
- }
+ let mut res =
+ Self::with_capacity(py, num_qubits, instruction_iter.size_hint().0, global_phase)?;
let no_clbit_index = (&mut res.cargs_interner)
.intern(InternerKey::Value(Vec::new()))?
.index;
@@ -164,6 +150,58 @@ impl CircuitData {
Ok(res)
}
+ /// Build an empty CircuitData object with an initially allocated instruction capacity
+ pub fn with_capacity(
+ py: Python,
+ num_qubits: u32,
+ instruction_capacity: usize,
+ global_phase: Param,
+ ) -> PyResult<Self> {
+ let mut res = CircuitData {
+ data: Vec::with_capacity(instruction_capacity),
+ qargs_interner: IndexedInterner::new(),
+ cargs_interner: IndexedInterner::new(),
+ qubits: BitData::new(py, "qubits".to_string()),
+ clbits: BitData::new(py, "clbits".to_string()),
+ param_table: ParamTable::new(),
+ global_phase,
+ };
+ if num_qubits > 0 {
+ let qubit_cls = QUBIT.get_bound(py);
+ for _i in 0..num_qubits {
+ let bit = qubit_cls.call0()?;
+ res.add_qubit(py, &bit, true)?;
+ }
+ }
+ Ok(res)
+ }
+
+ /// Append a standard gate to this CircuitData
+ pub fn push_standard_gate(
+ &mut self,
+ operation: StandardGate,
+ params: SmallVec<[Param; 3]>,
+ qargs: SmallVec<[Qubit; 2]>,
+ ) -> PyResult<()> {
+ let no_clbit_index = (&mut self.cargs_interner)
+ .intern(InternerKey::Value(Vec::new()))?
+ .index;
+ let params = (!params.is_empty()).then(|| Box::new(params));
+ let qubits = (&mut self.qargs_interner)
+ .intern(InternerKey::Value(qargs.to_vec()))?
+ .index;
+ self.data.push(PackedInstruction {
+ op: operation.into(),
+ qubits,
+ clbits: no_clbit_index,
+ params,
+ extra_attrs: None,
+ #[cfg(feature = "cache_pygates")]
+ py_op: RefCell::new(None),
+ });
+ Ok(())
+ }
+
fn handle_manual_params(
&mut self,
py: Python,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's follow up. I've tagged for merge.
In the recently merged Qiskit#12740 a path was added for constructing the `CircuitData` in rust when synthesizing to a `QuantumCircuit` directly. However, in that PR this was done through a layer of indirection by first collecting the gates into an intermediate `Vec` and then passing an iterator of that into `CircuitData::from_standard_gates()`. However this resulted in an unecessary allocation for two `Vec`s and it would have been better to just directly construct the `CircuitData` object directly. However, to accomplish this we needed more rust space methods for creating and manipulating a `CircuitData` object as it's mostly being constructed from Python space or using `CircuitData::from_standard_gates()` with an iterator so far. This commit makes those additions and then updates the `TwoQubitWeylDecomposition` code to directly construct the `CircuitData` object instead of using an intermediate `Vec`.
* Use Rust gates for 2q unitary synthesis This commit builds off of what Qiskit#12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the entire circuit is generated in Rust and returned to Python. * Run cargo fmt and black post rebase
* Use Rust gates for 2q unitary synthesis This commit builds off of what Qiskit#12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling `TwoQubitWeylDecomposition.circuit()` or or `TwoQubitBasisDecomposer.__call__` without the `use_dag` flag) the entire circuit is generated in Rust and returned to Python. * Run cargo fmt and black post rebase
* Directly construct CircuitData in TwoQubitWeylDecomposition In the recently merged #12740 a path was added for constructing the `CircuitData` in rust when synthesizing to a `QuantumCircuit` directly. However, in that PR this was done through a layer of indirection by first collecting the gates into an intermediate `Vec` and then passing an iterator of that into `CircuitData::from_standard_gates()`. However this resulted in an unecessary allocation for two `Vec`s and it would have been better to just directly construct the `CircuitData` object directly. However, to accomplish this we needed more rust space methods for creating and manipulating a `CircuitData` object as it's mostly being constructed from Python space or using `CircuitData::from_standard_gates()` with an iterator so far. This commit makes those additions and then updates the `TwoQubitWeylDecomposition` code to directly construct the `CircuitData` object instead of using an intermediate `Vec`. * Use set_global_phase instead of direct assignment * Use slice inputs for push_standard_gate() params and qargs
Summary
This commit builds off of what #12650 did for the 1q decomposer and moves to using rust gates for the 2q decomposer too. This means that the circuit sequence generation is using rust's StandardGate representation directly instead of relying on mapping strings. For places where circuits are generated (calling
TwoQubitWeylDecomposition.circuit()
or orTwoQubitBasisDecomposer.__call__
without theuse_dag
flag) the entire circuit is generated in Rust and returned to Python.Details and comments