Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerated implementation of objective function class for approximate quantum compiling #7697

Merged
merged 23 commits into from
Oct 28, 2022
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5881389
added accelerated version of existing approximate quantum compiling m…
A-tA-v Feb 22, 2022
e4eb1cd
Merge branch 'main' into drl_albert
A-tA-v Feb 25, 2022
bca3739
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Mar 15, 2022
b2df386
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Mar 17, 2022
1cf41c3
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Mar 17, 2022
490b994
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Mar 22, 2022
bcd488e
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Mar 22, 2022
26f5e0c
cleaning up approximate quantum compiling code for the pull request
A-tA-v Mar 22, 2022
0e6a075
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Mar 22, 2022
cbe700e
further clean up of AQC code to make pylint happy
A-tA-v Mar 22, 2022
ece5a75
Merge branch 'main' into drl_albert
A-tA-v Apr 21, 2022
c3cd6be
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Apr 25, 2022
64ae333
further clean up and minor improvements
A-tA-v Apr 25, 2022
487f79c
Merge branch 'drl_albert' of https://github.com/A-tA-v/qiskit-terra i…
A-tA-v Apr 25, 2022
c5a6b80
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Apr 26, 2022
a3dee86
minor compatibility issue on Windows
A-tA-v Apr 26, 2022
45b4f74
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v May 10, 2022
df60e65
Merge branch 'main' into drl_albert
A-tA-v Jun 7, 2022
a3bf131
Merge branch 'main' of https://github.com/Qiskit/qiskit-terra into dr…
A-tA-v Jun 22, 2022
a0a1302
Merge branch 'drl_albert' of https://github.com/A-tA-v/qiskit-terra i…
A-tA-v Jun 22, 2022
31b48a4
Merge branch 'main' into drl_albert
A-tA-v Jul 28, 2022
6a1ceb0
Merge branch 'main' into drl_albert
A-tA-v Oct 11, 2022
a5480d0
Merge branch 'main' into drl_albert
mergify[bot] Oct 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions qiskit/transpiler/synthesis/aqc/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2021.
# (C) Copyright IBM 2022.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
Expand Down Expand Up @@ -37,6 +37,7 @@
CNOTUnitCircuit
CNOTUnitObjective
DefaultCNOTUnitObjective
FastCNOTUnitObjective
Mathematical Detail
Expand Down Expand Up @@ -124,7 +125,7 @@
num_qubits=num_qubits,
network_layout=layout,
connectivity_type=connectivity,
depth=depth,
depth=depth
)
# Create an optimizer to be used by AQC
Expand All @@ -143,7 +144,7 @@
aqc.compile_unitary(
target_matrix=unitary,
approximate_circuit=approximate_circuit,
approximating_objective=approximating_objective,
approximating_objective=approximating_objective
)
Now ``approximate_circuit`` is a circuit that approximates the target unitary to a certain
Expand All @@ -153,6 +154,15 @@
.. autofunction:: make_cnot_network
One can take advantage of accelerated version of objective function. It implements the same
mathematical algorithm as the default one ``DefaultCNOTUnitObjective`` but runs several times
faster. Instantiation of accelerated objective function class is similar to the default case:
# Create an objective that defines our optimization problem
approximating_objective = FastCNOTUnitObjective(num_qubits=num_qubits, cnots=cnots)
The rest of the code in the above example does not change.
References:
[1]: Liam Madden, Andrea Simonetto, Best Approximate Quantum Compiling Problems.
Expand All @@ -165,3 +175,4 @@
from .cnot_structures import make_cnot_network
from .cnot_unit_circuit import CNOTUnitCircuit
from .cnot_unit_objective import CNOTUnitObjective, DefaultCNOTUnitObjective
from .fast_gradient.fast_gradient import FastCNOTUnitObjective
21 changes: 14 additions & 7 deletions qiskit/transpiler/synthesis/aqc/aqc.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2021.
# (C) Copyright IBM 2022.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
Expand Down Expand Up @@ -35,10 +35,17 @@ class AQC:
* Approximate objective is tightly coupled with the approximate circuit implementation and
provides two methods for computing objective function and gradient with respect to approximate
circuit parameters. This objective is passed to the optimizer. Currently, there's only one
implementation based on 4-rotations CNOT unit blocks: :class:`.DefaultCNOTUnitObjective`. This
is a naive implementation of the objective function and gradient and may suffer from performance
issues.
circuit parameters. This objective is passed to the optimizer. Currently, there are two
implementations based on 4-rotations CNOT unit blocks: :class:`.DefaultCNOTUnitObjective` and
its accelerated version :class:`.FastCNOTUnitObjective`. Both implementations share the same
idea of maximization the Hilbert-Schmidt product between the target matrix and its
approximation. The former implementation approach should be considered as a baseline one. It
may suffer from performance issues, and is mostly suitable for a small number of qubits
(up to 5 or 6), whereas the latter, accelerated one, can be applied to larger problems.
* One should take into consideration the exponential growth of matrix size with the number of
qubits because the implementation not only creates a potentially large target matrix, but
also allocates a number of temporary memory buffers comparable in size to the target matrix.
"""

def __init__(
Expand All @@ -49,8 +56,8 @@ def __init__(
"""
Args:
optimizer: an optimizer to be used in the optimization procedure of the search for
the best approximate circuit. By default :obj:`.L_BFGS_B` is used with max iterations
is set to 1000.
the best approximate circuit. By default, :obj:`.L_BFGS_B` is used with max
iterations set to 1000.
seed: a seed value to be user by a random number generator.
"""
super().__init__()
Expand Down
6 changes: 3 additions & 3 deletions qiskit/transpiler/synthesis/aqc/cnot_structures.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,14 +225,14 @@ def _cartan_network(num_qubits: int) -> np.ndarray:
"""
n = num_qubits
if n > 3:
cnots = np.array([[0, 0, 0], [1, 1, 1]])
mult = np.array([[n - 2, n - 3, n - 2, n - 3], [n - 1, n - 1, n - 1, n - 1]])
cnots = np.asarray([[0, 0, 0], [1, 1, 1]])
mult = np.asarray([[n - 2, n - 3, n - 2, n - 3], [n - 1, n - 1, n - 1, n - 1]])
for _ in range(n - 2):
cnots = np.hstack((np.tile(np.hstack((cnots, mult)), 3), cnots))
mult[0, -1] -= 1
mult = np.tile(mult, 2)
elif n == 3:
cnots = np.array(
cnots = np.asarray(
[
[0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1],
Expand Down
2 changes: 1 addition & 1 deletion qiskit/transpiler/synthesis/aqc/cnot_unit_circuit.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def __init__(
self._num_cnots = cnots.shape[1]
self._tol = tol

# Thetas to be optimizer by the AQC algorithm
# Thetas to be optimized by the AQC algorithm
self._thetas = None

@property
Expand Down
6 changes: 3 additions & 3 deletions qiskit/transpiler/synthesis/aqc/cnot_unit_objective.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,9 +178,9 @@ def gradient(self, param_values: np.ndarray) -> np.ndarray:
# the partial derivative of the circuit with respect to an angle
# is the same circuit with the corresponding pauli gate, multiplied
# by a global phase of -1j / 2, next to the rotation gate (it commutes)
pauli_x = np.multiply(-1j / 2, np.array([[0, 1], [1, 0]]))
pauli_y = np.multiply(-1j / 2, np.array([[0, -1j], [1j, 0]]))
pauli_z = np.multiply(-1j / 2, np.array([[1, 0], [0, -1]]))
pauli_x = np.multiply(-1j / 2, np.asarray([[0, 1], [1, 0]]))
pauli_y = np.multiply(-1j / 2, np.asarray([[0, -1j], [1j, 0]]))
pauli_z = np.multiply(-1j / 2, np.asarray([[1, 0], [0, -1]]))

n = self._num_qubits
d = int(2**n)
Expand Down
164 changes: 164 additions & 0 deletions qiskit/transpiler/synthesis/aqc/fast_gradient/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# This code is part of Qiskit.
#
# (C) Copyright IBM 2022.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
# of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
#
# Any modifications or derivative works of this code must retain this
# copyright notice, and modified files need to carry a notice indicating
# that they have been altered from the originals.

r"""
================================================================================
Fast implementation of objective function class
(:mod:`qiskit.transpiler.synthesis.aqc.fast_gradient`)
================================================================================
.. currentmodule:: qiskit.transpiler.synthesis.aqc.fast_gradient
Extension to the implementation of Approximate Quantum Compiler as described in the paper [1].
Interface
=========
The main public class of this module is FastCNOTUnitObjective. It replaces the default objective
function implementation :class:`.DefaultCNOTUnitObjective` for faster computation.
The individual classes include the public one (FastCNOTUnitObjective) and few
internal ones:
.. autosummary::
:toctree: ../stubs
:template: autosummary/class_no_inherited_members.rst
FastCNOTUnitObjective
LayerBase
Layer1Q
Layer2Q
PMatrix
Mathematical Details
====================
In what follows we briefly outline the main ideas underlying the accelerated implementation
of objective function class.
* The key ingredient of approximate compiling is the efficient optimization procedure
that minimizes :math:`\|V - U\|_{\mathrm{F}}` on a classical computer, where :math:`U`
is a given (target) unitary matrix and :math:`V` is a matrix of approximating quantum
circuit. Alternatively, we maximize the Hilbert-Schmidt product between :math:`U` and
:math:`V` as outlined in the main part of the documentation.
* The circuit :math:`V` can be represented as a sequence of 2-qubit gates (layers)
applied one after another. The corresponding matrix takes the form:
:math:`V = C_0 C_1 \ldots C_{L-1} F`, where :math:`L` is the length of the sequence
(number of layers). If the total number of qubits :math:`n > 2`, every
:math:`C_i = C_i(\Theta_i)` is a sparse, :math:`2^n \times 2^n` matrix of 2-qubit gate
(CNOT unit block) parameterized by a sub-set of parameters :math:`\Theta_i`
(4 parameters per unit block), and :math:`F` is a matrix that comprises the action
of all 1-qubit gates in front of approximating circuit. See the paper [1] for details.
* Over the course of optimization we compute the value of objective function and its
gradient, which implies computation of :math:`V` and its derivatives
:math:`{\partial V}/{\partial \Theta_i}` for all :math:`i`, given the current estimation
of all the parameters :math:`\Theta`.
* A naive implementation of the product :math:`V = C_0 C_1 \ldots C_{L-1} F` and its
derivatives would include computation and memorization of forward and backward partial
products as required by the backtracking algorithm. This is wasteful in terms of
performance and resource allocation.
* Minimization of :math:`\|V - U\|_{\mathrm{F}}^2` is equivalent to maximization of
:math:`\text{Re}\left(\text{Tr}\left(U^{\dagger} V\right)\right)`. By cyclic permutation
of the sequence of matrices under trace operation, we can avoid memorization of intermediate
partial products of gate matrices :math:`C_i`. Note, matrix size grows exponentially with
the number of qubits, quickly becoming prohibitively large.
* Sparse structure of :math:`C_i` can be exploited to speed up matrix-matrix multiplication.
However, using sparse matrices as such does not give performance gain because sparse patterns
tend to violate data proximity inside the cache memory of modern CPUs. Instead, we make use
of special structure of gate matrices :math:`C_i` coupled with permutation ones. Although
permutation is not cache friendly either, its impact is seemingly less severe than that
of sparse matrix multiplication (at least in Python implementation).
* On every optimization iteration we, first, compute :math:`V = C_0 C_1 \ldots C_{L-1} F`
given the current estimation of all the parameters :math:`\Theta`.
* As for the gradient of objective function, it can be shown (by moving cyclically around
an individual matrices under trace operation) that:
.. math::
\text{Tr}\left( U^{\dagger} \frac{\partial V}{\partial \Theta_{l,k}} \right) =
\langle \text{vec}\left(E_l\right), \text{vec}\left(
\frac{\partial C_l}{\partial \Theta_{l,k}}\right) \rangle,
where :math:`\Theta_{l,k}` is a :math:`k`-th parameter of :math:`l`-th CNOT unit block,
and :math:`E_l=C_{l-1}\left(C_{l-2}\left(\cdots\left(C_0\left(U^{\dagger}V
C_0^{\dagger}\right)C_1^{\dagger}\right) \cdots\right)C_{l-1}^{\dagger}\right)C_l^{\dagger}`
is an intermediate matrix.
* For every :math:`l`-th gradient component, we compute the trace using the matrix
:math:`E_l`, then this matrix is updated by multiplication on left and on the right
by corresponding gate matrices :math:`C_l` and :math:`C_{l+1}^{\dagger}` respectively
and proceed to the next gradient component.
* We save computations and resources by not storing intermediate partial products of
:math:`C_i`. Instead, incrementally updated matrix :math:`E_l` keeps all related
information. Also, vectorization of involved matrices (see the above formula) allows
us to replace matrix-matrix multiplication by "cheaper" vector-vector one under the
trace operation.
* The matrices :math:`C_i` are sparse. However, even for relatively small matrices
(< 1M elements) sparse-dense multiplication can be very slow. Construction of sparse
matrices takes a time as well. We should update every gate matrix on each iteration
of optimization loop.
* In fact, any gate matrix :math:`C_i` can be transformed to what we call a standard
form: :math:`C_i = P^T \widetilde{C}_i P`, where :math:`P` is an easily computable
permutation matrix and :math:`\widetilde{C}_i` has a block-diagonal layout:
.. math::
\widetilde{C}_i = \left(
\begin{array}{ccc}
G_{4 \times 4} & \ddots & 0 \\
\ddots & \ddots & \ddots \\
0 & \ddots & G_{4 \times 4}
\end{array}
\right)
* The 2-qubit gate matrix :math:`G_{4 \times 4}` is repeated along diagonal of the full
:math:`2^n \times 2^n` :math:`\widetilde{C}_i`.
* We do not actually create neither matrix :math:`\widetilde{C}_i` nor :math:`P`.
In fact, only :math:`G_{4 \times 4}` and a permutation array (of size :math:`2^n`)
are kept in memory.
* Consider left-hand side multiplication by some dense, :math:`2^n \times 2^n` matrix :math:`M`:
.. math::
C_i M = P^T \widetilde{C}_i P M = P^T \left( \widetilde{C}_i \left( P M \right) \right)
* First, we permute rows of :math:`M`, which is equivalent to the product :math:`P M`, but
without expensive multiplication of two :math:`2^n \times 2^n` matrices.
* Second, we compute :math:`\widetilde{C}_i P M` multiplying every block-diagonal sub-matrix
:math:`G_{4 \times 4}` by the corresponding rows of :math:`P M`. This is the dense-dense
matrix multiplication, which is very well optimized on modern CPUs. Important: the total
number of operations is :math:`O(2^{2 n})` in contrast to :math:`O(2^{3 n})` as in general
case.
* Third, we permute rows of :math:`\widetilde{C}_i P M` by applying :math:`P^T`.
* Right-hand side multiplication is done in a similar way.
* In summary, we save computational resources by exploiting some properties of 2-qubit gate
matrices :math:`C_i` and using hardware optimized multiplication of dense matrices. There
is still a room for further improvement, of course.
References:
[1]: Liam Madden, Andrea Simonetto, Best Approximate Quantum Compiling Problems.
`arXiv:2106.05649 <https://arxiv.org/abs/2106.05649>`_
"""
Loading