-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into test-all-optionals
- Loading branch information
Showing
26 changed files
with
2,330 additions
and
112 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
164 changes: 164 additions & 0 deletions
164
qiskit/transpiler/synthesis/aqc/fast_gradient/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
# This code is part of Qiskit. | ||
# | ||
# (C) Copyright IBM 2022. | ||
# | ||
# This code is licensed under the Apache License, Version 2.0. You may | ||
# obtain a copy of this license in the LICENSE.txt file in the root directory | ||
# of this source tree or at http://www.apache.org/licenses/LICENSE-2.0. | ||
# | ||
# Any modifications or derivative works of this code must retain this | ||
# copyright notice, and modified files need to carry a notice indicating | ||
# that they have been altered from the originals. | ||
|
||
r""" | ||
================================================================================ | ||
Fast implementation of objective function class | ||
(:mod:`qiskit.transpiler.synthesis.aqc.fast_gradient`) | ||
================================================================================ | ||
.. currentmodule:: qiskit.transpiler.synthesis.aqc.fast_gradient | ||
Extension to the implementation of Approximate Quantum Compiler as described in the paper [1]. | ||
Interface | ||
========= | ||
The main public class of this module is FastCNOTUnitObjective. It replaces the default objective | ||
function implementation :class:`.DefaultCNOTUnitObjective` for faster computation. | ||
The individual classes include the public one (FastCNOTUnitObjective) and few | ||
internal ones: | ||
.. autosummary:: | ||
:toctree: ../stubs | ||
:template: autosummary/class_no_inherited_members.rst | ||
FastCNOTUnitObjective | ||
LayerBase | ||
Layer1Q | ||
Layer2Q | ||
PMatrix | ||
Mathematical Details | ||
==================== | ||
In what follows we briefly outline the main ideas underlying the accelerated implementation | ||
of objective function class. | ||
* The key ingredient of approximate compiling is the efficient optimization procedure | ||
that minimizes :math:`\|V - U\|_{\mathrm{F}}` on a classical computer, where :math:`U` | ||
is a given (target) unitary matrix and :math:`V` is a matrix of approximating quantum | ||
circuit. Alternatively, we maximize the Hilbert-Schmidt product between :math:`U` and | ||
:math:`V` as outlined in the main part of the documentation. | ||
* The circuit :math:`V` can be represented as a sequence of 2-qubit gates (layers) | ||
applied one after another. The corresponding matrix takes the form: | ||
:math:`V = C_0 C_1 \ldots C_{L-1} F`, where :math:`L` is the length of the sequence | ||
(number of layers). If the total number of qubits :math:`n > 2`, every | ||
:math:`C_i = C_i(\Theta_i)` is a sparse, :math:`2^n \times 2^n` matrix of 2-qubit gate | ||
(CNOT unit block) parameterized by a sub-set of parameters :math:`\Theta_i` | ||
(4 parameters per unit block), and :math:`F` is a matrix that comprises the action | ||
of all 1-qubit gates in front of approximating circuit. See the paper [1] for details. | ||
* Over the course of optimization we compute the value of objective function and its | ||
gradient, which implies computation of :math:`V` and its derivatives | ||
:math:`{\partial V}/{\partial \Theta_i}` for all :math:`i`, given the current estimation | ||
of all the parameters :math:`\Theta`. | ||
* A naive implementation of the product :math:`V = C_0 C_1 \ldots C_{L-1} F` and its | ||
derivatives would include computation and memorization of forward and backward partial | ||
products as required by the backtracking algorithm. This is wasteful in terms of | ||
performance and resource allocation. | ||
* Minimization of :math:`\|V - U\|_{\mathrm{F}}^2` is equivalent to maximization of | ||
:math:`\text{Re}\left(\text{Tr}\left(U^{\dagger} V\right)\right)`. By cyclic permutation | ||
of the sequence of matrices under trace operation, we can avoid memorization of intermediate | ||
partial products of gate matrices :math:`C_i`. Note, matrix size grows exponentially with | ||
the number of qubits, quickly becoming prohibitively large. | ||
* Sparse structure of :math:`C_i` can be exploited to speed up matrix-matrix multiplication. | ||
However, using sparse matrices as such does not give performance gain because sparse patterns | ||
tend to violate data proximity inside the cache memory of modern CPUs. Instead, we make use | ||
of special structure of gate matrices :math:`C_i` coupled with permutation ones. Although | ||
permutation is not cache friendly either, its impact is seemingly less severe than that | ||
of sparse matrix multiplication (at least in Python implementation). | ||
* On every optimization iteration we, first, compute :math:`V = C_0 C_1 \ldots C_{L-1} F` | ||
given the current estimation of all the parameters :math:`\Theta`. | ||
* As for the gradient of objective function, it can be shown (by moving cyclically around | ||
an individual matrices under trace operation) that: | ||
.. math:: | ||
\text{Tr}\left( U^{\dagger} \frac{\partial V}{\partial \Theta_{l,k}} \right) = | ||
\langle \text{vec}\left(E_l\right), \text{vec}\left( | ||
\frac{\partial C_l}{\partial \Theta_{l,k}}\right) \rangle, | ||
where :math:`\Theta_{l,k}` is a :math:`k`-th parameter of :math:`l`-th CNOT unit block, | ||
and :math:`E_l=C_{l-1}\left(C_{l-2}\left(\cdots\left(C_0\left(U^{\dagger}V | ||
C_0^{\dagger}\right)C_1^{\dagger}\right) \cdots\right)C_{l-1}^{\dagger}\right)C_l^{\dagger}` | ||
is an intermediate matrix. | ||
* For every :math:`l`-th gradient component, we compute the trace using the matrix | ||
:math:`E_l`, then this matrix is updated by multiplication on left and on the right | ||
by corresponding gate matrices :math:`C_l` and :math:`C_{l+1}^{\dagger}` respectively | ||
and proceed to the next gradient component. | ||
* We save computations and resources by not storing intermediate partial products of | ||
:math:`C_i`. Instead, incrementally updated matrix :math:`E_l` keeps all related | ||
information. Also, vectorization of involved matrices (see the above formula) allows | ||
us to replace matrix-matrix multiplication by "cheaper" vector-vector one under the | ||
trace operation. | ||
* The matrices :math:`C_i` are sparse. However, even for relatively small matrices | ||
(< 1M elements) sparse-dense multiplication can be very slow. Construction of sparse | ||
matrices takes a time as well. We should update every gate matrix on each iteration | ||
of optimization loop. | ||
* In fact, any gate matrix :math:`C_i` can be transformed to what we call a standard | ||
form: :math:`C_i = P^T \widetilde{C}_i P`, where :math:`P` is an easily computable | ||
permutation matrix and :math:`\widetilde{C}_i` has a block-diagonal layout: | ||
.. math:: | ||
\widetilde{C}_i = \left( | ||
\begin{array}{ccc} | ||
G_{4 \times 4} & \ddots & 0 \\ | ||
\ddots & \ddots & \ddots \\ | ||
0 & \ddots & G_{4 \times 4} | ||
\end{array} | ||
\right) | ||
* The 2-qubit gate matrix :math:`G_{4 \times 4}` is repeated along diagonal of the full | ||
:math:`2^n \times 2^n` :math:`\widetilde{C}_i`. | ||
* We do not actually create neither matrix :math:`\widetilde{C}_i` nor :math:`P`. | ||
In fact, only :math:`G_{4 \times 4}` and a permutation array (of size :math:`2^n`) | ||
are kept in memory. | ||
* Consider left-hand side multiplication by some dense, :math:`2^n \times 2^n` matrix :math:`M`: | ||
.. math:: | ||
C_i M = P^T \widetilde{C}_i P M = P^T \left( \widetilde{C}_i \left( P M \right) \right) | ||
* First, we permute rows of :math:`M`, which is equivalent to the product :math:`P M`, but | ||
without expensive multiplication of two :math:`2^n \times 2^n` matrices. | ||
* Second, we compute :math:`\widetilde{C}_i P M` multiplying every block-diagonal sub-matrix | ||
:math:`G_{4 \times 4}` by the corresponding rows of :math:`P M`. This is the dense-dense | ||
matrix multiplication, which is very well optimized on modern CPUs. Important: the total | ||
number of operations is :math:`O(2^{2 n})` in contrast to :math:`O(2^{3 n})` as in general | ||
case. | ||
* Third, we permute rows of :math:`\widetilde{C}_i P M` by applying :math:`P^T`. | ||
* Right-hand side multiplication is done in a similar way. | ||
* In summary, we save computational resources by exploiting some properties of 2-qubit gate | ||
matrices :math:`C_i` and using hardware optimized multiplication of dense matrices. There | ||
is still a room for further improvement, of course. | ||
References: | ||
[1]: Liam Madden, Andrea Simonetto, Best Approximate Quantum Compiling Problems. | ||
`arXiv:2106.05649 <https://arxiv.org/abs/2106.05649>`_ | ||
""" |
Oops, something went wrong.