-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use singleton matrices for unparametrised standard gates #10296
Conversation
This makes the array form of standard gates with zero parameters singleton class attributes that reject modification. The class-level `__array__` methods are updated to return exactly the same instance, except in very unusual circumstances, which means that `Gate.to_matrix()` and `numpy.asarray()` calls on the objects will return the same instance. This avoids a decent amount of construction time, and avoids several Python-space list allocations and array allocations. The dtypes of the static arrays are all standardised to by complex128. Gate matrices are in general unitary, `Gate.to_matrix()` already enforces a cast to `complex128`. For gates that allowed their dtypes to be inferred, there were several cases where native ints and floats would be used, meaning that `Gate.to_matrix()` would also involve an extra matrix allocation to hold the cast, which just wasted time. For standard controlled gates, we store both the closed- and open-controlled matrices singly controlled gates. For gates with more than one control, we only store the "all ones" controlled case, as a memory/speed trade-off; open controls are much less common than closed controls. For the most part this won't have an effect on peak memory usage, since all the allocated matrices in standard Qiskit usage would be freed by the garbage collector almost immediately. This will, however, reduce construction costs and garbage-collector pressure, since fewer allocations+frees will occur, and no calculations will need to be done.
One or more of the the following people are requested to review this:
|
Oh wait, lol - I wrote the commit message before I actually finished making the code do what I described with regard to the larger controlled gates. I'll push up that change. |
Pull Request Test Coverage Report for Build 5581753665
💛 - Coveralls |
This is a good change, not just for optimization, but for organization and standardization as well. We could add some performance and safety by introducing something like this: _array = np.array([[1, 0, 0, 0], [0, 0, 0, 1], [0, 1, 0, 0], [0, 0, 1, 0]], dtype=np.complex128)
# Don't use this, just for comparison
def _as_array_plain(_array, dtype=None):
return np.asarray(_array, dtype=dtype)
# Use this if the unitary has non-zero imaginary part
# I chose TypeError even though the type is not part of the standard Python type apparatus.
def _as_array_complex(_array, dtype=None):
if dtype is None or dtype == 'complex128':
return _array
if dtype == 'float64':
raise TypeError("Can't convert unitary with non-zero imaginary part to float")
return np.asarray(_array, dtype=dtype)
# Use this if the unitary is real, although stored as complex128
def _as_array_float(_array, dtype=None):
if dtype is None or dtype == 'complex128':
return _array
if dtype == 'float64':
return _array.real
return np.asarray(_array, dtype=dtype) Here are some times: In [1]: %timeit _as_array_plain(_array)
102 ns ± 0.906 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [2]: %timeit _as_array_float(_array)
42.2 ns ± 0.362 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [3]: %timeit _as_array_plain(_array, dtype='float64')
/home/lapeyre/programming_play/playpython/numpy/array_convert.py:27: ComplexWarning: Casting complex values to real discards the imaginary part
return np.asarray(_array, dtype=dtype)
1.11 µs ± 9.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [4]: %timeit _as_array_float(_array, dtype='float64')
136 ns ± 0.59 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) For example the X gate would call |
That's interesting with the timings. I don't think there's a need to optimise a potential returned float matrix; a prospective caller can easily do call |
I understand that it makes sense for the caller to get the complex matrix and call numpy made a design choice to warn and return the real part instead of error when you do |
Personally I think Fwiw, evaluating The call that matters for timing within Qiskit is |
I must be missing something. I see this In [2]: v = np.array([1.,2.,3.])
In [3]: %timeit v.dtype == "float64"
94.6 ns ± 0.778 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [4]: %timeit v.dtype == np.dtype("float64")
205 ns ± 0.895 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: cmptype = np.dtype("float64")
In [6]: %timeit v.dtype == cmptype
37.3 ns ± 0.205 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [7]: dtype2 = np.dtype("float64")
In [8]: %timeit dtype2 == cmptype
23.5 ns ± 0.175 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [9]: %timeit dtype2 == "float64"
76.6 ns ± 0.556 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [10]: %timeit np.dtype("float64") == "float64"
227 ns ± 0.684 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) and In [37]: xg = XGate()
In [38]: %timeit np.asarray(xg)
1.73 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [39]: %timeit xg.to_matrix()
988 ns ± 6.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) |
Here are three sort of separate points
I think we may be able to avoid calling that on user-defined gates. But in any case, I don't think supporting this for efficiency is important. |
I'm thinking of something like this: import numpy
from qiskit.circuit.library import XGate, YGate, CXGate, CZGate, CCXGate
from qiskit.circuit.gate import Gate
def setmatrix(matrix_def):
matrix = numpy.array(matrix_def, dtype=numpy.complex128)
matrix.setflags(write=False)
return matrix
def _to_matrix(gate):
if hasattr(gate, "_ARRAY"):
return gate._ARRAY
if hasattr(gate, "_ARRAY_0"):
return gate._ARRAY_1 if gate.ctrl_state == 1 else gate._ARRAY_0
if hasattr(gate, "_ARRAY_3") and gate.ctrl_state == 3:
return gate._ARRAY_3
if hasattr(gate, "__array__"):
return gate.__array__(dtype=complex)
raise CircuitError(f"to_matrix not defined for this {type(gate)}")
Gate.to_matrix = _to_matrix Used like this XGate._ARRAY = setmatrix([[1, 0], [0, 1]])
CXGate._ARRAY_1 = setmatrix([[1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 1, 0, 0]])
CXGate._ARRAY_0 = setmatrix([[0, 0, 1, 0], [0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]])
array3 = numpy.eye(8)
array3[[3, 7], :] = array3[[7, 3], :]
CCXGate._ARRAY_3 = setmatrix(array3) |
On the dtype comparisons, I was mistaken because I thought |
Instead of defining the array functions manually for each class, this adds a small amount of metaprogramming that adds them in with the correct `ndarray` properties set, including for controlled gates.
Decorators added in 02f5623. |
I don't know if the decorators improve legibility. But they do what I was mostly concerned with, which is to factor out repeated code. It's certainly more concise now. |
Not to users, no, I put them in a private module. We can always export them later if we think they're API surface we'd want to support. |
Good. It would be nice to see how this works for a while without running the risk of the labor involved in deprecation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
This was left in only as a mistake during the writing of Qiskit#10296; originally the `with_gate_array` decorator didn't exist and all classes had manual `_ARRAY` specifiers, but `SXGate`'s got left in accidentally when all the rest were removed in favour of the decorator.
* Remove spurious `SXGate._ARRAY` This was left in only as a mistake during the writing of #10296; originally the `with_gate_array` decorator didn't exist and all classes had manual `_ARRAY` specifiers, but `SXGate`'s got left in accidentally when all the rest were removed in favour of the decorator. * Remove unused import
Summary
This makes the array form of standard gates with zero parameters singleton class attributes that reject modification. The class-level
__array__
methods are updated to return exactly the same instance, except in very unusual circumstances, which means thatGate.to_matrix()
andnumpy.asarray()
calls on the objects will return the same instance. This avoids a decent amount of construction time, and avoids several Python-space list allocations and array allocations.The dtypes of the static arrays are all standardised to be complex128. Gate matrices are in general unitary,
Gate.to_matrix()
already enforces a cast tocomplex128
. For gates that allowed their dtypes to be inferred, there were several cases where native ints and floats would be used, meaning thatGate.to_matrix()
would also involve an extra matrix allocation to hold the cast, which just wasted time.For standard controlled gates, we store both the closed- and open-controlled matrices singly controlled gates. For gates with more than one control, we only store the "all ones" controlled case, as a memory/speed trade-off; open controls are much less common than closed controls.
For the most part this won't have an effect on peak memory usage, since all the allocated matrices in standard Qiskit usage would be freed by the garbage collector almost immediately. This will, however, reduce construction costs and garbage-collector pressure, since fewer allocations+frees will occur, and no calculations will need to be done.
Details and comments
I ran some bits of the benchmark suite, not the full thing. One particularly notable speedup was in
CommutationAnalysis
, where I'm seeing ~15% improvements in the benchmarks:I'd expect to see some improvements bits of unitary synthesis and the 1q and 2q conversions into unitaries too for real workloads, though I didn't immediately see them in my quick runs - the benchmark suite might not use too many of the gates that see the speedup, though, or I might have missed those benchmarks.