Use singleton matrices for unparametrised standard gates #10296

jakelishman · 2023-06-15T19:13:56Z

Summary

This makes the array form of standard gates with zero parameters singleton class attributes that reject modification. The class-level __array__ methods are updated to return exactly the same instance, except in very unusual circumstances, which means that Gate.to_matrix() and numpy.asarray() calls on the objects will return the same instance. This avoids a decent amount of construction time, and avoids several Python-space list allocations and array allocations.

The dtypes of the static arrays are all standardised to be complex128. Gate matrices are in general unitary, Gate.to_matrix() already enforces a cast to complex128. For gates that allowed their dtypes to be inferred, there were several cases where native ints and floats would be used, meaning that Gate.to_matrix() would also involve an extra matrix allocation to hold the cast, which just wasted time.

For standard controlled gates, we store both the closed- and open-controlled matrices singly controlled gates. For gates with more than one control, we only store the "all ones" controlled case, as a memory/speed trade-off; open controls are much less common than closed controls.

For the most part this won't have an effect on peak memory usage, since all the allocated matrices in standard Qiskit usage would be freed by the garbage collector almost immediately. This will, however, reduce construction costs and garbage-collector pressure, since fewer allocations+frees will occur, and no calculations will need to be done.

Details and comments

I ran some bits of the benchmark suite, not the full thing. One particularly notable speedup was in CommutationAnalysis, where I'm seeing ~15% improvements in the benchmarks:

       before           after         ratio
     [81964e64]       [339f4dca]
     <singleton-matrices~1>       <singleton-matrices>
-      1.86±0.02s       1.65±0.01s     0.89  passes.PassBenchmarks.time_commutation_analysis(20, 1024)
-        563±10ms         488±20ms     0.87  passes.PassBenchmarks.time_commutation_analysis(5, 1024)
-       1.33±0.1s       1.12±0.01s     0.84  passes.PassBenchmarks.time_commutation_analysis(14, 1024)

I'd expect to see some improvements bits of unitary synthesis and the 1q and 2q conversions into unitaries too for real workloads, though I didn't immediately see them in my quick runs - the benchmark suite might not use too many of the gates that see the speedup, though, or I might have missed those benchmarks.

This makes the array form of standard gates with zero parameters singleton class attributes that reject modification. The class-level `__array__` methods are updated to return exactly the same instance, except in very unusual circumstances, which means that `Gate.to_matrix()` and `numpy.asarray()` calls on the objects will return the same instance. This avoids a decent amount of construction time, and avoids several Python-space list allocations and array allocations. The dtypes of the static arrays are all standardised to by complex128. Gate matrices are in general unitary, `Gate.to_matrix()` already enforces a cast to `complex128`. For gates that allowed their dtypes to be inferred, there were several cases where native ints and floats would be used, meaning that `Gate.to_matrix()` would also involve an extra matrix allocation to hold the cast, which just wasted time. For standard controlled gates, we store both the closed- and open-controlled matrices singly controlled gates. For gates with more than one control, we only store the "all ones" controlled case, as a memory/speed trade-off; open controls are much less common than closed controls. For the most part this won't have an effect on peak memory usage, since all the allocated matrices in standard Qiskit usage would be freed by the garbage collector almost immediately. This will, however, reduce construction costs and garbage-collector pressure, since fewer allocations+frees will occur, and no calculations will need to be done.

qiskit-bot · 2023-06-15T19:14:04Z

One or more of the the following people are requested to review this:

@Cryoris
@Qiskit/terra-core
@ajavadia

jakelishman · 2023-06-15T19:27:25Z

Oh wait, lol - I wrote the commit message before I actually finished making the code do what I described with regard to the larger controlled gates. I'll push up that change.

coveralls · 2023-06-15T19:43:53Z

Pull Request Test Coverage Report for Build 5581753665

82 of 84 (97.62%) changed or added relevant lines in 13 files are covered.
4 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.04%) to 86.062%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
qiskit/circuit/_utils.py	30	32	93.75%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/lex.rs	2	91.65%
qiskit/extensions/quantum_initializer/squ.py	2	80.0%

Totals
Change from base Build 5580403198:	0.04%
Covered Lines:	72297
Relevant Lines:	84006

💛 - Coveralls

jlapeyre · 2023-06-16T18:56:35Z

This is a good change, not just for optimization, but for organization and standardization as well.

We could add some performance and safety by introducing something like this:

_array = np.array([[1, 0, 0, 0], [0, 0, 0, 1], [0, 1, 0, 0], [0, 0, 1, 0]], dtype=np.complex128)

# Don't use this, just for comparison
def _as_array_plain(_array, dtype=None):
    return np.asarray(_array, dtype=dtype)

# Use this if the unitary has non-zero imaginary part
# I chose TypeError even though the type is not part of the standard Python type apparatus.
def _as_array_complex(_array, dtype=None):
    if dtype is None or dtype == 'complex128':
        return _array
    if dtype == 'float64':
        raise TypeError("Can't convert unitary with non-zero imaginary part to float")
    return np.asarray(_array, dtype=dtype)

# Use this if the unitary is real, although stored as complex128
def _as_array_float(_array, dtype=None):
    if dtype is None or dtype == 'complex128':
        return _array
    if dtype == 'float64':
        return _array.real
    return np.asarray(_array, dtype=dtype)

Here are some times:

In [1]: %timeit _as_array_plain(_array)
102 ns ± 0.906 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [2]: %timeit _as_array_float(_array)
42.2 ns ± 0.362 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [3]: %timeit _as_array_plain(_array, dtype='float64')
/home/lapeyre/programming_play/playpython/numpy/array_convert.py:27: ComplexWarning: Casting complex values to real discards the imaginary part
  return np.asarray(_array, dtype=dtype)
1.11 µs ± 9.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit _as_array_float(_array, dtype='float64')
136 ns ± 0.59 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

For example the X gate would call as_array_float and the Y gate would call _as_array_complex

jakelishman · 2023-06-16T19:30:44Z

That's interesting with the timings. I don't think there's a need to optimise a potential returned float matrix; a prospective caller can easily do call .real, since they'd need to know that the matrix was real for it to make sense anyway, and the logical return value of "unitary matrix" is complex only.

jlapeyre · 2023-06-16T20:38:36Z

I understand that it makes sense for the caller to get the complex matrix and call real themselves. But we are offering them an interface to use dtype. A user has to investigate or otherwise know that real will be more efficient. One essential part of why I suggested this is that we know which are real and which are not. So we can make it more efficient and safer at the same time. The user might make a mistake.

numpy made a design choice to warn and return the real part instead of error when you do asarray(v, dtype='float64') for complex v. I don't think we need to follow that choice here. These are higher level objects that have explicit, static, representations.

jakelishman · 2023-07-17T15:38:38Z

Personally I think Gate.__array__ should universally be typed as complex; unitary matrices are complex in general, and a user should be aware that they'll need to handle the conversion. I'm not sure it's worth maintaining additional code in Qiskit to accommodate the very edge case that somebody might want a gate matrix in a non-standard dtype, when it's a basic conversion that a user can easily do themselves.

Fwiw, evaluating dtype == "float64" will miss things. The correct comparison would be dtype == np.dtype("float64"), which is a call that takes longer than the existing np.asarray call.

The call that matters for timing within Qiskit is Gate.to_matrix, which does a general hasattr and then explicitly must pass dtype=np.complex128 to handle user-defined custom gates whose __array__ methods might naturally return non-complex types. That means we'll always need to be concerned about the overhead of dtype checking, which wipes out the performance benefit of _as_array_float.

jlapeyre · 2023-07-17T21:14:37Z

dtype == np.dtype("float64"), which is a call that takes longer than the existing np.asarray call.

I must be missing something. I see this

In [2]: v = np.array([1.,2.,3.])

In [3]: %timeit v.dtype == "float64"
94.6 ns ± 0.778 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [4]: %timeit v.dtype == np.dtype("float64")
205 ns ± 0.895 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: cmptype = np.dtype("float64")

In [6]: %timeit v.dtype == cmptype
37.3 ns ± 0.205 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [7]: dtype2 = np.dtype("float64")

In [8]: %timeit dtype2 == cmptype
23.5 ns ± 0.175 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [9]: %timeit dtype2 == "float64"
76.6 ns ± 0.556 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [10]: %timeit np.dtype("float64") == "float64"
227 ns ± 0.684 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

and

In [37]: xg = XGate()

In [38]: %timeit np.asarray(xg)
1.73 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [39]: %timeit xg.to_matrix()
988 ns ± 6.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

jlapeyre · 2023-07-17T22:01:58Z

Here are three sort of separate points

My preference would be that there be a standard way to convert a Gate to a numerical matrix that returns np.complex128 and does not support anything else. For example does not take an argument dtype, or raises an error if one is passed. That gives us what we need and doesn't supply extra API to support. (There is always some overlooked workflow in the wild, that hopefully is brought to our attention.)
It's clear that returning a precomputed array would be a great performance boost. EDIT: I missed _ARRAY.setflags(write=False), which answers the following concern. One potential problem is that a user may modify the data. I would expect that some user has done this even though its a bad idea just because it's easy. We'll of course say that modifying the array is not supported. But it would remain a source of bugs. I expect not too common. But, I've seen this bug in other contexts: ie an array whose data is meant to be immutable, but some caller eventually modifies it. I support just trying it anyway.
There is probably a way to factor the code in this PR. To me it seems like the obvious thing to do. But I suppose one could argue that it's preferable to introducing any structure. I think a small amount of structure is worth the trouble (I'm thinking more of a function or two rather than classes). I imagine that the implementation may be changed. Eg, how many control states do we support and how do we do it. This would be easier if it were done in one place.

wipes out the performance benefit of _as_array_float

I think we may be able to avoid calling that on user-defined gates. But in any case, I don't think supporting this for efficiency is important.

jlapeyre · 2023-07-17T22:43:50Z

I'm thinking of something like this:

import numpy
from qiskit.circuit.library import XGate, YGate, CXGate, CZGate, CCXGate
from qiskit.circuit.gate import Gate

def setmatrix(matrix_def):
    matrix = numpy.array(matrix_def, dtype=numpy.complex128)
    matrix.setflags(write=False)
    return matrix

def _to_matrix(gate):
    if hasattr(gate, "_ARRAY"):
        return gate._ARRAY
    if hasattr(gate, "_ARRAY_0"):
        return gate._ARRAY_1 if gate.ctrl_state == 1 else gate._ARRAY_0
    if hasattr(gate, "_ARRAY_3") and gate.ctrl_state == 3:
        return gate._ARRAY_3
    if hasattr(gate, "__array__"):
        return gate.__array__(dtype=complex)
    raise CircuitError(f"to_matrix not defined for this {type(gate)}")

Gate.to_matrix = _to_matrix

Used like this

XGate._ARRAY = setmatrix([[1, 0], [0, 1]])

CXGate._ARRAY_1 = setmatrix([[1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 1, 0, 0]])
CXGate._ARRAY_0 = setmatrix([[0, 0, 1, 0], [0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]])

array3 = numpy.eye(8)
array3[[3, 7], :] = array3[[7, 3], :]
CCXGate._ARRAY_3 = setmatrix(array3)

jakelishman · 2023-07-17T22:49:33Z

We have that: it's Gate.to_matrix. It's beyond the scope of this PR to change how that works by calling Gate.__array__, because that would change subclass interactions.
We're agreed here.
I can add a little bit of metaprogrammatic generation of the __array__ method and class variable if you think it'll improve legibility a lot - I'd do it as two decorators _add_array_method(base_array: np.ndarray) or _add_controlled_array_method(base_gate_array: np.ndarray, precomputed_control_states: Iterable[int]).

On the dtype comparisons, I was mistaken because I thought np.float64 was an instance of np.dtype, but it isn't. I was seeing that np.float64 != "float64", but that's not actually a dtype comparison like I thought it was.

Instead of defining the array functions manually for each class, this adds a small amount of metaprogramming that adds them in with the correct `ndarray` properties set, including for controlled gates.

jakelishman · 2023-07-17T23:53:09Z

Decorators added in 02f5623.

jlapeyre · 2023-07-18T03:59:56Z

I don't know if the decorators improve legibility. But they do what I was mostly concerned with, which is to factor out repeated code. It's certainly more concise now.
It looks like you want to make the decorators available to users, right?

jakelishman · 2023-07-18T08:32:59Z

Not to users, no, I put them in a private module. We can always export them later if we think they're API surface we'd want to support.

jlapeyre · 2023-07-18T14:50:51Z

Not to users, no, I put them in a private module

Good. It would be nice to see how this works for a while without running the risk of the labor involved in deprecation.

jlapeyre

Looks good.

This was left in only as a mistake during the writing of Qiskit#10296; originally the `with_gate_array` decorator didn't exist and all classes had manual `_ARRAY` specifiers, but `SXGate`'s got left in accidentally when all the rest were removed in favour of the decorator.

* Remove spurious `SXGate._ARRAY` This was left in only as a mistake during the writing of #10296; originally the `with_gate_array` decorator didn't exist and all classes had manual `_ARRAY` specifiers, but `SXGate`'s got left in accidentally when all the rest were removed in favour of the decorator. * Remove unused import

jakelishman added performance Changelog: None Do not include in changelog labels Jun 15, 2023

jakelishman requested a review from a team as a code owner June 15, 2023 19:13

jakelishman added 2 commits June 15, 2023 21:11

Store only all-ones controls for large matrices

659b68e

Fix lint

ac4fcb3

jakelishman added this to the 0.25.0 milestone Jun 16, 2023

jlapeyre self-assigned this Jun 16, 2023

jakelishman added 2 commits July 18, 2023 00:50

Use metaprogramming decorator to make __array__ methods

02f5623

Instead of defining the array functions manually for each class, this adds a small amount of metaprogramming that adds them in with the correct `ndarray` properties set, including for controlled gates.

Merge remote-tracking branch 'ibm/main' into singleton-matrices

858c3f9

jlapeyre approved these changes Jul 18, 2023

View reviewed changes

jlapeyre added this pull request to the merge queue Jul 18, 2023

Merged via the queue into Qiskit:main with commit d62f780 Jul 18, 2023

jakelishman deleted the singleton-matrices branch July 19, 2023 21:19

jakelishman mentioned this pull request Sep 21, 2023

Remove spurious SXGate._ARRAY #10876

Merged

jakelishman mentioned this pull request Jul 9, 2024

Fix C3SXGate to_matrix method #12742

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use singleton matrices for unparametrised standard gates #10296

Use singleton matrices for unparametrised standard gates #10296

jakelishman commented Jun 15, 2023 •

edited by jlapeyre

Loading

qiskit-bot commented Jun 15, 2023

jakelishman commented Jun 15, 2023

coveralls commented Jun 15, 2023 •

edited

Loading

jlapeyre commented Jun 16, 2023 •

edited

Loading

jakelishman commented Jun 16, 2023

jlapeyre commented Jun 16, 2023

jakelishman commented Jul 17, 2023

jlapeyre commented Jul 17, 2023

jlapeyre commented Jul 17, 2023 •

edited

Loading

jlapeyre commented Jul 17, 2023

jakelishman commented Jul 17, 2023 •

edited

Loading

jakelishman commented Jul 17, 2023

jlapeyre commented Jul 18, 2023 •

edited

Loading

jakelishman commented Jul 18, 2023

jlapeyre commented Jul 18, 2023

jlapeyre left a comment

Use singleton matrices for unparametrised standard gates #10296

Use singleton matrices for unparametrised standard gates #10296

Conversation

jakelishman commented Jun 15, 2023 • edited by jlapeyre Loading

Summary

Details and comments

qiskit-bot commented Jun 15, 2023

jakelishman commented Jun 15, 2023

coveralls commented Jun 15, 2023 • edited Loading

Pull Request Test Coverage Report for Build 5581753665

💛 - Coveralls

jlapeyre commented Jun 16, 2023 • edited Loading

jakelishman commented Jun 16, 2023

jlapeyre commented Jun 16, 2023

jakelishman commented Jul 17, 2023

jlapeyre commented Jul 17, 2023

jlapeyre commented Jul 17, 2023 • edited Loading

jlapeyre commented Jul 17, 2023

jakelishman commented Jul 17, 2023 • edited Loading

jakelishman commented Jul 17, 2023

jlapeyre commented Jul 18, 2023 • edited Loading

jakelishman commented Jul 18, 2023

jlapeyre commented Jul 18, 2023

jlapeyre left a comment

Choose a reason for hiding this comment

jakelishman commented Jun 15, 2023 •

edited by jlapeyre

Loading

coveralls commented Jun 15, 2023 •

edited

Loading

jlapeyre commented Jun 16, 2023 •

edited

Loading

jlapeyre commented Jul 17, 2023 •

edited

Loading

jakelishman commented Jul 17, 2023 •

edited

Loading

jlapeyre commented Jul 18, 2023 •

edited

Loading