Performance regression caused by #7087 #7528

mtreinish · 2022-01-13T20:57:16Z

Environment

Qiskit Terra version: main (after 05b60a5 )
Python version: 3.8
Operating system: linux

What is happening?

The nightly benchmark runs have flagged several run time performance regressions after #7087 merged:

none are huge in absolute time (on the order of ms) and likely won't be noticeable in a larger transpile() call or application but we should try to fix these because #7087 really shouldn't have had any performance impact.

How can we reproduce the issue?

Run any of the transpiler passes identified in the regressions linked

What should happen?

The addition of a new abstract class defining the interface for a circuit operation shouldn't cause a noticeable performance regression

Any suggestions?

Identify where the passes are spending more time after the addition of the Operation class and fix the bottleneck.

The text was updated successfully, but these errors were encountered:

alexanderivrii · 2022-01-17T12:41:22Z

After a bit of experimenting with the first problematic testcase (passes.MultiQBlockPassBenchmarks.time_collect_multiq_block), we see the following (we = Shelly, Eli and myself):

The slowdown seems to be due to inheriting Operation from ABC: class Operation(ABC): ...; changing this to class Operation: ... removes the slowdown.
The above is due to many __instancecheck__ calls within ABC (and for some reason these are not free), which is in turn due to many isinstance calls in Qiskit.
In the current testcase most of the isinstance calls come from the function collect_key in qiskit/transpiler/passes/optimization/collect_multiqubit_blocks.py, where the function above is used in dag.topological_op_nodes(key=collect_key). We saw on average 2 calls to isinstance per node in the dag.

What would be the best way to proceed? Should we get rid of dependence on ABC? Should we try to optimize the number of times isinstance is called in this transpiler pass? Or maybe the slowdown is acceptable?

jakelishman · 2022-01-17T12:53:09Z

The __instancecheck__ mechanism of ABCMeta is to allow a registration method for classes that don't actually inherit from the defined type, like the generic types in typing (dict appears to be an subclass of typing.MutableMapping, for example, even though it doesn't actually inherit from it). This is why isinstance is slowed on ABCs than other things.

We can't give up isinstance; it's the correct way to test the types of objects when they may be inherited. It's also the reason we put in this class at all, to ensure that QuantumCircuit has an easy way to check that an object is of the right type - it's just got to inherit from Operation.

Perhaps we could define a lightweight metaclass that inherits from ABCMeta but rebinds __instancecheck__ and __subclasscheck__ to the relevant unbound method from type. That should remove the overhead back down to pretty much exactly what it was before (technically it'll be marginally slower because there will be an extra MRO entry in most classes, but it shouldn't be noticeable).

eliarbel · 2022-01-17T13:02:52Z

For topological ordering, can we eventually get rid of using isinstance to derive the ordering labels by relying on proper polymorphism instead ? i.e. if all circuit elements derive from Operation, we can think of a polymorphic method that returns some ID info which can be leveraged by many algorithms. Though run-time type resolution is very common in Python for functional purposes (e.g. not just input validation), I think it should be avoided whenever possible

mtreinish · 2022-01-17T13:29:21Z

For topological sort we shouldn't be relying on isinstance anywhere. For the lexicographical topological sort we do the tie breaking string sort key is str(qargs) and the rest of the sort should be isolated to retworkx internals. The place where we use isinstance more is filtering over the sorted nodes to get only gates, which we need to do to find the ordered subset of operations in the circuit which are gates. We could add a slotted type or gate boolean attribute to do the same which probably would be faster than isinstance with an abcmeta.

jakelishman · 2022-01-17T13:32:24Z

The sort key here is specific to CollectMultiQBlocks; it's not the default sort key for DAGCircuit.topological_sort, which does use object polymorphism (sort of) by accessing x.sort_key.

edit: Matthew just said something pretty similar haha.

I don't see why we should have a gate Boolean instead of a Gate interface. That's kind of the point of #7087 in the first place - having those features be a defined interface means you can attach extra functionality, and you get static type-checking that the operations you want to perform are well defined. It's also something we explicitly removed from DAGNode.

eliarbel · 2022-01-17T14:07:24Z

So it seems we should live ABC as a base class for Operation and treat the performance degradation either in collect_key. @jakelishman what do you mean having a Gate interface instead of a gate Boolean, without using isinstance ?

jakelishman · 2022-01-17T14:21:44Z

I think you've misunderstood me here - I think the best way to treat the performance is to change the base class of Operation to use a metaclass that has the original versions of __instancecheck__ and __subclasscheck__ that avoids the ABC.register overhead checking. I also think the usage of isinstance(x, Gate) is appropriate in this key function; it needs to find out if a particular circuit Operation also implements the more specialised interface Gate, and that's the exact reason for #7087 in the first place.

I'd agree that using run-time type information is bad if you're trying switch on all possible subclass instances to do something different - that's absolutely something polymorphism should be used for - but in this case there is specialist behaviour that is necessary only for one specific subinterface to Operation. If we wanted to use polymorphism over all implementors of Operation here, we'd have to add a dummy method to Operation, simply so Gate can override it, and everything else can ignore it. Anything that wants to be treated as a gate should inherit from Gate to be properly polymorphic, not try to use gate-specific hooks without implementing the rest of the interface.

Gate is actually a concrete class that can be instantiated, but really we treat it as the "unitary circuit operation" interface throughout Qiskit, and we require that all such operations inherit from Gate, so isinstance is the appropriate check to me.

kdk · 2022-01-21T17:17:26Z

I think this needs some more investigation before picking a direction here, especially to determine if we would expect this slow-down to grow as we increase either the number of operations in a circuit, the number of operation-types or something else.

From my best effort reading of https://github.com/python/cpython/blob/f4c03484da59049eb62a9bf7777b963e2267d187/Modules/_abc.c#L586 , it seems like the isinstance calculation should be cached per-type, so I natively wouldn't expect much slow down after the first call. It looks like this cache is at least being populated, but isinstance for a type with an ABC base class is still in my timing 4x slower (~100 ns to ~400 ns) than for a type with a concrete base class.

>>> ABCMeta._dump_registry(Gate)
Class: qiskit.circuit.gate.Gate
Inv. counter: 60
_abc_registry: set()
_abc_cache: {<weakref at 0x13f23fdd0; to 'ABCMeta' at 0x7feff5cf5be0 (XGate)>}
_abc_negative_cache: set()
_abc_negative_cache_version: 60

It sounds like there was some investigation into the origin of the regression for CollectMultiQBlocks, but it's less clear to me where the regressions around the BasisTranslator are originating. That pass doesn't contain any obvious isinstance calls, and none of the DAGCircuit methods it uses (.op_nodes, and .substitute_ndoe_with_dag) do either. My best guess would be that this is coming from its use of the circuit_to_dag and dag_to_circuit converters and their use of QuantumCircuit._append, but if that were the case, I would've expected this commit to show up as a regression in some of our circuit building benchmarks, but those are all noticeably flat, even with 100k+ gates:

https://qiskit.github.io/qiskit/#circuit_construction.CircuitConstructionBench.time_circuit_construction?commits=05b60a5e
https://qiskit.github.io/qiskit/#circuit_construction.CircuitConstructionBench.time_circuit_copy?commits=05b60a5e
https://qiskit.github.io/qiskit/#converters.ConverterBenchmarks.time_dag_to_circuit?commits=05b60a5e
https://qiskit.github.io/qiskit/#converters.ConverterBenchmarks.time_circuit_to_dag?commits=05b60a5e
https://qiskit.github.io/qiskit/#ripple_adder.RippleAdderConstruction.time_build_ripple_adder?commits=05b60a5e
https://qiskit.github.io/qiskit/#ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder?commits=05b60a5e

@jakelishman 's suggestion seems like a reasonable way to solve the regression, but I am a bit hesitant to immediately jump to making an Operation a non-standard ABC without a better understanding of the origin of the problem, and it's likelihood to increase in magnitude . If we do go forward with that approach, we should make sure that when someone someday tries Operation.register(MyCustomOp) they see something like raise NotImplemenetedError("see GH-7528").

Possibly helpful reference: https://stackoverflow.com/questions/42378726/why-is-checking-isinstancesomething-mapping-so-slow

alexanderivrii · 2022-02-04T08:56:27Z

After investigating this a bit more (using cProfile):

As discussed previously, isinstance slows things down when Operation inherits from ABC
Additionally, it seems that returning fields directly, e.g.

class Instruction:
    def __init__(self):
        ...
        self.name = name

is faster than calling (polymorphic) functions:

    @property
    def name(self):
        """Return the name."""
        return self._name

and in the PR we have indeed changed name, num_qubits, and num_clbits to be properties. I believe this is what's responsible for basis_translator and assembler benchmarks.
(Note: if it would help, I could upload my profiling results somewhere).

Here are some numbers on a toy example, with classes

class Base(ABC): ...
class B(Base): ...
class C(B): ...

isinstance(elem, C) with Base not inheriting from ABC: 0.49
isinstance(elem, C) with Base inheriting from ABC: 1.17
inheriting from ABC, but using a function to return the "type": 0.57 (when returning int) and 0.51 (when returning bool)
inheriting from ABC, but using a field self._type: 0.33 when int, and 0.26 when bool

So, working with fields directly (especially bool ones) seems fastest (and does not depend on whether the base class inherits from ABC or not). Specifically, for CollectMultiQBlocks this would mean keeping a field in DagNode representing whether it's a DagInNode or DagOutNode, but (if I understand correctly) this is very closely related to what we are trying to deprecate.

I have tried to implement @jakelishman 's suggestion as follows (and probably did not do it fully correctly):

class ABCMeta_Hack(ABCMeta):

    def __instancecheck__(self, instance):
        return type.__instancecheck__(self, instance)

    def __subclasscheck__(self, subclass):
        return type.__subclasscheck__(self, subclass)


class ABC_Hack(metaclass=ABCMeta_Hack):
    __slots__ = ()

class Base(ABC_Hack): ...

but on the toy example it did not improve things compared to inheriting from ABC. If this wold help, here is the profiling data for Base not inheriting from anything, vs. Base inheriting from ABC_Hack vs Base inheriting from ABC:

 NO INHERITANCE:

  5000002 function calls in 1.363 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.830    0.830    1.363    1.363 test_code\test_isinstance.py:126(experiment_isinst)
  5000000    0.533    0.000    0.533    0.000 {built-in method builtins.isinstance}

ABC_HACK:

  11669922 function calls in 3.176 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.800    0.800    3.176    3.176 test_isinstance.py:126(experiment_isinst)
  5000000    1.094    0.000    2.376    0.000 {built-in method builtins.isinstance}
  3334960    0.864    0.000    1.282    0.000 test_code\test_isinstance.py:12(__instancecheck__)
  3334960    0.418    0.000    0.418    0.000 {function ABCMeta_Hack.__instancecheck__ at 0x0199BB68}

ABC:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.776    0.776    3.156    3.156 test_code\test_isinstance.py:126(experiment_isinst)
  5000000    1.132    0.000    2.380    0.000 {built-in method builtins.isinstance}
  3334960    0.663    0.000    1.248    0.000 Python\Python39-32\lib\abc.py:96(__instancecheck__)
  3334960    0.585    0.000    0.585    0.000 {built-in method _abc._abc_instancecheck}
        2    0.000    0.000    0.000    0.000 Python39-32\lib\abc.py:100(__subclasscheck__)
        2    0.000    0.000    0.000    0.000 {built-in method _abc._abc_subclasscheck}

kdk · 2022-02-15T15:48:45Z

Thanks for looking into this @alexanderivrii .

PR we have indeed changed name, num_qubits, and num_clbits to be properties. I believe this is what's responsible for basis_translator and assembler benchmarks.

Interesting, this seems like a likely culprit. It might be interesting to see if this can be mitigated by using class-level attributes, e.g.

class Clifford(Operation):
    name = 'clifford'

    def __init__(...):
        ....

This should satisfy the ABC requirements and be faster, and be more common than the case where these properties need to be dynamic (and thus @properties). (See also: #6883 )

(Note: if it would help, I could upload my profiling results somewhere).

I believe you should be able to attach files in the Github UI (by clicking the lower bar or drag and drop), but failing that, flamegraphs or cProfile outputs are still helpful.

kdk · 2022-03-29T15:20:11Z

For 0.20, since the circuit and transpiler are still using isinstance(..., Instruction), we should remove Operation as a base class for any of the instructions in the circuit library and defer the release note until we can investigate further.

mtreinish added the bug Something isn't working label Jan 13, 2022

mtreinish assigned kdk and ShellyGarion Jan 13, 2022

mtreinish added the performance label Jan 13, 2022

mtreinish added this to the 0.20 milestone Jan 13, 2022

ShellyGarion assigned alexanderivrii Jan 16, 2022

kdk mentioned this issue Mar 2, 2022

[WIP] broadcast arguments mixin #7718

Closed

kdk assigned eliarbel and unassigned kdk, ShellyGarion and alexanderivrii Mar 29, 2022

eliarbel mentioned this issue Mar 30, 2022

Temporarily remove Operation class usage for 0.20 #7840

Merged

jakelishman linked a pull request Mar 30, 2022 that will close this issue

Temporarily remove Operation class usage for 0.20 #7840

Merged

mergify bot closed this as completed in #7840 Mar 30, 2022

alexanderivrii mentioned this issue Apr 26, 2022

Adding Cliffords to QuantumCircuits as Operations #7966

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression caused by #7087 #7528

Performance regression caused by #7087 #7528

mtreinish commented Jan 13, 2022 •

edited

Loading

alexanderivrii commented Jan 17, 2022

jakelishman commented Jan 17, 2022 •

edited

Loading

eliarbel commented Jan 17, 2022

mtreinish commented Jan 17, 2022

jakelishman commented Jan 17, 2022 •

edited

Loading

eliarbel commented Jan 17, 2022

jakelishman commented Jan 17, 2022

kdk commented Jan 21, 2022

alexanderivrii commented Feb 4, 2022 •

edited

Loading

kdk commented Feb 15, 2022

kdk commented Mar 29, 2022

Performance regression caused by #7087 #7528

Performance regression caused by #7087 #7528

Comments

mtreinish commented Jan 13, 2022 • edited Loading

Environment

What is happening?

How can we reproduce the issue?

What should happen?

Any suggestions?

alexanderivrii commented Jan 17, 2022

jakelishman commented Jan 17, 2022 • edited Loading

eliarbel commented Jan 17, 2022

mtreinish commented Jan 17, 2022

jakelishman commented Jan 17, 2022 • edited Loading

eliarbel commented Jan 17, 2022

jakelishman commented Jan 17, 2022

kdk commented Jan 21, 2022

alexanderivrii commented Feb 4, 2022 • edited Loading

kdk commented Feb 15, 2022

kdk commented Mar 29, 2022

mtreinish commented Jan 13, 2022 •

edited

Loading

jakelishman commented Jan 17, 2022 •

edited

Loading

jakelishman commented Jan 17, 2022 •

edited

Loading

alexanderivrii commented Feb 4, 2022 •

edited

Loading