Performance regression caused by dagnode op property #6493

mtreinish · 2021-06-01T13:50:48Z

Information

Qiskit Terra version: main after 3c979ebd
Python version: any
Operating system: any

What is the current behavior?

After PR #6199 merged a regression was flagged on our nightly benchmarks for certain transpiler passes where we're accessing a dag node's op frequently. For example:

https://qiskit.github.io/qiskit/#passes.PassBenchmarks.time_cx_cancellation?machine=dedicated-benchmarking-softlayer-baremetal&os=Linux%204.15.0-46-generic&ram=16GB&p-n_qubits=5&p-depth=1024&commits=3c979ebd

As was pointed out in #6433 this is caused by python function call overhead from using @property especially compared to a slotted attribute access that was there before..

Steps to reproduce the problem

Run a transpiler pass or any dagnode operation that frequently uses dagnode.op

What is the expected behavior?

No performance regression.

Suggested solutions

Either revert the change or change our internal dagnode usage to use the _op attribute on access for performance critical code (which is part of what #6433 does for one pass).

The text was updated successfully, but these errors were encountered:

enavarro51 · 2021-06-01T16:27:05Z

After a quick check, it looks like there are about 400 instances of some form of node.op in the code. I assume changing those to node._op is preferable to reverting #6199. If the change is preferred, I could take this on.

kdk · 2021-06-01T16:55:15Z

After a quick check, it looks like there are about 400 instances of some form of node.op in the code. I assume changing those to node._op is preferable to reverting #6199. If the change is preferred, I could take this on.

If possible, I'd prefer to only access node._op in the cases where we know node.op is causing a performance problem. Even then, this might be better fixed by restructuring the interface so access to a private attribute isn't needed, even for performance. (Maybe redefine node.op to be Optional[Instruction] that's not None if and only if node.type == 'op', or separate sub-classes for IODAGNode and OpDagNode. This still isn't great, but is maybe a step in the right direction.)

That said, I'm a little confused. node.op as an @property has existed since DAGNode was introduced in #1815 . #6199 added an @property for DAGNode.name.

mtreinish · 2021-06-01T17:11:16Z

That's my fault for a lack of precision in opening the issue. #6199 introduced a regression around DAGNode.name access, if you look at the benchmark I linked in the issue it's for cx cancellation which will be calling DAGNode.name for each node in the dag as part of: https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/transpiler/passes/optimization/cx_cancellation.py#L30 which is why we see the regression there in the nightly benchmarks

In the issue I was conflating that with the changes made in #6433 which were about DAGNode.op and DAGNode.qargs which were the same root cause function call overhead and were probably potential bottlenecks we've had sitting around for some time and just never noticed before.

Maybe @IvanIsCoding can comment here about his profiling on the collect_2q_blocks pass to show the overhead from the function call.

I agree with @kdk that we probably only want to do that in performance critical parts, but not generally. For example, in the circuit drawers (assuming it's used there, which I think it is) it wouldn't make sense to change anything.

enavarro51 · 2021-06-01T17:51:27Z

So in #6199, we deprecated the use of passing a name when instantiating a DAGNode. We then added the property approach to accessing the name to point it to _op.name. It looks like there are fewer than 100 cases of node.name used, mostly in the passes. How about if we change those node.name's to node._op.name since that's where it's pointing to anyway.

IvanIsCoding · 2021-06-01T19:59:46Z

That's my fault for a lack of precision in opening the issue. #6199 introduced a regression around DAGNode.name access, if you look at the benchmark I linked in the issue it's for cx cancellation which will be calling DAGNode.name for each node in the dag as part of: https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/transpiler/passes/optimization/cx_cancellation.py#L30 which is why we see the regression there in the nightly benchmarks

In the issue I was conflating that with the changes made in #6433 which were about DAGNode.op and DAGNode.qargs which were the same root cause function call overhead and were probably potential bottlenecks we've had sitting around for some time and just never noticed before.

Maybe @IvanIsCoding can comment here about his profiling on the collect_2q_blocks pass to show the overhead from the function call.

I agree with @kdk that we probably only want to do that in performance critical parts, but not generally. For example, in the circuit drawers (assuming it's used there, which I think it is) it wouldn't make sense to change anything.

I can jump in an say that in my particular case at #6433, @property is used just for the setter and not for the getter. Hence, doing node.op and node._op are equivalent. This might not be as straightfoward in this case.

The speedup by replacing op/_op and qargs/_qargs at collect_2q_blocks comes from:

Almost every if/else checked those two properties, often multiple times per if statement
_qargs/_op are heavily optimized because of the use of __slots__ . That is the fastest attribute look up you can do in Python!
qargs/op are not optimized because there is @property, hence there is a function call which slows things

My advice is to benchmark and see if the gains in th specific case are worth it. In collect_2q_blocks, they were. Counting appearance of node.name is a good starter guess, but mind you those don't account if they're in loops that executed many times or if statements that are never reaached.

kdk · 2021-06-02T23:39:31Z

Out of curiosity, I ran the following through timeit:

class C():
    def __init__(self):
        self.foo = 'bar'
        
    @property
    def property_foo(self):
        return self.foo
    
    @property
    def cond_property_foo(self):
        if not True or 'true' != 'true':
            raise ValueError()
        return self.foo

time per call	without slots	with slots
`c.foo`	35.4 ns ± 1.5 ns	26 ns ± 0.197 ns
`c.property_foo`	119 ns ± 3.56 ns	107 ns ± 2.5 ns
`c.cond_property_foo`	159 ns ± 24 ns	140 ns ± 2.86 ns

(N.B. Python 3.6 on OSX 10.15 )

At least for this microbenchmark, it seems in general like __slots__ saves 10-15 ns per call, @property costs ~80 ns, and the condition checking another 30-40 ns per call. If these are attributes that are accessed frequently enough in a loop (and if these numbers are representative), this could add up to the observed regression.

That said, my preference here would be to either:

Use the _ attributes only in the cases where we know there is an active performance concern (with a comment explaining why we're using the private attribute),
or to restructure DAGNode (either via subclasses for IO and Op types, or by allowing .name,.op,... to be None) so that the @property and explicit type check aren't necessary.

That we have a performance concern over convenience code that wraps and re-raises an AttributeError as a QiskitError suggests to me that there's room for improvement in the design. In general, it would be good to avoid building interfaces that are fast only if the consumer knows how to hold them in the right way, when we can make interfaces that are intentionally fast (as fast as python can be) by design.

mtreinish · 2021-06-03T16:54:33Z

Personally I think the subclass approach makes the most sense. That way we can avoid the property altogether. Having the explicit type attribute always seemed a bit weird.

mtreinish added bug Something isn't working performance labels Jun 1, 2021

kdk added this to the 0.18 milestone Jun 1, 2021

kdk assigned enavarro51 Jun 10, 2021

enavarro51 mentioned this issue Jun 13, 2021

Replace DAGNode class with OpNode, InNode, and OutNode classes #6567

Merged

kdk modified the milestones: 0.18, 0.19 Jun 15, 2021

kdk closed this as completed in #6567 Aug 3, 2021

This was referenced Sep 17, 2021

Fast vs safe attribute access #7035

Closed

Make internal Layout and CouplingMap attrs slotted and adjust passes for fast access #7036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression caused by dagnode op property #6493

Performance regression caused by dagnode op property #6493

mtreinish commented Jun 1, 2021

enavarro51 commented Jun 1, 2021

kdk commented Jun 1, 2021

mtreinish commented Jun 1, 2021

enavarro51 commented Jun 1, 2021

IvanIsCoding commented Jun 1, 2021

kdk commented Jun 2, 2021

mtreinish commented Jun 3, 2021

Performance regression caused by dagnode op property #6493

Performance regression caused by dagnode op property #6493

Comments

mtreinish commented Jun 1, 2021

Information

What is the current behavior?

Steps to reproduce the problem

What is the expected behavior?

Suggested solutions

enavarro51 commented Jun 1, 2021

kdk commented Jun 1, 2021

mtreinish commented Jun 1, 2021

enavarro51 commented Jun 1, 2021

IvanIsCoding commented Jun 1, 2021

kdk commented Jun 2, 2021

mtreinish commented Jun 3, 2021