Using multiple comparison operators can cause performance issues #89705

akuvfx · 2021-10-20T18:25:33Z

BPO	45542
Nosy	@tim-one, @stevendaprano, @serhiy-storchaka, @brandtbucher, @sweeneyde, @akuvfx
PRs	bpo-45542: Improve performance of comparison operators #29109

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-10-20.18:25:33.499>
labels = ['interpreter-core', '3.11', 'performance']
title = 'Using multiple comparison operators can cause performance issues'
updated_at = <Date 2022-04-07.18:29:37.622>
user = 'https://github.com/akuvfx'

bugs.python.org fields:

activity = <Date 2022-04-07.18:29:37.622>
actor = 'brandtbucher'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2021-10-20.18:25:33.499>
creator = 'akuvfx'
dependencies = []
files = []
hgrepos = []
issue_num = 45542
keywords = ['patch']
message_count = 7.0
messages = ['404508', '404536', '404553', '404576', '416706', '416719', '416735']
nosy_count = 6.0
nosy_names = ['tim.peters', 'steven.daprano', 'serhiy.storchaka', 'brandtbucher', 'Dennis Sweeney', 'akuvfx']
pr_nums = ['29109']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue45542'
versions = ['Python 3.11']

akuvfx · 2021-10-20T18:25:33Z

For example:

    def f(x):
        return 1 < x < 3

will be slower than:

    def f(x):
        return 1 < x and x < 3

The first function will generate following bytecode:

          0 LOAD_CONST               1 (1)
          2 LOAD_FAST                0 (x)
          4 DUP_TOP
          6 ROT_THREE
          8 COMPARE_OP               0 (<)
         10 JUMP_IF_FALSE_OR_POP    18
         12 LOAD_CONST               2 (3)
         14 COMPARE_OP               0 (<)
         16 RETURN_VALUE
    >>   18 ROT_TWO
         20 POP_TOP
         22 RETURN_VALUE

Performs unnecessary stack operations: duplicates x, rotates 3 items for no reason, then re-rotates 2 items and pops a value. This is fine if the value in the middle was more complex and needed to be duplicated rather than recalculated, which would be definitely slower, but for simpler values like names or constants, it's actually bad. The bytecode for the first function should look the same as second one which is:

          0 LOAD_CONST               1 (1)
          2 LOAD_FAST                0 (x)
          4 COMPARE_OP               0 (<)
          6 JUMP_IF_TRUE_OR_POP     14
          8 LOAD_FAST                0 (x)
         10 LOAD_CONST               2 (3)
         12 COMPARE_OP               0 (<)
    >>   14 RETURN_VALUE

sweeneyde · 2021-10-20T23:02:09Z

The PR changes behavior slightly:

def f():
    class A:
        def __lt__(self, other):
            nonlocal x
            x += 100
            return True
    a = A()
    x = 1
    print(a < x < 10)
    x = 1
    print(a < x and x < 10)

### Before ###
>>> f()
True
False

### After ###
>>> f()
False
False

So strictly speaking, this would be backwards-incompatible. But morally, I am not totally sure.

tim-one · 2021-10-21T00:17:41Z

I think Dennis's example is fatal: from section 6.10 ("Comparisons"):

"""
Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).
"""

So doing LOAD_FAST twice on x (in 1 < x < 3) is prohibited by the language definition. Doesn't matter to this whether it's plain x or f(x) where f() is some arbitrary function: the object the middle comparand signifies is fixed at whatever it was when the the first comparison is evaluated.

serhiy-storchaka · 2021-10-21T09:42:08Z

Agree with Tim.

The idea of optimizing stack manipulation operations for constants is interesting, but is it common enough case to justify the compiler complication?

See also rejected bpo-27236.

sweeneyde · 2022-04-04T23:14:15Z

https://bugs.python.org/issue47221 was opened as a duplicate of this.

Unless there are any new ideas for getting around the concerns here, I think this can be closed.

stevendaprano · 2022-04-05T02:35:07Z

I came here from bpo-47221.

If I am reading this correctly, it concerns me that stack operations (which should be fast) are apparently slow?

If we can't reduce the number of stack operations, can we speed them up?

sweeneyde · 2022-04-05T03:14:58Z

For reference, chaining is about 1.18x slower in this microbenchmark on GCC:

./python -m pyperf timeit -s "x = 100" "if 10 < x < 30: print('no')" --duplicate=10
.....................
Mean +- std dev: 21.3 ns +- 0.2 ns
./python -m pyperf timeit -s "x = 100" "if 10 < x and x < 30: print('no')" --duplicate=10
.....................
Mean +- std dev: 18.0 ns +- 0.5 ns

For a related case, in #75153, the bytecode generate by "a, b = a0, b0" was changed.
Before: [load_a0, load_b0, swap, store_a, store_b]
After: [load_a0, load_b0, store_b, store_a]
However, this was only changed when the stores were STORE_FASTs. STORE_GLOBAL/STORE_NAME/STORE_DEREF cases still have the SWAP.
In the STORE_GLOBAL cases, you can construct scenarios with custom __del__ methods where storing b and then a has different behavior than storing a and then b. No such cases can be constructed for STORE_FAST without resorting to frame hacking.

I wonder if the same argument applies here: maybe @akuvfx's PR could be altered to use LOAD_FAST twice for each variable *only* if everything in sight is the result of a LOAD_FAST or a LOAD_CONST. My example above uses a LOAD_DEREF, so its behavior could remain unchanged.

The argument that this would within the language spec is maybe a little bit more dubious than the "a, b = a0, b0" case though, since custom __lt__ methods are a bit more well-specified than custom __del__ methods.

Thoughts?

iritkatriel · 2022-09-10T11:17:24Z

Closing since, unfortunately, the proposed optimization is not safe.

akuvfx mannequin added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 20, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using multiple comparison operators can cause performance issues #89705

Using multiple comparison operators can cause performance issues #89705

akuvfx mannequin commented Oct 20, 2021

akuvfx mannequin commented Oct 20, 2021

sweeneyde commented Oct 20, 2021

tim-one commented Oct 21, 2021

serhiy-storchaka commented Oct 21, 2021

sweeneyde commented Apr 4, 2022

stevendaprano commented Apr 5, 2022

sweeneyde commented Apr 5, 2022

iritkatriel commented Sep 10, 2022

Using multiple comparison operators can cause performance issues #89705

Using multiple comparison operators can cause performance issues #89705

Comments

akuvfx mannequin commented Oct 20, 2021

akuvfx mannequin commented Oct 20, 2021

sweeneyde commented Oct 20, 2021

tim-one commented Oct 21, 2021

serhiy-storchaka commented Oct 21, 2021

sweeneyde commented Apr 4, 2022

stevendaprano commented Apr 5, 2022

sweeneyde commented Apr 5, 2022

iritkatriel commented Sep 10, 2022