Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multiple comparison operators can cause performance issues #89705

Closed
akuvfx mannequin opened this issue Oct 20, 2021 · 8 comments
Closed

Using multiple comparison operators can cause performance issues #89705

akuvfx mannequin opened this issue Oct 20, 2021 · 8 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@akuvfx
Copy link
Mannequin

akuvfx mannequin commented Oct 20, 2021

BPO 45542
Nosy @tim-one, @stevendaprano, @serhiy-storchaka, @brandtbucher, @sweeneyde, @akuvfx
PRs
  • bpo-45542: Improve performance of comparison operators #29109
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-10-20.18:25:33.499>
    labels = ['interpreter-core', '3.11', 'performance']
    title = 'Using multiple comparison operators can cause performance issues'
    updated_at = <Date 2022-04-07.18:29:37.622>
    user = 'https://github.com/akuvfx'

    bugs.python.org fields:

    activity = <Date 2022-04-07.18:29:37.622>
    actor = 'brandtbucher'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2021-10-20.18:25:33.499>
    creator = 'akuvfx'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45542
    keywords = ['patch']
    message_count = 7.0
    messages = ['404508', '404536', '404553', '404576', '416706', '416719', '416735']
    nosy_count = 6.0
    nosy_names = ['tim.peters', 'steven.daprano', 'serhiy.storchaka', 'brandtbucher', 'Dennis Sweeney', 'akuvfx']
    pr_nums = ['29109']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue45542'
    versions = ['Python 3.11']

    @akuvfx
    Copy link
    Mannequin Author

    akuvfx mannequin commented Oct 20, 2021

    For example:

        def f(x):
            return 1 < x < 3

    will be slower than:

        def f(x):
            return 1 < x and x < 3

    The first function will generate following bytecode:

              0 LOAD_CONST               1 (1)
              2 LOAD_FAST                0 (x)
              4 DUP_TOP
              6 ROT_THREE
              8 COMPARE_OP               0 (<)
             10 JUMP_IF_FALSE_OR_POP    18
             12 LOAD_CONST               2 (3)
             14 COMPARE_OP               0 (<)
             16 RETURN_VALUE
        >>   18 ROT_TWO
             20 POP_TOP
             22 RETURN_VALUE
    

    Performs unnecessary stack operations: duplicates x, rotates 3 items for no reason, then re-rotates 2 items and pops a value. This is fine if the value in the middle was more complex and needed to be duplicated rather than recalculated, which would be definitely slower, but for simpler values like names or constants, it's actually bad. The bytecode for the first function should look the same as second one which is:

              0 LOAD_CONST               1 (1)
              2 LOAD_FAST                0 (x)
              4 COMPARE_OP               0 (<)
              6 JUMP_IF_TRUE_OR_POP     14
              8 LOAD_FAST                0 (x)
             10 LOAD_CONST               2 (3)
             12 COMPARE_OP               0 (<)
        >>   14 RETURN_VALUE
    

    @akuvfx akuvfx mannequin added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 20, 2021
    @sweeneyde
    Copy link
    Member

    The PR changes behavior slightly:

    def f():
        class A:
            def __lt__(self, other):
                nonlocal x
                x += 100
                return True
        a = A()
        x = 1
        print(a < x < 10)
        x = 1
        print(a < x and x < 10)
    ### Before ###
    >>> f()
    True
    False
    
    ### After ###
    >>> f()
    False
    False

    So strictly speaking, this would be backwards-incompatible. But morally, I am not totally sure.

    @tim-one
    Copy link
    Member

    tim-one commented Oct 21, 2021

    I think Dennis's example is fatal: from section 6.10 ("Comparisons"):

    """
    Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).
    """

    So doing LOAD_FAST twice on x (in 1 < x < 3) is prohibited by the language definition. Doesn't matter to this whether it's plain x or f(x) where f() is some arbitrary function: the object the middle comparand signifies is fixed at whatever it was when the the first comparison is evaluated.

    @serhiy-storchaka
    Copy link
    Member

    Agree with Tim.

    The idea of optimizing stack manipulation operations for constants is interesting, but is it common enough case to justify the compiler complication?

    See also rejected bpo-27236.

    @sweeneyde
    Copy link
    Member

    https://bugs.python.org/issue47221 was opened as a duplicate of this.

    Unless there are any new ideas for getting around the concerns here, I think this can be closed.

    @stevendaprano
    Copy link
    Member

    I came here from bpo-47221.

    If I am reading this correctly, it concerns me that stack operations (which should be fast) are apparently slow?

    If we can't reduce the number of stack operations, can we speed them up?

    @sweeneyde
    Copy link
    Member

    For reference, chaining is about 1.18x slower in this microbenchmark on GCC:

    ./python -m pyperf timeit -s "x = 100" "if 10 < x < 30: print('no')" --duplicate=10
    .....................
    Mean +- std dev: 21.3 ns +- 0.2 ns
    ./python -m pyperf timeit -s "x = 100" "if 10 < x and x < 30: print('no')" --duplicate=10
    .....................
    Mean +- std dev: 18.0 ns +- 0.5 ns

    For a related case, in #75153, the bytecode generate by "a, b = a0, b0" was changed.
    Before: [load_a0, load_b0, swap, store_a, store_b]
    After: [load_a0, load_b0, store_b, store_a]
    However, this was only changed when the stores were STORE_FASTs. STORE_GLOBAL/STORE_NAME/STORE_DEREF cases still have the SWAP.
    In the STORE_GLOBAL cases, you can construct scenarios with custom __del__ methods where storing b and then a has different behavior than storing a and then b. No such cases can be constructed for STORE_FAST without resorting to frame hacking.

    I wonder if the same argument applies here: maybe @akuvfx's PR could be altered to use LOAD_FAST twice for each variable *only* if everything in sight is the result of a LOAD_FAST or a LOAD_CONST. My example above uses a LOAD_DEREF, so its behavior could remain unchanged.

    The argument that this would within the language spec is maybe a little bit more dubious than the "a, b = a0, b0" case though, since custom __lt__ methods are a bit more well-specified than custom __del__ methods.

    Thoughts?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    Closing since, unfortunately, the proposed optimization is not safe.

    @iritkatriel iritkatriel closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants