Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUF017 Avoid quadratic list summation: consider itertools.chain.from_iterable #9045

Closed
RonnyPfannschmidt opened this issue Dec 7, 2023 · 7 comments
Labels
rule Implementing or modifying a lint rule

Comments

@RonnyPfannschmidt
Copy link

currently RUF017

takes

mess = [[1], [1,2],[1], [1,2],[1], [1,2]]

tags = sorted(sum([x for x in mess], []))

and turns it into

import functools
import operator
mess = [[1], [1,2],[1], [1,2],[1], [1,2]]

tags = sorted(functools.reduce(operator.iadd, [x for x in mess], []))

instead of the more idiomatic

import itertools

mess = [[1], [1,2],[1], [1,2],[1], [1,2]]

tags = sorted(itertools.chain.from_iterable(mess))

based on the applications of the pattern i have seen so far, i would almost always recommend to replace
the sum with a list(chain.from_iterable(...))

with the additional "optimization" of in-lining list comprehensions and/or generator expressions when applicable

@charliermarsh
Copy link
Member

This makes sense... We could at least do this for cases in which it's directly within a sorted or similar.

@charliermarsh charliermarsh added the rule Implementing or modifying a lint rule label Dec 7, 2023
@charliermarsh
Copy link
Member

Interestingly, though, the functools.reduce variant does seem to be quite a bit faster.

import timeit

code1 = """
tags = sorted(itertools.chain.from_iterable(mess))
"""

code2 = """
tags = sorted(functools.reduce(operator.iadd, [x for x in mess], []))
"""

repeats = 10000

time1 = timeit.timeit(stmt=code1, setup='import itertools\nmess = [[i, i + 1] for i in range(10000)]', number=repeats)
print(f"{time1} seconds")

time2 = timeit.timeit(stmt=code2, setup='import itertools\nimport functools\nimport operator\nmess = [[i] for i in range(10000)]', number=repeats)
print(f"{time2} seconds")

Yields:

4.019917957950383 seconds
2.7585187909426168 seconds

@RonnyPfannschmidt
Copy link
Author

I suspect this relates to different optimization levels of the Code paths

@hauntsaninja
Copy link
Contributor

Here's some more benchmarking from when we added this: #5073 (comment)

@Skylion007
Copy link
Contributor

Skylion007 commented Dec 22, 2023

This would be covered under a REFURB/FURB (FURB179) rule when we choose to implement it. #1348

@hauntsaninja
Copy link
Contributor

I'd recommend closing this issue, since the functools variant is meaningfully faster: #5073 (comment)

@charliermarsh
Copy link
Member

I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rule Implementing or modifying a lint rule
Projects
None yet
Development

No branches or pull requests

4 participants