Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PYTHON] CRPYT404: Use generator expressions instead of list comprehensions in for-loops declaration (Team 1.5) #127

Closed
PaulWassermann opened this issue Apr 5, 2023 · 1 comment · Fixed by #152
Assignees
Labels
🗃️ rule rule improvment or rule development or bug 🏆 challenge2023 🏆 Work done during the ecoCode Challenge 2023 python

Comments

@PaulWassermann
Copy link

Team 1.5°C

@dedece35 dedece35 added 🗃️ rule rule improvment or rule development or bug python 🏆 challenge2023 🏆 Work done during the ecoCode Challenge 2023 labels Apr 5, 2023
@PaulWassermann PaulWassermann changed the title [PYTHON] Avoid lists when unnecessary [PYTHON] CRPYT404: Avoid lists when unnecessary Apr 6, 2023
@PaulWassermann PaulWassermann changed the title [PYTHON] CRPYT404: Avoid lists when unnecessary [PYTHON] CRPYT404: Use generator comprehensions instead of list comprehensions in for-loops declaration Apr 6, 2023
@PaulWassermann
Copy link
Author

PaulWassermann commented Apr 6, 2023

Explanation

Python generators resemble lazy lists from other programming languages: when iterated over, they compute their values on the fly. They lack some list behaviors (indexing, len method, ...) but are memory-efficient, as they do not store each of their values in memory, unlike lists. Thus, when declared in a for-loop declaration, list comprehensions can be safely replaced with generator comprehensions.

Code examples

The code example below creates a list through list comprehension, only for it to be iterated over in a for loop:

for index in [index for index in range(1_000_000)]:
    ...

The 1,000,000 long list of integers is entirely stored in memory, taking unnecessary space as the list is not referenced anywhere else in the code.

Below is how it should be done preferrably:

for index in (index for index in range(1_000_000)):
    ...

Notice the use of parenthesis instead of brackets: we created a generator through generator comprehension. By design, generators take almost no space in memory, thus releasing memory constraints on the hardware.

Next steps

Inspect use of list comprehensions outside for-loops declaration

Explanation

If given more time, one could implement a rule that inspect list comprehensions when used in local contexts (for example, in a function body) and assess if the list comprehension can be replaced with a generator comprehension.

Challenges

To avoid raising too much false positives, one could keep track, for each list defined by list comprehension, of its references in the code, and ensure that no list-derived methods (indexing, len query, ...) are used on them. We would need also to ensure that the list is iterated over only once (a generator can be iterated over only once). If those 2 elements can be checked through static analysis in a local context, then we could raise an issue for using a list comprehension when a generator comprehension could be used.

Non-compliant code example:

def func(...):
    my_list = [i for i in range(1_000)]
    return sum(my_list)

Compliant code example:

def func(...):
    my_generator = (i for i in range(1_000))
    return sum(my_generator)

Code examples that should not raise this issue (but unarguably, they could be rewritten):

def func(...):
    my_list = [i for i in range(1_000)]
    
    for index in my_list:
        ...

    for index in my_list:
        ...
def func(...):
    my_list = [i for i in range(1_000)]
    
    return my_list[0]

Inspect use of list comprehensions in function calls for functions that take iterable arguments

Explanation

A good example is worth hundreds of words. The following code should never be used:

sum([i ** 2 for i in range(1_000)])

This should be always preferred:

sum(i ** 2 for i in range(1_000))

Note: when a generator created through generator comprehension is directly passed to a function, its parenthesis can be omitted.

Challenges

The main challenge here is to detect when a function actually needs a list input and not just an iterable input.

An other one would be to define the scope of the rule: which functions would be inspected (a predefined set of built-in functions like filter, sum, any, all, ... or all functions with a type hint indicating one parameter should be Iterable ?)

@PaulWassermann PaulWassermann changed the title [PYTHON] CRPYT404: Use generator comprehensions instead of list comprehensions in for-loops declaration [PYTHON] CRPYT404: Use generator comprehensions instead of list comprehensions in for-loops declaration (Team 1.5) Apr 6, 2023
@dedece35 dedece35 linked a pull request May 8, 2023 that will close this issue
@dedece35 dedece35 changed the title [PYTHON] CRPYT404: Use generator comprehensions instead of list comprehensions in for-loops declaration (Team 1.5) [PYTHON] CRPYT404: Use generator expressions instead of list comprehensions in for-loops declaration (Team 1.5) Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗃️ rule rule improvment or rule development or bug 🏆 challenge2023 🏆 Work done during the ecoCode Challenge 2023 python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants