Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup codebase using Refurb #14887

Closed
wants to merge 21 commits into from
Closed

Conversation

dosisod
Copy link
Contributor

@dosisod dosisod commented Mar 13, 2023

This PR includes a lot of cleanups from a tool I wrote called Refurb. Most of the checks are readability related, but some of them do provide (small) performance improvements, such as using list comprehensions in more places, and micro-optimizations such as using x.copy() instead of a slice copy (x[:]). I made a commit for each error code (or group of similar error codes), so feel free to yay or nay any error codes which you don't want included.

The full list of error codes and their meanings can be viewed here.

This PR ended up being a lot bigger then I intended, so if you would like me to break it up into smaller PR's I would be more then happy to do so!

Only one of these where a FURB125 error, the rest where FURB126 errors. The
FURB126 errors are `else` blocks which can be removed because all `if`/`elif`
blocks above it return. This does not detect `elif` blocks which can be removed
because all `elif`/`if` block above it return, which might be a future check
for Refurb.
There was only one FURB137 error. Most of them where list comprehension
related. In CPython list-comprehensions are faster then the equivalent loop
and append version, so it would be curious to see what the speedups look like
(if any) for the Mypyc compiled version.
@github-actions

This comment has been minimized.

Turns out you cannot compare different types with `None` using the `is`
operator. I would think that because `None` is common between the 2 optional
types that this would be valid, but it would appear not.
@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

Copy link
Collaborator

@cdce8p cdce8p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I like these changes in general. Especially the isinstance simplifications and if instead of elif / else after an early return. Left a few comments for things that might be questionable IMO. Other contributors might feel different about those though.

I think it would be interesting to know the performance impact of these changes. Especially the change towards more comprehensions. There is a script for that you might want to take a look at: https://github.com/python/mypy/blob/master/misc/perf_compare.py

if base_static is False and compare_static is False:
if base_static is compare_static is False:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my opinion: Even if technically correct, I'm not a huge fan of this style. It makes reasoning about the result of the IfExp more difficult, especially if it's a more complicated one. I would recommend to revert all FURB124 changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also one of the few changes that I really didn't like when I skimmed over

Comment on lines -299 to +296
elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == 1 and len(ctx.args[1]) == 1:
elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == len(ctx.args[1]) == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good example where FURB124 isn't ideal and even makes the code more difficult to read.

Comment on lines -2396 to +2400
for j, _ in enumerate(tvars)
for j in range(len(tvars))
]
down_args: list[Type] = [
UninhabitedType() if i == j else AnyType(TypeOfAny.special_form)
for j, _ in enumerate(tvars)
for j in range(len(tvars))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't look like improvements to me. I would revert the FURB148 changes.
It's much easier to read and understand enumerate than range(len(...)) even if we ignore one of the outputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(stmt, IfStmt) and seen_unconditional_func_def is False:
if isinstance(stmt, IfStmt) and not seen_unconditional_func_def:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is questionable. I would prefer the explicit is False. Especially in combination with and / or it's much to easy to make mistakes with not.

-> Revert FURB149

Comment on lines 785 to +787
stmt.else_body is None
or stmt.body[0].is_unreachable is False
and stmt.else_body.is_unreachable is False
or not stmt.body[0].is_unreachable
and not stmt.else_body.is_unreachable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Sometimes being explicit isn't necessarily a bad thing.

@JukkaL
Copy link
Collaborator

JukkaL commented Mar 13, 2023

Various individual changes look good, though this seems too big as a single PR. If something would actually regress, it would be difficult to figure out why. Also, some of the changes may actually slow things down when compiled with mypyc, even though the change might be beneficial when using interpreted Python.

What about creating PRs which clean up one class of issues each, so that they can be independently reviewed (I'd recommend only creating one or two PRs at a time).

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes could slow things down when compiled (didn't double check these).

@@ -129,7 +129,7 @@ def main() -> None:
exit(1)

if not args:
args = DEFAULT_COMMANDS[:]
args = DEFAULT_COMMANDS.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be slower when compiled with mypyc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some benchmarks I made comparing x[:], x.copy(), and list(x) (I tried using the perf_compare.py file, but it just froze my computer during the compilation stage).

For each benchmark I ran 10 million iterations, repeated the test 3 times, and took the average. I did this for both the interpreted and compiled versions (currently using CPython 3.10.9, and Mypyc checked out from this PR). This is the benchmark code I used (which can probably be improved):

import time

diff = 0.0
nums = [1, 2, 3, 4]

for _ in range(10_000_000):
    start = time.time()

    # <snippet to test here>

    end = time.time()
    diff += end - start

print(diff)

These are the stats for my machine (units are in seconds):

Mode x = nums[:] x = nums.copy() x = list(nums)
Interpreted 2.610 2.451 3.161
Compiled 2.767 2.428 3.245

Interestingly, both nums[:] and list(nums) are slightly slower when compiled, and nums.copy() is fastest overall, including when compiled.

@@ -183,8 +182,7 @@ def update_register_assignments_to_set_bitmap(
IntOp.OR,
op.line,
)
new_ops.append(new)
new_ops.append(Assign(reg, new))
new_ops.extend((new, Assign(reg, new)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be slower when compiled with mypyc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building off my last comment, here are some benchmarks for append() vs extend(). Here is the relevant portions of the benchmarking code with the new snippets added:

for _ in range(10_000_000):
    ...
    nums: list[int] = []

    # append
    # nums.append(1)
    # nums.append(2)

    # extend
    nums.extend((1, 2))
    ...

I commented out only the append and extend methods for each test, and here are the results (units are in seconds):

Mode append extend
Interpreted 3.114 2.559
Compiled 7.573 6.874

Shockingly, those snippets are much slower when compiled. I moved nums out of the loop, meaning only the append/extend part would be tested, and this is what I got:

Mode append extend
Interpreted 2.533 2.019
Compiled 2.532 2.038

Meaning they are pretty much on par with each other.

And, for comparison, here is the speed comparison for creating an empty list in a loop (with no append/extend):

Mode x = []
Interpreted 1.995
Compiled 5.396

Perhaps my test methodology is wrong, but we might want to look into why allocating empty lists is so much slower.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running the experiments! The results don't look like what I'd have expected -- very interesting. I'll investigate this.

@@ -544,7 +545,7 @@ def run_analysis(
block_kill[block] = kill

# Set up initial state for worklist algorithm.
worklist = list(blocks)
worklist = blocks.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be slower when compiled with mypyc.

@JukkaL
Copy link
Collaborator

JukkaL commented Mar 13, 2023

These changes could slow things down when compiled (didn't double check these).

I flagged only one instance of each category. Others like these would be similar.

@@ -20,8 +20,8 @@
def make_cache(input_dir: str, sqlite: bool) -> MetadataStore:
if sqlite:
return SqliteMetadataStore(input_dir)
else:
return FilesystemMetadataStore(input_dir)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of these changes as they reduce symmetry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, might be a good modification to the rule to avoid rewriting into an early-return where the two parts are somewhat symmetrical in number of statements, or maybe simply when they're short.

@dosisod
Copy link
Contributor Author

dosisod commented Mar 14, 2023

Thank you all for the feedback! I will go ahead and break up this PR into a bunch of smaller ones over the next few days.

To summarize though, the errors which should be reverted are:

  • FURB124: x is y or x is z -> x is y is z
  • FURB126: Simplifying else and elif blocks
  • FURB148: x, _ in enumerate(y) -> x in range(len(x))
  • FURB149: x is False -> x

Once we agree on what should/shouldn't be reverted I'll go ahead and close this PR.

dosisod added a commit to dosisod/mypy that referenced this pull request Mar 15, 2023
This PR removes certain `len()` comparisons when it would be faster/more
succinct to do a truthy check instead, such as `len(x) == 0` vs `not x`.

See python#14887
@dosisod dosisod closed this Mar 15, 2023
dosisod added a commit to dosisod/mypy that referenced this pull request Mar 23, 2023
This PR fixes the [`FURB145`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb145-no-slice-copy)
error code. In my (limited) testing using `.copy()` is faster then both
interpreted and compiled versions of `[:]` (see python#14887 (comment)).

See python#14887
hauntsaninja pushed a commit that referenced this pull request Mar 25, 2023
This PR removes certain `len()` comparisons when it would be faster/more
succinct to do a truthy check instead, such as `len(x) == 0` vs `not x`.

This fixes the
[`FURB115`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb115-no-len-compare)
errors in Refurb (see #14887).
sobolevn pushed a commit that referenced this pull request Apr 24, 2023
This PR fixes the
[`FURB145`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb145-no-slice-copy)
error code from Refurb. In my (limited) testing, using `x.copy()` is
faster then both the interpreted and compiled versions of `x[:]` (see
#14887 (comment)).

See #14887 for more info.
@dosisod dosisod mentioned this pull request May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants