Cleanup codebase using Refurb #14887

dosisod · 2023-03-13T04:48:46Z

This PR includes a lot of cleanups from a tool I wrote called Refurb. Most of the checks are readability related, but some of them do provide (small) performance improvements, such as using list comprehensions in more places, and micro-optimizations such as using x.copy() instead of a slice copy (x[:]). I made a commit for each error code (or group of similar error codes), so feel free to yay or nay any error codes which you don't want included.

The full list of error codes and their meanings can be viewed here.

This PR ended up being a lot bigger then I intended, so if you would like me to break it up into smaller PR's I would be more then happy to do so!

Only one of these where a FURB125 error, the rest where FURB126 errors. The FURB126 errors are `else` blocks which can be removed because all `if`/`elif` blocks above it return. This does not detect `elif` blocks which can be removed because all `elif`/`if` block above it return, which might be a future check for Refurb.

There was only one FURB137 error. Most of them where list comprehension related. In CPython list-comprehensions are faster then the equivalent loop and append version, so it would be curious to see what the speedups look like (if any) for the Mypyc compiled version.

Turns out you cannot compare different types with `None` using the `is` operator. I would think that because `None` is common between the 2 optional types that this would be valid, but it would appear not.

github-actions · 2023-03-13T06:00:51Z

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

cdce8p

Thanks for the PR! I like these changes in general. Especially the isinstance simplifications and if instead of elif / else after an early return. Left a few comments for things that might be questionable IMO. Other contributors might feel different about those though.

I think it would be interesting to know the performance impact of these changes. Especially the change towards more comprehensions. There is a script for that you might want to take a look at: https://github.com/python/mypy/blob/master/misc/perf_compare.py

cdce8p · 2023-03-13T13:01:39Z

mypy/checker.py

-                if base_static is False and compare_static is False:
+                if base_static is compare_static is False:


Just my opinion: Even if technically correct, I'm not a huge fan of this style. It makes reasoning about the result of the IfExp more difficult, especially if it's a more complicated one. I would recommend to revert all FURB124 changes.

This was also one of the few changes that I really didn't like when I skimmed over

cdce8p · 2023-03-13T13:02:27Z

mypy/plugins/default.py

-        elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == 1 and len(ctx.args[1]) == 1:
+        elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == len(ctx.args[1]) == 1:


Another good example where FURB124 isn't ideal and even makes the code more difficult to read.

cdce8p · 2023-03-13T13:10:19Z

mypy/checker.py

-                for j, _ in enumerate(tvars)
+                for j in range(len(tvars))
            ]
            down_args: list[Type] = [
                UninhabitedType() if i == j else AnyType(TypeOfAny.special_form)
-                for j, _ in enumerate(tvars)
+                for j in range(len(tvars))


These don't look like improvements to me. I would revert the FURB148 changes.
It's much easier to read and understand enumerate than range(len(...)) even if we ignore one of the outputs.

I was wondering how refurb knows that tvars is a sequence, apparently it uses mypy!
https://github.com/dosisod/refurb/blob/467b133cd2751811671b86e7678240aa423dd7c7/refurb/checks/builtin/no_ignored_enumerate.py#L3-L12

cdce8p · 2023-03-13T13:13:57Z

mypy/fastparse.py

-            if isinstance(stmt, IfStmt) and seen_unconditional_func_def is False:
+            if isinstance(stmt, IfStmt) and not seen_unconditional_func_def:


This one is questionable. I would prefer the explicit is False. Especially in combination with and / or it's much to easy to make mistakes with not.

-> Revert FURB149

cdce8p · 2023-03-13T13:14:46Z

mypy/fastparse.py

            stmt.else_body is None
-            or stmt.body[0].is_unreachable is False
-            and stmt.else_body.is_unreachable is False
+            or not stmt.body[0].is_unreachable
+            and not stmt.else_body.is_unreachable


Same here. Sometimes being explicit isn't necessarily a bad thing.

JukkaL · 2023-03-13T13:26:07Z

Various individual changes look good, though this seems too big as a single PR. If something would actually regress, it would be difficult to figure out why. Also, some of the changes may actually slow things down when compiled with mypyc, even though the change might be beneficial when using interpreted Python.

What about creating PRs which clean up one class of issues each, so that they can be independently reviewed (I'd recommend only creating one or two PRs at a time).

JukkaL

These changes could slow things down when compiled (didn't double check these).

JukkaL · 2023-03-13T13:26:56Z

runtests.py

@@ -129,7 +129,7 @@ def main() -> None:
        exit(1)

    if not args:
-        args = DEFAULT_COMMANDS[:]
+        args = DEFAULT_COMMANDS.copy()


This might be slower when compiled with mypyc.

Here are some benchmarks I made comparing x[:], x.copy(), and list(x) (I tried using the perf_compare.py file, but it just froze my computer during the compilation stage).

For each benchmark I ran 10 million iterations, repeated the test 3 times, and took the average. I did this for both the interpreted and compiled versions (currently using CPython 3.10.9, and Mypyc checked out from this PR). This is the benchmark code I used (which can probably be improved):

import time diff = 0.0 nums = [1, 2, 3, 4] for _ in range(10_000_000): start = time.time() # <snippet to test here> end = time.time() diff += end - start print(diff)

These are the stats for my machine (units are in seconds):

Mode x = nums[:] x = nums.copy() x = list(nums)

Interpreted 2.610 2.451 3.161

Compiled 2.767 2.428 3.245

Interestingly, both nums[:] and list(nums) are slightly slower when compiled, and nums.copy() is fastest overall, including when compiled.

JukkaL · 2023-03-13T13:27:05Z

mypyc/transform/uninit.py

@@ -183,8 +182,7 @@ def update_register_assignments_to_set_bitmap(
                        IntOp.OR,
                        op.line,
                    )
-                    new_ops.append(new)
-                    new_ops.append(Assign(reg, new))
+                    new_ops.extend((new, Assign(reg, new)))


This may be slower when compiled with mypyc.

Building off my last comment, here are some benchmarks for append() vs extend(). Here is the relevant portions of the benchmarking code with the new snippets added:

for _ in range(10_000_000): ... nums: list[int] = [] # append # nums.append(1) # nums.append(2) # extend nums.extend((1, 2)) ...

I commented out only the append and extend methods for each test, and here are the results (units are in seconds):

Mode append extend

Interpreted 3.114 2.559

Compiled 7.573 6.874

Shockingly, those snippets are much slower when compiled. I moved nums out of the loop, meaning only the append/extend part would be tested, and this is what I got:

Mode append extend

Interpreted 2.533 2.019

Compiled 2.532 2.038

Meaning they are pretty much on par with each other.

And, for comparison, here is the speed comparison for creating an empty list in a loop (with no append/extend):

Mode x = []

Interpreted 1.995

Compiled 5.396

Perhaps my test methodology is wrong, but we might want to look into why allocating empty lists is so much slower.

Thanks for running the experiments! The results don't look like what I'd have expected -- very interesting. I'll investigate this.

JukkaL · 2023-03-13T13:29:40Z

mypyc/analysis/dataflow.py

@@ -544,7 +545,7 @@ def run_analysis(
        block_kill[block] = kill

    # Set up initial state for worklist algorithm.
-    worklist = list(blocks)
+    worklist = blocks.copy()


This can be slower when compiled with mypyc.

JukkaL · 2023-03-13T13:32:07Z

These changes could slow things down when compiled (didn't double check these).

I flagged only one instance of each category. Others like these would be similar.

JelleZijlstra · 2023-03-13T13:32:52Z

misc/apply-cache-diff.py

@@ -20,8 +20,8 @@
 def make_cache(input_dir: str, sqlite: bool) -> MetadataStore:
    if sqlite:
        return SqliteMetadataStore(input_dir)
-    else:
-        return FilesystemMetadataStore(input_dir)
+


Not a fan of these changes as they reduce symmetry.

Yes, might be a good modification to the rule to avoid rewriting into an early-return where the two parts are somewhat symmetrical in number of statements, or maybe simply when they're short.

dosisod · 2023-03-14T02:58:44Z

Thank you all for the feedback! I will go ahead and break up this PR into a bunch of smaller ones over the next few days.

To summarize though, the errors which should be reverted are:

FURB124: x is y or x is z -> x is y is z
FURB126: Simplifying else and elif blocks
FURB148: x, _ in enumerate(y) -> x in range(len(x))
FURB149: x is False -> x

Once we agree on what should/shouldn't be reverted I'll go ahead and close this PR.

This PR removes certain `len()` comparisons when it would be faster/more succinct to do a truthy check instead, such as `len(x) == 0` vs `not x`. See python#14887

This PR fixes the [`FURB145`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb145-no-slice-copy) error code. In my (limited) testing using `.copy()` is faster then both interpreted and compiled versions of `[:]` (see python#14887 (comment)). See python#14887

This PR removes certain `len()` comparisons when it would be faster/more succinct to do a truthy check instead, such as `len(x) == 0` vs `not x`. This fixes the [`FURB115`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb115-no-len-compare) errors in Refurb (see #14887).

This PR fixes the [`FURB145`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb145-no-slice-copy) error code from Refurb. In my (limited) testing, using `x.copy()` is faster then both the interpreted and compiled versions of `x[:]` (see #14887 (comment)). See #14887 for more info.

This PR fixes the [`FURB121`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb121-use-isinstance-issubclass-tuple) error code in [Refurb](https://github.com/dosisod/refurb). See python#14887

This PR fixes the [`FURB121`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb121-use-isinstance-issubclass-tuple) error code in [Refurb](https://github.com/dosisod/refurb). See #14887

This PR fixes the [`FURB122`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb122-use-writelines), [`FURB129`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb129-simplify-readlines), and [`FURB130`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb130-no-in-dict-keys) error codes in [Refurb](https://github.com/dosisod/refurb). See python#14887

This PR fixes the [`FURB129`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb129-simplify-readlines), and [`FURB130`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb130-no-in-dict-keys) error codes in [Refurb](https://github.com/dosisod/refurb). See #14887

dosisod added 20 commits March 12, 2023 21:28

Fix FURB102, FURB110, and FURB111 error codes

2f27180

Fix FURB112 errors

9116680

Fix FURB113 errors

b1fcda7

Fix FURB115 errors

85b353b

Fix FURB119 errors

8177749

Fix FURB121 errors

46691e4

Fix FURB123 errors

8c30532

Fix FURB124 errors

ebf51a0

Fix FURB129 errors

3b8c1ee

Fix FURB130 errors

a435dff

Fix FURB143 errors

62855b4

Fix FURB145 errors

6a1bafe

Fix FURB148 errors

dde666d

Fix FURB149 error

807ab1e

Remove colon

fda7235

Fix a few misc errors

dfa1288

Fix FURB errors in misc/root folder

adf4757

Run black

961a537

This comment has been minimized.

Sign in to view

Fix type errors:

d9db61d

Turns out you cannot compare different types with `None` using the `is` operator. I would think that because `None` is common between the 2 optional types that this would be valid, but it would appear not.

cdce8p reviewed Mar 13, 2023

View reviewed changes

JukkaL reviewed Mar 13, 2023

View reviewed changes

JelleZijlstra reviewed Mar 13, 2023

View reviewed changes

dosisod added a commit to dosisod/mypy that referenced this pull request Mar 15, 2023

Simplify len() comparisons:

a2cbbf7

This PR removes certain `len()` comparisons when it would be faster/more succinct to do a truthy check instead, such as `len(x) == 0` vs `not x`. See python#14887

dosisod mentioned this pull request Mar 15, 2023

Simplify len() comparisons #14907

Merged

dosisod closed this Mar 15, 2023

dosisod mentioned this pull request Mar 23, 2023

Replace x[:] with x.copy() #14949

Merged

dosisod mentioned this pull request Apr 27, 2023

Simplify isinstance calls #15149

Merged

dosisod mentioned this pull request May 4, 2023

Simplify iterator usage #15185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup codebase using Refurb #14887

Cleanup codebase using Refurb #14887

dosisod commented Mar 13, 2023

This comment has been minimized.

github-actions bot commented Mar 13, 2023

cdce8p left a comment

cdce8p Mar 13, 2023

AlexWaygood Mar 13, 2023

cdce8p Mar 13, 2023

cdce8p Mar 13, 2023

ikonst Mar 14, 2023

cdce8p Mar 13, 2023

cdce8p Mar 13, 2023

JukkaL commented Mar 13, 2023

JukkaL left a comment

JukkaL Mar 13, 2023

dosisod Mar 14, 2023

JukkaL Mar 13, 2023

dosisod Mar 14, 2023

JukkaL Mar 14, 2023

JukkaL Mar 13, 2023

JukkaL commented Mar 13, 2023

JelleZijlstra Mar 13, 2023

ikonst Mar 14, 2023

dosisod commented Mar 14, 2023

		if base_static is False and compare_static is False:
		if base_static is compare_static is False:

		elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == 1 and len(ctx.args[1]) == 1:
		elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == len(ctx.args[1]) == 1:

		if isinstance(stmt, IfStmt) and seen_unconditional_func_def is False:
		if isinstance(stmt, IfStmt) and not seen_unconditional_func_def:

Cleanup codebase using Refurb #14887

Cleanup codebase using Refurb #14887

Conversation

dosisod commented Mar 13, 2023

This comment has been minimized.

github-actions bot commented Mar 13, 2023

cdce8p left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JukkaL commented Mar 13, 2023

JukkaL left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JukkaL commented Mar 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dosisod commented Mar 14, 2023