-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup codebase using Refurb #14887
Conversation
Only one of these where a FURB125 error, the rest where FURB126 errors. The FURB126 errors are `else` blocks which can be removed because all `if`/`elif` blocks above it return. This does not detect `elif` blocks which can be removed because all `elif`/`if` block above it return, which might be a future check for Refurb.
There was only one FURB137 error. Most of them where list comprehension related. In CPython list-comprehensions are faster then the equivalent loop and append version, so it would be curious to see what the speedups look like (if any) for the Mypyc compiled version.
This comment has been minimized.
This comment has been minimized.
Turns out you cannot compare different types with `None` using the `is` operator. I would think that because `None` is common between the 2 optional types that this would be valid, but it would appear not.
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I like these changes in general. Especially the isinstance
simplifications and if
instead of elif
/ else
after an early return. Left a few comments for things that might be questionable IMO. Other contributors might feel different about those though.
I think it would be interesting to know the performance impact of these changes. Especially the change towards more comprehensions. There is a script for that you might want to take a look at: https://github.com/python/mypy/blob/master/misc/perf_compare.py
if base_static is False and compare_static is False: | ||
if base_static is compare_static is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my opinion: Even if technically correct, I'm not a huge fan of this style. It makes reasoning about the result of the IfExp more difficult, especially if it's a more complicated one. I would recommend to revert all FURB124
changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also one of the few changes that I really didn't like when I skimmed over
elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == 1 and len(ctx.args[1]) == 1: | ||
elif len(ctx.arg_types) == 2 and len(ctx.arg_types[1]) == len(ctx.args[1]) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good example where FURB124
isn't ideal and even makes the code more difficult to read.
for j, _ in enumerate(tvars) | ||
for j in range(len(tvars)) | ||
] | ||
down_args: list[Type] = [ | ||
UninhabitedType() if i == j else AnyType(TypeOfAny.special_form) | ||
for j, _ in enumerate(tvars) | ||
for j in range(len(tvars)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't look like improvements to me. I would revert the FURB148
changes.
It's much easier to read and understand enumerate
than range(len(...))
even if we ignore one of the outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering how refurb knows that tvars
is a sequence, apparently it uses mypy!
https://github.com/dosisod/refurb/blob/467b133cd2751811671b86e7678240aa423dd7c7/refurb/checks/builtin/no_ignored_enumerate.py#L3-L12
if isinstance(stmt, IfStmt) and seen_unconditional_func_def is False: | ||
if isinstance(stmt, IfStmt) and not seen_unconditional_func_def: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is questionable. I would prefer the explicit is False
. Especially in combination with and
/ or
it's much to easy to make mistakes with not
.
-> Revert FURB149
stmt.else_body is None | ||
or stmt.body[0].is_unreachable is False | ||
and stmt.else_body.is_unreachable is False | ||
or not stmt.body[0].is_unreachable | ||
and not stmt.else_body.is_unreachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Sometimes being explicit isn't necessarily a bad thing.
Various individual changes look good, though this seems too big as a single PR. If something would actually regress, it would be difficult to figure out why. Also, some of the changes may actually slow things down when compiled with mypyc, even though the change might be beneficial when using interpreted Python. What about creating PRs which clean up one class of issues each, so that they can be independently reviewed (I'd recommend only creating one or two PRs at a time). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes could slow things down when compiled (didn't double check these).
@@ -129,7 +129,7 @@ def main() -> None: | |||
exit(1) | |||
|
|||
if not args: | |||
args = DEFAULT_COMMANDS[:] | |||
args = DEFAULT_COMMANDS.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be slower when compiled with mypyc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some benchmarks I made comparing x[:]
, x.copy()
, and list(x)
(I tried using the perf_compare.py
file, but it just froze my computer during the compilation stage).
For each benchmark I ran 10 million iterations, repeated the test 3 times, and took the average. I did this for both the interpreted and compiled versions (currently using CPython 3.10.9, and Mypyc checked out from this PR). This is the benchmark code I used (which can probably be improved):
import time
diff = 0.0
nums = [1, 2, 3, 4]
for _ in range(10_000_000):
start = time.time()
# <snippet to test here>
end = time.time()
diff += end - start
print(diff)
These are the stats for my machine (units are in seconds):
Mode | x = nums[:] |
x = nums.copy() |
x = list(nums) |
---|---|---|---|
Interpreted | 2.610 | 2.451 | 3.161 |
Compiled | 2.767 | 2.428 | 3.245 |
Interestingly, both nums[:]
and list(nums)
are slightly slower when compiled, and nums.copy()
is fastest overall, including when compiled.
@@ -183,8 +182,7 @@ def update_register_assignments_to_set_bitmap( | |||
IntOp.OR, | |||
op.line, | |||
) | |||
new_ops.append(new) | |||
new_ops.append(Assign(reg, new)) | |||
new_ops.extend((new, Assign(reg, new))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be slower when compiled with mypyc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building off my last comment, here are some benchmarks for append()
vs extend()
. Here is the relevant portions of the benchmarking code with the new snippets added:
for _ in range(10_000_000):
...
nums: list[int] = []
# append
# nums.append(1)
# nums.append(2)
# extend
nums.extend((1, 2))
...
I commented out only the append
and extend
methods for each test, and here are the results (units are in seconds):
Mode | append |
extend |
---|---|---|
Interpreted | 3.114 | 2.559 |
Compiled | 7.573 | 6.874 |
Shockingly, those snippets are much slower when compiled. I moved nums
out of the loop, meaning only the append/extend part would be tested, and this is what I got:
Mode | append |
extend |
---|---|---|
Interpreted | 2.533 | 2.019 |
Compiled | 2.532 | 2.038 |
Meaning they are pretty much on par with each other.
And, for comparison, here is the speed comparison for creating an empty list in a loop (with no append/extend):
Mode | x = [] |
---|---|
Interpreted | 1.995 |
Compiled | 5.396 |
Perhaps my test methodology is wrong, but we might want to look into why allocating empty lists is so much slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for running the experiments! The results don't look like what I'd have expected -- very interesting. I'll investigate this.
@@ -544,7 +545,7 @@ def run_analysis( | |||
block_kill[block] = kill | |||
|
|||
# Set up initial state for worklist algorithm. | |||
worklist = list(blocks) | |||
worklist = blocks.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be slower when compiled with mypyc.
I flagged only one instance of each category. Others like these would be similar. |
@@ -20,8 +20,8 @@ | |||
def make_cache(input_dir: str, sqlite: bool) -> MetadataStore: | |||
if sqlite: | |||
return SqliteMetadataStore(input_dir) | |||
else: | |||
return FilesystemMetadataStore(input_dir) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of these changes as they reduce symmetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, might be a good modification to the rule to avoid rewriting into an early-return where the two parts are somewhat symmetrical in number of statements, or maybe simply when they're short.
Thank you all for the feedback! I will go ahead and break up this PR into a bunch of smaller ones over the next few days. To summarize though, the errors which should be reverted are:
Once we agree on what should/shouldn't be reverted I'll go ahead and close this PR. |
This PR removes certain `len()` comparisons when it would be faster/more succinct to do a truthy check instead, such as `len(x) == 0` vs `not x`. See python#14887
This PR fixes the [`FURB145`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb145-no-slice-copy) error code. In my (limited) testing using `.copy()` is faster then both interpreted and compiled versions of `[:]` (see python#14887 (comment)). See python#14887
This PR removes certain `len()` comparisons when it would be faster/more succinct to do a truthy check instead, such as `len(x) == 0` vs `not x`. This fixes the [`FURB115`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb115-no-len-compare) errors in Refurb (see #14887).
This PR fixes the [`FURB145`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb145-no-slice-copy) error code from Refurb. In my (limited) testing, using `x.copy()` is faster then both the interpreted and compiled versions of `x[:]` (see #14887 (comment)). See #14887 for more info.
This PR fixes the [`FURB121`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb121-use-isinstance-issubclass-tuple) error code in [Refurb](https://github.com/dosisod/refurb). See python#14887
This PR fixes the [`FURB121`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb121-use-isinstance-issubclass-tuple) error code in [Refurb](https://github.com/dosisod/refurb). See python#14887
This PR fixes the [`FURB121`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb121-use-isinstance-issubclass-tuple) error code in [Refurb](https://github.com/dosisod/refurb). See #14887
This PR fixes the [`FURB122`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb122-use-writelines), [`FURB129`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb129-simplify-readlines), and [`FURB130`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb130-no-in-dict-keys) error codes in [Refurb](https://github.com/dosisod/refurb). See python#14887
This PR fixes the [`FURB129`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb129-simplify-readlines), and [`FURB130`](https://github.com/dosisod/refurb/blob/master/docs/checks.md#furb130-no-in-dict-keys) error codes in [Refurb](https://github.com/dosisod/refurb). See #14887
This PR includes a lot of cleanups from a tool I wrote called Refurb. Most of the checks are readability related, but some of them do provide (small) performance improvements, such as using list comprehensions in more places, and micro-optimizations such as using
x.copy()
instead of a slice copy (x[:]
). I made a commit for each error code (or group of similar error codes), so feel free to yay or nay any error codes which you don't want included.The full list of error codes and their meanings can be viewed here.
This PR ended up being a lot bigger then I intended, so if you would like me to break it up into smaller PR's I would be more then happy to do so!