Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FURB109 explanation is technically incorrect #44

Closed
henryiii opened this issue Oct 4, 2022 · 5 comments · Fixed by #49
Closed

FURB109 explanation is technically incorrect #44

henryiii opened this issue Oct 4, 2022 · 5 comments · Fixed by #49
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@henryiii
Copy link

henryiii commented Oct 4, 2022

This:

Since tuples cannot change value over time, it is more performant to use
them in `for` loops, generators, etc.:

Is wrong. Besides the fact tuples are not usually more performant than lists, the expression in question actually produces the same byte code:

>>> import dist
>>> def f(x):
...     return x in [1, 2, 3]
>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 ((1, 2, 3))
              4 CONTAINS_OP              0
              6 RETURN_VALUE
>>> def f(x):
...     return x in (1, 2, 3)
>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 ((1, 2, 3))
              4 CONTAINS_OP              0
              6 RETURN_VALUE

In general, lists tend to be homogenous variable length "lists" and tuples heterogeneous data.
This actually is potentially faster and is logically the correct thing:

>>> def f(x):
...     return x in {1,2,3}
...
>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              2 LOAD_CONST               1 (frozenset({1, 2, 3}))
              4 CONTAINS_OP              0
              6 RETURN_VALUE

However, it's not equivalent - the items need to be hashable, and the comparison is not quite the same. Sets are also much slower to construct, but as long as they are inline they are directly loaded in byte code. This could be much faster to check, but that depends on the size, and inline is not likely to be that large.

@dosisod
Copy link
Owner

dosisod commented Oct 5, 2022

I created this Godbolt link in case you or anyone else wants to play with this more.

Perhaps there are instances where the bytecode would differ, but from the few examples I tried the bytecode was the same. I think propping up this check as a "performance" boost is misleading, and should be changed. Perhaps making this a "consistency" check would be more beneficial.

If that is the case, users should be able to specify whether they would prefer () over [], though there currently is no ability to pass data to a check.

@dosisod dosisod self-assigned this Oct 5, 2022
@dosisod dosisod added the documentation Improvements or additions to documentation label Oct 5, 2022
@dosisod
Copy link
Owner

dosisod commented Oct 5, 2022

And yes, sets/frozensets are potentially faster if you have many elements, but in my opinion they look "gross", and I seldom see any codebases using sets literals for in operations. Also, quantifying when a set should be used over a tuple/list will be hard, since you will probably not be able to determine the size of the set.

@henryiii
Copy link
Author

henryiii commented Oct 5, 2022

I didn't know you could use godbolt on Python.

I'm not sure why they look gross, containership is a natural question for a set or a dict, while it's "hacked on" to a list / tuple as a convenience - it's just a looping check. I was strongly in favor of x in {...} until I realized they were not fully identical. Sometimes you want one over the other - there was a case (in awkward) where they give different results. I think it might be if you have custom __eq__? But don't remember exactly what was different.

Agree on consistency (and I'm not against the check - though picking one over the other is stylistic rather than performance).

@dosisod
Copy link
Owner

dosisod commented Oct 5, 2022

I think gross is too strong of a word. Perhaps my dislike comes from the fact that {"key", "value"} looks very similar to {"key": "value"}, or maybe I just haven't seen it enough in practice. I am totally not opposed to using sets with in operations, just the usage of the set literal itself feels unnatural to me.

@dopplershift
Copy link

I found this discussion fascinating and learned something from it. In fairness to refurb (and to everyone who wasn't actually aware of this behavior), it wasn't until Python 3.8 that this optimization was made. (And I can't even find it in the "What's New".)

So when running against code still supporting <=3.7, the tool is technically still correct. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants