Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable recursive type aliases behind a flag #13297

Merged
merged 7 commits into from
Aug 3, 2022

Conversation

ilevkivskyi
Copy link
Member

This PR exposes recursive type aliases that were secretly there for last ~3 years. For now they will still be behind an opt-in flag, because they are not production ready.

As we discussed with @JukkaL during PyCon, I use couple hacks to make them minimally usable, as proper solutions will take time. I may clean up some of them in near future (or may not).

You can see few added test cases to get an idea of what is supported, example:

Nested = Sequence[Union[T, Nested[T]]]

def flatten(seq: Nested[T]) -> List[T]:
    flat: List[T] = []
    for item in seq:
        if isinstance(item, Sequence):
            res.extend(flatten(item))
        else:
            res.append(item)
    return flat

reveal_type(flatten([1, [2, [3]]]))  # N: Revealed type is "builtins.list[builtins.int]"

There are several known issues, missing features, etc. Some comments:

  • People use get_proper_type() too much (I also used to do this). This can cause giant error messages, infinite recursion, and even performance issues. We should try to avoid expanding unless necessary, and always pass on original type when possible.
  • We should fail gracefully on invalid aliases A = Union[A, ...] and B = Type[B] (first is theoretically meaningless, second is technically valid, but hard to support)
  • We need a better formatting for recursive types. Currently it is just two verbosity levels: deep unroll with full names or just Name[Arg, ...].
  • Local (function level) recursive aliases may need to be prohibited (or we can store them in global namespace, like we do with local named tuples etc).
  • Recursive classes (most notably class str(Sequence[str]): ...) can interfere with recursive aliases causing infinite recursion. I will likely fix this soon (should be easy unless I am missing something).
  • Recursive named tuples and typed dicts are still not supported (unless you use dirty NT = Union[_NT, _NT] trick). Btw supporting recursive and generic named tuples/typed dicts should be quite easy if we agree to store them internally as type aliases (or maybe we can introduce couple other non-proper types).
  • We should get rid of is_same_type() altogether, replacing with either equality or (proper) equivalence call by call. Also I am probably going to unify (proper) subtype visitors.

@AlexWaygood
Copy link
Member

  • Recursive named tuples and typed dicts are still not supported

I think there's already an open PR to support recursive namedtuples btw:

@ilevkivskyi
Copy link
Member Author

I think there's already an open PR to support recursive namedtuples btw:

Yeah, but it will not work, our internal representation is not suitable for them being recursive.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

@hauntsaninja @JelleZijlstra could one of you (or both) please take a look at this?

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting progress. I have an optional suggestion.

return False
if not isinstance(s.lvalues[0], NameExpr):
return False
if s.unanalyzed_type is not None and not self.is_pep_613(s):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we always return True if is_pep_613(s) returns True?

It may also be reasonable to require explicit TypeAlias for recursive aliases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we always return True if is_pep_613(s) returns True?

We are interested only in aliases that can be recursive, so as a micro-optimization we only do this for r.h.s that can be recursive. I will add a comment on this.

It may also be reasonable to require explicit TypeAlias for recursive aliases.

Hm, yeah this will simplify logic a bit. OTOH it may be hard to reliably detect cases where it was intended and missing, to give a good error message. I will keep the current logic as is, it is easy to change later, as this part is decoupled from everything else.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2022

Diff from mypy_primer, showing the effect of this PR on open source code:

dragonchain (https://github.com/dragonchain/dragonchain)
- dragonchain/lib/database/redis.py:399:36: error: Argument 2 to "zadd" of "SortedSetCommands" has incompatible type "Dict[str, int]"; expected "Mapping[Union[str, bytes], Union[bytes, float, str]]"
+ dragonchain/lib/database/redis.py:399:36: error: Argument 2 to "zadd" of "SortedSetCommands" has incompatible type "Dict[str, int]"; expected "Mapping[Union[str, bytes], Union[bytes, float, int, str]]"
- dragonchain/lib/database/redis_utest.py:218:9: error: "Callable[[Union[str, bytes], Mapping[Union[str, bytes], Union[bytes, float, str]], bool, bool, bool, bool, Optional[Any], Optional[Any]], int]" has no attribute "assert_called_once_with"
+ dragonchain/lib/database/redis_utest.py:218:9: error: "Callable[[Union[str, bytes], Mapping[Union[str, bytes], Union[bytes, float, int, str]], bool, bool, bool, bool, Optional[Any], Optional[Any]], int]" has no attribute "assert_called_once_with"

@ilevkivskyi ilevkivskyi merged commit d2063d2 into python:master Aug 3, 2022
@ilevkivskyi ilevkivskyi deleted the enable-recursive-aliases branch August 3, 2022 10:30
ilevkivskyi added a commit that referenced this pull request Aug 4, 2022
This is a follow up for #13297

I move around some calls to `get_proper_type()` to preserve original types as much as possible (plus few related required low-risk changes). This makes error messages much more concise, before some error messages in the tests I added were truly epic:
```
Incompatible types in assignment (expression has type "Sequence[Union[B, Sequence[Union[B, Sequence[Union[B, Sequence[Union[B, Sequence[Union[B, Sequence[Union[B, NestedB]]]]]]]]]]]]", variable has type "int")
```
ilevkivskyi added a commit that referenced this pull request Aug 5, 2022
…#13328)

This is a follow-up for #13297 

The fix for infinite recursion is kind of simple, but it is hard to make inference infer something useful. Currently we handle all most common cases, but it is quite fragile (I however have few tricks left if people will complain about inference).
ilevkivskyi added a commit that referenced this pull request Aug 5, 2022
…13336)

This is a follow up for #13297.

See some motivation in the original PR (also in the docstrings).
@ilevkivskyi
Copy link
Member Author

I implemented all of the minor follow-up items from the list above. We are only left with more major items: make TypedDicts and NamedTuples recursive and generic.

TBH I was thinking again about this, and it seems to me it would be so much simpler to do this by storing them in symbol tables as TypeAliases, instead of as TypeInfos that are used for fallbacks. One could say storing them as type aliases is a hack, but IMO current design is also a hack (docstring says TypeInfos should be in one-to-one correspondence with classes). So we would be replacing one hack with another (IMO less ugly) hack.

As I mentioned elsewhere it also kind of makes sense, as these two are conceptually similar (and this analogy would be even stronger if we would allow inline NamedTuples/TypedDicts):

Point = Tuple[int, int]
Point = NamedTuple('Point', [('x', int), ('y', int)])

Opts = Dict[str, int]
Opts = TypedDict('Opts', {'start': int, 'limit': int})

The other option of adding new dedicated nodes and/or non-proper types also seems dangerous, as we already have various special-casing for TypeAliasType to prevent infinite recursion etc.

@ilevkivskyi
Copy link
Member Author

Hm, it may be easy to make the above proposed logic conditional on the --enable-recursive-aliases flag. I will try this tomorrow.

Btw in the meantime I discovered another case were can't get rid of infinite recursion, see #13352. I even switched to full nerd mode and read some recent research papers, it looks like it may be a still open problem (funny thing is that the term I invented for myself to call them "diverging" is actually the official name for these things).

ilevkivskyi added a commit that referenced this pull request Aug 8, 2022
This is another fix for recursive aliases. Ref #13297 

Previously this special casing caused `Optional[...]` to fail with infinite recursion, where `Union[..., None]` worked. I also delete a duplicate helper, and replace it with another existing one that handles type aliases properly.

(I have no idea why some TypeGuard test started passing, but we have `xfail-strict` so I am re-enabling this test.)
@ilevkivskyi
Copy link
Member Author

OK, so I tried to do the above and it worked quite well, couple comments:

  • After all I think switching from TypeInfo to TypeAlias under a flag is a bad idea, this is a significant change, and it is better if it will be constantly tested (I was anyway running all tests with this flag while working on this).
  • There are still around a dozen or so tests failing if I enable the switch. First, it was assumed in several cases that named tuple will be fully analyzed only once, which can't be the case anymore. Also various internal machinery (like fixup.py, fine grained deps etc) expect named tuples to be TypeInfos. It is all fixable in one-two days, but I wanted to try another experiment: allow TypeAliasType to point to either TypeAlias or TypeInfo, this may be simpler to implement.

@ilevkivskyi
Copy link
Member Author

After all, the most promising seem to be a compromise approach, where we still store named tuples as TypeInfos in symbol tables, and TypeAliasType still always points to a TypeAlias. But, whenever a type analyzer sees a TypeInfo with a tuple type, it creates a type alias, see #13371.

ilevkivskyi added a commit that referenced this pull request Aug 9, 2022
This is another follow up on #13297.

We can't support aliases like `Nested = Union[T, Nested[List[T]]]` (and it looks like no-one can, without hacks like fixed type recursion limit). I would propose to just ban them for now. We can reconsider if people will ask for this.
ilevkivskyi added a commit that referenced this pull request Aug 10, 2022
This is a continuation of #13297 

The main change here is that although named tuples are still stored in symbol tables as `TypeInfo`s, when type analyzer sees them, it creates a `TypeAliasType` targeting what it would return before (a `TupleType` with a fallback to an instance of that `TypeInfo`). Although it is a significant change, IMO this is the simplest but still clean way to support recursive named tuples.

Also it is very simple to extend to TypedDicts, but I wanted to make the latter in a separate PR, to minimize the scope of changes.
It would be great if someone can take a look at this PR soon.

The most code changes are to make named tuples semantic analysis idempotent, previously they were analyzed "for real" only once, when all types were ready. It is not possible anymore if we want them to be recursive. So I pass in `existing_info` everywhere, and update it instead of creating a new one every time.
@ilevkivskyi
Copy link
Member Author

The last major item of my "grand plan" is to support generic NamedTuples and TypedDicts. The latter should be easy unless I am missing something. But for tuple types we have a problem with fallbacks. In some sense this is not new, we have special delayed treatment for tuple fallbacks. But now we will probably need to delay it "even more".

To illustrate, what Instance would we put in bases in class C(Tuple[T, S]): ...? In case of actual types we can just schedule a patch and fill in the fallback after semantic analysis. But for type variables there is nothing we can put there before we apply type arguments. This is problematic because bases are stored in TypeInfo, i.e. it is just one type for all instances of C. My current idea is to just say we don't care, and put tuple[Any, ...] in the base, and compute the fallback wherever needed on the fly. We already do this in several places, so it sounds totally doable. Most notably we would need to special case map_instance_to_supertype(). You may say, well it is a bit dangerous to introduce a potential cycle map_instance_to_supertype -> join_types -> is_subtype -> map_instance_to_supertype, but in fact we already have a loop map_instance_to_supertype -> expand_type -> make_simplified_union -> is_subtype -> map_instance_to_supertype and it doesn't case much trouble (I have found only one bug on the tracker caused by this #6730).

I will think a bit more, and if I will not come up with a better idea, I will move on with the "on the fly fallbacks" tomorrow.

ilevkivskyi added a commit that referenced this pull request Aug 11, 2022
This is a continuation of #13297
Depends on #13371 

It was actually quite easy, essentially just a 1-to-1 mapping from the other PR.
ilevkivskyi added a commit that referenced this pull request Aug 15, 2022
Fixes #685

This builds on top of some infra I added for recursive types (Ref #13297). Implementation is based on the idea in #13297 (comment). Generally it works well, but there are actually some problems for named tuples that are recursive. Special-casing them in `maptype.py` is a bit ugly, but I think this is best we can get at the moment.
ilevkivskyi added a commit that referenced this pull request Aug 15, 2022
Fixes #3863

This builds on top of some infra I added for recursive types (Ref #13297). Implementation is quite straightforward. The only non-trivial thing is that when extending/merging TypedDicts, the item types need to me mapped to supertype during semantic analysis. This means we can't call `is_subtype()` etc., and can in theory get types like `Union[int, int]`. But OTOH this equally applies to type aliases, and doesn't seem to cause problems.
@sg495
Copy link

sg495 commented Feb 5, 2023

I'll plainly admit my ignorance here: I still don't quite understand how this test actually runs fine in 3.9.

For me, running the first example from the test in 3.9.13 results in NameError:

from typing import Dict, List, Union
JSON = Union[str, List[JSON], Dict[str, JSON]]
# NameError: name 'JSON' is not defined

On the other hand, the same line written with forward references runs without issues (and mypy 0.991 correctly catches the nested list item error):

from typing import Dict, List, Union
JSON = Union[str, List["JSON"], Dict[str, "JSON"]]
class Bad: ...
x: JSON = ["foo", {"bar": [Bad()]}]
# mypy error list-item - List item 0 has incompatible type "Bad"; expected "Union[str, List[JSON], Dict[str, JSON]]"

@ilevkivskyi: when you have a second, could you please write a brief comment clarifying why that worked? I know the example works fine with forward refs, but I'm intrigued by the possibility of it running without (it'd make for much neater code 😄).

@JelleZijlstra
Copy link
Member

This test won't work at runtime. Mypy's test suite does not actually run its tests at runtime.

PEP-695 may provide a mechanism for writing recursive type aliases without stringification, but that's still in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants