-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foundations for non-linear solver and polymorphic application #15287
Foundations for non-linear solver and polymorphic application #15287
Conversation
This comment has been minimized.
This comment has been minimized.
I will go through the |
OK, so even before looking into
These should be easy to fix. Another thing is that my algorithm cannot solve a relatively simple system like Btw I forgot to include one important missing feature in this MVP implementation: we need to support type variables with bounds/constraints. For example here: T = TypeVar("T")
S = TypeVar("S")
U = TypeVar("U", bound=int)
def dec(fn: Callable[[T], S]) -> Callable[[T], list[S]]: ...
def id(x: U) -> U: ... the inferred type of Finally, @JukkaL actually hiding this completely behind a flag will have one problem: this will require adding a flag to |
OK, so I spent some more time this weekend reading papers and thinking. Here are some comments on the three ingredients of this PR:
|
I was being naive, it is not that trivial, so the plan stays the same (follow-up PRs will follow roughly in the order as they are listed in this PR's description, 1 and 2 will likely be combined into a single PR). |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
OK, so I finally went through the First category is self-types. Previously, for example the code like this worked from typing import Self, Callable, TypeVar
T = TypeVar("T")
def dec(f: Callable[..., T]) -> Callable[..., T]: ...
class C:
@dec
def test(self) -> Self: ...
c: C
reveal_type(c.test()) # Revealed type is "__main__.C" Here is the old logic:
Moreover, even more tricky decorators like T1 = TypeVar("T1")
T2 = TypeVar("T2")
def dec2(f: Callable[[T1], T2]) -> CustomProperty[T1, T2]: ... used to (accidentally) work when applied to methods with self-types, while in fact we can't even express an inferred type of Second category is errors where a user writes a bad constructor signature in a generic class like this: class Value: ...
T = TypeVar("T", bound=Value)
class Box(Generic[T]):
def __init__(self, item: Value) -> None: ... Previously things like this worked: xs: List[Value]
items = map(Box, xs) since we would infer (again, a meaningless) type For the first issue I added a special case that should cover the "new style" self-types (by simply excluding them from new inference logic), just to minimize the fallout. For the second issue, I don't think we need to do anything, the old behavior is clearly wrong. Soon I am going to hide the new inference logic behind an internal flag (btw I think I will use |
This comment has been minimized.
This comment has been minimized.
Apologies for the delay, I was supposed to review this today, but I didn't have enough uninterrupted time. I've reserved a few hours tomorrow to review this PR. Thanks for your patience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! It's exciting that we will be able to support more complex decorator use cases. Also it's great to have type variable leakage bugs fixed, as they were annoying and super confusing. This opens up many additional use cases. Also I like that you are working on improvements incrementally and the new feature is behind a feature flag until we have more confidence.
This was a very complex piece work. Most of my comments are about adding comments or unit tests to make the code easier to understand and maintain in the long term. Feel free to merge this PR without making all the suggested changes, since the feature is not enabled by default.
Thank you for working on this!
mypy/solve.py
Outdated
# Solve each type variable separately. | ||
The whole algorithm consists of five steps: | ||
* Propagate via linear constraints to get all possible constraints for each variable | ||
* Find dependencies between type variables, group them in SCCs, and sor topologically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: 'sor'
mypy/solve.py
Outdated
def solve_iteratively( | ||
batch: list[TypeVarId], cmap: dict[TypeVarId, list[Constraint]], free_vars: list[TypeVarId] | ||
) -> dict[TypeVarId, Type | None]: | ||
"""Solve constraints for type variables sequentially, updating targets after each step.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't clear to me what 'targets' means here without studying the implementation. Clarify?
Document that we are solving for batch (not free_vars), and document free_vars.
It could be useful to eventually have some unit tests for this (no need to do it in this PR, but before the new type inference logic is enabled by default).
|
||
def normalize_constraints( | ||
constraints: list[Constraint], vars: list[TypeVarId] | ||
) -> list[Constraint]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would nice to have some unit tests for this (not need to do this in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I'm not sure if it's worth having unit tests for all of these functions. It may make more sense to have some unit tests for top-level functions. @jhance has recently written some unit tests that target type inference in mypy/test/testconstraints.py
, and I think that they are helpful, even if we are not trying to cover all edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think it is a good idea. Of course not all of these really needs to be tested, but for most of them it is a good thing. Especially for e.g. transitive_closure()
since logic there is really easy to get wrong. I will however add them in a separate PR.
|
||
def propagate_constraints_for( | ||
var: TypeVarId, direction: int, cmap: dict[TypeVarId, list[Constraint]] | ||
) -> list[Constraint]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, it would be nice to eventually have some unit tests for this. The logic is a little tricky to follow.
mypy/solve.py
Outdated
"""Find transitive closure for given constraints on type variables. | ||
|
||
Transitive closure gives maximal set of lower/upper bounds for each type variable, such | ||
we cannot deduce any further bounds by chaining other existing bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give an example or two that illustrates what this means in the docstring? Again, it would be good to have some unit tests to validate the logic and avoid regressions in the future.
leafs = raw_batches[0] | ||
free_vars = [] | ||
for scc in leafs: | ||
if all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment here about the purpose of this if statement (I think I figured it out but it wasn't obvious).
|
||
def solve_one( | ||
lowers: Iterable[Type], uppers: Iterable[Type], not_allowed_vars: list[TypeVarId] | ||
) -> Type | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly some unit tests would be nice at some point.
|
||
def compute_dependencies( | ||
cmap: dict[TypeVarId, list[Constraint]] | ||
) -> dict[TypeVarId, list[TypeVarId]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm repeating myself, but again unit tests would be nice :-)
... | ||
def id(x: U) -> U: | ||
... | ||
reveal_type(dec(id)) # N: Revealed type is "def [S] (S`1) -> builtins.list[S`1]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an actual use of the decorator to at least some of the decorator-like use cases, for more end-to-end testing? E.g. something like this:
@dec
def f(...) -> ...: ... # Generic function
reveal_type(f(...))
I also ran some self check benchmarks and this doesn't seem to have a performance effect when disabled, as expected. I also couldn't see much of an impact when enabled. These are still preliminary results. |
@JukkaL thanks for review! I will address everything except the unit tests soon. Unit tests will go into a separate PR. |
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
This is a first follow-up for #15287 (I like how my PR titles sound like research paper titles, LOL) This PR completes the new type inference foundations by switching to a complete and well founded algorithm [1] for transitive closure (that replaces more ad hoc initial algorithm that covered 80% of cases and was good for experimenting with new inference scheme). In particular the algorithm in this PR covers two important edge cases (see tests). Some comments: * I don't intend to switch the default for `--new-type-inference`, I just want to see the effect of the switch on `mypy_primer`, I will switch back to false before merging * This flag is still not ready to be publicly announced, I am going to make another 2-3 PRs from the list in #15287 before making this public. * I am not adding yet the unit tests as discussed in previous PR. This PR is already quite big, and the next one (support for upper bounds and values) should be much smaller. I am going to add unit tests only for `transitive_closure()` which is the core of new logic. * While working on this I fixed couple bugs exposed in `TypeVarTuple` support: one is rare technical corner case, another one is serious, template and actual where swapped during constraint inference, effectively causing outer/return context to be completely ignored for instances. * It is better to review the PR with "ignore whitespace" option turned on (there is big chunk in solve.py that is just change of indentation). * There is one questionable design choice I am making in this PR, I am adding `extra_tvars` as an attribute of `Constraint` class, while it logically should not be attributed to any individual constraint, but rather to the full list of constrains. However, doing this properly would require changing the return type of `infer_constrains()` and all related functions, which would be a really big refactoring. [1] Definition 7.1 in https://inria.hal.science/inria-00073205/document --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This is a third PR in series following #15287 and #15754. This one is quite simple: I just add basic support for polymorphic inference involving type variables with upper bounds and values. A complete support would be quite complicated, and it will be a corner case to already rare situation. Finally, it is written in a way that is easy to tune in the future. I also use this PR to add some unit tests for all three PRs so far, other two PRs only added integration tests (and I clean up existing unit tests as well).
…as (#15837) This is a third follow-up for #15287 (likely there will be just one more PR, for `TypeVarTuple`s, and few less important items I mentioned in the original PR I will leave for more distant future). After all this PR turned out to be larger than I wanted. The problem is that `Concatenate` support for `ParamSpec` was quite broken, and this caused many of my tests fail. So I decided to include some major cleanup in this PR (I tried splitting it into a separate PR but it turned out to be tricky). After all, if one ignores added tests, it is almost net zero line count. The main problems that I encountered are: * First, valid substitutions for a `ParamSpecType` were: another `ParamSpecType`, `Parameters`, and `CallableType` (and also `AnyType` and `UninhabitedType` but those seem to be handled trivially). Having `CallableType` in this list caused various missed cases, bogus `get_proper_type()`s, and was generally counter-intuitive. * Second (and probably bigger) issue is that it is possible to represent `Concatenate` in two different forms: as a prefix for `ParamSpecType` (used mostly for instances), and as separate argument types (used mostly for callables). The problem is that some parts of the code were implicitly relying on it being in one or the other form, while some other code uncontrollably switched between the two. I propose to fix this by introducing some simplifications and rules (some of which I enforce by asserts): * Only valid non-trivial substitutions (and consequently upper/lower bound in constraints) for `ParamSpecType` are `ParamSpecType` and `Parameters`. * When `ParamSpecType` appears in a callable it must have an empty `prefix`. * `Parameters` cannot contain other `Parameters` (and ideally also `ParamSpecType`s) among argument types. * For inference we bring `Concatenate` to common representation (because both callables and instances may appear in the same expression). Using the `ParamSpecType` representation with `prefix` looks significantly simpler (especially in solver). Apart from this actual implementation of polymorphic inference is simple/straightforward, I just handle the additional `ParamSpecType` cases (in addition to `TypeVarType`) for inference, for solver, and for application. I also enabled polymorphic inference for lambda expressions, since they are handled by similar code paths. Some minor comments: * I fixed couple minor bugs uncovered by this PR (see e.g. test case for accidental `TypeVar` id clash). * I switch few tests to `--new-type-inference` because there error messages are slightly different, and so it is easier for me to test global flip to `True` locally. * I may tweak some of the "ground rules" if `mypy_primer` output will be particularly bad. --------- Co-authored-by: Ivan Levkivskyi <[email protected]>
This is the fifth PR in the series started by #15287, and a last one for the foreseeable future. This completes polymorphic inference sufficiently for extensive experimentation, and enabling polymorphic fallback by default. Remaining items for which I am going to open follow-up issues: * Enable `--new-type-inference` by default (should be done before everything else in this list). * Use polymorphic inference during unification. * Use polymorphic inference as primary an only mechanism, rather than a fallback if basic inference fails in some way. * Move `apply_poly()` logic from `checkexpr.py` to `applytype.py` (this one depends on everything above). * Experiment with backtracking in the new solver. * Experiment with universal quantification for types other that `Callable` (btw we already have a hacky support for capturing a generic function in an instance with `ParamSpec`). Now some comments on the PR proper. First of all I decided to do some clean-up of `TypeVarTuple` support, but added only strictly necessary parts of the cleanup here. Everything else will be in follow up PR(s). The polymorphic inference/solver/application is practically trivial here, so here is my view on how I see large-scale structure of `TypeVarTuple` implementation: * There should be no special-casing in `applytype.py`, so I deleted everything from there (as I did for `ParamSpec`) and complemented `visit_callable_type()` in `expandtype.py`. Basically, `applytype.py` should have three simple steps: validate substitutions (upper bounds, values, argument kinds etc.); call `expand_type()`; update callable type variables (currently we just reduce the number, but in future we may also add variables there, see TODO that I added). * The only valid positions for a variadic item (a.k.a. `UnpackType`) are inside `Instance`s, `TupleType`s, and `CallableType`s. I like how there is an agreement that for callables there should never be a prefix, and instead prefix should be represented with regular positional arguments. I think that ideally we should enforce this with an `assert` in `CallableType` constructor (similar to how I did this for `ParamSpec`). * Completing `expand_type()` should be a priority (since it describes basic semantics of `TypeVarLikeType`s). I think I made good progress in this direction. IIUC the only valid substitution for `*Ts` are `TupleType.items`, `*tuple[X, ...]`, `Any`, and `<nothing>`, so it was not hard. * I propose to only allow `TupleType` (mostly for `semanal.py`, see item below), plain `TypeVarTupleType`, and a homogeneous `tuple` instances inside `UnpackType`. Supporting unions of those is not specified by the PEP and support will likely be quite tricky to implement. Also I propose to even eagerly expand type aliases to tuples (since there is no point in supporting recursive types like `A = Tuple[int, *A]`). * I propose to forcefully flatten nested `TupleType`s, there should be no things like `Tuple[X1, *Tuple[X2, *Ts, Y2], Y1]` etc after semantic analysis. (Similarly to how we always flatten `Parameters` for `ParamSpec`, and how we flatten nested unions in `UnionType` _constructor_). Currently we do the flattening/normalization of tuples in `expand_type()` etc. * I suspect `build_constraints_for_unpack()` may be broken, at least when it was used for tuples and callables it did something wrong in few cases I tested (and there are other symptoms I mentioned in a TODO). I therefore re-implemented logic for callables/tuples using a separate dedicated helper. I will investigate more later. As I mentioned above I only implemented strictly minimal amount of the above plan to make my tests pass, but still wanted to write this out to see if there are any objections (or maybe I don't understand something). If there are no objections to this plan, I will continue it in separate PR(s). Btw, I like how with this plan we will have clear logical parallels between `TypeVarTuple` implementation and (recently updated) `ParamSpec` implementation. --------- Co-authored-by: Ivan Levkivskyi <[email protected]>
Fixes #1317
Fixes #5738
Fixes #12919
(also fixes a
FIX
comment that is more than 10 years old according to git blame)Note: although this PR fixes most typical use-cases for type inference against generic functions, it is intentionally incomplete, and it is made in a way to limit implications to small scope.
This PR has essentially three components (better infer, better solve, better apply - all three are needed for this MVP to work):
constraints.py
: if the actual function is generic, we unify it with template before inferring constraints. This prevents leaking generic type variables of actual in the solutions (which makes no sense), but also introduces new kind of constraintsT <: F[S]
, where type variables we solve for appear in target type. These are much harder to solve, but also it is a great opportunity to play with them to prepare for single bin inference (if we will switch to it in some form later). Note unifying is not the best solution, but a good first approximation (see below on what is the best solution).solve.py
. The full algorithm is outlined in the docstring forsolve_non_linear()
. It looks like it should be able to solve arbitrary constraints that don't (indirectly) contain "F-bounded" things likeT <: list[T]
. Very short the idea is to compute transitive closure, then organize constraints by topologically sorted SCCs.checkexpr.py
. In cases where solver identifies there are free variables (e.g. we have just one constraintS <: list[T]
, soT
is free, and solution forS
islist[T]
) it will apply the solutions while creating new generic functions. For example, if we have a functiondef [S, T] (fn: Callable[[S], T]) -> Callable[[S], T]
applied to a functiondef [U] (x: U) -> U
, this will result indef [T] (T) -> T
as the return.I want to put here some thoughts on the last ingredient, since it may be mysterious, but now it seems to me it is actually a very well defined procedure. The key point here is thinking about generic functions as about infinite intersections or infinite overloads. Now reducing these infinite overloads/intersections to finite ones it is easy to understand what is actually going on. For example, imagine we live in a world with just two types
int
andstr
. Now we have two functions:the first one can be seen as overload over
and second as an overload over
Now what happens when I apply
dec(id)
? We need to choose an overload that matches the argument (this is what we call type inference), but here is a trick, in this case two overloads ofdec
match the argument type. So (and btw I think we are missing this for real overloads) we construct a new overload that returns intersection of matching overloads# 1
and# 4
. So if we generalize this intuition to the general case, the inference is selection of an (infinite) parametrized subset among the bigger parameterized set of intersecting types. The only question is whether resulting infinite intersection is representable in our type system. For exampleforall T. dict[T, T]
can make sense but is not representable, whileforall T. (T) -> T
is a well defined type. And finally, there is a very easy way to find whether a type is representable or not, we are already doing this during semantic analyzis. I use the same logic (that I used to view as ad-hoc because of lack of good syntax for callables) to bind type variables in the inferred type.OK, so here is the list of missing features, and some comments on them:
Sequence[T] <: S <: Sequence[U] => T <: U
ParamSpec
(and probablyTypeVarTuple
). Current support for applying callables withParamSpec
to generics is hacky, and kind of dead-end. Although(Callable[P, T]) -> Callable[P, List[T]]
works when applied toid
, even a slight variation like(Callable[P, List[T]]) -> Callable[P, T]
fails. I think it needs to be re-worked in the framework I propose (the tests I added are just to be sure I don't break existing code)<nothing>
/object
).LRUCache[[x: T], T]
. Btw note that I apply force expansion to type aliases and callback protocols. Since I can't transform e.g.A = Callable[[T], T]
into a generic callable without getting proper type.T <: List[int], T <: List[S]
.I am planning to address at least majority of the above items, but I think we should move slowly, since in my experience type inference is really fragile topic with hard to predict long reaching consequences. Please play with this PR if you want to and have time, and please suggest tests to add.