-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check that literal strings/int/float belong to /is excluded from a set/range of values #478
Comments
I don't see how it is a "very overkill" to write: from your_lib import foo, mode
foo(mode.started) instead of from your_lib import foo
foo('started') Anyway, if we are going to consider something like this, I would propose an already discussed idea: literal types, so that the first example will be from typing import Union, Literal
Mode = Union[Literal['strated'], Literal['paused'], Literal['cancelled']]
def foo(mode: Mode): ...
foo('started') # OK
foo('wrong') # Error!
x = 'started'
foo(x) # Also an error, we can't track a 'str' variable The same would work e.g. to describe @overload
def open(name: str, mode: Literal['r']) -> IO[str]: ...
@overload
def open(name: str, mode: Literal['rb']) -> IO[bytes]: ...
@overload
def open(name: str, mode: str) -> IO[Any]: ... This will not cover the integer ranges, but I think it is too advanced for a typechecker (I think we could support just simple/common dependent types). |
Another use-case is booleans, e.g., the return value from Both boolean flags and strings as enums are endemic within Python's numerical computing ecosystem. |
@shoyer It would help us if you can give additional concrete examples of cases where string or boolean values affect types (in numerical libraries or elsewhere). |
Let me give examples culled from the API docs for pandas, which I think reflect common usage in numerical libraries: Format: argument_name=literal_value -> return_type pandas.read_csv pandas.cut pandas.concat pandas.get_dummies pandas.to_datetime pandas.Series.drop pandas.Series.reset_index pandas.DataFrame.to_dict These are not cherry-picked examples -- these are common functions/methods used in a large fraction of code using pandas. So in practice, this will be a very real obstacle when/if we try to add type annotations to pandas (pandas-dev/pandas#14468). Certainly pandas does not exhibit best practices here, but nonetheless it's a very popular library. This is intentionally only including cases where type signature itself varies gives based on the literal value, which precludes usefully typing a function almost at all unless we have a way of recognizing literal notes. There are many more examples, including in libraries like numpy or scipy, where a limited set of strings describes all valid values. From a principled perspective, most of these should probably be using enums instead of strings or booleans, but the value dependent semantics remains. |
@shoyer Thanks for the additional examples! They are very helpful. The current type system clearly seems to work poorly for numerical Python libraries. Here are a few additional things that would be useful for moving this forward:
|
Another example is |
Looking through NumPy, I notice some themes. There are a few common arguments that only have particular valid values (enum equivalents):
These are found on a very large portion of NumPy's API (e.g., most array creation routines, shape changing routines, copying routines and all ufuncs). NumPy doesn't use enums, so the "strings as enums" pattern is endemic throughout the library: There are also a few functions whose return value depends on boolean flags: I have not looked into the more esoteric corners of the library (e.g., masked arrays, So on the whole, NumPy is certainly in a much better position than pandas: there are only a handful of functions where the return type depends on a literal values (although they are widely used). I'm not going to bother going through SciPy as the API is larger and more varied, and I'm less familiar with it, but I hope you'll trust me that its situation is very similar to that for NumPy. The is large overlap in the community maintaining both libraries, and for a short while they were even integrated in a single project. Certainly strings are used as enums throughout. I don't know if there are any commonly used functions that can return multiple types. One last thing I'll note is that there are quite a few further examples of type unstable behavior (even in NumPy) if we ever tried to make array shapes part of the type system. |
Note that we solved the specific problem with In theory you can now write numpy-specific plugins that do the same thing for the numpy APIs you list above. As long as the call sites are typically passing literals. Adding a new mechanism to the type system that would let you define a subtype of |
While plugins work, they are less than ideal. I think it would be much better to have literal types in the type system. It is a not too uncommon pattern to have literal keyword arguments cause different behavior. Additionally, it is much more maintainable in my mind to write the types via an overload as compared to a plugin (less indirection, less magic, etc). That being said, for smaller cases, plugins are an acceptable stop-gap measure. |
No argument there! We've just been worried about the cost of adding literal types to the type system vs. the cost of developing a plugin system. (Also, the syntax for the |
Another example of use in var: Literal['value'] 2). Constant qualifier, it is needed for situations like this: @overload
def func(mode: Literal['b']) -> IO[bytes]: ...
@overload
def func(mode: Literal['s']) -> IO[str]: ...
MODE: Const[str] = 'b'
func(MODE) # this should be IO[bytes] This example is oversimplified, but I have seen such patterns for data types, array/matrix dimensions/sizes etc. (see the next point). I think that we could move step by step here. The first two are actually not so hard to implement, will already cover large amount of numeric code, and could be also useful for other (non-numeric) code. The idea is that |
This is true, but strings like @gvanrossum @ilevkivskyi I assume by "plugin" you are referring specifically to mypy? |
@ilevkivskyi I'm not sure I understand why |
The mypy plugin approach really only works well if the special signatures are rare enough. These "special" cases don't sound very special at all in numpy and pandas, so a plugin-based approach is less than optimal. Also, the plugin approach is very specific to mypy. @ilevkivskyi Your proposal sounds pretty reasonable. I agree that that 3) sounds much harder than the rest. Before considering implementation, it would make sense to have some partial draft stubs for numpy and pandas that use the proposed features. There is a risk that there are other problems (such as array shapes) that we'd also need to solve before we can have useful stubs. Here are a some additional comments:
|
BTW value types could be useful in a broader sense, not only for literals per se. For example, one may have overloaded functions for various enum members: class Color(Enum):
RED = 1
GREEN = 2
@overload
def f(x: Literal[Color.RED]) -> Any: ...
@overload
def f(x: Literal[Color.GREEN]) -> Any: ... |
@vlasovskikh Good point, enums are a good match as well. I remember another discussion where overloading based on enum values was proposed but can't find it now. |
Another example is default arguments, where you want to detect no value is entered. You could do that with |
@TeamSpen210 Can you give an example of how this would work with default arguments? |
Sorry for a long silence, here are some comments:
I think
I also wanted to say this, but I am not sure since it would be a single case where
I am fine with either.
I think we can allow
I think we can implement it first only for global and local variables. For class/instance variables this may be harder and seems related to python/mypy#4019
I think the simplest way is to infer |
Do you mean any expression of
But a case of matrix multiply does not work. Next code shows a example of type-constraint.
|
Has there been any progress with the discussion about this in another place? |
At PyCon typing meeting, most people like the idea, now someone just needs to implement this. An original plan for mypy was end of summer, but no guarantees. |
Will |
a) |
I see, thank you :) |
Small update: we're started work on a PoC implementation of literal types in mypy and are hoping to have the core implementation work done sometime around early 2019. We also have a preliminary draft of what we think the semantics of
|
Thanks for the update @Michael0x2a . I have two questions around the process:
|
|
I think that we can wait until there has been some discussion of the TypedDict PEP (which still needs to be written). It still seems possible to me that the definition syntax may be tweaked during the PEP process. |
In what forum is the TypedDict PEP going to be developed? I would love to be involved as I'm working on implementing them into Pyre now and would want to make sure we're heading in a standards compliant direction. |
@mrkmndz We'll share drafts of the TypedDict PEP on the typing-sig mailing list. It may be best to develop the initial draft as a GitHub PR to make commenting easy. This might happen in December/January. If you have questions before that, feel free to open an issue here or on the mypy issue tracker. |
@dkgi -- I opened a pull request for my branch as Ivan suggested: Michael0x2a/peps#1 |
Most things discussed in this issues are now supported by |
Opened in python/mypy#4040, but moved here after @JukkaL 's advice.
Some debate took place in there, but I'll copy the original post here for context:
It's a common practice to pass literal strings as arguments. In Python, it's even more important, as strings are often used instead of byte flags, constants or enums.
You often end up checking if those literals are passed correctly so you can give some debug information:
sorry, the parameter mode except a string among "started, paused, cancelled";
you can only used an integer between 0 and 16.
etc.
The problem with that is it's done at runtime despite the fact those argument are completely static ones. So the IDE can't use this information to help you write code. Typically, I would have code completion and linting helping me with "started/pause/cancelled" if it was an enum, but not if it's a string.
With this concept I could do:
So that mypy could alert my users if they do:
def foo("Start", 6):
Of course, it's only for strings/ints/floats literals, maybe bytes and tuple, as we need the information to be immutable and simple to be usable.
The text was updated successfully, but these errors were encountered: