Skip to content
This repository has been archived by the owner on Jun 10, 2020. It is now read-only.

[WIP] Typehint np.random #11

Closed
wants to merge 5 commits into from
Closed

[WIP] Typehint np.random #11

wants to merge 5 commits into from

Conversation

alanhdu
Copy link
Contributor

@alanhdu alanhdu commented Mar 1, 2018

No description provided.

@alanhdu
Copy link
Contributor Author

alanhdu commented Mar 1, 2018

This isn't ready yet, but I wanted to get feedback about whether this was the right approach (lots of @overloads for type-hinting). The problems I have with it are:

  • It's verbose and irritating (I actually have a jinja2 template that generates this file), although this is a one-time cost so I'm not too concerned.
  • overload semantics are a little in flux -- because of error: Overloaded function signatures 1 and 2 overlap with incompatible return types python/mypy#4020, some of the overloads are counted as overlapping from MyPy and return an error (see that last commit)
  • The type signatures aren't strictly correct -- if you pass in a scalar ndarray into one of these functions, you get a scalar non-ndarray out. I'm not sure how to deal with this until we can distinguish scalar ndarrays and non-scalar ndarrays.

I'm not sure what the alternative approaches are -- I thought about trying to create a special _Distribution that subclasses Callable and somehow does all of the broadcasting for us, but I couldn't actually figure out how to get that to work.

@overload
def beta(self, a: float, b: float) -> float: ...
@overload
def beta(self, a: float, b: float, size: _Size) -> float: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If size is specified, you get an ndarray, not a float.

@overload
def beta(self, a: float, b: float, size: _Size) -> float: ...
@overload
def beta(self, a: ndarray, b: float) -> ndarray: ...
Copy link
Member

@shoyer shoyer Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this isn't quite right -- if you pass in a zero-dimensional ndarray, you get a float back, not a ndarray:

In [17]: rs.beta(np.array(2), 2)
Out[17]: 0.3340049666412606

This sort of behavior (zero dimensional arrays -> scalars) is unfortunately quite common with NumPy.

EDIT: I see you already noted this in your comment :)

@overload
def choice(self, a: int, *,
replace: bool = ..., p: Optional[Sequence[float]] = ...
) -> int: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is np.int64, not int (at least not on Python 3)

@overload
def choice(self, a: Sequence[_T], *,
replace: bool = ..., p: Optional[Sequence[float]] = ...
) -> _T: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a gets cast to an ndarray, so the return value is the corresponding NumPy scalar type.

_T = TypeVar('_T')

class RandomState:
def __init__(self, state: Optional[Union[int, ndarray]] = ...) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor argument is called seed, not state:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html

Also, it can accept sequences of integers ("array_like") not just ndarray instances.

@overload
def beta(self, a: float, b: ndarray) -> ndarray: ...
@overload
def beta(self, a: ndarray, b: ndarray) -> ndarray: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can also accept arbitrary "array-like" types which it coerces to ndarrays, e.g., lists of numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I'm not sure how to represent "array-like" types. A naive approach would be something like:

_ArrayLike = Union[Sequence[_T], ndarray]

But nested sequences are also _ArrayLike and MyPy doesn't have support for recursive types (which we'd need for arbitrarily nested sequences).

@shoyer
Copy link
Member

shoyer commented Mar 1, 2018

It's verbose and irritating (I actually have a jinja2 template that generates this file), although this is a one-time cost so I'm not too concerned.

There are going to be quite a few functions like these (e.g., all NumPy ufuncs), so it may make sense to incorporate some sort of template generation in the build process for this library.

I guess ufuncs are objects, we so might be able to specify the type-casting rules once (for each number of function arguments) on the base class. But there are still lots of NumPy functions/methods that aren't ufuncs...

overload semantics are a little in flux -- because of python/mypy#4020, some of the overloads are counted as overlapping from MyPy and return an error (see that last commit)

This is more concerning to me -- it basically blocks anyone from using our annotations with MyPy. I think we need to ensure that mypy can use our annotations without errors.

The type signatures aren't strictly correct -- if you pass in a scalar ndarray into one of these functions, you get a scalar non-ndarray out. I'm not sure how to deal with this until we can distinguish scalar ndarrays and non-scalar ndarrays.

Yes, this sort of behavior is very unfortunate for us, and unfortunately is also quite prevalent in NumPy. My inclination is that our type-stubs should prioritize correctness more than catching all possible errors. Zero-dimensional arrays are somewhat usual to see in NumPy, but they do come up. Without typing support for array dimensionality, this would mean that many of these return values should be unsatisfying Any types.

@alanhdu
Copy link
Contributor Author

alanhdu commented Mar 2, 2018

I believe I've figured out how to hack around the @overload problem -- by explicitly annotating size: None, MyPy no longer thinks that they're conflicting types.

There are going to be quite a few functions like these (e.g., all NumPy ufuncs), so it may make sense to incorporate some sort of template generation in the build process for this library.

Yeah, I think some template generation "build" makes a lot of sense. my approach w/ Jinja2 is pretty ugly (because of how it handles whitespace) and still fails flake8 despite my best efforts. Do you know of a good way to template out the AST level (or something that's not whitespace sensitive?)? For prettifying, I've thought about using yapf or some other autoformatter (although we might have to tweak them for .pyi files).

Yes, this sort of behavior is very unfortunate for us, and unfortunately is also quite prevalent in NumPy. My inclination is that our type-stubs should prioritize correctness more than catching all possible errors. Zero-dimensional arrays are somewhat usual to see in NumPy, but they do come up. Without typing support for array dimensionality, this would mean that many of these return values should be unsatisfying Any types.

I'm willing to defer to your judgement here, but I'd personally lean the other way -- IME w/ MyPy is still limited enough that you almost never have "seamless" integration with any reasonably large codebase, and given that you have to adapt the coding style (to write "type-friendly" code) and do integration work anyways, I think it's reasonable to prioritize "correctness" (as long as the burden's not too much, like having to add some int and float casts for numpy scalars).

In any case, I think Any is an ok return type -- we could also return a Union[int, ndarray] or whatever, although that's arguably the worst of both worlds (since you'd almost certainly need to do a type assertion on the output anyways).

It conflicts with `@overload`s in type stubs
@alanhdu alanhdu force-pushed the random branch 2 times, most recently from b38cc2b to 813f0b9 Compare July 6, 2018 23:02
Better overloading rules
@alanhdu
Copy link
Contributor Author

alanhdu commented Jul 6, 2018

@shoyer Sorry for the (very long) delay!

I've pushed a new revision which (I believe) fully types out np.random (although I need to go through and double-check and write some tests). I decided to start over with a custom stub generator (in generate/random.py and generate/utils.py), which I found to be much smoother and easier to read than trying to use Jinja2 (significant whitespace meant that Jinja2 was finicky).

To avoid having to think too much about formatting, I also used https://github.com/ambv/black as a (manual) post-processing step of the generated stub files.

I think the main unresolved questions I have are:

  • Does this light-weight stub generator look good, or does it need more structure?
  • What's the best way to integrate a stub generator and the stubs? do we check both of them into the repo? Do we automatically run the generator in CI?
  • Should black just autoformat everything in the repo? (I think the answer is yes)
  • I don't think we resolved the behavior with 0-D ndarrays. I still believe that the current behavior is the "best" compromise, although I'm willing to defer to whatever you think's best.

WDYT?

@alanhdu alanhdu mentioned this pull request Nov 20, 2018
person142 added a commit to person142/numpy-stubs that referenced this pull request Mar 26, 2020
The most significant benefit is changes to `@overload` around what
counts as overlapping signatures that could unlock some functionality;
see e.g.

- numpy#44 (comment)
- numpy#11 (comment)
rgommers pushed a commit that referenced this pull request Mar 29, 2020
The most significant benefit is changes to `@overload` around what
counts as overlapping signatures that could unlock some functionality;
see e.g.

- #44 (comment)
- #11 (comment)
@person142
Copy link
Member

Closing; but feel free to make a new PR against the NumPy main repo!

@person142 person142 closed this Jun 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants