Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed ABCs from pandas._typing #27424

Merged
merged 14 commits into from
Jul 24, 2019
Merged

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jul 16, 2019

This reveals a few new typing issues so won't pass CI but submitting as draft

Here's one way I think that works to avoid import cycles. Basically any internal imports in pandas._typing need to be set in a TYPE_CHECKING block to prevent their runtime evaluation from causing import cycles.

To illustrate the effectiveness like this if you put something like this at the end of pandas._typing:

def func(obj: FrameOrSeries) -> None:
    reveal_type(obj)

It would previously give you

pandas/_typing.py:43: note: Revealed type is 'Any'

Whereas now gives:

pandas/_typing.py:44: note: Revealed type is 'pandas.core.series.Series*'
pandas/_typing.py:44: note: Revealed type is 'pandas.core.frame.DataFrame*'

@topper-123 any way to check VSCode to see if this works with code completion?

In the long run this may not be needed with PEP 563:

https://www.python.org/dev/peps/pep-0563/

But I think this hack gets us there in the meantime

@WillAyd WillAyd added the Typing type annotations, mypy/pyright type checking label Jul 16, 2019
@WillAyd
Copy link
Member Author

WillAyd commented Jul 16, 2019

So the mypy complaints may be pointing to unexpected behavior with ExtensionArrays since casting is not part of their signature. To illustrate:

>>> import pandas as pd
>>> from pandas.core.dtypes.common import ensure_int_or_float

>>> ensure_int_or_float(pd.Series(range(3), dtype='Int64'))
0    0
1    1
2    2
dtype: int64

# Directly against PandasArray now
>>> ensure_int_or_float(pd.Series(range(3), dtype='Int64').array)
array([0., 1., 2.])  # returns float

Can ignore for now but bringing up in case it yields any insights @jreback @TomAugspurger

The IntervalIndex complaint may or may not be a bug as well @jschendel

@topper-123
Copy link
Contributor

I've tried it using reveal_type on on klass in pandas/groupby/generic.py, and it works.

PyCharm doesn't give code completions on this though, but that could possibly be that TypeVars are not implemented in PyCharm, maybe. I haven't myself gotten TypeVars to work in PyCharm, and I've tried several versions.

I don't use VSCode, so I don't if it gives completions there

@WillAyd WillAyd marked this pull request as ready for review July 16, 2019 23:20
@WillAyd
Copy link
Member Author

WillAyd commented Jul 20, 2019 via email

@pep8speaks
Copy link

pep8speaks commented Jul 21, 2019

Hello @WillAyd! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-24 15:11:11 UTC

FilePathOrBuffer = Union[str, Path, IO[AnyStr]]

FrameOrSeries = TypeVar("FrameOrSeries", ABCSeries, ABCDataFrame)
FrameOrSeries = TypeVar("FrameOrSeries", "Series", "DataFrame")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FrameOrSeries = TypeVar("FrameOrSeries", "Series", "DataFrame")
FrameOrSeries = Union["Series", "DataFrame"]

quote from https://mypy.readthedocs.io/en/latest/generics.html...
"User-defined generics are a moderately advanced feature and you can get far without ever using them..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks but this just loosens the type system rather than actually fixing anything. TypeVar is going to be generally more useful for checking functions that can be fully generic in nature.

Might just change the return of this one and see how many others require Union in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Union[Series, DataFrame] might be better written as NDFrame anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the "user-defined generics" you are referring to are more applicable to containers not TypeVars. Right now we just use a blanket Series as a return object, though in the future we could do something like Series[int] and Series[str], etc...; the Series would be the user-defined generic in that case

The TypeVar in the docs you linked is just a way of parametrizing that user-defined generic, so that a Series[int] would have to stay as a Series[int] through it's lifecycle; without that parametrization we allow Series[int] to become Series[str] without any complaints from mypy today

We are probably a ways off of doing user-defined generics but this is great that you looked into it. Certainly open to ideas on that front if you think of a good way to implement as we get more familiar with these annotations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Union[Series, DataFrame] might be better written as NDFrame anyway?

Hmm that would work though we don't typically import NDFrame anywhere so I don't think want to start here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave as FrameOrSeries as its more descriptive

@simonjayhawkins
Copy link
Member

@WillAyd

taken a quick look as requested in #27512 (comment)

There's some failures showing up in there in the io.formats file so may uncover some others here as well since that's a central module getting touched

there is only one mypy error pandas/core/window.py:266: error: Incompatible return value type (got "Series", expected "DataFrame") and that can be fixed by using a Union instead of TypeVar.

@@ -906,35 +906,35 @@ def get_indexer(
)
raise InvalidIndexError(msg)

target = ensure_index(target)
target_as_index = ensure_index(target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you also make AnyArrayLike an Alias instead of a TypeVar, then this module doesn't need to be changed at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing though - this just loosens the type checking which isn't desired. Actually moved towards TypeVars for reasons described in #26453 (comment)

Might update #27050 to include some of that info

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually moved towards TypeVars for reasons described in #26453 (comment)

i don't think that applies here.

target does not need to be a TypeVar, the type of target does not need to be maintained.

target can be any of ["ExtensionArray", "Index", "Series", "SparseSeries", np.ndarray]

ensure_index is not yet typed. but presumably will have a return type of Index.

so reassigning with an Index to a variable that has a type of Union of ["ExtensionArray", "Index", "Series", "SparseSeries", np.ndarray] is perfectly valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add Union alternatives for each TypeVar in the central module but I think that confounds the point of generic programming and/or makes our type system weaker. Another option would be to allow redefinition of variables which mypy supplies a setting for:

https://mypy.readthedocs.io/en/latest/command_line.html?highlight=allow-redefinition#miscellaneous-strictness-flags

But I also think that makes for a weaker type system, and generally there's not a lot of downside to creating a separate variable here instead of allowing it's type to implicitly be altered by the return of ensure_index

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to allow redefinition of variables which mypy supplies a setting for:

https://mypy.readthedocs.io/en/latest/command_line.html?highlight=allow-redefinition#miscellaneous-strictness-flags

But I also think that makes for a weaker type system

definitely. we should rule this out.

generally there's not a lot of downside to creating a separate variable here instead of allowing it's type to implicitly be altered by the return of ensure_index

disagree. the return type of ensure_index would need to conform to the type of target.

is creating new variables not weakening the type system?

@WillAyd
Copy link
Member Author

WillAyd commented Jul 22, 2019

@simonjayhawkins thanks for checking these. Actually was getting an error locally in io.formats which is why I pinged you on this but doesn't appear on CI here so might have just been a local merge issue. I'll push an update to the issue here today to see how it looks

@@ -240,7 +240,7 @@ def _prep_values(self, values: Optional[np.ndarray] = None) -> np.ndarray:

return values

def _wrap_result(self, result, block=None, obj=None) -> FrameOrSeries:
def _wrap_result(self, result, block=None, obj=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ideal to remove FrameOrSeries here but mypy was complaining since this function is not fully generic (i.e. a conditional branch could determine the returned object)

Options discussed were:

  1. Use Union["Series, "DataFrame"] as a return type
  2. Use NDFrame as a return type
  3. Remove

I looked at option 1 first it caused a lot of flake8 complaints with other imports in the module where we import Series directly within a function, so would add a lot of noqa directives to the file to make it work. Option 2 was suggested by @simonjayhawkins and would be viable but I figured just remove for now and can open up a follow up issue to try and make this function generic.

It was type checking to Any before this change anyway so we aren't loosing any inference. Open to thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was type checking to Any before this change anyway so we aren't loosing any inference.

adding Any (or not typing) is not so bad. There is a network effect benefit from typing, so the quicker we add type hints to modules, the quicker we see a benefit.

having Any can easily be picked up in a future pass with one of many cli flags to mypy.

so I don't think any PRs adding type annotations should be held pending perfection (and especially unrelated PRs with "Any interest in adding an annotation while you are looking at this?" . xref #26795)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so we are on the same page anything without an annotation is implicitly Any so typing it as such doesn't change anything.

How do you view the Any annotations currently in the code? I interpret that as "this has already been reviewed and really should allow anything" versus "I just added Any to get this to pass". I don't see a lot of value add to the latter since that's what happens implicitly without an annotation anyway but certainly a point for discussion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpret that as "this has already been reviewed and really should allow anything" versus "I just added Any to get this to pass". I don't see a lot of value add to the latter since that's what happens implicitly without an annotation

+1 for this. I strongly prefer to leave something un-typed as an invitation to the next person to try to figure it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this. I strongly prefer to leave something un-typed as an invitation to the next person to try to figure it out.

just done an experiment.. it appears that Any is considered a type annotation and so the code gets checked, whereas leaving it out causes the type checker to not check the code.

so -1 for not adding Any to un-typed function signatures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a case where having a function typed with Any everywhere is valuable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a case where having a function typed with Any everywhere is valuable?

not everywhere!

just where they are currently omitted because the union is of too many types. such as key in indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears that Any is considered a type annotation and so the code gets checked, whereas leaving it out causes the type checker to not check the code.

Have to double check but it's not the field you were actually annotated that throws the error as much as something else right? I think mypy might skip a function if it has no annotations so adding Any may cause it to throw errors for inferences of local variables it can't make.

With that said I really don't think adding Any is a great approach. If we wanted the former there is a --check-untyped-def flag we could enable.

This thread is probably best as a separate issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to double check but it's not the field you were actually annotated that throws the error as much as something else right? I think mypy might skip a function if it has no annotations so adding Any may cause it to throw errors for inferences of local variables it can't make.

that's correct. or even incorrect calls to typed functions.

With that said I really don't think adding Any is a great approach.

disagree.

If we wanted the former there is a --check-untyped-def flag we could enable.

ok locally for a module, but how to set-up on the CI? if we are typing gradually we are not yet ready for this.

Adding Any to a function, indicates the body of that function is ready to be type checked.

This thread is probably best as a separate issue

i'm opening another PR shortly, where Any is added for all unions of 5 or more items. https://monkeytype.readthedocs.io/en/latest/generation.html#monkeytype.typing.RewriteLargeUnion so we can continue the discusion there.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2019

@WillAyd can you rebase

@jreback jreback added this to the 1.0 milestone Jul 24, 2019
@jreback
Copy link
Contributor

jreback commented Jul 24, 2019

@WillAyd lgtm. can you rebase once more (as merged bunch of things). looks good to merge (I know you had some open quesitons, but if ok can address in followup)

@WillAyd WillAyd merged commit 3b96ada into pandas-dev:master Jul 24, 2019
@WillAyd
Copy link
Member Author

WillAyd commented Jul 24, 2019

Thanks for review @jreback and @simonjayhawkins

@WillAyd WillAyd deleted the remove-abcs branch July 24, 2019 16:43
quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove ABC Usage from pandas._typing
7 participants