-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST/TYP: start testing public api types #40202
Conversation
ballpark, how big should we expect this form of testing to get? e.g. 3 files for every .py file in core/? |
Hello @simonjayhawkins! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-03-04 12:28:38 UTC |
from #40200
so I would expect that these would be added to the typing tests. so the file will grow at the rate of PRs adding typing ( that we want to test) to public apis I added a I don't think we need to mirror the source code structure for this as we are unlikely to test typing for internals. (although we could have an internals section if there are some methods that require complex overloading that would benefit from testing) The structure (and scope) either needs more discussion if it is a concern, or be allowed to evolve over time. |
I hadn't appreciated this straight away, but there also needs to be a test for the bool case. e.g. # t.py
import pandas as pd
inplace: bool
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=False))
reveal_type(pd.DataFrame([1,2,3]).reset_index(drop=True, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, drop=False, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=True, col_fill='a'))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, col_fill='a'))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(drop=True, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, drop=False, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=inplace, col_fill='a'))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=inplace)) on master this gives: $ mypy t.py
t.py:5: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:6: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:7: note: Revealed type is 'None'
t.py:8: note: Revealed type is 'None'
t.py:9: note: Revealed type is 'None'
t.py:10: note: Revealed type is 'None'
t.py:11: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:12: note: Revealed type is 'None'
t.py:13: error: No overload variant of "reset_index" of "DataFrame" matches argument type "bool" [call-overload]
t.py:13: note: Possible overload variants:
t.py:13: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[False] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> DataFrame
t.py:13: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[True] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> None
t.py:13: note: Revealed type is 'Any'
t.py:14: error: No overload variant of "reset_index" of "DataFrame" matches argument types "bool", "bool" [call-overload]
t.py:14: note: Possible overload variants:
t.py:14: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[False] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> DataFrame
t.py:14: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[True] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> None
t.py:14: note: Revealed type is 'Any'
t.py:15: error: No overload variant of "reset_index" of "DataFrame" matches argument types "int", "bool" [call-overload]
t.py:15: note: Possible overload variants:
t.py:15: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[False] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> DataFrame
t.py:15: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[True] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> None
t.py:15: note: Revealed type is 'Any'
t.py:16: error: No overload variant of "reset_index" of "DataFrame" matches argument types "int", "bool", "bool" [call-overload]
t.py:16: note: Possible overload variants:
t.py:16: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[False] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> DataFrame
t.py:16: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[True] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> None
t.py:16: note: Revealed type is 'Any'
t.py:17: error: No overload variant of "reset_index" of "DataFrame" matches argument types "int", "bool", "str" [call-overload]
t.py:17: note: Possible overload variants:
t.py:17: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[False] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> DataFrame
t.py:17: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[True] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> None
t.py:17: note: Revealed type is 'Any'
t.py:18: error: No overload variant of "reset_index" of "DataFrame" matches argument type "bool" [call-overload]
t.py:18: note: Possible overload variants:
t.py:18: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[False] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> DataFrame
t.py:18: note: def reset_index(self, level: Union[Hashable, Sequence[Hashable], None] = ..., drop: bool = ..., inplace: Literal[True] = ..., col_level: Hashable = ..., col_fill: Hashable = ...) -> None
t.py:18: note: Revealed type is 'Any'
Found 6 errors in 1 file (checked 1 source file) while on the latest commit from #40200 it gives: $ mypy t.py
t.py:5: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:6: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:7: note: Revealed type is 'None'
t.py:8: note: Revealed type is 'None'
t.py:9: note: Revealed type is 'None'
t.py:10: note: Revealed type is 'None'
t.py:11: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:12: note: Revealed type is 'None'
t.py:13: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:14: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:15: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:16: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:17: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:18: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]' |
Anyway, thanks @simonjayhawkins for doing this - having spent several hours in the last few days making sense of these overloads, I've seen that it's really easy to fool oneself into thinking that the type hints are correct because |
Sorry, this is probably not a place for a discussion. Wouldn't it be easier to automatically decorate all functions with a runtime type checker during testing, for example with typeguard? Assuming there is a runtime type checker that supports all typing features (I think typeguard does not support overloads). An alternative to typeguard could be to use mypyc to compile compatible python files. Mypyc inserts type checks into the compiled code. But I fear that mypyc does not yet support enough python language features to be useful for pandas. |
sure is for now. this PR is a possible alternative to #39813 (comment)
other options are definitely worth discussing and maybe we should open a dedicated issue. my preference for the numpy approach (if we do test types) is that not only are we closely aligned as projects with numpy, but also thier method is "tried and tested" unlike the approach suggested in #39813 where we have no idea where the pain points maybe. we must also remember that there is also the "do nothing" option and not explicitly test the types. IIRC this was suggested in the January dev meeting (Mostly relying on user feedback to report issues.) |
@jreback @WillAyd @jbrockmendel @Dr-Irv @jorisvandenbossche Is this something we might want to do? |
I very much support pandas having some kind of tests that test our typing declarations. BUT I think it will easier for us to do what I said before in #39813 (comment) (quoting again here):
My reason for going that direction is because users will encounter typing issues in their own code (and I have first hand experience with this using the type stubs that Microsoft's pylance/pyright have put out there), and I think it will be easier for the community to fix our source code that has typed methods for the public API (and add an appropriate test in In addition, it's not clear to me how the current proposal helps us test the variety of arguments that can be passed to methods. We know that our existing testing code already tries different argument types, so if we typed our testing code (and ran mypy on it), then we're getting the benefit of one set of tests that tests functionality as well as typing signatures. |
@simonjayhawkins is this ready? i am not sure anyone has looked yet - certainly can |
I opened this PR with the intention of supporting review of #40200 and #40197. #40200 is merged and I don't want to block #40197 pending discussion here. so closing this. to be clear, I am -1 on going down the route of type checking the tests at the moment.
I had already written the above. so will keep that narrative here. but leave the PR open for now. |
will give it another look shortly. so converted to draft for now. |
closing to clear queue. |
This is no longer the case as the authors of https://github.com/VirtusLab/pandas-stubs/tree/master/tests/snippets can help assess this as an alternative.
I'll link to this from #45252, so this option is also considered. |
There has been some discussion in the recent developer meetings regarding the testing of types.
This PR uses the a method of testing used (maybe developed?) by numpy.
IMO this is most useful for testing overloads, but could be extended to the complete public api.