Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: Use stubtest to ensure consistency with pyi files #47760

Closed
hauntsaninja opened this issue Jul 16, 2022 · 8 comments · Fixed by #47817
Closed

TYP: Use stubtest to ensure consistency with pyi files #47760

hauntsaninja opened this issue Jul 16, 2022 · 8 comments · Fixed by #47817
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Typing type annotations, mypy/pyright type checking

Comments

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Jul 16, 2022

Problem Description

Currently it's possible for stub files (pyi) to become out of date with the implementation (pyx, py)
pandas-dev/pandas-stubs#33 (comment)

Feature Description

stubtest is a tool that is shipped with mypy that can help compare the runtime implementation with what it finds in the stubs. For example, it found these issues (along with several others):
#47756
#47758

Alternative Solutions

I'm not aware of other tools that would work for this use case.

Additional Context

cc @twoertwein

@hauntsaninja hauntsaninja added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 16, 2022
@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jul 16, 2022

Assuming a setup and activated virtualenv, the following basically works (might need to double check configuration to make sure there aren't problematic false negatives):

python -m pip install pypyp
# only check modules that correspond to stub files
python -m mypy.stubtest $(find pandas -name '*.pyi' | python -m pyp 'x.replace("/", ".")[:-len(".pyi")]') --ignore-missing-stub --concise --mypy-config-file pyproject.toml

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jul 16, 2022

Output of the above on main:

pandas._libs.lib.NoDefault is not a recognised type alias
pandas._libs.lib._NoDefault.no_default variable differs from runtime type Literal['NO_DEFAULT']
pandas._libs.lib.ndarray_obj_2d is not present at runtime
pandas._libs.missing.NAType.__new__ is inconsistent, stub does not have *args argument "args"
pandas._libs.algos.ensure_complex128 is not present at runtime
pandas._libs.algos.ensure_complex64 is not present at runtime
pandas._libs.algos.ensure_float32 is not present at runtime
pandas._libs.algos.ensure_uint16 is not present at runtime
pandas._libs.algos.ensure_uint32 is not present at runtime
pandas._libs.algos.ensure_uint64 is not present at runtime
pandas._libs.algos.ensure_uint8 is not present at runtime
pandas._libs.algos.kth_smallest is inconsistent, stub argument "a" differs from runtime argument "arr"
pandas._libs.groupby.group_cummax is inconsistent, stub does not have argument "mask"
pandas._libs.groupby.group_cummax is inconsistent, stub does not have argument "result_mask"
pandas._libs.groupby.group_cummax is inconsistent, stub does not have argument "skipna"
pandas._libs.groupby.group_cummin is inconsistent, stub does not have argument "mask"
pandas._libs.groupby.group_cummin is inconsistent, stub does not have argument "result_mask"
pandas._libs.groupby.group_cummin is inconsistent, stub does not have argument "skipna"
pandas._libs.groupby.group_last is inconsistent, runtime argument "result_mask" has a default value but stub argument does not
pandas._libs.groupby.group_last is inconsistent, stub does not have argument "is_datetimelike"
pandas._libs.groupby.group_max is inconsistent, stub argument "mask" differs from runtime argument "is_datetimelike"
pandas._libs.groupby.group_max is inconsistent, runtime argument "is_datetimelike" has a default value of type Literal[False], which is incompatible with stub argument type Union[numpy.ndarray[Any, Any], None]
pandas._libs.groupby.group_max is inconsistent, stub argument "result_mask" differs from runtime argument "mask"
pandas._libs.groupby.group_max is inconsistent, stub does not have argument "result_mask"
pandas._libs.groupby.group_min is inconsistent, stub argument "mask" differs from runtime argument "is_datetimelike"
pandas._libs.groupby.group_min is inconsistent, runtime argument "is_datetimelike" has a default value of type Literal[False], which is incompatible with stub argument type Union[numpy.ndarray[Any, Any], None]
pandas._libs.groupby.group_min is inconsistent, stub argument "result_mask" differs from runtime argument "mask"
pandas._libs.groupby.group_min is inconsistent, stub does not have argument "result_mask"
pandas._libs.groupby.group_nth is inconsistent, runtime argument "result_mask" has a default value but stub argument does not
pandas._libs.groupby.group_nth is inconsistent, stub does not have argument "is_datetimelike"
pandas._libs.groupby.group_rank is inconsistent, runtime argument "ties_method" has a default value of type Literal['average'], which is incompatible with stub argument type Union[Literal['aveage'], Literal['min'], Literal['max'], Literal['first'], Literal['dense']]
pandas._libs.internals.BlockPlacement cannot be subclassed at runtime, but isn't marked with @final in the stub
pandas._libs.parsers.TextReader.set_error_bad_lines is not present at runtime
pandas._libs.sparse.SparseIndex.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.sparse.SparseIndex.equals is not present at runtime
pandas._libs.sparse.SparseIndex.indices is not present at runtime
pandas._libs.sparse.SparseIndex.intersect is not present at runtime
pandas._libs.sparse.SparseIndex.lookup is not present at runtime
pandas._libs.sparse.SparseIndex.lookup_array is not present at runtime
pandas._libs.sparse.SparseIndex.make_union is not present at runtime
pandas._libs.sparse.SparseIndex.nbytes is not present at runtime
pandas._libs.sparse.SparseIndex.ngaps is not present at runtime
pandas._libs.sparse.SparseIndex.to_block_index is not present at runtime
pandas._libs.sparse.SparseIndex.to_int_index is not present at runtime
pandas._libs.join.asof_join_backward is not present at runtime
pandas._libs.join.asof_join_backward_on_X_by_Y is inconsistent, stub does not have argument "use_hashtable"
pandas._libs.join.asof_join_forward is not present at runtime
pandas._libs.join.asof_join_forward_on_X_by_Y is inconsistent, stub does not have argument "use_hashtable"
pandas._libs.join.asof_join_nearest is not present at runtime
pandas._libs.join.asof_join_nearest_on_X_by_Y is inconsistent, stub does not have argument "use_hashtable"
pandas._libs.tslibs.offsets.BaseOffset._apply_array is inconsistent, stub argument "dtarr" differs from runtime argument "other"
pandas._libs.tslibs.offsets.BaseOffset.freqstr is not a function
pandas._libs.tslibs.offsets.BusinessHour.rollback is inconsistent, stub argument "dt" differs from runtime argument "other"
pandas._libs.tslibs.offsets.BusinessHour.rollforward is inconsistent, stub argument "dt" differs from runtime argument "other"
pandas._libs.tslibs.timedeltas.Timedelta.__new__ is inconsistent, runtime argument "unit" has a default value of type None, which is incompatible with stub argument type builtins.str
pandas._libs.tslibs.timedeltas.UnitChoices is not present at runtime
pandas._libs.tslibs.dtypes.PeriodDtypeBase._freq_group_code is not a function
pandas._libs.tslibs.dtypes.PeriodDtypeBase._freqstr is not a function
pandas._libs.tslibs.timezones.get_dst_info is not present at runtime
pandas._libs.tslibs.timestamps.Timestamp.astimezone is inconsistent, stub argument "tz" has a default value but runtime argument does not
pandas._libs.tslibs.timestamps.Timestamp.day_of_month is not present at runtime
pandas._libs.tslibs.timestamps.Timestamp.fromtimestamp is inconsistent, stub argument "t" differs from runtime argument "ts"
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "year" has a default value of type None, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "month" has a default value of type None, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "day" has a default value of type None, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "hour" has a default value of type None, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "minute" has a default value of type None, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "second" has a default value of type None, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "microsecond" has a default value of type None, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, stub argument "tzinfo" differs from runtime argument "nanosecond"
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, stub argument "fold" differs from runtime argument "tzinfo"
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, runtime argument "tzinfo" has a default value of type builtins.type, which is incompatible with stub argument type builtins.int
pandas._libs.tslibs.timestamps.Timestamp.replace is inconsistent, stub does not have argument "fold"
pandas._libs.tslibs.timestamps.Timestamp.utcfromtimestamp is inconsistent, stub argument "t" differs from runtime argument "ts"
pandas._libs.tslibs.ccalendar.dayofweek is not present at runtime
pandas._libs.tslibs.ccalendar.is_leapyear is not present at runtime
pandas._libs.tslibs.nattype.NaTType.asm8 is not a function
pandas._libs.tslibs.nattype.is_null_datetimelike is not present at runtime
pandas._libs.hashtable.Complex128Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.Complex64Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.Float32Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.Float64Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.HashTable.__contains__ is not present at runtime
pandas._libs.hashtable.HashTable.__len__ is not present at runtime
pandas._libs.hashtable.HashTable.factorize is not present at runtime
pandas._libs.hashtable.HashTable.get_item is not present at runtime
pandas._libs.hashtable.HashTable.get_labels is not present at runtime
pandas._libs.hashtable.HashTable.get_state is not present at runtime
pandas._libs.hashtable.HashTable.lookup is not present at runtime
pandas._libs.hashtable.HashTable.map_locations is not present at runtime
pandas._libs.hashtable.HashTable.set_item is not present at runtime
pandas._libs.hashtable.HashTable.sizeof is not present at runtime
pandas._libs.hashtable.HashTable.unique is not present at runtime
pandas._libs.hashtable.Int16Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.Int32Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.Int64Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.Int8Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.ObjectVector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.StringVector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.UInt16Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.UInt32Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.UInt64Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.hashtable.UInt8Vector.__init__ is inconsistent, stub does not have *args argument "args"
pandas._libs.properties.cache_readonly.__get__ is inconsistent, stub argument "__type" has a default value but runtime argument does not
pandas._libs.properties.cache_readonly.deleter is not present at runtime
pandas._libs.properties.cache_readonly.getter is not present at runtime
pandas._libs.properties.cache_readonly.setter is not present at runtime

@twoertwein twoertwein added the Typing type annotations, mypy/pyright type checking label Jul 17, 2022
@twoertwein
Copy link
Member

When I run python -m mypy.stubtest $(find pandas -name '*.pyi' | python -m pyp 'x.replace("/", ".")[:-len(".pyi")]') --ignore-missing-stub --concise --mypy-config-file pyproject.toml using mypy 0.980+dev.8428af7e23c582bf1e088777a83888cad98cb233, I get four type errors that are not reported when running mypy:

error: not checking stubs due to mypy build errors:
pandas/core/arrays/string_arrow.py:114: error: Incompatible types in assignment (expression has type "StringDtype", variable has type "ArrowDtype")  [assignment]
pandas/core/arrays/string_arrow.py:154: error: Incompatible return value type (got "ArrowDtype", expected "StringDtype")  [return-value]
pandas/core/window/rolling.py:1842: error: Unused "type: ignore" comment
pandas/core/window/rolling.py:1848: error: Unused "type: ignore" comment

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jul 17, 2022

You're a commit too early, it's "fixed" on latest mypy by python/mypy#13161 (aka runs without errors), but I'm not sure I understand the exact connection. The first two errors actually looked plausible to me, since _dtype in ArrowExtensionArray is ArrowDtype

@twoertwein
Copy link
Member

@hauntsaninja changed some type annotations locally (and added some new ignore comments) so that mypy and stubtest happen to agree :) (not great but it works for now). I would like to call stubtest from within python but the following seems to exit without running stubtest:

import os
from pathlib import Path
import sys

from mypy import stubtest

if __name__ == "__main__":
    # find pyi files
    root = Path.cwd()
    pyi_modules = [
        str(pyi.relative_to(root)).replace(os.sep, ".")
        for pyi in root.glob("pandas/**/*.pyi")
    ]

    args = pyi_modules + [
        "--ignore-missing-stub",
        "--concise",
        "--mypy-config-file",
        "pyproject.toml",
    ]
    sys.exit(stubtest.test_stubs(stubtest.parse_options(args)))

@hauntsaninja
Copy link
Contributor Author

I wasn't able to repro on a baby test case (using mypy 0.971).

~/tmp/tmp_stubt λ cat z.py 
x = "asdf"
~/tmp/tmp_stubt λ cat z.pyi
x: int
~/tmp/tmp_stubt λ python3                  
Python 3.9.9 (main, Jul 18 2022, 16:20:03) 
[Clang 13.0.0 (clang-1300.0.27.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mypy import stubtest
>>> stubtest.test_stubs(stubtest.parse_options(["z"]))   
error: z.x variable differs from runtime type Literal['asdf']
Stub: at line 1
builtins.int
Runtime:
'asdf'

Found 1 error (checked 1 module)

I'll try again later with my laptop that had a pandas dev env

@twoertwein
Copy link
Member

twoertwein commented Jul 20, 2022

Feel free to checkout this branch https://github.com/twoertwein/pandas/tree/stubtest and run python scripts/run_stubtest.py (I'm using mypy 0.971 and python 3.10)

@twoertwein
Copy link
Member

I found my (stupid) mistake: I wasn't removing the pyi suffix! I would have received errors, if I had not added ignore-missing-stub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants