Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Using pyright to analyze missing type declarations #39813

Open
Dr-Irv opened this issue Feb 14, 2021 · 5 comments
Open

ENH: Using pyright to analyze missing type declarations #39813

Dr-Irv opened this issue Feb 14, 2021 · 5 comments
Labels
Enhancement Typing type annotations, mypy/pyright type checking

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 14, 2021

This describes a procedure for using the command line tool pyright (https://github.com/microsoft/pyright/blob/master/docs/command-line.md) to identify places in the pandas code that are missing type declarations. xref #28142

  1. Install pyright: See https://github.com/microsoft/pyright#command-line
  2. In your pandas development folder, create an empty file py.typed in the same folder as pandas\__init__.py
  3. To get the complete analysis as a text file, in your shell, cd to the folder containing README.md from pandas, and type pyright --verifytypes pandas! > pyright.out
  4. To determine the modules that need the most work, use the script shown below named verifytypes.py which can be run from the command line as python verifytypes.py and will print the top 20 modules that need fixing.

Open issues for adding types:

  1. We will need to systematically bring over the typing work done by Microsoft in https://github.com/microsoft/python-type-stubs/tree/main/pandas to help enhance our type declarations.
  2. Using pyright to determine where thing are missing will not determine if we are missing appropriate overloads. See example below.
  3. Most likely, the best way to test if we have all the overloads correct is by fully typing our tests code, and adding # ignore comments when we are specifically testing for incorrect types.
verifytypes.py utility
import subprocess
import json
import pandas as pd


def getpyrightout() -> bytes:
    try:
        pyrightout = subprocess.run(
            ["pyright", "--outputjson", "--verifytypes", "pandas!"],
            capture_output=True,
            shell=True,
        )
    except Exception as e:
        raise e

    return pyrightout.stdout


def processjson(jsonstr: bytes):
    d = json.loads(jsonstr)
    msgsSeries = pd.Series([k["message"] for k in d["diagnostics"]])
    msgsdf = msgsSeries.str.split('"', n=2, expand=True)
    msgsdf.columns = ["primary", "element", "extra"]
    typemsgs = msgsdf[msgsdf.primary.str.startswith("Type")].copy()
    typemsgs["module"] = typemsgs["element"].str.replace(r"\.[A-Z][a-z_A-Z\.]*$", "")
    notest = typemsgs[~typemsgs.module.str.startswith("pandas.tests")]
    print(
        notest.groupby(["module", "primary"])
        .size()
        .sort_values(ascending=False)
        .head(20)
    )


if __name__ == "__main__":
    processjson(getpyrightout())
Example using DataFrame.rename() where overloads are needed

This is taken from https://github.com/microsoft/python-type-stubs/blob/main/pandas/core/frame.pyi

    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Literal["backfill", "bfill", "ffill", "pad"]] = ...,
        axis: Optional[AxisType] = ...,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
        *,
        inplace: Literal[True]
    ) -> None: ...
    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Literal["backfill", "bfill", "ffill", "pad"]] = ...,
        axis: Optional[AxisType] = ...,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
        *,
        inplace: Literal[False] = ...
    ) -> DataFrame: ...
    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Union[_str, Literal["backfill", "bfill", "ffill", "pad"]]] = ...,
        axis: Optional[AxisType] = ...,
        *,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
    ) -> Union[None, DataFrame]: ...
    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Union[_str, Literal["backfill", "bfill", "ffill", "pad"]]] = ...,
        axis: Optional[AxisType] = ...,
        inplace: Optional[_bool] = ...,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
    ) -> Union[None, DataFrame]: ...
@Dr-Irv Dr-Irv added Enhancement Typing type annotations, mypy/pyright type checking labels Feb 14, 2021
@rhshadrach
Copy link
Member

For type-checking tests, by adding the return type -> None, mypy will type-check it. I think all that would remain is to type-hint pytest fixtures and parameters.

Also, adding # type: ignore is an additional test; our CI will fail if a type-ignore is not necessary.

@bashtage
Copy link
Contributor

pyright is a bit daft IMO. It complains about things like

self.some_int = int(val)

which can only be an int.

@simonjayhawkins
Copy link
Member

3. Most likely, the best way to test if we have all the overloads correct is by fully typing our tests code, and adding # ignore comments when we are specifically testing for incorrect types.

see also #40202 for a POC of a more explicit and comprehensive way of testing overloads

@jbrockmendel
Copy link
Member

@Dr-Irv IIUC we're doing this now. is this issue still active?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jan 23, 2023

@Dr-Irv IIUC we're doing this now. is this issue still active?

I created this issue as a reference so that we could identify which parts of the pandas source are missing type declarations.

So it is still valid, unless we feel that all of the pandas source now has type declarations (which I don't think is true).

I did edit the description to refer to pandas-stubs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

No branches or pull requests

5 participants