Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: Fix typing in ExtensionDtype registry #41203

Merged
merged 17 commits into from
Aug 3, 2021

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Apr 28, 2021

Simplifying #40421 to just fix up typing in the registry for Extension Dtypes, and return type for construct_from_string in pandas/core/dtypes/base.py

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Apr 28, 2021
@jreback jreback added this to the 1.3 milestone Apr 30, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented May 16, 2021

@simonjayhawkins can you review, please?

@jreback
Copy link
Contributor

jreback commented May 21, 2021

cc @simonjayhawkins if you can (a few open by @Dr-Irv )

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dr-Irv. generally lgtm.

pandas/core/dtypes/base.py Outdated Show resolved Hide resolved
pandas/core/dtypes/base.py Outdated Show resolved Hide resolved
@@ -268,7 +271,7 @@ def construct_from_string(cls, string: str):
return cls()

@classmethod
def is_dtype(cls, dtype: object) -> bool:
def is_dtype(cls: type_t[ExtensionDtypeT], dtype: object) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pandas/core/dtypes/base.py Outdated Show resolved Hide resolved
...

@overload
def find(self, dtype: ExtensionDtype) -> ExtensionDtype:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def find(self, dtype: ExtensionDtype) -> ExtensionDtype:
def find(self, dtype: ExtensionDtypeT) -> ExtensionDtypeT:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pandas/core/dtypes/base.py Outdated Show resolved Hide resolved
@simonjayhawkins simonjayhawkins added the ExtensionArray Extending pandas with custom dtypes or arrays. label May 22, 2021
@simonjayhawkins
Copy link
Member

@Dr-Irv removing the milestones on these typing PRs as our types are not public so the 1.3.0rc date is not a concern

@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone May 24, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented May 30, 2021

@simonjayhawkins pushed changes a week ago, so looking forward to a new review.


Returns
-------
return the first matching dtype, otherwise return None
"""
if not isinstance(dtype, str):
dtype_type = dtype
dtype_type: type[ExtensionDtype] | type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about NpDtype added to the signature above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, NpDtype is Union[str, np.dtype] . If dtype is str, this code isn't entered. If dtype is an np.dtype, then an np.dtype is a type, so adding the np.dtype type to the type of dtype_type is redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the union of type[ExtensionDtype] and type is just type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next commit

dtype: type[ExtensionDtype]
| ExtensionDtype
| NpDtype
| type_t[str | float | int | complex | bool | object],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arent these subsumed by NpDtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, NpDtype includes str

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does #41945 help simplify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to undo this

@simonjayhawkins simonjayhawkins mentioned this pull request Jun 1, 2021
4 tasks
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 1, 2021

Merged to master to remove stale label and to see that it passes tests.

@jreback jreback added this to the 1.4 milestone Jul 2, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dr-Irv. sorry for the delay in reviewing the latest changes.

@@ -422,28 +427,52 @@ def register(self, dtype: type[ExtensionDtype]) -> None:

self.dtypes.append(dtype)

def find(self, dtype: type[ExtensionDtype] | str) -> type[ExtensionDtype] | None:
@overload
def find(self, dtype: type[ExtensionDtypeT]) -> ExtensionDtypeT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> typ
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> pd.core.dtypes.base.Registry().find(typ)
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> 
Suggested change
def find(self, dtype: type[ExtensionDtypeT]) -> ExtensionDtypeT:
def find(self, dtype: type[ExtensionDtypeT]) -> type[ExtensionDtypeT]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that type[ExtensionDtypeT] is not a possible return type, so I removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see code sample in comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I can put it back, but then type[ExtensionDtypeT] conflicts with npt.DTypeLike, because that can be any type, so have to go back to specifying the specific types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type[object] can create issues. but unfortunately object is a valid dtype, use ignores where needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to avoid false positives for users passing a valid dtype and also avoid using our NpDtype alias. Ideally we want to remove use of that alias completely.

Examining this function and ignoring the docstring, any type can be passed and the return would be None if not an EA type. i.e. this function never raises a TypeError.

maybe widen the types to any, use object and if mypy complains use Any or ignore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If working with the public api, you may want to consider reviving #40202.

if not create a test file and post the results (and the test file used) in a similar maner to #40200

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry.find() is not in the public API, so do we need to worry about "false positives for users" ? This code is just so pandas developers don't mess things up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, The registry attribute was privatized in #40538 so that pyright --verifytypes passes. The changes in this PR are not to fix pyright issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't remember! It's been since January since I started this adventure. I did this PR as part of getting ExtensionArray working. I haven't been using pyright to verify this.


@overload
def find(
self, dtype: NpDtype | type_t[str | float | int | complex | bool | object] | str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_t and type used. choose one for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to type_t in next commit

dtype: type[ExtensionDtype]
| ExtensionDtype
| NpDtype
| type_t[str | float | int | complex | bool | object],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does #41945 help simplify

return dtype
# cast needed here as mypy doesn't know we have figured
# out it is an ExtensionDtype
return cast(ExtensionDtype, dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype could also be type[ExtensionDtype] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this point in the logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using code sample from #41203 (comment)

>>> typ
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> pd.core.dtypes.base.Registry().find(typ)
> /home/simon/pandas/pandas/core/dtypes/base.py(460)find()
-> return cast(ExtensionDtype, dtype)
(Pdb) dtype
<class 'pandas.core.arrays.integer.Int64Dtype'>
(Pdb) type(dtype)
<class 'type'>
(Pdb) 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Fixed in a new commit.


Returns
-------
return the first matching dtype, otherwise return None
"""
if not isinstance(dtype, str):
dtype_type = dtype
dtype_type: type[ExtensionDtype] | type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the union of type[ExtensionDtype] and type is just type?


return None

for dtype_type in self.dtypes:
for dtype_loop in self.dtypes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the annotations self.dtypes: list[type[ExtensionDtype]]

and dtype_type is dtype_type: type[ExtensionDtype] | type

so why is this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back due to change of declaration of dtype_type above.

# out it is an ExtensionDtype
return cast(ExtensionDtype, dtype)
# out it is an ExtensionDtype or type_t[ExtensionDtype]
return cast(Union[ExtensionDtype, type_t[ExtensionDtype]], dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, to avoid adding the Union import, i have a preference for,

Suggested change
return cast(Union[ExtensionDtype, type_t[ExtensionDtype]], dtype)
return cast("ExtensionDtype | type_t[ExtensionDtype]", dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next commit

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using

import numpy as np
import pandas as pd

from pandas.core.dtypes.base import _registry


reveal_type(_registry.find("int"))
reveal_type(_registry.find(object))
reveal_type(_registry.find(np.dtype))
reveal_type(_registry.find(np.dtype("object")))
reveal_type(_registry.find(pd.Int64Dtype))
reveal_type(_registry.find(pd.Int64Dtype()))
/home/simon/t.py:7: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:8: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:9: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:10: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:11: note: Revealed type is "Type[pandas.core.arrays.integer.Int64Dtype*]"
/home/simon/t.py:12: note: Revealed type is "pandas.core.arrays.integer.Int64Dtype*"

adding another overload for string type (and maybe using npt.DTypeLike) can get us closer?

/home/simon/t.py:7: note: Revealed type is "Union[pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:8: note: Revealed type is "None"
/home/simon/t.py:9: note: Revealed type is "None"
/home/simon/t.py:10: note: Revealed type is "None"
/home/simon/t.py:11: note: Revealed type is "Type[pandas.core.arrays.integer.Int64Dtype*]"
/home/simon/t.py:12: note: Revealed type is "pandas.core.arrays.integer.Int64Dtype*"
diff --git a/pandas/core/dtypes/base.py b/pandas/core/dtypes/base.py
index abac3faa97..508c92c671 100644
--- a/pandas/core/dtypes/base.py
+++ b/pandas/core/dtypes/base.py
@@ -17,7 +17,7 @@ import numpy as np
 from pandas._libs.hashtable import object_hash
 from pandas._typing import (
     DtypeObj,
-    NpDtype,
+    npt,
     type_t,
 )
 from pandas.errors import AbstractMethodError
@@ -437,16 +437,21 @@ class Registry:
 
     @overload
     def find(
-        self, dtype: NpDtype | type_t[str | float | int | complex | bool | object]
-    ) -> type_t[ExtensionDtypeT] | ExtensionDtype | None:
+        self, dtype: str
+    ) -> ExtensionDtype | None:
+        ...
+
+    @overload
+    def find(
+        self, dtype: npt.DTypeLike
+    ) -> None:
         ...
 
     def find(
         self,
         dtype: type_t[ExtensionDtype]
         | ExtensionDtype
-        | NpDtype
-        | type_t[str | float | int | complex | bool | object],
+        | npt.DTypeLike
     ) -> type_t[ExtensionDtype] | ExtensionDtype | None:
         """
         Parameters

we may in the future decide to create an alias for the literal strings that represent built-in EA dtypes, but not needed here.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 4, 2021

adding another overload for string type (and maybe using npt.DTypeLike) can get us closer?

So if you do:

    @overload
    def find(self, dtype: type_t[ExtensionDtypeT]) -> type_t[ExtensionDtypeT]:
        ...

    @overload
    def find(self, dtype: ExtensionDtypeT) -> ExtensionDtypeT:
        ...

    @overload
    def find(self, dtype: str) -> ExtensionDtype | None:
        ...

    @overload
    def find(
        self, dtype: npt.DTypeLike
    ) ->None:
        ...

you get this from mypy:

pandas\core\dtypes\base.py:431: error: Overloaded function signatures 1 and 4 overlap with incompatible return types  [misc]
pandas\core\dtypes\base.py:439: error: Overloaded function signatures 3 and 4 overlap with incompatible return types  [misc]

That's because npt.DTypeLike includes str and type as possible values. If we get a str, we can return ExtensionDtype or None.

In the next commit, I changed the 4th overload to be:

    @overload
    def find(
        self, dtype: npt.DTypeLike
    ) -> type_t[ExtensionDtype] | ExtensionDtype | None:
        ...

That's because the npt.DTypeLike is too wide, and there is not a way to just say "the numpy types" plus "the base types"

@simonjayhawkins
Copy link
Member

Thanks @Dr-Irv can you post the output of the code sample in #41203 (review) with the latest changes.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 4, 2021

Thanks @Dr-Irv can you post the output of the code sample in #41203 (review) with the latest changes.

mypytest.py:7: note: Revealed type is "Union[pandas.core.dtypes.base.ExtensionDtype, None]"
mypytest.py:8: note: Revealed type is "Union[Type[pandas.core.dtypes.base.ExtensionDtype], pandas.core.dtypes.base.ExtensionDtype, None]"
mypytest.py:9: note: Revealed type is "Union[Type[pandas.core.dtypes.base.ExtensionDtype], pandas.core.dtypes.base.ExtensionDtype, None]"
mypytest.py:10: note: Revealed type is "Union[Type[pandas.core.dtypes.base.ExtensionDtype], pandas.core.dtypes.base.ExtensionDtype, None]"
mypytest.py:11: note: Revealed type is "Type[pandas.core.arrays.integer.Int64Dtype*]"
mypytest.py:12: note: Revealed type is "pandas.core.arrays.integer.Int64Dtype*"

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 14, 2021

@simonjayhawkins ping...

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 26, 2021

pinging @simonjayhawkins

@jreback
Copy link
Contributor

jreback commented Jul 28, 2021

@Dr-Irv if you can rebase

cc @simonjayhawkins

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dr-Irv

@simonjayhawkins simonjayhawkins merged commit 60a3389 into pandas-dev:master Aug 3, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
@Dr-Irv Dr-Irv deleted the extensionv1 branch February 13, 2023 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants