-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: Fix typing in ExtensionDtype registry #41203
Conversation
@simonjayhawkins can you review, please? |
cc @simonjayhawkins if you can (a few open by @Dr-Irv ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dr-Irv. generally lgtm.
pandas/core/dtypes/base.py
Outdated
@@ -268,7 +271,7 @@ def construct_from_string(cls, string: str): | |||
return cls() | |||
|
|||
@classmethod | |||
def is_dtype(cls, dtype: object) -> bool: | |||
def is_dtype(cls: type_t[ExtensionDtypeT], dtype: object) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed, revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/dtypes/base.py
Outdated
... | ||
|
||
@overload | ||
def find(self, dtype: ExtensionDtype) -> ExtensionDtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def find(self, dtype: ExtensionDtype) -> ExtensionDtype: | |
def find(self, dtype: ExtensionDtypeT) -> ExtensionDtypeT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@Dr-Irv removing the milestones on these typing PRs as our types are not public so the 1.3.0rc date is not a concern |
@simonjayhawkins pushed changes a week ago, so looking forward to a new review. |
pandas/core/dtypes/base.py
Outdated
|
||
Returns | ||
------- | ||
return the first matching dtype, otherwise return None | ||
""" | ||
if not isinstance(dtype, str): | ||
dtype_type = dtype | ||
dtype_type: type[ExtensionDtype] | type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about NpDtype added to the signature above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, NpDtype
is Union[str, np.dtype]
. If dtype
is str
, this code isn't entered. If dtype
is an np.dtype
, then an np.dtype
is a type
, so adding the np.dtype
type to the type of dtype_type
is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the union of type[ExtensionDtype]
and type
is just type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in next commit
pandas/core/dtypes/base.py
Outdated
dtype: type[ExtensionDtype] | ||
| ExtensionDtype | ||
| NpDtype | ||
| type_t[str | float | int | complex | bool | object], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arent these subsumed by NpDtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, NpDtype
includes str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does #41945 help simplify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to undo this
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Merged to master to remove stale label and to see that it passes tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @simonjayhawkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dr-Irv. sorry for the delay in reviewing the latest changes.
pandas/core/dtypes/base.py
Outdated
@@ -422,28 +427,52 @@ def register(self, dtype: type[ExtensionDtype]) -> None: | |||
|
|||
self.dtypes.append(dtype) | |||
|
|||
def find(self, dtype: type[ExtensionDtype] | str) -> type[ExtensionDtype] | None: | |||
@overload | |||
def find(self, dtype: type[ExtensionDtypeT]) -> ExtensionDtypeT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> typ
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> pd.core.dtypes.base.Registry().find(typ)
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>>
def find(self, dtype: type[ExtensionDtypeT]) -> ExtensionDtypeT: | |
def find(self, dtype: type[ExtensionDtypeT]) -> type[ExtensionDtypeT]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that type[ExtensionDtypeT]
is not a possible return type, so I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see code sample in comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so I can put it back, but then type[ExtensionDtypeT]
conflicts with npt.DTypeLike
, because that can be any type, so have to go back to specifying the specific types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type[object] can create issues. but unfortunately object
is a valid dtype, use ignores where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to avoid false positives for users passing a valid dtype and also avoid using our NpDtype
alias. Ideally we want to remove use of that alias completely.
Examining this function and ignoring the docstring, any type can be passed and the return would be None
if not an EA type. i.e. this function never raises a TypeError.
maybe widen the types to any, use object
and if mypy complains use Any
or ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registry.find()
is not in the public API, so do we need to worry about "false positives for users" ? This code is just so pandas developers don't mess things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, The registry attribute was privatized in #40538 so that pyright --verifytypes
passes. The changes in this PR are not to fix pyright issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't remember! It's been since January since I started this adventure. I did this PR as part of getting ExtensionArray
working. I haven't been using pyright
to verify this.
pandas/core/dtypes/base.py
Outdated
|
||
@overload | ||
def find( | ||
self, dtype: NpDtype | type_t[str | float | int | complex | bool | object] | str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type_t and type used. choose one for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to type_t
in next commit
pandas/core/dtypes/base.py
Outdated
dtype: type[ExtensionDtype] | ||
| ExtensionDtype | ||
| NpDtype | ||
| type_t[str | float | int | complex | bool | object], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does #41945 help simplify
pandas/core/dtypes/base.py
Outdated
return dtype | ||
# cast needed here as mypy doesn't know we have figured | ||
# out it is an ExtensionDtype | ||
return cast(ExtensionDtype, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype
could also be type[ExtensionDtype]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at this point in the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using code sample from #41203 (comment)
>>> typ
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> pd.core.dtypes.base.Registry().find(typ)
> /home/simon/pandas/pandas/core/dtypes/base.py(460)find()
-> return cast(ExtensionDtype, dtype)
(Pdb) dtype
<class 'pandas.core.arrays.integer.Int64Dtype'>
(Pdb) type(dtype)
<class 'type'>
(Pdb)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. Fixed in a new commit.
pandas/core/dtypes/base.py
Outdated
|
||
Returns | ||
------- | ||
return the first matching dtype, otherwise return None | ||
""" | ||
if not isinstance(dtype, str): | ||
dtype_type = dtype | ||
dtype_type: type[ExtensionDtype] | type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the union of type[ExtensionDtype]
and type
is just type
?
pandas/core/dtypes/base.py
Outdated
|
||
return None | ||
|
||
for dtype_type in self.dtypes: | ||
for dtype_loop in self.dtypes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the annotations self.dtypes: list[type[ExtensionDtype]]
and dtype_type
is dtype_type: type[ExtensionDtype] | type
so why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted back due to change of declaration of dtype_type
above.
pandas/core/dtypes/base.py
Outdated
# out it is an ExtensionDtype | ||
return cast(ExtensionDtype, dtype) | ||
# out it is an ExtensionDtype or type_t[ExtensionDtype] | ||
return cast(Union[ExtensionDtype, type_t[ExtensionDtype]], dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, to avoid adding the Union import, i have a preference for,
return cast(Union[ExtensionDtype, type_t[ExtensionDtype]], dtype) | |
return cast("ExtensionDtype | type_t[ExtensionDtype]", dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using
import numpy as np
import pandas as pd
from pandas.core.dtypes.base import _registry
reveal_type(_registry.find("int"))
reveal_type(_registry.find(object))
reveal_type(_registry.find(np.dtype))
reveal_type(_registry.find(np.dtype("object")))
reveal_type(_registry.find(pd.Int64Dtype))
reveal_type(_registry.find(pd.Int64Dtype()))
/home/simon/t.py:7: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:8: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:9: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:10: note: Revealed type is "Union[Type[<nothing>], pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:11: note: Revealed type is "Type[pandas.core.arrays.integer.Int64Dtype*]"
/home/simon/t.py:12: note: Revealed type is "pandas.core.arrays.integer.Int64Dtype*"
adding another overload for string type (and maybe using npt.DTypeLike) can get us closer?
/home/simon/t.py:7: note: Revealed type is "Union[pandas.core.dtypes.base.ExtensionDtype, None]"
/home/simon/t.py:8: note: Revealed type is "None"
/home/simon/t.py:9: note: Revealed type is "None"
/home/simon/t.py:10: note: Revealed type is "None"
/home/simon/t.py:11: note: Revealed type is "Type[pandas.core.arrays.integer.Int64Dtype*]"
/home/simon/t.py:12: note: Revealed type is "pandas.core.arrays.integer.Int64Dtype*"
diff --git a/pandas/core/dtypes/base.py b/pandas/core/dtypes/base.py
index abac3faa97..508c92c671 100644
--- a/pandas/core/dtypes/base.py
+++ b/pandas/core/dtypes/base.py
@@ -17,7 +17,7 @@ import numpy as np
from pandas._libs.hashtable import object_hash
from pandas._typing import (
DtypeObj,
- NpDtype,
+ npt,
type_t,
)
from pandas.errors import AbstractMethodError
@@ -437,16 +437,21 @@ class Registry:
@overload
def find(
- self, dtype: NpDtype | type_t[str | float | int | complex | bool | object]
- ) -> type_t[ExtensionDtypeT] | ExtensionDtype | None:
+ self, dtype: str
+ ) -> ExtensionDtype | None:
+ ...
+
+ @overload
+ def find(
+ self, dtype: npt.DTypeLike
+ ) -> None:
...
def find(
self,
dtype: type_t[ExtensionDtype]
| ExtensionDtype
- | NpDtype
- | type_t[str | float | int | complex | bool | object],
+ | npt.DTypeLike
) -> type_t[ExtensionDtype] | ExtensionDtype | None:
"""
Parameters
we may in the future decide to create an alias for the literal strings that represent built-in EA dtypes, but not needed here.
So if you do: @overload
def find(self, dtype: type_t[ExtensionDtypeT]) -> type_t[ExtensionDtypeT]:
...
@overload
def find(self, dtype: ExtensionDtypeT) -> ExtensionDtypeT:
...
@overload
def find(self, dtype: str) -> ExtensionDtype | None:
...
@overload
def find(
self, dtype: npt.DTypeLike
) ->None:
... you get this from mypy:
That's because In the next commit, I changed the 4th overload to be: @overload
def find(
self, dtype: npt.DTypeLike
) -> type_t[ExtensionDtype] | ExtensionDtype | None:
... That's because the |
Thanks @Dr-Irv can you post the output of the code sample in #41203 (review) with the latest changes. |
|
@simonjayhawkins ping... |
pinging @simonjayhawkins |
@Dr-Irv if you can rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dr-Irv
Simplifying #40421 to just fix up typing in the registry for Extension Dtypes, and return type for
construct_from_string
inpandas/core/dtypes/base.py