-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: remove NaTType as possible result of Timestamp and Timedelta constructor #46171
Conversation
This reverts commit 84b119f.
Aren't the "type: ignore[misc]"s that this removes there specifically because mypy does pick it up?
How big a deal is this for users? I'm uncomfortable with intentionally making the annotations less accurate. The ideal solution is upstream in the type checkers accurately reflecting the capabilites of |
IMHO, this is a big deal. Otherwise, the code above would have to be written as: from typing import cast
import pandas as pd
print("version ", pd.__version__)
ts:pd.Timestamp = cast(pd.Timestamp, pd.Timestamp("2022-02-27"))
print(ts) There's a tradeoff here to consider between how much the static type checker picks up versus what errors would get picked up in runtime. Right now,
Actually, |
How about using overloads to either return Timestamp or NaTType? Probably cannot catch all cases when NaTType might be returned but at least the obvious ones. Edit: Or can NaTType inherit from both Timedelta and Timestamp? |
I have another solution that seems to be working locally. Idea is as follows:
I can push that if people want to take a look, but it's a more substantive change. Alternatively, I could create a second PR with that approach. Comments welcome. |
Would need to do some more gymnastics to avoid having Timedelta inherit from datetime. -0.5 on the original. Making our annotations less accurate for the sake of allowing users to make their annotations less accurate just seems counter-productive. #24983 might be a better long-term solution. |
So I have something that works - it only affects typing, and essentially is treating Question is whether I put that up as a separate PR or push it here. |
I cannot comment on whether the type annotations should purposefully be different from the implementation. How do the microsoft/virtus lab stubs handle it? I understand that from an end-user perspective, the current situation is as bad as it is with read_csv (DataFrame or TextFileReader). Except that for read_csv we can more easily create overloads. Since mypy does anyways not recognize that If Timestamp/Timedelta were very simply (take only one argument), I wouldn't mind the following as a middle ground: # cases when it is guanteed to be NaTType
# Literal[float("NaN")] is probably not valid
def __new__(value: None | Literal[float("NaN")] | ...) -> NaTType: ...
# might be NaTType, but in most cases is Timestamp
def __new__(value: BaseOffset | ...) -> Timestamp: ...
# in case the argument is a union of the above
def __new__(value: BaseOffset | ... | None | ...) -> Timestamp | NaTType: ... |
Well, I'm working on the microsoft ones now. I copied over what was in pandas, and discovered this issue, and have submitted a PR there to do what I propose here. As for Virtus Labs, their stubs do not indicate
We could do something similar with the current Here's a question - I think the only way |
A near-complete list: None, np.nan, NaT, np.datetime64("NaT"), "NaT", "nat", "NAT", "nan", "NaN", "NAN", NaT.value |
I played around with this, and I don't think this is possible. You can't have different I think to make things good for the users, we have to go with what is in this PR. Side note: I tried something like this, and from __future__ import annotations
from typing import overload
class A:
pass
class B:
pass
class C(A, B):
@overload
def __new__(cls, x: int) -> A:
...
@overload
def __new__(cls, x: str) -> B:
...
def __new__(cls, x: int | str) -> C | A | B:
if isinstance(x, int):
return super(A, cls).__new__(cls)
elif isinstance(x, str):
return super(B, cls).__new__(cls)
else:
raise Exception("bad argument")
y = C(2)
z = C("hey")
reveal_type(y)
reveal_type(z) pyright (with defaults) reports:
mypy (with defaults) reports:
|
While I would like to avoid mypy ignores comments, I think your approach is working well:
From my perspective it looks good. The question is what do we prefer: type annotations being easy to use ("most likely" overloads) or being as strict as possible (always returning the union). Hopefully, the implementation will change in the future so that we don't need to make these trade-offs. |
@@ -56,7 +54,7 @@ class Timestamp(datetime): | |||
tzinfo: _tzinfo | None = ..., | |||
*, | |||
fold: int | None = ..., | |||
) -> _DatetimeT | NaTType: ... | |||
) -> _DatetimeT: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment (maybe ref this issue) about the reasoning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls rebase as well
pandas/io/formats/format.py
Outdated
@@ -1767,16 +1767,13 @@ def _format_datetime64_dateonly( | |||
nat_rep: str = "NaT", | |||
date_format: str | None = None, | |||
) -> str: | |||
if x is NaT: | |||
if x is NaT or isinstance(x, NaTType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is NaTType leaking here? (is this is a typing issue? e.g. maybe cast?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to just do if isinstance(x, NaTType)
here - the typing makes it clear. Will be in next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking for both NaT and NaTType is redundant. the existing check should be marginally more performant than an isintance check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that x
is of type NaTType | Timestamp
, so if you just do x is NaT
then mypy
can't conclude that x
is Timestamp
when the condition is False
, but if you do isinstance(x, NaTType)
, it can make that conclusion when the condition is False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im aware. i dont think that justifies making the check redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I just changed it to if isinstance(x, NaTType)
in e79d253
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as commented elsewhere, -1 on this change.
pandas/io/sas/sas_xport.py
Outdated
@@ -139,7 +140,7 @@ | |||
""" | |||
|
|||
|
|||
def _parse_date(datestr: str) -> datetime: | |||
def _parse_date(datestr: str) -> datetime | NaTType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we have a type that is a datetime scalar | NatType already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not bc ATM NaTType subclasses datetime
@@ -1286,7 +1291,7 @@ def test_resample_consistency(): | |||
tm.assert_series_equal(s10_2, rl) | |||
|
|||
|
|||
dates1 = [ | |||
dates1: List[Union[datetime, NaTType]] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh this, should be a first class type (the Union)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a type in next commit
Change test for NaT in io/formats/format.py
@jbrockmendel ok here? |
I remain -0.5, not gonna make a fuss about it. |
thanks! point noted. |
@@ -1767,16 +1767,13 @@ def _format_datetime64_dateonly( | |||
nat_rep: str = "NaT", | |||
date_format: str | None = None, | |||
) -> str: | |||
if x is NaT: | |||
if isinstance(x, NaTType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on code changes to satisfy the type checker. We know that static type checker have issues with singletons and imo that was adequately noted in the removed comment adjacent to the ignore statement.
It may be nbd is some cases, but am always -1 on principle.
) -> _DatetimeT: ... | ||
# GH 46171 | ||
# While Timestamp can return pd.NaT, having the constructor return | ||
# a Union with NaTType makes things awkward for users of pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be inclined to return Any in the public api if there are genuine concerns rather that have the incorrect return type.
I disagree, users type checking their code could be doing it to ensure more robust code, which is a different requirement to intellisense and other linting productivity tools. The types should be precise (or less precise such as Any if we have issues to resolve) but not wrong. |
given @twoertwein and @jbrockmendel comments, I think this one should be reverted. |
are you aware of other projects that use a "most likely" approach? |
Unfortunately, no. But I would be happy to make a compromise to avoid having two different type annotations (internal usage vs. public usage). While I'm not a fan of two type annotations, I agree that we should start with the MS stubs to have a smooth transition from a user-perspective. |
yep. a reason that i'm not too actively involved in typing issues at the moment. letting it play out for now. So if this PR is prep for that then OK, but I would like to also see the impact on our current inline types and code changes to be kept to a minimum. |
As I noted in the linked issue #24983, I believe removing that type annotation is a bad idea. In particular, the fact that NaTType is not a subclass of Timestamp and/or Timedelta is not just an inconvenience for the static typing of user code, but also a risk; whenever a NaTType is being spit out, user code expecting either a Timedelta or Timestamp might break. Thus, the perceived inconvenience to the user is actually just the type checker doing its job: Alerting users from potential dangers through wrong assumptions. Sweeping that alert under the rug defeats the purpose of type annotations. |
@mroeschke should we consider reverting for 1.5? |
I wouldn't revert the whole PR. You could just revert the places where
Having said that, it would be far better if |
It would be good to have the more correct annotation back for 1.5. I am not fully caught up what partial changes are needed to restore the old annotation, so if someone could provide a PR to correct that, it would be appreciated. Otherwise, if I am not able to get to it/figure it out I'll just fully revert for 1.5. |
I will try to revert just that part. |
A couple of notes:
I created a PR #48112 to revert just this typing change in the constructors, but if people are using |
While working on the Microsoft type stubs, I discovered this. With the current code, if you run
pyright
on the following with default settings for basic type checking (not using what we have inpyproject.toml
):you get this:
mypy
doesn't pick this up. The real issue here is that you generally expect aTimestamp
constructor to return aTimestamp
, so by returning theUnion
, we are then forcing end users tocast
the result of the constructor, which is pretty inconvenient.So even though the constructor could return a
NaTType
, we shouldn't force users to code for that due to static type checking.This PR just takes out the
NaTType
as a return type for the constructors.