Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate NaT values for Timedelta ("NaTD") and Period? #24983

Open
shoyer opened this issue Jan 28, 2019 · 2 comments
Open

Separate NaT values for Timedelta ("NaTD") and Period? #24983

shoyer opened this issue Jan 28, 2019 · 2 comments
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action Period Period data type Timedelta Timedelta data type

Comments

@shoyer
Copy link
Member

shoyer commented Jan 28, 2019

Separate scalar missing values Timedelta, Timestamp and Period scalars would go a long ways towards achieving predictable types with pandas. As noted in #19124, it is impossible to make some operations consistent with the current state of affairs. Most recently this came up in #24957.

This is listed in the pandas2 tracker (wesm/pandas2#74), but I think it might even be achievable for pandas 1.x? There would only be backwards compatibility issues if people are explicitly checking object identity against the pd.NaT scalar, which is a bit of an anti-pattern.

@jbrockmendel
Copy link
Member

xref #24645

@burnpanck
Copy link
Contributor

We did run into this in production code of ours, where we do some timedelta gymnastics: An innocent looking part of code iterates over the rows of a pandas dataframe, and among other things applies a np.maximum to a timedelta within that row, and a constant lower bound. This fails with an UFuncTypeError only in the case where we happen to have a NaT in that row, even if that piece of code would otherwise work fine with NaTs. Thus, in our view, we'd consider this an Issue rather than an Enhancement, as it breaks invariants that one would expect from the type system.

While searching for this issue, I also came across #46171. If I understand correctly, in that PR type annotations have been made less accurate for the convenience of the users. I believe that this inconvenience was actually an alarm signal from the type checkers pointing to the underlying issue. The PR simply swept that alarm signal under the rug. Without that PR, the type-checker might have prevented us from running into this in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action Period Period data type Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

5 participants