-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docstrings, error messages #19672
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments, but in general, I would defer this to when we actually changed things instead of now already trying to be "prepared" or "robust" for such changes without knowing how it will exactly be done in practice.
I would rather put effort in actually finding out how we can such a refactor to dispatch ops to Array-likes, backing both ops for Series/Block and Index, but that is something you have to sync closely with @TomAugspurger to make sure you don't conflict with what he is doing (and I think help is certainly welcome! we just need to make sure we are not doing duplicate work)
pandas/core/indexes/datetimelike.py
Outdated
@@ -794,7 +796,7 @@ def isin(self, values): | |||
|
|||
def shift(self, n, freq=None): | |||
""" | |||
Specialized shift which produces a DatetimeIndex | |||
Specialized shift which produces a new object of the same type as self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a useful description for users (they don't necessarily know what 'self' is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good note. But the reason for this change isn't future-looking; this is also the docstring for TimedeltaIndex.shift
.
.format(typ=type(other))) | ||
raise TypeError("cannot add {cls} and {typ}" | ||
.format(cls=type(self).__name__, | ||
typ=type(other).__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is an improvement in robustness, as if series would use the same code, this 'cls' would become either "Series" or "TimedeltaArray", and "Series" is not that informative, and "TimedeltaArray" should not bubble up to user when using a Series (that would be an internal implementation detail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is always a TimedeltaIndex given the elif statement above
pandas/core/indexes/datetimelike.py
Outdated
@@ -804,7 +806,7 @@ def shift(self, n, freq=None): | |||
|
|||
Returns | |||
------- | |||
shifted : DatetimeIndex | |||
shifted : type(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
|
||
if isinstance(other, (datetime, compat.string_types)): | ||
if isinstance(other, datetime): | ||
# GH#18435 strings get a pass from tzawareness compat | ||
self._assert_tzawareness_compat(other) | ||
|
||
other = _to_m8(other, tz=self.tz) | ||
result = func(other) | ||
result = binop(self, other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because DatetimeIndex.__mro__
has 12 entries, so with the standard super
usage it isn't obvious where the method is defined.
Haven't looked closely yet, but I haven't touched datetimeindex yet. Including the correct class name in the error message has been an issue, so those changes should be worthwhile. Should be able to look closer in a few days. |
Just pushed a fix for the |
Codecov Report
@@ Coverage Diff @@
## master #19672 +/- ##
==========================================
+ Coverage 91.58% 91.58% +<.01%
==========================================
Files 150 150
Lines 48867 48875 +8
==========================================
+ Hits 44755 44763 +8
Misses 4112 4112
Continue to review full report at Codecov.
|
Thoughts on this? There are a number of broken cases to fix in and around these methods; ideally we'd close this out first. |
.format(typ=type(other))) | ||
raise TypeError("cannot add {cls} and {typ}" | ||
.format(cls=type(self).__name__, | ||
typ=type(other).__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is always a TimedeltaIndex given the elif statement above
.format(typ=type(other).__name__)) | ||
raise TypeError("cannot subtract {cls} and {typ}" | ||
.format(cls=type(self).__name__, | ||
typ=type(other).__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
|
||
def _sub_datelike(self, other): | ||
# subtract a datetime from myself, yielding a TimedeltaIndex | ||
from pandas import TimedeltaIndex | ||
if isinstance(other, DatetimeIndex): | ||
# require tz compat | ||
if not self._has_same_tz(other): | ||
raise TypeError("DatetimeIndex subtraction must have the same " | ||
"timezones or no timezones") | ||
raise TypeError("{cls} subtraction must have the same " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this will always be subtraction of DatetimeIndexes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now, but one of the on-deck PRs will have ndarray[datetime64] go down this path too to fix:
dti = pd.date_range('2016-01-01', periods=3)
arr = np.array(['2015-01-01', '2015-02-02', '2015-03-03'], dtype='datetime64[ns]')
>>> dti - arr
DatetimeIndex(['1971-01-01', '1970-12-01', '1970-11-03'], dtype='datetime64[ns]', freq=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about instead of using type(self).__name__
we make a little function which can format these. I agree with @jorisvandenbossche point here, in the case of an Index type the dtype is clear, but not for a Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems worthwhile but a pretty low priority. Closing for now.
.format(typ=type(other).__name__)) | ||
raise TypeError("cannot subtract {cls} and {typ}" | ||
.format(cls=type(self).__name__, | ||
typ=type(other).__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
@@ -59,23 +59,25 @@ def _td_index_cmp(opname, cls, nat_result=False): | |||
""" | |||
|
|||
def wrapper(self, other): | |||
msg = "cannot compare a TimedeltaIndex with type {0}" | |||
func = getattr(super(TimedeltaIndex, self), opname) | |||
msg = "cannot compare a {cls} with type {typ}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here? self
is always a TimedeltaIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TDI/DTI/PI have very similar implementations for these methods, some of which can be de-duplicated if we make these class-agnostic. More to the point, we want these messages to remain accurate if/when they get moved to the array level. Also in principle this could be subclassed and we'd like the name to come out right there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests which lock down these error messages? IOW the types that are supposed to appear
@@ -794,7 +796,7 @@ def isin(self, values): | |||
|
|||
def shift(self, n, freq=None): | |||
""" | |||
Specialized shift which produces a DatetimeIndex | |||
Specialized shift which produces a %(klass)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing the overriding in the subclass just to fix this name, we can maybe reword the docstring to be clear enough?
Eg mention it works for Datetime- and TimedeltaIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the idea in the earlier version. I'm open to suggestions.
|
||
def _sub_datelike(self, other): | ||
# subtract a datetime from myself, yielding a TimedeltaIndex | ||
from pandas import TimedeltaIndex | ||
if isinstance(other, DatetimeIndex): | ||
# require tz compat | ||
if not self._has_same_tz(other): | ||
raise TypeError("DatetimeIndex subtraction must have the same " | ||
"timezones or no timezones") | ||
raise TypeError("{cls} subtraction must have the same " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about instead of using type(self).__name__
we make a little function which can format these. I agree with @jorisvandenbossche point here, in the case of an Index type the dtype is clear, but not for a Series.
@@ -901,6 +903,11 @@ def _sub_datelike_dti(self, other): | |||
new_values[mask] = libts.iNaT | |||
return new_values.view('i8') | |||
|
|||
@Substitution(klass='DatetimeIndex') | |||
@Appender(DatetimeIndexOpsMixin.shift.__doc__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a shared doc-string instead (and same for the other things you added). this is the existing pattern.
@@ -59,23 +59,25 @@ def _td_index_cmp(opname, cls, nat_result=False): | |||
""" | |||
|
|||
def wrapper(self, other): | |||
msg = "cannot compare a TimedeltaIndex with type {0}" | |||
func = getattr(super(TimedeltaIndex, self), opname) | |||
msg = "cannot compare a {cls} with type {typ}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests which lock down these error messages? IOW the types that are supposed to appear
@@ -409,7 +412,7 @@ def _add_datelike(self, other): | |||
result = checked_add_with_arr(i8, other.value, | |||
arr_mask=self._isnan) | |||
result = self._maybe_mask_results(result, fill_value=iNaT) | |||
return DatetimeIndex(result, name=self.name, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what changed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just got indented (next line in the diff) b/c this branch is never reached except for in the indented case.
The discussion in #19558 makes me think getting PeriodArray and DatetimeArray implemented sooner rather than later will make things easier for other goals. In preparation for that, this PR goes through (most) of the methods that will need to be ported and adapts their error messages to be class-agnostic, and to use
str.format
syntax.Also fixes a couple of copy/paste typos in docstrings.