Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Index __mul__-like ops with timedelta scalars #19333

Merged
merged 29 commits into from
Feb 22, 2018

Conversation

jbrockmendel
Copy link
Member

Fixes the following current behavior for each of the numeric index classes and each of the basic timedelta-like scalars:

idx = pd.Index(range(3))
td = pd.Timedelta(days=1)

>>> idx * td
TypeError: can only perform ops with timedelta like values
>>> td * idx
TypeError: can only perform ops with timedelta like values
>>> td / idx
TypeError: can only perform ops with timedelta like values
>>> td // idx
TypeError: can only perform ops with timedelta like values
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@@ -428,7 +428,9 @@ Conversion
- Bug in localization of a naive, datetime string in a ``Series`` constructor with a ``datetime64[ns, tz]`` dtype (:issue:`174151`)
- :func:`Timestamp.replace` will now handle Daylight Savings transitions gracefully (:issue:`18319`)
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)

- Multiplication of :class:`TimedeltaIndex` by ``TimedeltaIndex`` will now raise ``TypeError`` instead of raising ``ValueError`` in cases of length mis-match (:issue:`????`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you don't have an issue number, use the PR number

@@ -3864,7 +3864,21 @@ def dropna(self, how='any'):
return self._shallow_copy()

def _evaluate_with_timedelta_like(self, other, op, opstr, reversed=False):
raise TypeError("can only perform ops with timedelta like values")
# Timedelta knows how to operate with np.array, so dispatch to that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be left alone, and rather define this in TimedeltaIndex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for for numeric-dtyped indexes. e.g. pd.Index(range(3)) * pd.Timedelta(days=1) goes through this path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't this just raise NotImplementedError? which then Timedelta would handle? or is that too recursive a path?

Copy link
Member Author

@jbrockmendel jbrockmendel Jan 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one thing b/c there’s pytimedelta and timedelta64 that need to be handled.

@jreback jreback added Timedelta Timedelta data type Compat pandas objects compatability with Numpy or Python functions labels Jan 21, 2018
@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

I just updated the whatsnew, to be more organized, so rebase again.

@jbrockmendel
Copy link
Member Author

OK. Should I put the multiplication/division stuff in Numeric instead of Conversion? A few days back you suggested an Ops section, but I wasn't clear on what the scope of that would be.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

I would put this in datetimelike, numeric is for well, numeric types! this is sort of an arbitrary split but makes each section shorter

@@ -446,6 +447,9 @@ Numeric
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)
- Multiplication of :class:`TimedeltaIndex` by ``TimedeltaIndex`` will now raise ``TypeError`` instead of raising ``ValueError`` in cases of length mis-match (:issue`19333`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st one ok, other 2 in datetimelike section

@@ -26,6 +26,36 @@ def full_like(array, value):
return ret


class TestIndexArithmetic(object):
@pytest.mark.parametrize('index', [pd.Int64Index(range(1, 11)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number as a comment (or PR number if no issue)

class TestIndexArithmetic(object):
@pytest.mark.parametrize('index', [pd.Int64Index(range(1, 11)),
pd.UInt64Index(range(1, 11)),
pd.Float64Index(range(1, 11)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this how we are importing in this module, e.g. pd.*? usually we don't do this (its ok whichever, just want to be consistent)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I'm being inconsistent here, will change.

expected = pd.TimedeltaIndex(['1 Day', '12 Hours'])

result = scalar_td / index
tm.assert_index_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test that the commute raises here

@@ -297,7 +297,7 @@ def test_dti_mul_dti_raises(self):

def test_dti_mul_too_short_raises(self):
idx = self._holder(np.arange(5, dtype='int64'))
with pytest.raises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the problem is now caught in TimedeltaIndex when checking what it is operating against, whereas before the problem was caught in numpy when trying to operate on differently-sized arrays.

pd.UInt64Index(range(1, 3)),
pd.Float64Index(range(1, 3)),
pd.RangeIndex(1, 3)])
@pytest.mark.parametrize('scalar_td', [Timedelta(days=1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests really belong with the scalar timedelta tests I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah I thought about that and eventually decided that since the affected code was in the Index class, the test belonged in the Index tests. But either way works for me.

@@ -26,6 +26,36 @@ def full_like(array, value):
return ret


class TestIndexArithmetic(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this class its index with timedelta scalar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment. future readers are going to have no idea of where to find tests

@jbrockmendel
Copy link
Member Author

Any idea what's special about the circleCI setup? I can't replicate the error there.

@codecov
Copy link

codecov bot commented Jan 22, 2018

Codecov Report

Merging #19333 into master will decrease coverage by 0.02%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19333      +/-   ##
==========================================
- Coverage   91.61%   91.59%   -0.03%     
==========================================
  Files         150      150              
  Lines       48892    48911      +19     
==========================================
+ Hits        44792    44798       +6     
- Misses       4100     4113      +13
Flag Coverage Δ
#multiple 89.96% <95.23%> (-0.03%) ⬇️
#single 41.77% <14.28%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/range.py 95.74% <100%> (+0.09%) ⬆️
pandas/core/indexes/base.py 96.41% <92.3%> (-0.04%) ⬇️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa59954...9e3dec4. Read the comment docs.

@jbrockmendel
Copy link
Member Author

@gfyoung any wisdom to offer about what makes circleci config special?

@gfyoung
Copy link
Member

gfyoung commented Jan 24, 2018

any wisdom to offer about what makes circleci config special?

@jbrockmendel : I wish 😉 I've generally found Circle and AppVeyor to be more PITA than Travis due to weird failures like these. AFAIK, Travis and Circle are quite similar, so unless you have some kind of flakiness in your tests, I'm not 100% sure why you are getting these results.

@jbrockmendel
Copy link
Member Author

@gfyoung OK thanks. I'd rebase and re-push and hope that it sorts itself out, but don't want to put the strain on Travis.

@gfyoung
Copy link
Member

gfyoung commented Jan 24, 2018

but don't want to put the strain on Travis.

@jbrockmendel : No worries. I'm still kind of perplexed why Circle gets the completely wrong result when doing such a simple operation.

@jreback
Copy link
Contributor

jreback commented Jan 25, 2018

looks fine. ping when all green.

@jreback
Copy link
Contributor

jreback commented Feb 7, 2018

look on circleci on how to do this

@jbrockmendel
Copy link
Member Author

https://circleci.com/docs/2.0/ssh-access-jobs/

Looks like I need to send you a public key and have you authorize it for me to debug.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

you simply need to sign up on CircleCI, then you can run personal builds.

@jbrockmendel
Copy link
Member Author

@gfyoung I made it onto CircleCI but couldn't replicate the test failure. Can I impose on you to give it a try? There's a beer and a high-five in it for you if that helps.

@gfyoung
Copy link
Member

gfyoung commented Feb 14, 2018

@jbrockmendel : Can you push those to me on GitHub? 😄 I restarted both AppVeyor and Circle for you. That being said, not sure yet what's causing the discrepancy. Might be something architecturally-related (e.g. int32 vs. int64).

@jbrockmendel
Copy link
Member Author

Oh is it a 32 bit build thats broken? I hadn't noticed that.

@jreback jreback added this to the 0.23.0 milestone Feb 15, 2018
values, other = other, values

with np.errstate(all='ignore'):
result = op(values, other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments (or expand the top ones), this code does a lot

@jreback
Copy link
Contributor

jreback commented Feb 15, 2018

small comments, is there an issue about this (or just PR)? rebase and ping on green.

@jbrockmendel
Copy link
Member Author

small comments, is there an issue about this (or just PR)? rebase and ping on green.

Just the PR I think.

My attempt to SSH into CircleCI to debug the weird failing test failed. I'll try again at some point, hoping a fresh pair of eyeballs takes a crack at it in the interim.

@jbrockmendel
Copy link
Member Author

I think I got this fixed, reopening.

@jbrockmendel jbrockmendel reopened this Feb 21, 2018
@jreback
Copy link
Contributor

jreback commented Feb 21, 2018

what did it turn out to be?

@jbrockmendel
Copy link
Member Author

What did it turn out to be?

I'll open an issue for the underlying cause because it might cause problems elsewhere: in sufficiently old numpy (the build that caught it used 1.9.2) we see the following:

value = np.timedelta64(86400000000000,'ns')
new_value = int(value)
assert new_value == value  # <-- in older numpy, this does not raise

As a result RangeIndex(np.timedelta64(1, 'ns'), np.timedelta64(10, 'ns'), np.timedelta64(1, 'ns')) doesn't raise and instead returns RangeIndex(1, 10, 1).

@jreback
Copy link
Contributor

jreback commented Feb 21, 2018

we do't need a new issue for this. see my comments there. rebase and I'll have a look.

@jreback
Copy link
Contributor

jreback commented Feb 21, 2018

value = np.timedelta64(86400000000000,'ns')
new_value = int(value)
assert new_value == value  # <-- in older numpy, this does not raise

there are many places in the code base where this is handled. This is the point of needing to check for timedeta before its checked for integer.

@jbrockmendel
Copy link
Member Author

Rebase accomplished.

@jreback jreback merged commit 3ab8623 into pandas-dev:master Feb 22, 2018
@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

thanks! hey sometimes my stubborness pay off ..... thanks for sticking with it!

@jbrockmendel jbrockmendel deleted the idx_mul_tdi branch February 22, 2018 05:45
harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants