BUG: Patch Checked Add Method #14324

gfyoung · 2016-09-29T18:08:07Z

Title is self-explanatory. Follow-up to #14237.

Merged in PR #14453

codecov-io · 2016-09-29T18:59:58Z

Current coverage is 85.25% (diff: 90.47%)

Merging #14324 into master will decrease coverage by 0.01%

@@             master     #14324   diff @@
==========================================
  Files           140        140          
  Lines         50634      50645    +11   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43173      43175     +2   
- Misses         7461       7470     +9   
  Partials          0          0

Powered by Codecov. Last update d98e982...85ca49d

jorisvandenbossche · 2016-09-29T21:39:03Z

Although this is probably more correct, I don't think the performance degradation is allowable:

# master

In [1]: arr = np.arange(1000000)
   ...: td = pd.to_timedelta(arr)
   ...: ts = pd.Timestamp('2000')
   ...: 

In [2]: %timeit td + ts
100 loops, best of 3: 2.51 ms per loop

# PR

In [2]: %timeit td + ts
10 loops, best of 3: 41.1 ms per loop

The problem is in np.iinfo(np.int64).min - b > arr when b is positive (and probably in the other check of b is negative)

gfyoung · 2016-09-29T22:03:38Z

@jorisvandenbossche : I draw a hard line on correctness. This is impeding my ability to patch the other TDI overflow issues because there are tests that fail because of this bug. Perhaps it would instead be worthwhile to break the check up into two separate statements, though not sure that would help too much given that it's Python.

jreback · 2016-09-30T10:27:05Z

try this. The key to this expression is lazy evaluation as its a compound expression. numexpr does this for free.

In [5]: arr = np.arange(1000000)
   ...: td = pd.to_timedelta(arr)
   ...: ts = pd.Timestamp('2000')
   ...: a = td.view('i8')
   ...: b = ts.value
   ...: 

In [7]: nmax = np.iinfo(np.int64).max 

In [8]: nmin = np.iinfo(np.int64).min

n [11]: ((b > 0) & (np.iinfo(np.int64).max - b < arr)).any() or((b < 0) & (np.iinfo(np.int64).min - b > arr)).any()
Out[11]: False

In [12]: %timeit ((b > 0) & (np.iinfo(np.int64).max - b < arr)).any() or((b < 0) & (np.iinfo(np.int64).min - b > arr)).any()
10 loops, best of 3: 57.7 ms per loop

In [14]: %timeit pd.eval('((b > 0) & (nmax - b < arr)) | ((b < 0) & (nmin - b > arr))').any()
100 loops, best of 3: 6.24 ms per loop

In [15]: pd.eval('((b > 0) & (nmax - b < arr)) | ((b < 0) & (nmin - b > arr))').any()
Out[15]: False

jreback · 2016-09-30T10:38:19Z

numba can do even better (but we don't have this in the quiver atm)

In [17]: def f(a,b,arr,nmax,nmin):
    ...:     return np.any((b > 0) & (nmax - b < arr)) | ((b < 0) & (nmin - b > arr))

In [18]: f = numba.jit(f,nopython=True)

In [19]: %timeit f(a,b,arr,nmax,nmin).any()
The slowest run took 59.62 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.28 ms per loop

In [20]: f(a,b,arr,nmax,nmin).any()
Out[20]: False

jorisvandenbossche · 2016-09-30T10:47:31Z

Something like this is quite convoluted, but is a bit faster as the eval approach:

In [62]: %%timeit
    ...: b2 = np.broadcast_to(b, arr.shape)
    ...: mask1 = b2 > 0
    ...: mask2 = b2 < 0
    ...: if not mask1.any():
    ...:     to_raise = (np.iinfo(np.int64).min - b2 > arr).any()
    ...: elif not mask2.any():
    ...:     to_raise = (np.iinfo(np.int64).max - b2 < arr).any()
    ...: else:
    ...:     to_raise = (np.iinfo(np.int64).max - b2[mask1] < arr[mask1]).any() or (np.iinfo(np.int64).min - b2[mask1] > arr[mask1]).any()
    ...: 
100 loops, best of 3: 4.2 ms per loop

gfyoung · 2016-10-04T17:17:14Z

@jorisvandenbossche : Are you proposing I use this more convoluted method then? Thoughts?

gfyoung · 2016-10-09T04:49:55Z

@jorisvandenbossche : Some feedback on my question above would be greatly appreciated.

jreback · 2016-10-09T13:58:58Z

@gfyoung yes pls use one of the methods proposed above

gfyoung · 2016-10-09T21:07:13Z

@jorisvandenbossche : Your solution is unfortunately not backwards compatible with older versions of numpy (e.g. 1.7.1) that don't have np.broadcast_to. Let me see if np.broadcast_arrays could work. If not, I'll have to go with @jreback suggestion.

jreback · 2016-10-09T21:11:03Z

so put a conditional on the numpy version (or use the eval soln)

gfyoung · 2016-10-09T21:11:54Z

@jreback : I thought we don't do those kind of checks (or try to avoid them). In any case, I'm trying with another method as I mentioned above. IMO this isn't something THAT important that we need to check numpy versions.

jreback · 2016-10-09T21:16:04Z

i don't see a reason not to do a check in the numpy version

or use the eval soln (which is fine but will fall back to numpy if numexpr is not installed)

gfyoung · 2016-10-09T21:17:11Z

@jreback : Fair enough. Also, I have no objections to your eval solution. I'm trying it right now, as np.broadcast_arrays (the alternative to np.broadcast_to) doesn't meet the bill.

jorisvandenbossche · 2016-10-09T21:33:59Z

The eval solution will be very slow if you don't have numexpr installed, so I would prefer not to use it.

gfyoung · 2016-10-09T21:37:52Z

@jorisvandenbossche : Unfortunately, your solution is not backwards compatible. And if you don't like @jreback suggestion, then what do you suggest? Compared to my original change, the eval solution would at least be an improvement.

jreback · 2016-10-09T21:39:04Z

just check the numpy version

i am not sure why you think this is an issue

gfyoung · 2016-10-09T21:40:32Z

@jreback : And then what, use the slower solution that I used before if the version is not compatible? Would it be preferable to fallback to my original solution or use the eval one you used?

jreback · 2016-10-09T21:41:15Z

yep

…projects (pandas-dev#14406) As per [their blog post of the 27th April](https://blog.readthedocs.com/securing-subdomains/) ‘Securing subdomains’: > Starting today, Read the Docs will start hosting projects from subdomains on the domain readthedocs.io, instead of on readthedocs.org. This change addresses some security concerns around site cookies while hosting user generated data on the same domain as our dashboard. Test Plan: Manually visited all the links I’ve modified.

jorisvandenbossche · 2016-10-13T09:24:55Z

This looks good to me.
@gfyoung final question: can you run the timedelta benchmark (test_add_td_ts) and compare this to 0.18 as well?

jreback · 2016-10-13T10:09:53Z

lgtm

closes pandas-dev#14402 closes pandas-dev#14405

gfyoung · 2016-10-13T14:03:18Z

@jorisvandenbossche : Using the same setup as above, 100 loops, best of 3: 3.08 ms per loop

jorisvandenbossche · 2016-10-13T14:06:46Z

And compared to 0.18? (on my machine I get 1.15 ms, so probably something of 3x slowdown)

gfyoung · 2016-10-13T14:18:16Z

@jorisvandenbossche : That timing information is for v0.18.0.

jorisvandenbossche · 2016-10-13T14:54:49Z

@gfyoung That is a bit strange, because then you have 4.06 for this PR and 3.08 for 0.18. I get 5.11 vs 1.11, so a much larger difference.

Anyway, it's a rather large performance degredation, but we decided to go down this path.

closes pandas-dev#14391 closes pandas-dev#14390

…x (GH14327) (pandas-dev#14428) Existing logic under "if level is not None:" assumed that index was a MultiIndex. Now we check and also handle the case where an Index is passed in with a None grouper. This resolves GH 14327

Deprecated back in `v0.17.0` <a href="https://github.com/pydata/pandas /commit/987b7e7e586b8df1d127406c69e0a9094a1a5322">here</a>. Seems appropriate to carry though now. Author: gfyoung <[email protected]> Closes pandas-dev#13602 from gfyoung/remove-coerce-to-timedelta and squashes the following commits: 091af83 [gfyoung] CLN: Removed coerce param in pd.to_timedelta

closes pandas-dev#14369 Author: Brandon M. Burroughs <[email protected]> Closes pandas-dev#14416 from brandonmburroughs/concat_with_axis_rows and squashes the following commits: 9aa260d [Brandon M. Burroughs] Changing axis handling to depend on object passed 49442be [Brandon M. Burroughs] Using dataframe _get_axis_number instance method 64702fb [Brandon M. Burroughs] Updating documentation for concat fdd5260 [Brandon M. Burroughs] Removing duplicate expected dfs 3f08b07 [Brandon M. Burroughs] Adding concat tests for axis 0, 1, and 'index' cf3f998 [Brandon M. Burroughs] Adding ValueError test for concat Series axis 'columns' a6694b9 [Brandon M. Burroughs] Updating documentation 584ebd2 [Brandon M. Burroughs] BUG: Allow concat to take string axis names

closes pandas-dev#14408

1) Add checks to ensure that add overflow does not occur both in the positive or negative directions. 2) Add benchmarks to ensure that operations involving this checked add function are significantly impacted.

jreback · 2016-10-19T12:33:54Z

@gfyoung you may need to close this and open a new PR. It seems that people are unable to rebase and push correctly to the original pandas repo. (further appevey was disconnected) so might be a good test.

gfyoung · 2016-10-19T16:11:15Z

@jreback : Ah, I can see that now. Will close and repost.

jorisvandenbossche · 2016-10-19T21:24:18Z

@jreback BTW, I contacted github about it, and it should be solved now (was an error on their side). So the closing/open new PR should no longer be needed for all older PRs.

jreback · 2016-10-19T23:24:55Z

@jorisvandenbossche good to know! thanks!

gfyoung · 2016-10-19T23:25:58Z

A little to late now but oh well. 😄 New PR in #14453.

jorisvandenbossche added Bug Timedelta Timedelta data type labels Sep 29, 2016

jreback mentioned this pull request Sep 30, 2016

require numexpr / numba wesm/pandas2#37

Open

gfyoung force-pushed the add-overflow-patch branch 2 times, most recently from 9d31ec7 to ad4cd85 Compare October 2, 2016 20:19

gfyoung force-pushed the add-overflow-patch branch 2 times, most recently from c395af3 to 760b1bb Compare October 7, 2016 21:21

gfyoung force-pushed the add-overflow-patch branch 3 times, most recently from d0563a7 to cf4da12 Compare October 9, 2016 20:25

gfyoung force-pushed the add-overflow-patch branch from cf4da12 to 08b6395 Compare October 9, 2016 21:30

gfyoung force-pushed the add-overflow-patch branch from 896c9f1 to ba515fe Compare October 13, 2016 04:39

jreback added this to the 0.19.1 milestone Oct 13, 2016

BUG: Bug in localizing an ambiguous timezone when a boolean is passed

a40e185

closes pandas-dev#14402 closes pandas-dev#14405

gfyoung force-pushed the add-overflow-patch branch from ba515fe to ee1653b Compare October 13, 2016 14:16

DOC: pydata/pandas -> pandas-dev/pandas (pandas-dev#14409)

7d40f18

gfyoung force-pushed the add-overflow-patch branch from ee1653b to de31e6a Compare October 14, 2016 17:17

Josh Owen and others added 5 commits October 15, 2016 09:37

BUG: fix json with lines=True for quoted special characters

286b9b9

closes pandas-dev#14391 closes pandas-dev#14390

PERF: Period factorization (pandas-dev#14419)

6a3755b

gfyoung force-pushed the add-overflow-patch branch from de31e6a to 94076aa Compare October 15, 2016 20:00

jreback and others added 2 commits October 15, 2016 16:02

PERF: improved perf in .to_json when lines=True

7cad3f1

closes pandas-dev#14408

BUG: Catch overflow in both directions for checked add

85ca49d

1) Add checks to ensure that add overflow does not occur both in the positive or negative directions. 2) Add benchmarks to ensure that operations involving this checked add function are significantly impacted.

gfyoung force-pushed the add-overflow-patch branch from 94076aa to 85ca49d Compare October 16, 2016 19:30

gfyoung closed this Oct 19, 2016

gfyoung deleted the add-overflow-patch branch October 19, 2016 16:12

gfyoung mentioned this pull request Oct 19, 2016

BUG: Catch overflow in both directions for checked add #14453

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Patch Checked Add Method #14324

BUG: Patch Checked Add Method #14324

gfyoung commented Sep 29, 2016 •

edited by jorisvandenbossche

Loading

codecov-io commented Sep 29, 2016 •

edited

Loading

jorisvandenbossche commented Sep 29, 2016

gfyoung commented Sep 29, 2016

jreback commented Sep 30, 2016 •

edited

Loading

jreback commented Sep 30, 2016

jorisvandenbossche commented Sep 30, 2016

gfyoung commented Oct 4, 2016

gfyoung commented Oct 9, 2016

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 •

edited

Loading

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 •

edited

Loading

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 •

edited

Loading

jorisvandenbossche commented Oct 9, 2016

gfyoung commented Oct 9, 2016

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 •

edited

Loading

jreback commented Oct 9, 2016

jorisvandenbossche commented Oct 13, 2016

jreback commented Oct 13, 2016

gfyoung commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

gfyoung commented Oct 13, 2016 •

edited

Loading

jorisvandenbossche commented Oct 13, 2016

jreback commented Oct 19, 2016

gfyoung commented Oct 19, 2016

jorisvandenbossche commented Oct 19, 2016

jreback commented Oct 19, 2016

gfyoung commented Oct 19, 2016

BUG: Patch Checked Add Method #14324

BUG: Patch Checked Add Method #14324

Conversation

gfyoung commented Sep 29, 2016 • edited by jorisvandenbossche Loading

codecov-io commented Sep 29, 2016 • edited Loading

Current coverage is 85.25% (diff: 90.47%)

jorisvandenbossche commented Sep 29, 2016

gfyoung commented Sep 29, 2016

jreback commented Sep 30, 2016 • edited Loading

jreback commented Sep 30, 2016

jorisvandenbossche commented Sep 30, 2016

gfyoung commented Oct 4, 2016

gfyoung commented Oct 9, 2016

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 • edited Loading

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 • edited Loading

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 • edited Loading

jorisvandenbossche commented Oct 9, 2016

gfyoung commented Oct 9, 2016

jreback commented Oct 9, 2016

gfyoung commented Oct 9, 2016 • edited Loading

jreback commented Oct 9, 2016

jorisvandenbossche commented Oct 13, 2016

jreback commented Oct 13, 2016

gfyoung commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

gfyoung commented Oct 13, 2016 • edited Loading

jorisvandenbossche commented Oct 13, 2016

jreback commented Oct 19, 2016

gfyoung commented Oct 19, 2016

jorisvandenbossche commented Oct 19, 2016

jreback commented Oct 19, 2016

gfyoung commented Oct 19, 2016

gfyoung commented Sep 29, 2016 •

edited by jorisvandenbossche

Loading

codecov-io commented Sep 29, 2016 •

edited

Loading

jreback commented Sep 30, 2016 •

edited

Loading

gfyoung commented Oct 9, 2016 •

edited

Loading

gfyoung commented Oct 9, 2016 •

edited

Loading

gfyoung commented Oct 9, 2016 •

edited

Loading

gfyoung commented Oct 9, 2016 •

edited

Loading

gfyoung commented Oct 13, 2016 •

edited

Loading