Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: have series/panel arithmetic operators use expressions (numexpr) #3765

Closed
jreback opened this issue Jun 5, 2013 · 31 comments · Fixed by #5022
Closed

PERF: have series/panel arithmetic operators use expressions (numexpr) #3765

jreback opened this issue Jun 5, 2013 · 31 comments · Fixed by #5022
Labels
Performance Memory or execution speed performance
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jun 5, 2013

No description provided.

@jtratner
Copy link
Contributor

jtratner commented Jun 9, 2013

I have this done for Series and implemented for scalar arithmetic in Panel. (less clear if it's possible / useful to accelerate combination operations with numexpr). @cpcloud @jreback could one of you write up how to create a Panel with >10K items with integer, floats, mixed integer and integers with zeros?

Basically, like these (which are what I'm using for test cases in test_expressions.py currently), but using Panel instead (don't need multiple, just one version each would be fine):

_frame  = DataFrame(np.random.randn(10000, 4), columns = list('ABCD'), dtype='float64')
_frame2 = DataFrame(np.random.randn(100, 4),   columns = list('ABCD'), dtype='float64')
_mixed  = DataFrame({ 'A' : _frame['A'].copy(), 'B' : _frame['B'].astype('float32'), 'C' : _frame['C'].astype('int64'), 'D' : _frame['D'].astype('int32') })
_mixed2 = DataFrame({ 'A' : _frame2['A'].copy(), 'B' : _frame2['B'].astype('float32'), 'C' : _frame2['C'].astype('int64'), 'D' : _frame2['D'].astype('int32') })
_integer  = DataFrame(np.random.randint(1, 100, size=(10001, 4)), columns = list('ABCD'), dtype='int64')

@jreback
Copy link
Contributor Author

jreback commented Jun 9, 2013

also examples in
pandas.utils.testing.makePanel

In [17]: _frame2 = DataFrame(np.random.randn(100, 4),   columns = list('ABCD'), dtype='float64')

In [18]: df = DataFrame({ 'A' : _frame2['A'].copy(), 'B' : _frame2['B'].astype('float32'), 'C' : _frame2['C'].astype('int64'), 'D' : _frame2['D'].astype('int32') })

In [19]: Panel(dict(ItemA = df, ItemB = (df+1).astype('int64')))
Out[19]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 100 (major_axis) x 4 (minor_axis)
Items axis: ItemA to ItemB
Major_axis axis: 0 to 99
Minor_axis axis: A to D

@jtratner
Copy link
Contributor

jtratner commented Jun 9, 2013

@cpcloud FYI: numexpr isn't working with // right now (at least with our containers), produces the following exception within _evaluate_numexpr:

    Exception: TypeError("unsupported operand type(s) for //: 'VariableNode' and 'VariableNode'",)

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2013

Yup check out my list on the eval thread I think I note it there. Not sure what to do there. What's weird is that they support scalar floor division

@jtratner
Copy link
Contributor

jtratner commented Jun 9, 2013

@cpcloud maybe we could special case it to do true division and then convert it to int?

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2013

Yeah that should be fine.

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2013

Hm but let me think about it cuz for a complex expression how to cast without performing the op?

@jtratner
Copy link
Contributor

jtratner commented Jun 9, 2013

I setup series to use numexpr. Is this speed increase at all useful? Unclear how good it actually would be...

In [10]: ser = Series(np.random.randn(1000000))

In [11]: ser2 = Series(np.random.randn(1000000))

In [12]: %%timeit
   ....: ser * ser2
   ....:
100 loops, best of 3: 6.87 ms per loop

In [13]: expr.set_use_numexpr(True)

In [14]: %%timeit
   ....: ser * ser2
   ....:
100 loops, best of 3: 6.76 ms per loop

In [15]: %%timeit
   ....: ser / ser2
   ....:
100 loops, best of 3: 7.89 ms per loop

In [16]: expr.set_use_numexpr(False)

In [17]: %%timeit
   ....: ser / ser2
   ....:
100 loops, best of 3: 10.4 ms per loop

@jreback
Copy link
Contributor Author

jreback commented Jun 9, 2013

In reality these are very similar to numpy operations and they prob cannot be parallelized, which is where a large part fo the speedups come from; doing these same operations on frames yields pretty big speedups (and even better when we have eval, which will enable multiple operates to be sent to ne)

@jtratner
Copy link
Contributor

jtratner commented Jun 9, 2013

It just occurred to me that I'm using a 2 core Macbook Air, so I might not notice as much of a difference or I'm not using the right test cases to see the difference. (frame appears to be accelerated by about 30%, huge panel very little and series very little)

@jreback
Copy link
Contributor Author

jreback commented Jun 9, 2013

build the eval branch that @cpcloud is working on

that's interesting because the more things u have in a term the faster it gets

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2013

@jtratner just fyi as of now only scalar arith ops with series/frame and already-aligned frames and series arith ops work as well...

@cpcloud
Copy link
Member

cpcloud commented Jun 9, 2013

it's also fun to make a huge frame and run htop to see all of my cores being used for ops :)

@jtratner
Copy link
Contributor

hey, do you have any instructions on how to set up vbench/vb_suite? Not clear and I'd like to help write cases and/or check out performance after changes.

@jreback
Copy link
Contributor Author

jreback commented Jun 12, 2013

see this #3156

@jtratner
Copy link
Contributor

@jreback I worked something up and would be interested in yoru thoughts on it. After I started with this, I realized that I was duplicating all the code to create the special arithmetic methods (i.e. __*__) and flex methods (e.g., pow, mul, etc.) so I pushed them all into two classmethods pandas/core/generic.py (replicating how Panel handled aggregated methods). It definitely uses some magic to get it done and I'm not sure whether it adds more complexity than it's worth. It shouldn't add any extra cost performance-wise, since it still binds to the class at compile/import time. See jtratner/pandas@899576d

Upsides:

  • Less places to update arithmetic methods (and therefore less chance of errors like in BUG: Fix __truediv__ numexpr error #3764 .
  • Creates a consistent arithmetic method interface-ish thing (previously I think Series and Panel were missing some of the operators).
  • Doesn't change existing behavior except to speed up certain operations
  • Allows SparsePanel to opt out of numexpr
  • Doesn't override methods defined in immediate superclass.
  • Doesn't add an extra layer of function calls...ultimately, everything is bound nearly the same way in the derived classes.
  • Points the way to more unification of arithmetic methods between Series, DataFrame, and Panel.

Downsides:

  • Adds more indirection, making it slightly harder to follow the code.
  • More difficult / impossible for static code analyzers to determine all the methods are defined on them (potentially could be remedied by putting stub methods on NDFrame. That said, this issue must crop up elsewhere with all the Appender usage on docstrings, right?

@jreback
Copy link
Contributor Author

jreback commented Jun 15, 2013

I like it!

your downside points are correct but clearly IMHO outweighed by
less code and less chance of errors when things get modified

I am of the camp that you should only have code to do something in one place
but sometimes I jump thru hoops with subclass/magic to make it happen

great job!

can merge first thing in 0.12

helps consolidation - always a good thing

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

this is nice! great to have the behavior propagate down into the pandas object hierarchy with minimal fuss

@jtratner
Copy link
Contributor

@jreback @cpcloud thanks! :) now I just need to add a testing mechanism to numexpr to check that this actually uses numexpr everywhere I'm claiming it does (going to follow @jreback's suggestions above). Right now it just checks that the numexpr speedups do not change any of the existing behavior on calculations and that it doesn't mess up default axes on DataFrame, etc.

@jreback
Copy link
Contributor Author

jreback commented Jun 15, 2013

@jtratner as an aside - my heuristic of when to use ne may be too conservative
u may want to try a couple of diff values to see where it helps vs it's overhead

I am talkin about the 20k num of elements here (I forgot what number is actually there)

@jtratner
Copy link
Contributor

@cpcloud @jreback Blah, Travis is going to fail right now because of a URL error in unrelated package [this time it's a yahoo test] :-/ .

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

number is 10k

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

what's with those anyway i keep getting google/yahoo failures

@jtratner
Copy link
Contributor

Maybe Travis' IPs are getting flagged as spamming because of all the builds?

On Fri, Jun 14, 2013 at 9:18 PM, Phillip Cloud [email protected]:

what's with those anyway i keep getting google/yahoo failures


Reply to this email directly or view it on GitHubhttps://github.com//issues/3765#issuecomment-19489118
.

@jreback
Copy link
Contributor Author

jreback commented Jun 15, 2013

I thought there was a fix somewhere?

need a decorator to skip a network test that fails because of a urlibopen failure (eg connection refused)
they exist in some tests in test_yhoo/ google but should be on all

@cpcloud
Copy link
Member

cpcloud commented Jun 15, 2013

anyway let me not derail the conversation here

@jtratner
Copy link
Contributor

@jreback @cpcloud this is 90% of the way done, I just have to get a test suite working for testing that evaluate actually successfully evaluated an expression. The only problem that I have with this is that it kills the ability for static code analyzers to note that add, div, mul, pow, etc. are methods and have real docstrings. Have you encountered this elsewhere? Do you have any suggestions on handling this? (or maybe it's not worth dealing with?) [btw - it's here jtratner/pandas@899576d

@cpcloud
Copy link
Member

cpcloud commented Jun 20, 2013

personally, i never use that feature of static code analyzers. i usually just want them to tell me, e.g., "hey you forgot to assign a variable here, dummy!" so i would say it's not worth it...expressions in pandas are a bit different anyway so the best way to get information about them short of reading the code, is in the online documentation.

@jtratner
Copy link
Contributor

@cpcloud sorry, I was conflating two things. I mean that, in the original code, the static analyzer could at least tell you that the add, div, and mul methods existed.


separately - I'm trying to add some kind of testing functionality to core/expressions.py so that you can actually assert "Hey, this test case actually used numexpr - yay!". As it stands now, it's not really possible to tell if numexpr was successfully used or not. (except potentially by performance time)

@jreback
Copy link
Contributor Author

jreback commented Sep 20, 2013

@cpcloud @jtratner I think eval fixes this...so close?

@cpcloud
Copy link
Member

cpcloud commented Sep 20, 2013

i think this will be closed by @jtratner's arith op refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment