-
Notifications
You must be signed in to change notification settings - Fork 1
/
2015-09-28-python-performance.html
196 lines (169 loc) · 10.3 KB
/
2015-09-28-python-performance.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="alternate"
type="application/rss+xml"
href="https://bastibe.de/rss.xml"
title="RSS feed for https://bastibe.de/">
<title>Python Numeric Performance</title>
<meta name="author" content="Bastian Bechtold">
<meta name="referrer" content="no-referrer">
<link href= "static/style.css" rel="stylesheet" type="text/css" />
<link rel="icon" href="static/favicon.ico">
<link rel="apple-touch-icon-precomposed" href="static/favicon-152.png">
<link rel="msapplication-TitleImage" href="static/favicon-144.png">
<link rel="msapplication-TitleColor" href="#0141ff">
<script src="static/katex.min.js"></script>
<script src="static/auto-render.min.js"></script>
<script src="static/lightbox.js"></script>
<link rel="stylesheet" href="static/katex.min.css">
<script>document.addEventListener("DOMContentLoaded", function() { renderMathInElement(document.body); });</script>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8">
<meta name="viewport" content="initial-scale=1,width=device-width,minimum-scale=1"></head>
<body>
<div id="preamble" class="status"><div class="header">
<a href="https://bastibe.de">Basti's Scratchpad on the Internet</a>
<div class="sitelinks">
<a href="https://github.com/bastibe">Github</a> | <a href="https://bastibe.de/projects.html">Projects</a> | <a href="https://bastibe.de/uses.html">Uses</a> | <a href="https://bastibe.de/reviews.html">Reviews</a> | <a href="https://bastibe.de/about.html">About</a>
</div>
</div></div>
<div id="content">
<div class="post-date">28 Sep 2015</div><h1 class="post-title"><a href="https://bastibe.de/2015-09-28-python-performance.html">Python Numeric Performance</a></h1>
<p>
Recently, I was working on a dynamic programming algorithm that involves a lot of number crunching in nested loops. The algorithm looks like this:
</p>
<div class="org-src-container">
<pre class="src src-python">def y_change_probability_python(oct_per_sec):
""" ... """
b = 1.781
mu = -0.301
return 1/(2*b)*math.exp(-abs(oct_per_sec-mu)/b)
def y_idx_range_python(y_idx, max_y_factor, height):
""" ... """
y = (y_idx/height)*(max_y-min_y)+min_y
y_lo = max(y/max_y_factor, min_y)
y_hi = min(y*max_y_factor, max_y)
y_lo_idx = int((y_lo-min_y)/(max_y-min_y)*height)
y_hi_idx = int((y_hi-min_y)/(max_y-min_y)*height)
return y_lo_idx, y_hi_idx
def find_tracks_python(cost_matrix, delta_x):
""" ... """
tracks = np.zeros(correlogram.shape, dtype=np.int64)
cum_cost = np.zeros(correlogram.shape)
max_y_factor = (2**5)**(delta_x)
height = correlogram.shape[1]
probabilities = np.empty((height, height))
for y_idx in range(height):
y = (y_idx/height)*(max_y-min_y)+min_y
for y_pre_idx in range(*y_idx_range_numba(y_idx, max_y_factor, height)):
y_pre = (y_pre_idx/height)*(max_y-min_y)+min_y
doubles_per_x = math.log2((y/y_pre)**(1/delta_x))
probabilities[y_idx, y_pre_idx] = y_change_probability_numba(doubles_per_x)
for x_idx, cost_column in enumerate(cost_matrix):
if x_idx == 0:
cum_cost[x_idx] = cost_column
continue
for y_idx, cost in enumerate(cost_column):
for y_pre_idx in range(*y_idx_range_numba(y_idx, max_y_factor, height)):
weighted_cum_cost = cum_cost[x_idx-1, y_pre_idx] + cost*probabilities[y_idx, y_pre_idx]
if weighted_cum_cost > cum_cost[x_idx, y_idx]:
cum_cost[x_idx, y_idx] = weighted_cum_cost
tracks[x_idx, y_idx] = y_pre_idx
cum_cost[x_idx, y_idx] = cum_cost[x_idx-1, tracks[x_idx, y_idx]] + cost
return tracks, cum_cost
</pre>
</div>
<p>
I'm not going into the details of what this algorithm does, but note that it iterates over every column and row of the matrix <code>cost_matrix</code>, and then iterates over another range <code>previous_y_range</code> for each of the values in <code>cost_matrix</code>. On the way, it does a lot of basic arithmetic and some algebra.
</p>
<p>
The problem is, this is very slow. For a \(90 \times 200\) <code>cost_matrix</code>, this takes about 260 ms. Lots of loops? Lots of simple mathematics? Slow? That sounds like a perfect match for <a href="http://www.numpy.org/">Numpy</a>!
</p>
<p>
If you can express your code in terms of linear algebra, Numpy will execute them in highly-optimized C code. The problem is, translating loops into linear algebra is not always easy. In this case, it took some effort:
</p>
<div class="org-src-container">
<pre class="src src-python">def y_change_probability_numpy(doubles_per_x):
""" ... """
b = 1.781
mu = -0.301
return 1/(2*b)*np.exp(-np.abs(doubles_per_x-mu)/b)
def find_frequency_tracks_numpy(cost_matrix, delta_x):
""" ... """
tracks = np.zeros(cost_matrix.shape, dtype=np.int)
cum_cost = np.zeros(cost_matrix.shape)
max_y_factor = (2**5)**(delta_t) # allow at most 5 octaves per second (3 sigma)
height = cost_matrix.shape[1]
# pre-allocate probabilities matrix as minus infinity. This matrix
# will be sparsely filled with positive probability values, and
# empty values will have minus infinite probability.
probabilities = -np.ones((height, height))*np.inf
for y_idx in range(probabilities.shape[0]):
y = (y_idx/height)*(max_y-min_y)+min_y
y_pre_idx = np.arange(int((max(y/max_y_factor, min_y)-min_y)/(max_y-min_y)*height),
int((min(y*max_y_factor, max_y)-min_y)/(max_y-min_y)*height))
y_pre = (y_pre_idx/height)*(max_y-min_y)+min_y
doubles_per_x = np.log2((y/y_pre)**(1/delta_x))
probabilities[y_idx, y_pre_idx] = y_change_probability(doubles_per_x)
cum_cost[0] = cost_matrix[0]
for x_idx in range(1, len(cost_matrix)):
cost_column = cost_matrix[x_idx:x_idx+1] # extract cost_column as 2d-vector!
weighted_cum_cost = cum_cost[x_idx-1] + cost_column.T*probabilities
tracks[x_idx] = np.argmax(weighted_cum_cost, axis=1)
cum_cost[x_idx] = cum_cost[x_idx-1, tracks[x_idx]] + cost_column
return tracks, cum_corrs
</pre>
</div>
<p>
This code does not look much like the original, but calculates exactly the same thing. This takes about 15 ms for a \(90 \times 200\) <code>cost_matrix</code>, which is about 17 times faster than the original code! Yay Numpy! And furthermore, this code is arguably more readable than the original, since it is written at a higher level of abstraction.
</p>
<p>
Another avenue for performance optimization is <a href="http://numba.pydata.org/">Numba</a>. Numba applies dark and powerful magic to compile humble Python functions into blazingly fast machine code. It is proper magic, if you ask me. Simply add an innocuous little decorator to your functions, and let Numba do it's thing. If all goes well, your code will work just as before, except with unheard-of performance:
</p>
<div class="org-src-container">
<pre class="src src-python">@jit(numba.float64(numba.float64), nopython=True)
def y_change_probability_numba(doubles_per_x):
...
@jit((numba.int64, numba.float64, numba.int64), nopython=True)
def y_idx_range_numba(y_idx, max_y_factor, height):
...
@jit((numba.float64[:,:], numba.float64), nopython=True)
def find_tracks_numba(cost_matrix, delta_t):
...
</pre>
</div>
<p>
However, Numba is no silver bullet, and does not support all of Python yet. In the present case, it is missing support for <code>enumerate</code> for Numpy matrices. Thus, I had to rewrite the first two loops like this:
</p>
<div class="org-src-container">
<pre class="src src-python"># python version
for x_idx, cost_column in enumerate(cost_matrix):
...
# numba version
for x_idx in range(len(cost_matrix)):
cost_column = cost_matrix[x_idx]
...
</pre>
</div>
<p>
Another area that proved problematic is N-D slice writing. Instead of using expressions like <code>m1[x,y:y+3] = m2</code>, you have to write <code>for idx in range(3): m1[x,y+idx] = m2[idx]</code>. Not a difficult transformation, but it basically forced me to unroll all the nice vectorized code of the Numpy version back to their original pure-Python form. That said, Numba is getting better and better, and many constructs that used to be uncompilable (<code>yield</code>) are not a problem any more.
</p>
<p>
Anyway, with that done, the above code went down from 260 ms to 2.2 ms. This is a 120-fold increase in performance, and still seven times faster than Numpy, with minimal code changes. This is proper magic!
</p>
<p>
So why wouldn't you just always use Numba? After all, when it comes down to raw performance, Numba is the clear winner. The big difference between performance optimization using Numpy and Numba is that properly vectorizing your code for Numpy often reveals simplifications and abstractions that make it easier to reason about your code. Numpy forces you to think in terms of vectors, matrices, and linear algebra, and this often makes your code <i>more beautiful</i>. Numba on the other hand often requires you to make your code <i>less beautiful</i> to conform to it's subset of compilable Python.
</p>
<div class="taglist"><a href="https://bastibe.de/tags.html">Tags</a>: <a href="https://bastibe.de/tag-python.html">python</a> </div>
<div id="comments"><script async src="https://talk.hyvor.com/embed/embed.js" type="module"></script>
<hyvor-talk-comments id="hyvorcomments" website-id="3390" page-id=""></hyvor-talk-comments>
<script type="text/javascript">
document.getElementById("hyvorcomments").setAttribute("page-id", location.pathname);
</script></div></div>
<div id="postamble" class="status"><div id="archive">
<a href="https://bastibe.de/archive.html">Other posts</a>
</div>
<center><a rel="license" href="https://creativecommons.org/licenses/by-sa/3.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/3.0/88x31.png" /></a><br /><span xmlns:dct="https://purl.org/dc/terms/" href="https://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">bastibe.de</span> by <a xmlns:cc="https://creativecommons.org/ns#" href="https://bastibe.de" property="cc:attributionName" rel="cc:attributionURL">Bastian Bechtold</a> is licensed under a <a rel="license" href="https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.</center></div>
</body>
</html>