Today we'll be going over how to optimise code in Python, and will end with a real-world use case.
In general, optimising code is useful, and is often much easier than modifying the hardware something runs on. There are a few considerations to take into account when optimising, though:
- First, is optimisation necessary
- If it takes 200ms and that is fast enough, optimisation effort is largely wasted
- If so, what parts of the code should you optimise
- If two parts, A & B make up the code. If part B takes 90% of the time, is optimising part A really that worth it?
- Finally, how do you go about optimising it?
Once we decide we do need to optimise something, what code should we change or specifically optimise?
- We can find this out by profiling our code, to see which parts or functions take the most amount of time.
- For Python code, we can make use of two tools,
cProfile
andsnakeviz
. To installsnakeviz
, runpip install snakeviz
.- Then, to create a profile, run it using
python -m cProfile -o temp.dat /path/to/my/script.py
- And view it by doing
snakeviz temp.dat
- This gives you a nice visual indication of what is taking the most time, and where it would be fruitful to optimise first.
- Then, to create a profile, run it using
In Python, the following snippet can be used to measure the time it takes to run some code:
from timeit import default_timer as tmr # import
start_time = tmr()
... # code to time goes here
end_time = tmr()
duration_in_seconds = end_time - start_time
- You can also use
timeit
to run the same snippet multiple times to get a more accurate measurement of how long it takes. - In Jupyter Notebooks, using
%%timeit
at the top of a cell will you how long it took to run.
Now, once you have identified what code is slow, how do you actually make it faster?
- Analyse and improve the complexity of the code
- In Python, try to replace (slow!) loops with e.g. numpy vectorised operations.
- Most of the work is done in
C
.
- Most of the work is done in
- Caching
- Bandwidth vs latency
- Are you wasting work (i.e. recomputing things) that could be saved by
- pre-computing
- memoising the previous results
- Searching for names in a list
- Why Python loops are not the best
- Generating Cities in Minecraft
def v1_baseline_search_for_names_in_list(queries: List[str], names_to_search_in: List[str]) -> List[int]:
list_of_answers = []
for q in queries:from
for i, n in enumerate(names_to_search_in):
if q == n:
list_of_answers.append(i)
break
else:
list_of_answers.append(-1)
return list_of_answers
The main issue is that the loop searches through the entire names_to_search_in
list for each element in queries
! This is effectively quadratic complexity.
- Solution: Precompute the index for every element, so that we can get the index in constant time.
- This uses a hashmap, or dictionary in python
def get_dictionary(queries: List[str], names_to_search_in: List[str]) -> Dict[str, int]:
return {n: i for i, n in enumerate(names_to_search_in)}
def v2_precompute(queries: List[str], names_to_search_in: List[str]) -> List[int]:
dictionary_of_value_to_index = get_dictionary(queries, names_to_search_in)
list_of_answers = []
for q in queries:
list_of_answers.append(dictionary_of_value_to_index.get(q, -1))
return list_of_answers
- Much faster!
- The dot product between two lists
a
andb
- Naive:
sum([i * j for i, j in zip(a, b)])
- Numpy:
np.dot(a, b)
- Naive:
- Times:
- Naive:
~250ms
- Numpy:
~1.2ms
- Naive:
- A Massive increase in speed!
- This can be impactful when dealing with any type of numerical data, e.g. images: Avoid loops in Python and rely on vectorisation.
Here, the main slow part is the run_compositional.py
script.
- Seems like the
clear_all
function takes around 20 seconds - The main other thing is the
'SegregatedCall
function from grpc.
One fix is to batch the calls to the grpc code better -- rather send every 10000 blocks instead of every 100 blocks.
Line 50:
if len(blocks_list) >= 100:
Change to
if len(blocks_list) >= 10000:
- Massive speedup!
- Similarly, line 20
L = 100
toL = 500
batches these calls more.
- Code Optimisation can be very useful, and could save you a lot of time.
- It is important to know if optimisation is even worth it though
- In some cases it very much is, in others it may just be wasting time.