Skip to content

Latest commit

 

History

History
173 lines (121 loc) · 5.93 KB

README.md

File metadata and controls

173 lines (121 loc) · 5.93 KB

bigO

bigO automatically measures empirical computational complexity (in both time and space) of functions. To use bigO, you just need to add a @bigO.track decorator to your function together with a "length" function (the "n"). You can then run your function as usual, ideally along a wide range of inputs, and bigO will report the computational complexity of that function.

bigO accumulates its results in a file named bigO_data.json in the local directory; you can then generate a graph of time and space complexity for each tracked function by running python3 -m bigO.graph.

Demonstration

Inferring Time and Space Bounds

The file test/facts.py is a small program that demonstrates bigO.

import bigO

def fact(x: int) -> int:
  v = 1
  for i in range(x):
    v *= i
  return v

@bigO.track(lambda xs: len(xs))
def factorialize(xs: list[int]) -> list[int]:
  new_list = [fact(x) for x in xs]
  return new_list

# Exercise the function - more inputs are better!
for i in range(30):
    factorialize([i for i in range(i * 100)])

Now run the program as usual:

python3 test/facts.py

Now you can easily generate a graph of all tracked functions. Just run the following command in the same directory.

python3 -m bigO

This command creates the file bigO.pdf that contains graphs like this:

infer

Verifying Time and Space Bounds

bigO will also verify declared bounds of functions. The file test/facts_bounds.py declares factorialize as follows:

@bigO.bounds(lambda xs: len(xs),
             time="O(n*log(n))",
             mem="O(n)")
def factorialize(xs: list[int]) -> list[int]:
    ...

Running

python3 -m bigO

now creates this plot, showing that the timing data matches a worse bound than the declared bound and that the memory data matches the declared bound:

bounds

The analysis currently supports these performance models: O(1), O(log(log(n))), O(log(n)), O(log(n)**2), O(log(n)**3), O(sqrt(n)), O(n), O(n*log(n)), O(n**2), O(n**3), O(n**k), and O(2**n). It is trivial to add additional forms.

Lightweight A/B Performance Experiments

bigO also lets you run light-weight A/B performance tests. The file tests/test_ab_sort.py demonstrates this. It includes two sorting functions:

import random
import numpy as np
from bigO.bigO import ab_test

def insertion_sort(arr: np.ndarray) -> np.ndarray:
    ...

@ab_test(lambda x: len(x), alt=insertion_sort)
def quick_sort(arr: np.ndarray) -> np.ndarray:
    ...

for i in range(200):
    quick_sort(np.random.rand(random.randint(1, 100)))

The quick_sort function is annotated to indicate the bigO should compare the time and memory of that function to insertion_sort. At run time, bigO will randomly select which of the two functions to run on each call to quick_sort.

After running the program,

python3 -m bigO

compares the running times across the input size range, identifying segments where one function performs statistically significantly better than the other, as in the following, which shows smoothed performance curves, shaded regions where one function is better than the other, as well as the p-values for each segment's statistical test.

abtest

Verifying Hard Limits on Time, Space, and Input Size

bigO also lets you verify at run time that function calls do not exceed hard limits on time, memory, or input size. The file test/test_limits_insertion_sort.py demonstrates this:

@limits(len, 
        time=0.1, 
        mem=1_000_000, 
        length=1800)
def insertion_sort(arr: np.ndarray) -> np.ndarray:
    ...

After running that program

python3 -m bigO

produces the following plots, showing the histograms of those metrics, as well as the specified limits.

limits

You can also specify only one or two of the limits instead of all three.

Technical Details

Curve-fitting

bigO's curve-fitting based approach is directly inspired by "Measuring Empirical Computational Complexity" by Goldsmith et al., FSE 2007, using log-log plots to fit a power-law distribution.

Unlike that work, bigO also measures space complexity by tracking memory allocations during function execution.

In addition, bigO uses a more general curve fitting approach that can handle complexity classes that do not follow the power law, and it uses the AIC to select the best model. Further, bigO measures the statistical significance of its complexity inference results via p-values computed by the technique outlined in An Empirical Investigation of Statistical Significance in NLP" by Berg-Kirkpatrick, Burkett, and Klein, Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural 2012.

For A/B testing, bigO smooths the performance curves for the two functions, segments the input range by approximating crossover points for those curves, and then performs a standard permutation test to determine whether the different in performance between the function across that range is statistically significant. The test statistic is the area between the two curves, as approximated by numerical integration via the trapezoid rule.

A version of bigO matching the log-log approach of the paper above can be run as follows:

python3 -m bigO.graph

This command creates the file bigO.pdf that contains graphs like this:

bigO