Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for chagnes since 1.2.2 #150

Merged
merged 9 commits into from
Dec 1, 2024
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,13 @@ julia> @b rand(1000) hash # How long does it take to hash that array?

julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise?
172.970 ns (3 allocs: 7.875 KiB)
```

[Why Chairmarks?](https://Chairmarks.lilithhafner.com/stable/why)
julia> @b rand(100,100) inv,_^2,sum # Is it be faster to invert, square, or sum a matrix? [THIS USAGE IS EXPERIMENTAL]
(92.917 μs (9 allocs: 129.203 KiB), 27.166 μs (3 allocs: 78.203 KiB), 1.083 μs)
```

[Tutorial](https://Chairmarks.lilithhafner.com/stable/tutorial)

[Why Chairmarks?](https://Chairmarks.lilithhafner.com/stable/why)

[API Reference](https://Chairmarks.lilithhafner.com/stable/reference)
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Chairmarks = "0ca39b1e-fe0b-4e98-acfc-b1656634c4de"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365"
9 changes: 9 additions & 0 deletions docs/src/explanations.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,15 @@ stops respecting the requested runtime budget and so it could very well perform
precisely than Chairmarks (it's hard to compete with a 500ms benchmark when you only have
1ms). In practice, however, Chairmarks stays pretty reliable even for fairly low runtimes.

When comparing different implementations of the same function, `@b rand f,g` can be more reliable
than `judge(minimum(@benchmark(f(x) setup=(x=rand()))), minimum(@benchmark(g(x) setup=(x=rand())))`
because the former randomly interleaves calls to `f` and `g` in the same context and scope
with the same inputs while the latter runs all evaluations of `f` before all evaluations of
`g` and—typically less importantly—uses different random inputs.

!!! warning
Comparative benchmarking is experimental and may be removed or changed in future versions

## How does tuning work?

First of all, what is "tuning" for? It's for tuning the number of evaluations per sample.
Expand Down
3 changes: 3 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,7 @@ julia> @b rand(1000) hash # How long does it take to hash that array?

julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise?
172.970 ns (3 allocs: 7.875 KiB)

julia> @b rand(100,100) inv,_^2,sum # Is it be faster to invert, square, or sum a matrix? [THIS USAGE IS EXPERIMENTAL]
(92.917 μs (9 allocs: 129.203 KiB), 27.166 μs (3 allocs: 78.203 KiB), 1.083 μs)
```
85 changes: 84 additions & 1 deletion docs/src/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ CurrentModule = Chairmarks
DocTestSetup = quote
using Chairmarks
end
DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?"]
DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?| (time: |memory:) .*% => (improvement|regression|invariant) \((5|1).00% tolerance\)"]
```

# [How to migrate from BenchmarkTools to Chairmarks](@id migration)
Expand Down Expand Up @@ -95,6 +95,40 @@ Benchmark results have the following fields:

Note that more fields may be added as more information becomes available.

### Comparisons

Chairmarks does not provide a `judge` function to decide if two benchmarks are significantly
different. However, you can get accurate data to inform that judgement by passing passing a
comma separated list of functions to `@b` or `@be`.


!!! warning
Comparative benchmarking is experimental and may be removed or changed in future versions

```jldoctest; setup=(using BenchmarkTools)
julia> f() = sum(rand() for _ in 1:1000)
f (generic function with 1 method)

julia> g() = sum(rand() for _ in 1:1010)
g (generic function with 1 method)

julia> @b f,g
(1.121 μs, 1.132 μs)

julia> @b f,g
(1.063 μs, 1.073 μs)

julia> judge(minimum(@benchmark(f())), minimum(@benchmark(g())))
BenchmarkTools.TrialJudgement:
time: -5.91% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)

julia> judge(minimum(@benchmark(f())), minimum(@benchmark(g())))
BenchmarkTools.TrialJudgement:
time: -0.78% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
```

### Nonconstant globals and interpolation

Like BenchmarkTools, benchmarks that include access to nonconstant globals will receive a
Expand All @@ -121,3 +155,52 @@ julia> @b rand($x) # interpolate (most familiar to BenchmarkTools users)
julia> @b x rand # put the access in the setup phase (most concise in simple cases)
15.507 ns (2 allocs: 112 bytes)
```

### `BenchmarkGroup`s

It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks. Replacing
`@benchmarkable` invocations with `@be` invocations and wrapping the group in a function
suffices. You don't have to run `tune!` and instead of calling `run`, call the function.
Even running `Statistics.median(suite)` works—although any custom plotting might need a
couple of tweaks.

```julia
using BenchmarkTools, Statistics

function create_benchmarks()
functions = Function[sqrt, inv, cbrt, sin, cos]
group = BenchmarkGroup()
for (index, func) in enumerate(functions)
group[index] = @benchmarkable $func(x) setup=(x=rand())
end
group
end

suite = create_benchmarks()

tune!(suite)

median(run(suite))
# edit code
median(run(suite))
```

```julia
using Chairmarks, Statistics

function run_benchmarks()
functions = Function[sqrt, inv, cbrt, sin, cos]
group = BenchmarkGroup()
for (index, func) in enumerate(functions)
group[nameof(func)] = @be rand func
end
group
end

median(run_benchmarks())
# edit code
median(run_benchmarks())
```

This behavior emerged naturally rather than being intentionally designed so expect some
rough edges. See https://github.com/LilithHafner/Chairmarks.jl/issues/70 for more info.
2 changes: 2 additions & 0 deletions docs/src/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@ version number if the change is not expected to cause significant disruptions.
- [`Chairmarks.Benchmark`](@ref)
- [`@b`](@ref)
- [`@be`](@ref)
- [`Chairmarks.summarize`](@ref)
- [`Chairmarks.DEFAULTS`](@ref)

```@docs
Chairmarks.Sample
Chairmarks.Benchmark
@b
@be
Chairmarks.summarize
Chairmarks.DEFAULTS
```
29 changes: 19 additions & 10 deletions docs/src/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,22 +89,31 @@ julia> @b rand(100) hash

The first argument is called once per sample, and the second argument is called once per
evaluation, each time passing the result of the first argument. We can also use the special
`_` variable to refer to the output of the previous step. Here, we compare two different
implementations of the norm of a vector
`_` variable to refer to the output of the previous step. Here, we benchmark computing the
norm of a vector:

```jldoctest
julia> @b rand(100) sqrt(sum(_ .* _))
37.628 ns (2 allocs: 928 bytes)
38.373 ns (2 allocs: 928 bytes)
```

The _ refers to the array whose norm is to be computed.

We can perform a comparison of two different implementations of the same specification by
providing a comma-separated list of functions to benchmark. Here, we compare two ways of
computing the norm of a vector:

julia> @b rand(100) sqrt(sum(x->x^2, _))
11.053 ns
!!! warning
Comparative benchmarking is experimental and may be removed or changed in future versions

```jldoctest
julia> @b rand(100) sqrt(sum(_ .* _)),sqrt(sum(x->x^2, _))
(40.373 ns (2 allocs: 928 bytes), 11.440 ns)
```

The _ refers to the array whose norm is to be computed. Both implementations are quite fast.
These measurements are on a 3.5 GHz CPU so it appears that the first implementation takes
about one clock cycle per element, with a bit of overhead. The second, on the other hand,
appears to be running much faster than that, likely because it is making use of SIMD
instructions.
This invocation pattern runs the setup function once per sample and randomly selects which
implementation to run first for each sample. This makes comparative benchmarks robust to
fluctuations in system load.

## Common pitfalls

Expand Down
35 changes: 20 additions & 15 deletions docs/src/why.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,21 @@ DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?"]

Capable of detecting 1% difference in runtime in ideal conditions

!!! warning
Comparative benchmarking is experimental and may be removed or changed in future versions

```jldoctest
julia> f(n) = sum(rand() for _ in 1:n)
f (generic function with 1 method)

julia> @b f(1000)
1.074 μs

julia> @b f(1000)
1.075 μs

julia> @b f(1000)
1.076 μs
julia> @b f(1000), f(1010)
(1.064 μs, 1.074 μs)

julia> @b f(1010)
1.086 μs
julia> @b f(1000), f(1010)
(1.063 μs, 1.073 μs)

julia> @b f(1010)
1.087 μs

julia> @b f(1010)
1.087 μs
julia> @b f(1000), f(1010)
(1.064 μs, 1.074 μs)
```

## Efficient
Expand Down Expand Up @@ -89,6 +83,17 @@ julia> @b rand(100) sort(_, by=x -> exp(-x)) issorted(_, rev=true) || error()
5.358 μs (2 allocs: 1.750 KiB)
```

The function being benchmarked can be a comma separated list of functions in which case a tuple
of the results is returned

!!! warning
Comparative benchmarking is experimental and may be removed or changed in future versions

```jldoctest
julia> @b rand(100) sort(_, alg=InsertionSort),sort(_, alg=MergeSort)
(1.245 μs (2 allocs: 928 bytes), 921.875 ns (4 allocs: 1.375 KiB))
```

See [`@be`](@ref) for more info

## Truthful
Expand Down
2 changes: 1 addition & 1 deletion src/Chairmarks.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ module Chairmarks

using Printf

VERSION >= v"1.11.0-DEV.469" && eval(Meta.parse("public Sample, Benchmark, DEFAULTS"))
VERSION >= v"1.11.0-DEV.469" && eval(Meta.parse("public Sample, Benchmark, DEFAULTS, summarize"))
export @b, @be

include("types.jl")
Expand Down
29 changes: 26 additions & 3 deletions src/public.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ Benchmark `f` and return the fastest [`Sample`](@ref).

Use [`@be`](@ref) for full results.

`@b args...` is equivalent to `summarize(@be args...)`. See the docstring for [`@be`](@ref)
for more information.
`@b args...` is equivalent to `Chairmarks.summarize(@be args...)`. See the docstring of
[`@be`](@ref) for more information.

# Examples

Expand Down Expand Up @@ -34,6 +34,9 @@ julia> @b (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary exp

julia> @b (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
2.447 s (without a warmup)

julia> @b rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
(17.256 ns, 4.246 ns)
```
"""
macro b(args...)
Expand Down Expand Up @@ -148,6 +151,14 @@ At a high level, the implementation of this function looks like this
So `init` will be called once, `setup` and `teardown` will be called once per sample, and
`f` will be called `evals` times per sample.

# Experimental Features

You can pass a comma separated list of functions or expressions to `@be` and they will all
be benchmarked at the same time with interleaved samples, returning a tuple of `Benchmark`s.

!!! warning
Comparative benchmarking is experimental and may be removed or changed in future versions

# Examples

```jldoctest; filter = [r"\\d\\d?\\d?\\.\\d{3} [μmn]?s( \\(.*\\))?"=>s"RES", r"\\d+ (sample|evaluation)s?"=>s"### \\1"], setup=(using Random)
Expand Down Expand Up @@ -203,14 +214,26 @@ Benchmark: 3387 samples with 144 evaluations
julia> @be (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup)
Benchmark: 1 sample with 1 evaluation
2.488 s (without a warmup)

julia> @be rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL]
Benchmark: 14887 samples with 436 evaluations
min 17.106 ns
median 18.922 ns
mean 20.974 ns
max 234.998 ns
Benchmark: 14887 samples with 436 evaluations
min 4.110 ns
median 4.683 ns
mean 4.979 ns
max 42.911 ns
```
"""
macro be(args...)
process_args(args)
end

"""
summarize(b::Benchmark) -> Any
`summarize(@be ...)` is equivalent to `@b ...`

Used by `@b` to summarize the output of `@be`. Currently implemented as elementwise `minimum`.
"""
Expand Down
Loading