From 1cd9b88c6eeac62b9818ebc9d616d3c62557b6b1 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 12:09:30 -0600 Subject: [PATCH 1/9] Update docs for chagnes since 1.2.2 --- docs/src/explanations.md | 9 +++++ docs/src/migration.md | 83 ++++++++++++++++++++++++++++++++++++++++ docs/src/reference.md | 2 + docs/src/tutorial.md | 29 +++++++++----- docs/src/why.md | 32 ++++++++-------- src/Chairmarks.jl | 2 +- src/public.jl | 25 +++++++++++- 7 files changed, 155 insertions(+), 27 deletions(-) diff --git a/docs/src/explanations.md b/docs/src/explanations.md index 8d395c4f..199a772d 100644 --- a/docs/src/explanations.md +++ b/docs/src/explanations.md @@ -67,6 +67,15 @@ stops respecting the requested runtime budget and so it could very well perform precisely than Chairmarks (it's hard to compete with a 500ms benchmark when you only have 1ms). In practice, however, Chairmarks stays pretty reliable even for fairly low runtimes. +When comparing different implementations of the same function, `@b rand f,g` can be more reliable +than `judge(minimum(@benchmark(f(x) setup=(x=rand()))), minimum(@benchmark(g(x) setup=(x=rand())))` +because the former randomly interleaves calls to `f` and `g` in the same context and scope +with the same inputs while the latter runs all evaluations of `f` before all evaluations of +`g` and—typically less importantly—uses different random inputs. + +!!! warning + Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + ## How does tuning work? First of all, what is "tuning" for? It's for tuning the number of evaluations per sample. diff --git a/docs/src/migration.md b/docs/src/migration.md index 7bdfe80b..f6533747 100644 --- a/docs/src/migration.md +++ b/docs/src/migration.md @@ -95,6 +95,40 @@ Benchmark results have the following fields: Note that more fields may be added as more information becomes available. +### Comparisons + +Chairmarks does not provide a `judge` function to decide if two benchmarks are significantly +different. However, you can get accurate data to inform that judgement by passing passing a +comma separated list of functions to `@b` or `@be`. + + +!!! warning + Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + +```jldoctest +julia> f() = sum(rand() for _ in 1:1000) +f (generic function with 1 method) + +julia> g() = sum(rand() for _ in 1:1010) +g (generic function with 1 method) + +julia> @b f,g +(1.121 μs, 1.132 μs) + +julia> @b f,g +(1.063 μs, 1.073 μs) + +julia> judge(minimum(@benchmark(f())), minimum(@benchmark(g()))) +BenchmarkTools.TrialJudgement: + time: -5.91% => improvement (5.00% tolerance) + memory: +0.00% => invariant (1.00% tolerance) + +julia> judge(minimum(@benchmark(f())), minimum(@benchmark(g()))) +BenchmarkTools.TrialJudgement: + time: -0.78% => invariant (5.00% tolerance) + memory: +0.00% => invariant (1.00% tolerance) +``` + ### Nonconstant globals and interpolation Like BenchmarkTools, benchmarks that include access to nonconstant globals will receive a @@ -121,3 +155,52 @@ julia> @b rand($x) # interpolate (most familiar to BenchmarkTools users) julia> @b x rand # put the access in the setup phase (most concise in simple cases) 15.507 ns (2 allocs: 112 bytes) ``` + +### `BenchmarkGroup`s + +It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks. Replacing +`@benchmarkable` invocations with `@be` invocations and wrapping the group in a function +suffices. You don't have to run `tune!` and instead of calling `run`, call the function. +Even running `Statistics.median(suite)` works—although any custom plotting might need a +couple of tweaks. + +```julia +using BenchmarkTools, Statistics + +function create_benchmarks() + functions = Function[sqrt, inv, cbrt, sin, cos] + group = BenchmarkGroup() + for (index, func) in enumerate(functions) + group[index] = @benchmarkable $func(x) setup=(x=rand()) + end + group +end + +suite = create_benchmarks() + +tune!(suite) + +median(run(suite)) +# edit code +median(run(suite)) +``` + +```julia +using Chairmarks, Statistics + +function run_benchmarks() + functions = Function[sqrt, inv, cbrt, sin, cos] + group = BenchmarkGroup() + for (index, func) in enumerate(functions) + group[nameof(func)] = @be rand func + end + group +end + +median(run_benchmarks()) +# edit code +median(run_benchmarks()) +``` + +This behavior emerged naturally rather than being intentionally designed so expect some +rough edges. See https://github.com/LilithHafner/Chairmarks.jl/issues/70 for more info. diff --git a/docs/src/reference.md b/docs/src/reference.md index eaf5426f..85223ccb 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -12,6 +12,7 @@ version number if the change is not expected to cause significant disruptions. - [`Chairmarks.Benchmark`](@ref) - [`@b`](@ref) - [`@be`](@ref) +- [`summarize`](@ref) - [`Chairmarks.DEFAULTS`](@ref) ```@docs @@ -19,5 +20,6 @@ Chairmarks.Sample Chairmarks.Benchmark @b @be +Chairmarks.summarize Chairmarks.DEFAULTS ``` diff --git a/docs/src/tutorial.md b/docs/src/tutorial.md index aecea66e..e8ac6acc 100644 --- a/docs/src/tutorial.md +++ b/docs/src/tutorial.md @@ -89,22 +89,31 @@ julia> @b rand(100) hash The first argument is called once per sample, and the second argument is called once per evaluation, each time passing the result of the first argument. We can also use the special -`_` variable to refer to the output of the previous step. Here, we compare two different -implementations of the norm of a vector +`_` variable to refer to the output of the previous step. Here, we benchmark computing the +norm of a vector: ```jldoctest julia> @b rand(100) sqrt(sum(_ .* _)) -37.628 ns (2 allocs: 928 bytes) +38.373 ns (2 allocs: 928 bytes) +``` + +The _ refers to the array whose norm is to be computed. + +We can perform a comparison of two different implementations of the same specification by +providing a comma-separated list of functions to benchmark. Here, we compare two ways of +computing the norm of a vector: -julia> @b rand(100) sqrt(sum(x->x^2, _)) -11.053 ns +!!! warning + Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + +```jldoctest +julia> @b rand(100) sqrt(sum(_ .* _)),sqrt(sum(x->x^2, _)) +(40.373 ns (2 allocs: 928 bytes), 11.440 ns) ``` -The _ refers to the array whose norm is to be computed. Both implementations are quite fast. -These measurements are on a 3.5 GHz CPU so it appears that the first implementation takes -about one clock cycle per element, with a bit of overhead. The second, on the other hand, -appears to be running much faster than that, likely because it is making use of SIMD -instructions. +This invocation pattern runs the setup function once per sample and randomly selects which +implementation to run first for each sample. This makes comparative benchmarks robust to +fluctuations in system load. ## Common pitfalls diff --git a/docs/src/why.md b/docs/src/why.md index 67d7ee0a..c5dfc66f 100644 --- a/docs/src/why.md +++ b/docs/src/why.md @@ -14,23 +14,14 @@ Capable of detecting 1% difference in runtime in ideal conditions julia> f(n) = sum(rand() for _ in 1:n) f (generic function with 1 method) -julia> @b f(1000) -1.074 μs +julia> @b f(1000), f(1010) +(1.064 μs, 1.074 μs) -julia> @b f(1000) -1.075 μs +julia> @b f(1000), f(1010) +(1.063 μs, 1.073 μs) -julia> @b f(1000) -1.076 μs - -julia> @b f(1010) -1.086 μs - -julia> @b f(1010) -1.087 μs - -julia> @b f(1010) -1.087 μs +julia> @b f(1000), f(1010) +(1.064 μs, 1.074 μs) ``` ## Efficient @@ -89,6 +80,17 @@ julia> @b rand(100) sort(_, by=x -> exp(-x)) issorted(_, rev=true) || error() 5.358 μs (2 allocs: 1.750 KiB) ``` +The function being benchmarked can be a comma separated list of functions in which case a tuple +of the results is returned + +!!! warning + Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + +```jldoctest +julia> @b rand(100) sort(_, alg=InsertionSort),sort(_, alg=MergeSort) +(1.245 μs (2 allocs: 928 bytes), 921.875 ns (4 allocs: 1.375 KiB)) +``` + See [`@be`](@ref) for more info ## Truthful diff --git a/src/Chairmarks.jl b/src/Chairmarks.jl index 989e10ae..1d7de008 100644 --- a/src/Chairmarks.jl +++ b/src/Chairmarks.jl @@ -17,7 +17,7 @@ module Chairmarks using Printf -VERSION >= v"1.11.0-DEV.469" && eval(Meta.parse("public Sample, Benchmark, DEFAULTS")) +VERSION >= v"1.11.0-DEV.469" && eval(Meta.parse("public Sample, Benchmark, DEFAULTS, summarize")) export @b, @be include("types.jl") diff --git a/src/public.jl b/src/public.jl index a239c537..ec83504c 100644 --- a/src/public.jl +++ b/src/public.jl @@ -34,6 +34,9 @@ julia> @b (x = 0; for _ in 1:50; x = hash(x); end; x) # We can use arbitrary exp julia> @b (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup) 2.447 s (without a warmup) + +julia> @b rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL] +(17.256 ns, 4.246 ns) ``` """ macro b(args...) @@ -148,6 +151,14 @@ At a high level, the implementation of this function looks like this So `init` will be called once, `setup` and `teardown` will be called once per sample, and `f` will be called `evals` times per sample. +# Experimental Features + +You can pass a comma separated list of functions or expressions to `@be` and they will all +be benchmarked at the same time with interleaved samples, returning a tuple of `Benchmark`s. + +!!! warning + Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + # Examples ```jldoctest; filter = [r"\\d\\d?\\d?\\.\\d{3} [μmn]?s( \\(.*\\))?"=>s"RES", r"\\d+ (sample|evaluation)s?"=>s"### \\1"], setup=(using Random) @@ -203,6 +214,18 @@ Benchmark: 3387 samples with 144 evaluations julia> @be (x = 0; for _ in 1:5e8; x = hash(x); end; x) # This runs for a long time, so it is only run once (with no warmup) Benchmark: 1 sample with 1 evaluation 2.488 s (without a warmup) + +julia> @be rand(10) hash,objectid # Which hash algorithm is faster? [THIS USAGE IS EXPERIMENTAL] +Benchmark: 14887 samples with 436 evaluations + min 17.106 ns + median 18.922 ns + mean 20.974 ns + max 234.998 ns +Benchmark: 14887 samples with 436 evaluations + min 4.110 ns + median 4.683 ns + mean 4.979 ns + max 42.911 ns ``` """ macro be(args...) @@ -210,7 +233,7 @@ macro be(args...) end """ - summarize(b::Benchmark) -> Any +`summarize(@be ...)` is equivalent to `@b ...` Used by `@b` to summarize the output of `@be`. Currently implemented as elementwise `minimum`. """ From 8e654b8a8d06e4d3c65dd7acd9d8eeb4a95eb4da Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 12:21:49 -0600 Subject: [PATCH 2/9] Add comparison to README and index.html (under experimental advisory) and demote the Why? link because it's beginning to become self evident --- README.md | 7 +++++-- docs/src/index.md | 3 +++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index aa9b1946..32fe5dc1 100644 --- a/README.md +++ b/README.md @@ -28,10 +28,13 @@ julia> @b rand(1000) hash # How long does it take to hash that array? julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise? 172.970 ns (3 allocs: 7.875 KiB) -``` -[Why Chairmarks?](https://Chairmarks.lilithhafner.com/stable/why) +julia> @b rand(100,100) inv,_^2,sum # Is it be faster to invert, square, or sum a matrix? [THIS USAGE IS EXPERIMENTAL] +(92.917 μs (9 allocs: 129.203 KiB), 27.166 μs (3 allocs: 78.203 KiB), 1.083 μs) +``` [Tutorial](https://Chairmarks.lilithhafner.com/stable/tutorial) +[Why Chairmarks?](https://Chairmarks.lilithhafner.com/stable/why) + [API Reference](https://Chairmarks.lilithhafner.com/stable/reference) diff --git a/docs/src/index.md b/docs/src/index.md index 6b988d2d..708b6706 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -30,4 +30,7 @@ julia> @b rand(1000) hash # How long does it take to hash that array? julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise? 172.970 ns (3 allocs: 7.875 KiB) + +julia> @b rand(100,100) inv,_^2,sum # Is it be faster to invert, square, or sum a matrix? [THIS USAGE IS EXPERIMENTAL] +(92.917 μs (9 allocs: 129.203 KiB), 27.166 μs (3 allocs: 78.203 KiB), 1.083 μs) ``` From d016842b41f9c21924678e21feb84ce3da58dbea Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 12:34:07 -0600 Subject: [PATCH 3/9] fixups for doctests --- docs/src/migration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/migration.md b/docs/src/migration.md index f6533747..00755b43 100644 --- a/docs/src/migration.md +++ b/docs/src/migration.md @@ -3,7 +3,7 @@ CurrentModule = Chairmarks DocTestSetup = quote using Chairmarks end -DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?"] +DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?| (time: |memory:) .*% => (improvement|regression|invariant) \((5|1).00% tolerance)"] ``` # [How to migrate from BenchmarkTools to Chairmarks](@id migration) @@ -105,7 +105,7 @@ comma separated list of functions to `@b` or `@be`. !!! warning Comparative benchmarking is experimental and may be removed or its syntax changed in future versions -```jldoctest +```jldoctest setup=:(using BenchmarkTools) julia> f() = sum(rand() for _ in 1:1000) f (generic function with 1 method) From e4c1d4eb647804391f67a354fbce40f230284b5f Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 12:37:25 -0600 Subject: [PATCH 4/9] Escape closing paren --- docs/src/migration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/migration.md b/docs/src/migration.md index 00755b43..fdc7655c 100644 --- a/docs/src/migration.md +++ b/docs/src/migration.md @@ -3,7 +3,7 @@ CurrentModule = Chairmarks DocTestSetup = quote using Chairmarks end -DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?| (time: |memory:) .*% => (improvement|regression|invariant) \((5|1).00% tolerance)"] +DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?| (time: |memory:) .*% => (improvement|regression|invariant) \((5|1).00% tolerance\)"] ``` # [How to migrate from BenchmarkTools to Chairmarks](@id migration) From 2194dd7aa67e8c93792843ece7d9292a097b1be0 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 12:50:19 -0600 Subject: [PATCH 5/9] Workaround https://github.com/JuliaDocs/Documenter.jl/issues/2613 --- docs/src/migration.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/src/migration.md b/docs/src/migration.md index fdc7655c..a0f96ba8 100644 --- a/docs/src/migration.md +++ b/docs/src/migration.md @@ -105,7 +105,14 @@ comma separated list of functions to `@b` or `@be`. !!! warning Comparative benchmarking is experimental and may be removed or its syntax changed in future versions -```jldoctest setup=:(using BenchmarkTools) + +```@meta +DocTestSetup = quote + using Chairmarks, BenchmarkTools +end +``` + +```jldoctest julia> f() = sum(rand() for _ in 1:1000) f (generic function with 1 method) @@ -129,6 +136,12 @@ BenchmarkTools.TrialJudgement: memory: +0.00% => invariant (1.00% tolerance) ``` +```@meta +DocTestSetup = quote + using Chairmarks +end +``` + ### Nonconstant globals and interpolation Like BenchmarkTools, benchmarks that include access to nonconstant globals will receive a From 8af4c1982c8f3cf1bb63cd2ad5ac863ca3a85125 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 12:54:24 -0600 Subject: [PATCH 6/9] Add BenchmarkTools as a docs dep --- docs/Project.toml | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/Project.toml b/docs/Project.toml index 97de8994..854fb257 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,4 +1,5 @@ [deps] +BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" Chairmarks = "0ca39b1e-fe0b-4e98-acfc-b1656634c4de" Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365" From 6c1cf154aca1317365c10cd71c0a01840fe8cca1 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 13:01:38 -0600 Subject: [PATCH 7/9] Use correct setup syntax --- docs/src/migration.md | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/docs/src/migration.md b/docs/src/migration.md index a0f96ba8..7564ab57 100644 --- a/docs/src/migration.md +++ b/docs/src/migration.md @@ -105,14 +105,7 @@ comma separated list of functions to `@b` or `@be`. !!! warning Comparative benchmarking is experimental and may be removed or its syntax changed in future versions - -```@meta -DocTestSetup = quote - using Chairmarks, BenchmarkTools -end -``` - -```jldoctest +```jldoctest; setup=(using BenchmarkTools) julia> f() = sum(rand() for _ in 1:1000) f (generic function with 1 method) @@ -136,12 +129,6 @@ BenchmarkTools.TrialJudgement: memory: +0.00% => invariant (1.00% tolerance) ``` -```@meta -DocTestSetup = quote - using Chairmarks -end -``` - ### Nonconstant globals and interpolation Like BenchmarkTools, benchmarks that include access to nonconstant globals will receive a From 6519c69cd445b32bb3e89ac09deb011cd85f0de2 Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sat, 30 Nov 2024 13:04:22 -0600 Subject: [PATCH 8/9] Qualify summarize and minor rewording --- docs/src/reference.md | 2 +- src/public.jl | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/src/reference.md b/docs/src/reference.md index 85223ccb..782c090a 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -12,7 +12,7 @@ version number if the change is not expected to cause significant disruptions. - [`Chairmarks.Benchmark`](@ref) - [`@b`](@ref) - [`@be`](@ref) -- [`summarize`](@ref) +- [`Chairmarks.summarize`](@ref) - [`Chairmarks.DEFAULTS`](@ref) ```@docs diff --git a/src/public.jl b/src/public.jl index ec83504c..44cd1997 100644 --- a/src/public.jl +++ b/src/public.jl @@ -5,8 +5,8 @@ Benchmark `f` and return the fastest [`Sample`](@ref). Use [`@be`](@ref) for full results. -`@b args...` is equivalent to `summarize(@be args...)`. See the docstring for [`@be`](@ref) -for more information. +`@b args...` is equivalent to `Chairmarks.summarize(@be args...)`. See the docstring of +[`@be`](@ref) for more information. # Examples From ca3db1674ae589b2027d361512c0f14492989c5d Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Sun, 1 Dec 2024 08:58:39 -0600 Subject: [PATCH 9/9] Update experimental warnings --- docs/src/explanations.md | 2 +- docs/src/migration.md | 2 +- docs/src/tutorial.md | 2 +- docs/src/why.md | 5 ++++- src/public.jl | 2 +- 5 files changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/src/explanations.md b/docs/src/explanations.md index 199a772d..a055f1d5 100644 --- a/docs/src/explanations.md +++ b/docs/src/explanations.md @@ -74,7 +74,7 @@ with the same inputs while the latter runs all evaluations of `f` before all eva `g` and—typically less importantly—uses different random inputs. !!! warning - Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + Comparative benchmarking is experimental and may be removed or changed in future versions ## How does tuning work? diff --git a/docs/src/migration.md b/docs/src/migration.md index 7564ab57..70fbe75c 100644 --- a/docs/src/migration.md +++ b/docs/src/migration.md @@ -103,7 +103,7 @@ comma separated list of functions to `@b` or `@be`. !!! warning - Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + Comparative benchmarking is experimental and may be removed or changed in future versions ```jldoctest; setup=(using BenchmarkTools) julia> f() = sum(rand() for _ in 1:1000) diff --git a/docs/src/tutorial.md b/docs/src/tutorial.md index e8ac6acc..e345c200 100644 --- a/docs/src/tutorial.md +++ b/docs/src/tutorial.md @@ -104,7 +104,7 @@ providing a comma-separated list of functions to benchmark. Here, we compare two computing the norm of a vector: !!! warning - Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + Comparative benchmarking is experimental and may be removed or changed in future versions ```jldoctest julia> @b rand(100) sqrt(sum(_ .* _)),sqrt(sum(x->x^2, _)) diff --git a/docs/src/why.md b/docs/src/why.md index c5dfc66f..41918c24 100644 --- a/docs/src/why.md +++ b/docs/src/why.md @@ -10,6 +10,9 @@ DocTestFilters = [r"\d\d?\d?\.\d{3} [μmn]?s( \(.*\))?"] Capable of detecting 1% difference in runtime in ideal conditions +!!! warning + Comparative benchmarking is experimental and may be removed or changed in future versions + ```jldoctest julia> f(n) = sum(rand() for _ in 1:n) f (generic function with 1 method) @@ -84,7 +87,7 @@ The function being benchmarked can be a comma separated list of functions in whi of the results is returned !!! warning - Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + Comparative benchmarking is experimental and may be removed or changed in future versions ```jldoctest julia> @b rand(100) sort(_, alg=InsertionSort),sort(_, alg=MergeSort) diff --git a/src/public.jl b/src/public.jl index 44cd1997..4d2484f1 100644 --- a/src/public.jl +++ b/src/public.jl @@ -157,7 +157,7 @@ You can pass a comma separated list of functions or expressions to `@be` and the be benchmarked at the same time with interleaved samples, returning a tuple of `Benchmark`s. !!! warning - Comparative benchmarking is experimental and may be removed or its syntax changed in future versions + Comparative benchmarking is experimental and may be removed or changed in future versions # Examples