PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks #70

asinghvi17 · 2024-03-06T16:54:04Z

Simply replacing @benchmarkable with @be suffices, and you don't have to run tune! or run either!

Even running Statistics.median(suite) works - although any custom plotting utilities might need a couple of tweaks :)

The text was updated successfully, but these errors were encountered:

LilithHafner · 2024-03-07T15:57:35Z

What? I had not idea. This is lovely :)

MilesCranmer · 2024-03-13T17:55:38Z

Is there a way to take a BenchmarkGroup and "translate" it somehow? Or otherwise run a benchmark suite using Chairmarks instead of BenchmarkTools? Many projects have historically defined their benchmark suite in benchmarks/benchmarks.jl and use BenchmarkTools. So to measure performance over time it is perhaps not practical to replace all previous @benchmarkable (especially if there are setup=... in the macro).

I ask specifically in the context of AirspeedVelocity.jl: MilesCranmer/AirspeedVelocity.jl#35

asinghvi17 · 2024-03-13T17:58:20Z

I mean they both have the same syntax, so an operation at all leaf nodes of that dict could easily reassign the @benchmarkables to the results of @be on that same Expr (if you can directly invoke a macro on an Expr).

MilesCranmer · 2024-03-13T17:59:52Z

I think they have slightly different syntax, no?

  Positional argument disambiguation
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  setup, teardown, and init are optional and are parsed with that precedence giving these possible forms:

  @be f
  @be setup f
  @be setup f teardown
  @be init setup f teardown

whereas @benchmarkable is

help?> @benchmarkable
  @benchmarkable <expr to benchmark> [setup=<setup expr>]


  Create a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.

asinghvi17 · 2024-03-13T18:00:31Z

Ah that's true - I had not used setup much but I guess it could also be translated by way of Expr rewriting

MilesCranmer · 2024-03-13T18:04:01Z

Wait, I'm a bit confused. @benchmarkable returns a benchmark (that you can then execute). Whereas @be appears to actually return the results of benchmarking the expression. Is that correct?

asinghvi17 · 2024-03-13T18:05:05Z

Yep! Creating a benchmark suite with @be instead of @benchmarkable gives you the equivalent of that suite after run.

MilesCranmer · 2024-03-13T18:05:58Z

Oh, but isn't the whole point of @benchmarkable for it to be lazily evaluated, so you can tune! it? And @benchmark for eager evaluation?

asinghvi17 · 2024-03-13T18:13:58Z

That's true, but if running in a non-interactive framework I don't think it really matters?

Even then, the performance difference is substantial enough that it's actually possible to do semi-interactive workflows with Chairmarks.

MilesCranmer · 2024-03-13T18:33:48Z

I guess I just wouldn't consider @benchmarkable -> @be to be a complete solution (maybe this deserves a new issue with a feature request). For example if I need a single benchmark result I would use @benchmark. But I want a suite of benchmarks that I can store in my REPL, in between Revise.jl-ing my library, then the suite is something I would want to re-run. In principle it doesn't seem too bad to add a compatibility layer?

This is the benchmarkable code. The logic isn't too complicated it seems, it just generates a function `samplefunc` that returns the recorded time, allocations, memory for a particular expression.

"""
    @benchmarkable <expr to benchmark> [setup=<setup expr>]

Create a `Benchmark` instance for the given expression. `@benchmarkable`
has similar syntax with `@benchmark`. See also [`@benchmark`](@ref).
"""
macro benchmarkable(args...)
    core, setup, teardown, quote_vars, quote_vals, params = benchmarkable_parts(args)
    map!(esc, params, params)

    # extract any variable bindings shared between the core and setup expressions
    setup_vars = isa(setup, Expr) ? collectvars(setup) : []
    core_vars = isa(core, Expr) ? collectvars(core) : []
    out_vars = filter(var -> var in setup_vars, core_vars)

    # generate the benchmark definition
    return quote
        generate_benchmark_definition(
            $__module__,
            $(Expr(:quote, out_vars)),
            $(Expr(:quote, setup_vars)),
            $(Expr(:quote, quote_vars)),
            $(esc(Expr(:tuple, Expr.(:quote, quote_vals)...))),
            $(esc(Expr(:quote, core))),
            $(esc(Expr(:quote, setup))),
            $(esc(Expr(:quote, teardown))),
            Parameters($(params...)),
        )
    end
end

# `eval` an expression that forcibly defines the specified benchmark at
# top-level in order to allow transfer of locally-scoped variables into
# benchmark scope.
#
# The double-underscore-prefixed variable names are not particularly hygienic - it's
# possible for them to conflict with names used in the setup or teardown expressions.
# A more robust solution would be preferable.
function generate_benchmark_definition(
    eval_module, out_vars, setup_vars, quote_vars, quote_vals, core, setup, teardown, params
)
    @nospecialize
    corefunc = gensym("core")
    samplefunc = gensym("sample")
    type_vars = [gensym() for i in 1:(length(quote_vars) + length(setup_vars))]
    signature = Expr(:call, corefunc, quote_vars..., setup_vars...)
    signature_def = Expr(
        :where,
        Expr(
            :call,
            corefunc,
            [
                Expr(:(::), var, type) for
                (var, type) in zip([quote_vars; setup_vars], type_vars)
            ]...,
        ),
        type_vars...,
    )
    if length(out_vars) == 0
        invocation = signature
        core_body = core
    elseif length(out_vars) == 1
        returns = :(return $(out_vars[1]))
        invocation = :($(out_vars[1]) = $(signature))
        core_body = :($(core); $(returns))
    else
        returns = :(return $(Expr(:tuple, out_vars...)))
        invocation = :($(Expr(:tuple, out_vars...)) = $(signature))
        core_body = :($(core); $(returns))
    end
    @static if isdefined(Base, :donotdelete)
        invocation = :(
            let x = $invocation
                Base.donotdelete(x)
                x
            end
        )
    end
    return Core.eval(
        eval_module,
        quote
            @noinline $(signature_def) = begin
                $(core_body)
            end
            @noinline function $(samplefunc)(
                $(Expr(:tuple, quote_vars...)), __params::$BenchmarkTools.Parameters
            )
                $(setup)
                __evals = __params.evals
                __gc_start = Base.gc_num()
                __start_time = time_ns()
                __return_val = $(invocation)
                for __iter in 2:__evals
                    $(invocation)
                end
                __sample_time = time_ns() - __start_time
                __gcdiff = Base.GC_Diff(Base.gc_num(), __gc_start)
                $(teardown)
                __time = max((__sample_time / __evals) - __params.overhead, 0.001)
                __gctime = max((__gcdiff.total_time / __evals) - __params.overhead, 0.0)
                __memory = Int(Base.fld(__gcdiff.allocd, __evals))
                __allocs = Int(
                    Base.fld(
                        __gcdiff.malloc +
                        __gcdiff.realloc +
                        __gcdiff.poolalloc +
                        __gcdiff.bigalloc,
                        __evals,
                    ),
                )
                return __time, __gctime, __memory, __allocs, __return_val
            end
            $BenchmarkTools.Benchmark($(samplefunc), $(quote_vals), $(params))
        end,
    )
en

So perhaps a ext/ChairmarksBenchmarkToolsExt.jl could create a @benchmarkable that still uses benchmarkable_parts to extract the pieces, but uses Chairmarks instead for running the thing? I'm not sure how doable this is. Maybe @LilithHafner could share their thoughts.

asinghvi17 · 2024-03-13T18:39:04Z

Hmm! Yeah that logic could probably be directly translated to Chairmarks somehow, probably by creating a new macro which stores an Expr object when run (the equivalent of @benchmarkable). It doesn't look like this is hijackable unless either we do major type piracy by overloading BenchmarkTools.generate_benchmark_definition directly, or offer a function Chairmarks.override_benchmarkable!() to @eval that code in.

LilithHafner · 2024-11-27T21:48:01Z

This should be moved from an issue into the documentation

LilithHafner · 2024-12-01T15:14:23Z

This PSA is in the docs now. Issues with its behavior should be reported separately. Though as I mention in the docs, this isn't intentionally designed behavior so it may never be seamless.

LilithHafner · 2024-12-01T15:15:00Z

Thanks again for reporting, @asinghvi17, and thanks for the qualifications and context, @MilesCranmer

LilithHafner · 2025-01-09T14:35:36Z

Ah, @MilesCranmer, I never responded to your suggestion to use @benchmarkable's API on the frontend and Chairmarks on the backend. That's possibly syntactically (i.e. to write a syntax transformation from BenchmarkTools's input format to Charimarks's input format) but there would remain semantic holes. Most notably, BenchmarkTools runs the sample function in global scope while Chairmarks runs it in local scope.

MilesCranmer mentioned this issue Mar 13, 2024

Consider switch to Chairmarks.jl MilesCranmer/AirspeedVelocity.jl#35

Open

gdalle mentioned this issue Mar 18, 2024

Unify benchmarking and allocation testing JuliaDiff/DifferentiationInterface.jl#60

Closed

LilithHafner added the documentation Improvements or additions to documentation label Nov 27, 2024

LilithHafner linked a pull request Nov 30, 2024 that will close this issue

Update docs for chagnes since 1.2.2 #150

Merged

LilithHafner closed this as completed in #150 Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks #70

PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks #70

asinghvi17 commented Mar 6, 2024

LilithHafner commented Mar 7, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024 •

edited

Loading

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

LilithHafner commented Nov 27, 2024

LilithHafner commented Dec 1, 2024

LilithHafner commented Dec 1, 2024

LilithHafner commented Jan 9, 2025

PSA: It is possible to use BenchmarkTools.BenchmarkGroup with Chairmarks #70

PSA: It is possible to use BenchmarkTools.BenchmarkGroup with Chairmarks #70

Comments

asinghvi17 commented Mar 6, 2024

LilithHafner commented Mar 7, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024 • edited Loading

asinghvi17 commented Mar 13, 2024

MilesCranmer commented Mar 13, 2024

asinghvi17 commented Mar 13, 2024

LilithHafner commented Nov 27, 2024

LilithHafner commented Dec 1, 2024

LilithHafner commented Dec 1, 2024

LilithHafner commented Jan 9, 2025

PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks #70

PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks #70

MilesCranmer commented Mar 13, 2024 •

edited

Loading