DataFusion + Conbench Integration #1791

dianaclarke · 2022-02-09T01:50:06Z

Here's a minimal DataFusion + Conbench[1] proof of concept.

[1] https://github.com/conbench/conbench

A few notes (areas for improvement, caveats, etc):

Criterion results are in nanoseconds, but the smallest unit
Conbench currently speaks is seconds (because Conbench was initially
for macro not micro benchmarking). I suspect most places in Conbench
would work just fine if nanoseconds were passed in, but I need to
audit the code for any places that assume seconds if it isn't a
throughput benchmark.
If the Criterion benchmarks were named better, I could tag them
better in Conbench. For example, I suspect sqrt_20_12, sqrt_20_9,
sqrt_22_12, and sqrt_22_14 are parameterized variations of the same
benchmark, and if they were named something like "sqrt, foo=20,
bar=12", I could batch them together & tag their parameters so that
Conbench would automatically graph them in relation to each other. I
was sort of able to do this with the following benchmarks (because
there was a machine readable pattern). Anyhoo, that's easy enough to
do down the road as a last integration step, and it does appear from
the Criterion docs that they have their own recommendations for how to
do this.
- window partition by, u64_narrow, aggregate functions
- window partition by, u64_narrow, built-in functions
- window partition by, u64_wide, aggregate functions
- window partition by, u64_wide, built-in functions
While Criterion benchmarks can also measure throughput in some
cases, all the arrow-datafusion benchmarks were in elapsed time (not
sure about the arrow-rs benchmarks), so I didn't bother writing code
to support potential throughput results from
arrow-datafusion/arrow-rs, but we may need to revisit that.
We probably want to add some additional context, like the
arrow-rs/arrow-datafusion version, rust version, any compiler flags,
etc.

dianaclarke · 2022-02-09T03:29:58Z

conbench/.flake8

@@ -0,0 +1,19 @@
+# Licensed to the Apache Software Foundation (ASF) under one


From a Python perspective, it's probably a mistake to name this directory conbench. Thoughts on what you would like arrow-datafusion/conbench/ to be named?

arrow-datafusion/_conbench/?

arrow-datafusion/conbench-benchmarks/?

arrow-datafusion/conbench-integration?

alamb · 2022-02-09T20:32:30Z

Thank you @dianaclarke -- I will try and review this carefully over the next day or two. CC @andygrove and @Dandandan

dianaclarke · 2022-02-09T20:37:16Z

Thank you @dianaclarke -- I will try and review this carefully over the next day or two. CC @andygrove and @Dandandan

No rush & no pressure to merge this (or the sister arrow-rs pull request). I don't actually work on anything Arrow related anymore – just didn't want to leave you hanging. Cheers!

andygrove

LGTM. I followed the instructions and conbench produced reports showing the benchmark results.

andygrove · 2022-02-19T18:23:57Z

If there are no objections, I plan to merge this PR next week so that I can proceed to the next step of integrating this with CI. The changes in this PR are self-contained so I think it is low risk to merge.

Dandandan

LGTM

houqp · 2022-02-20T07:33:38Z

Thank you @dianaclarke , this is going to be a big productivity boost for us on all performance related PRs :)

dianaclarke force-pushed the conbench branch from f043d7d to 4f37cad Compare February 9, 2022 02:05

dianaclarke mentioned this pull request Feb 9, 2022

What is the process for running benchmarks on repos other than apache/arrow? voltrondata-labs/arrow-benchmarks-ci#45

Closed

dianaclarke force-pushed the conbench branch 3 times, most recently from ca279e3 to bf5b590 Compare February 9, 2022 03:20

dianaclarke commented Feb 9, 2022

View reviewed changes

dianaclarke force-pushed the conbench branch from bf5b590 to 2a31af5 Compare February 9, 2022 03:38

dianaclarke mentioned this pull request Feb 9, 2022

Spike adding arrow-datafusion & arrow-rust voltrondata-labs/benchmarks#79

Closed

dianaclarke force-pushed the conbench branch from 2a31af5 to 2976c2a Compare February 9, 2022 04:44

DataFusion + Conbench Integration

8dd562d

dianaclarke force-pushed the conbench branch from 2976c2a to 8dd562d Compare February 9, 2022 15:22

remove --src-dir

e52330b

dianaclarke mentioned this pull request Feb 9, 2022

Arrow Rust + Conbench Integration apache/arrow-rs#1289

Merged

andygrove approved these changes Feb 10, 2022

View reviewed changes

Dandandan approved these changes Feb 19, 2022

View reviewed changes

houqp merged commit 2681386 into apache:master Feb 20, 2022

alamb mentioned this pull request Mar 7, 2023

Run DataFusion benchmarks regularly and track performance history over time #5504

Open

alamb mentioned this pull request Mar 29, 2024

Remove vestigal conbench integration #9855

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFusion + Conbench Integration #1791

DataFusion + Conbench Integration #1791

dianaclarke commented Feb 9, 2022 •

edited

Loading

dianaclarke Feb 9, 2022 •

edited

Loading

alamb commented Feb 9, 2022

dianaclarke commented Feb 9, 2022 •

edited

Loading

andygrove left a comment

andygrove commented Feb 19, 2022

Dandandan left a comment

houqp commented Feb 20, 2022

		@@ -0,0 +1,19 @@
		# Licensed to the Apache Software Foundation (ASF) under one

DataFusion + Conbench Integration #1791

DataFusion + Conbench Integration #1791

Conversation

dianaclarke commented Feb 9, 2022 • edited Loading

dianaclarke Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

alamb commented Feb 9, 2022

dianaclarke commented Feb 9, 2022 • edited Loading

andygrove left a comment

Choose a reason for hiding this comment

andygrove commented Feb 19, 2022

Dandandan left a comment

Choose a reason for hiding this comment

houqp commented Feb 20, 2022

dianaclarke commented Feb 9, 2022 •

edited

Loading

dianaclarke Feb 9, 2022 •

edited

Loading

dianaclarke commented Feb 9, 2022 •

edited

Loading