Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFusion + Conbench Integration #1791

Merged
merged 2 commits into from
Feb 20, 2022
Merged

DataFusion + Conbench Integration #1791

merged 2 commits into from
Feb 20, 2022

Conversation

dianaclarke
Copy link
Contributor

@dianaclarke dianaclarke commented Feb 9, 2022

Here's a minimal DataFusion + Conbench[1] proof of concept.

[1] https://github.com/conbench/conbench

A few notes (areas for improvement, caveats, etc):

  • Criterion results are in nanoseconds, but the smallest unit
    Conbench currently speaks is seconds (because Conbench was initially
    for macro not micro benchmarking). I suspect most places in Conbench
    would work just fine if nanoseconds were passed in, but I need to
    audit the code for any places that assume seconds if it isn't a
    throughput benchmark.

  • If the Criterion benchmarks were named better, I could tag them
    better in Conbench. For example, I suspect sqrt_20_12, sqrt_20_9,
    sqrt_22_12, and sqrt_22_14 are parameterized variations of the same
    benchmark, and if they were named something like "sqrt, foo=20,
    bar=12", I could batch them together & tag their parameters so that
    Conbench would automatically graph them in relation to each other. I
    was sort of able to do this with the following benchmarks (because
    there was a machine readable pattern). Anyhoo, that's easy enough to
    do down the road as a last integration step, and it does appear from
    the Criterion docs that they have their own recommendations for how to
    do this.

    • window partition by, u64_narrow, aggregate functions
    • window partition by, u64_narrow, built-in functions
    • window partition by, u64_wide, aggregate functions
    • window partition by, u64_wide, built-in functions
  • While Criterion benchmarks can also measure throughput in some
    cases, all the arrow-datafusion benchmarks were in elapsed time (not
    sure about the arrow-rs benchmarks), so I didn't bother writing code
    to support potential throughput results from
    arrow-datafusion/arrow-rs, but we may need to revisit that.

  • We probably want to add some additional context, like the
    arrow-rs/arrow-datafusion version, rust version, any compiler flags,
    etc.

conbench/.flake8 Outdated
@@ -0,0 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

@dianaclarke dianaclarke Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a Python perspective, it's probably a mistake to name this directory conbench. Thoughts on what you would like arrow-datafusion/conbench/ to be named?

  • arrow-datafusion/_conbench/?
  • arrow-datafusion/conbench-benchmarks/?
  • arrow-datafusion/conbench-integration?

@alamb
Copy link
Contributor

alamb commented Feb 9, 2022

Thank you @dianaclarke -- I will try and review this carefully over the next day or two. CC @andygrove and @Dandandan

@dianaclarke
Copy link
Contributor Author

dianaclarke commented Feb 9, 2022

Thank you @dianaclarke -- I will try and review this carefully over the next day or two. CC @andygrove and @Dandandan

No rush & no pressure to merge this (or the sister arrow-rs pull request). I don't actually work on anything Arrow related anymore – just didn't want to leave you hanging. Cheers!

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I followed the instructions and conbench produced reports showing the benchmark results.

@andygrove
Copy link
Member

If there are no objections, I plan to merge this PR next week so that I can proceed to the next step of integrating this with CI. The changes in this PR are self-contained so I think it is low risk to merge.

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@houqp houqp merged commit 2681386 into apache:master Feb 20, 2022
@houqp
Copy link
Member

houqp commented Feb 20, 2022

Thank you @dianaclarke , this is going to be a big productivity boost for us on all performance related PRs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants