Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow Rust + Conbench Integration #1289

Merged
merged 2 commits into from
Feb 22, 2022
Merged

Conversation

dianaclarke
Copy link
Contributor

@dianaclarke dianaclarke commented Feb 9, 2022

Here's a minimal Arrow Rust + Conbench[1] proof of concept.

[1] https://github.com/conbench/conbench

A few notes (areas for improvement, caveats, etc):

  • Criterion results are in nanoseconds, but the smallest unit
    Conbench currently speaks is seconds (because Conbench was initially
    for macro not micro benchmarking). I suspect most places in Conbench
    would work just fine if nanoseconds were passed in, but I need to
    audit the code for any places that assume seconds if it isn't a
    throughput benchmark.

  • If the Criterion benchmarks were named better, I could tag them
    better in Conbench. For example, I suspect sqrt_20_12, sqrt_20_9,
    sqrt_22_12, and sqrt_22_14 are parameterized variations of the same
    benchmark, and if they were named something like "sqrt, foo=20,
    bar=12", I could batch them together & tag their parameters so that
    Conbench would automatically graph them in relation to each other. I
    was sort of able to do this with the following benchmarks (because
    there was a machine readable pattern). Anyhoo, that's easy enough to
    do down the road as a last integration step, and it does appear from
    the Criterion docs that they have their own recommendations for how to
    do this.

    • window partition by, u64_narrow, aggregate functions
    • window partition by, u64_narrow, built-in functions
    • window partition by, u64_wide, aggregate functions
    • window partition by, u64_wide, built-in functions
  • While Criterion benchmarks can also measure throughput in some
    cases, all the arrow-datafusion benchmarks were in elapsed time (not
    sure about the arrow-rs benchmarks), so I didn't bother writing code
    to support potential throughput results from
    arrow-datafusion/arrow-rs, but we may need to revisit that.

  • We probably want to add some additional context, like the
    arrow-rs/arrow-datafusion version, rust version, any compiler flags,
    etc.

@codecov-commenter
Copy link

codecov-commenter commented Feb 9, 2022

Codecov Report

Merging #1289 (6c3e7b8) into master (35e16be) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1289   +/-   ##
=======================================
  Coverage   83.04%   83.04%           
=======================================
  Files         180      180           
  Lines       52424    52424           
=======================================
  Hits        43537    43537           
  Misses       8887     8887           
Impacted Files Coverage Δ
arrow/src/datatypes/datatype.rs 66.40% <0.00%> (-0.40%) ⬇️
arrow/src/datatypes/field.rs 53.79% <0.00%> (-0.31%) ⬇️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (ø)
arrow/src/array/transform/mod.rs 84.77% <0.00%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35e16be...6c3e7b8. Read the comment docs.

@dianaclarke dianaclarke force-pushed the conbench branch 3 times, most recently from 359e459 to dc59492 Compare February 9, 2022 03:22
conbench/.flake8 Outdated
@@ -0,0 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

@dianaclarke dianaclarke Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a Python perspective, it's probably a mistake to name this directory conbench. Thoughts on what you would like arrow-rs/conbench/ to be named?

  • arrow-rs/_conbench/?
  • arrow-rs/conbench-benchmarks/?
  • arrow-rs/conbench-integration/?

conbench/README.md Outdated Show resolved Hide resolved
@andygrove
Copy link
Member

I pulled the branch locally and followed the instructions in the README and benchmarks are currently running. It looks like it will take a while so I will check in on results in the morning and continue reviewing the PR.

@dianaclarke
Copy link
Contributor Author

I pulled the branch locally and followed the instructions in the README and benchmarks are currently running. It looks like it will take a while so I will check in on results in the morning and continue reviewing the PR.

Nice!

99% of the time is spent executing cargo bench.

Note that this is only the first step – you should see json formatted benchmark results spew to your terminal, but they won't yet publish to the Arrow Conbench server[1].

You'll want to work with @ElenaHenderson on getting the results to publish to the Arrow Conbench server via buildkite etc.

[1] https://conbench.ursa.dev/

@@ -0,0 +1,130 @@
# Byte-compiled / optimized / DLL files
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the GitHub generated .gitignore for Python packages. I can trim in down, but I usually don't bother.

@andygrove
Copy link
Member

One of the tests failed because it makes an assumption that a target directory exists and it did not. My assumption is that somehow the current working directory was not correct. This is how I initiated the benchmark run.

~/git/arrow-rs/conbench$ conbench arrow-rs --src-dir=`pwd`/..

I see that the Python code should have changed the directory so I am not sure what went wrong. I will start debugging this tonight. We may also want to have our tests write output to a temporary directory rather than making any assumptions about current working directory.

@dianaclarke
Copy link
Contributor Author

One of the tests failed because it makes an assumption that a target directory exists and it did not. My assumption is that somehow the current working directory was not correct. This is how I initiated the benchmark run.

~/git/arrow-rs/conbench$ conbench arrow-rs --src-dir=`pwd`/..

I see that the Python code should have changed the directory so I am not sure what went wrong. I will start debugging this tonight. We may also want to have our tests write output to a temporary directory rather than making any assumptions about current working directory.

If you pull the latest, I just removed the need for --src-dir. Hopefully that works for you.

$ cd ~/arrow-rs/conbench/
$ conda create -y -n conbench python=3.9
$ conda activate conbench
(conbench) $ pip install -r requirements.txt
(conbench) $ conbench arrow-rs

PS. I don't work on arrow at all anymore, so I better go back to my day job now – just didn't want to leave you hanging. Cheers!

@andygrove
Copy link
Member

Thanks @dianaclarke I can take it from here for sure .. and it turns out this benchmark was failing even before this PR. I should have checked that first.

@dianaclarke
Copy link
Contributor Author

Thanks @dianaclarke I can take it from here for sure .. and it turns out this benchmark was failing even before this PR. I should have checked that first.

Ah, I was getting an error with cargo bench in arrow-rs, but I assumed I just didn't have my env set up correctly (I don't know Rust aside from what I learned doing this).

But I can cargo bench in arrow-datafusion, so perhaps it's easier to start with this pull request:

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. After commenting out the failing csv_writer bench, conbench runs to completion and produces reports. A fix to csv_writer has already been merged to master.

@andygrove
Copy link
Member

If there are no objections, I plan to merge this PR next week so that I can proceed to the next step of integrating this with CI. The changes in this PR are self-contained so I think it is low risk to merge.

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dianaclarke ! Great to see conbench finally lands in arrow-rs :)

@nevi-me nevi-me merged commit 4b89f7e into apache:master Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants