-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bench.sh script to automate benchmarking DataFusion against itself #6131
Conversation
Some interesting results already -- I ran a quick experiment to see how much 'lto' link time optimization helps. The answer is "quite a bit"
|
benchmarks/README.md
Outdated
|
||
# Benchmark Descriptions: | ||
|
||
## `tpch` Benchmark derived from TPC-H | ||
|
||
These benchmarks are derived from the [TPC-H][1] benchmark. And we use this repo as the source of tpch-gen and answers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I next hope / plan tor review the other benchmarks and consolidate them and their data generation and runner scripts into the bench.sh framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb!
# Gather baseline data for tpch benchmark | ||
./benchmarks/bench.sh run tpch | ||
|
||
# Switch to the branch the branch name is mybranch and gather data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I was curious before about what's the magic for comparing branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @yjshen -- I am trying to reduce the amount of magic involved.
I am going to merge this in and we can continue to iterate (next I would like to increase the number of different tests supported)
Which issue does this PR close?
Closes #6127
Rationale for this change
TLDR to make it easier to run the benchmarks included with DataFusion with a standard set of scenarios
See #6127
What changes are included in this PR?
This script currently supports two benchmarks as shown in the usage instructions.
Are these changes tested?
I tested them manually on an x86 mac and a Linux x86 machine.
Are there any user-facing changes?
No, it is just development scripts