-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compare.py to compare the output of multiple benchmarks #5655
Conversation
benchmarks/README.md
Outdated
```shell | ||
$ git checkout main | ||
# generate an output script in /tmp/output_main | ||
$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think --path /data
should be replaced with --path ./data
in this line. Also we can change the --format parquet
with --format tbl
(Assuming user doesn't run the conversion script. This is the format of the output of ./tpch-gen.sh
)
benchmarks/README.md
Outdated
$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_main | ||
# generate an output script in /tmp/output_branch | ||
$ git checkout my_branch | ||
$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_my_branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar changes can be applied with above suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these suggestions, I have made them in dc5099d
benchmarks/README.md
Outdated
```shell | ||
$ git checkout main | ||
# generate an output script in /tmp/output_main | ||
$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /data --format parquet -o /tmp/output_main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when I run this script unless /tmp/output_main
already exists. I receive IO Error
. Is this expected?. If so, I think we should add mkdir /tmp/output_main
above this line.
I added some minor comments. Other than those comments, This PR is LGTM!. Thanks @alamb for this PR. This is very useful to compare results with friendly report. |
Thanks again for the review @mustafasrepo |
Which issue does this PR close?
Closes #5561
Rationale for this change
See #5561
What changes are included in this PR?
compare.py
script from @Taza53 based on one from @isidentical (see Report and compare benchmark runs against two branches #5561 (comment))Are these changes tested?
Not really,
Are there any user-facing changes?
No