TPC-H benchmark can optionally write JSON output file with benchmark summary #1766

andygrove · 2022-02-06T17:36:27Z

Which issue does this PR close?

Closes #1757.

Rationale for this change

To help with benchmark automation and reporting, I would like the benchmark results to be written to a JSON file.

What changes are included in this PR?

This PR adds a new --output argument to the tpch benchmark. When specified, a JSON summary file will be written to the specified directory, containing the benchmark results.

Example JSON output

{
  "benchmark_version": "5.0.0",
  "datafusion_version": "6.0.0",
  "num_cpus": 48,
  "start_time": 1644167292,
  "arguments": [
    "benchmark",
    "datafusion",
    "--iterations",
    "1",
    "--path",
    "/mnt/bigdata/tpch/sf100-tbl",
    "--format",
    "tbl",
    "--query",
    "1",
    "--batch-size",
    "4096",
    "-o",
    "/tmp"
  ],
  "query": 1,
  "iterations": [
    {
      "elapsed": 210781.71820099998,
      "row_count": 4
    }
  ]
}

Are there any user-facing changes?

There is a new --output option when running the tpch benchmarks.

Signed-off-by: Andy Grove <[email protected]>

benchmarks/src/bin/tpch.rs

Dandandan · 2022-02-07T11:40:40Z

@andygrove FYI I pushed a fix for a clippy linting error

alamb · 2022-02-09T18:39:57Z

Query 1 iteration 2 took 62666.3 ms and returned 4 rows
Query 1 avg time: 62579.50 ms
Writing summary file to /tmp/tpch-q1-1644431672.json

It is pretty neat:

alamb@MacBook-Pro-2 arrow-datafusion % cat /tmp/tpch-q1-1644431672.json 
{
  "benchmark_version": "5.0.0",
  "datafusion_version": "6.0.0",
  "num_cpus": 16,
  "start_time": 1644431672,
  "arguments": [
    "benchmark",
    "datafusion",
    "-o",
    "/tmp",
    "-p",
    "/Users/alamb/Software/tpch_data/SF1",
    "-q",
    "1",
    "--format",
    "tbl"
  ],
  "query": 1,
  "iterations": [
    {
      "elapsed": 62607.731700000004,
      "row_count": 4
    },
    {
      "elapsed": 62464.438148,
      "row_count": 4
    },
    {
      "elapsed": 62666.318697,
      "row_count": 4
    }
  ]

alamb · 2022-02-09T18:57:32Z

Follow on PR that I needed to test this more easily: #1800

andygrove added 4 commits February 5, 2022 09:04

use ordered-float 2.10

08056cf

Signed-off-by: Andy Grove <[email protected]>

Add DATAFUSION_VERSION constant

7dc7a66

Signed-off-by: Andy Grove <[email protected]>

Add option to write JSON summary file with benchmark results

e2216b1

Merge remote-tracking branch 'apache/master' into benchmark-json-summary

17b014a

github-actions bot added the datafusion Changes in the datafusion crate label Feb 6, 2022

andygrove changed the title ~~Benchmark json summary~~ TPC-H benchmark can optionally write JSON output file with benchmark summary Feb 6, 2022

update test

ecac56f

Dandandan reviewed Feb 7, 2022

View reviewed changes

benchmarks/src/bin/tpch.rs Outdated Show resolved Hide resolved

Clippy fix

694946a

Dandandan approved these changes Feb 7, 2022

View reviewed changes

alamb merged commit 1431ef3 into apache:master Feb 9, 2022

alamb mentioned this pull request Feb 9, 2022

Improve the error message and UX of tpch benchmark program #1800

Merged

andygrove deleted the benchmark-json-summary branch January 27, 2023 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPC-H benchmark can optionally write JSON output file with benchmark summary #1766

TPC-H benchmark can optionally write JSON output file with benchmark summary #1766

andygrove commented Feb 6, 2022

Dandandan commented Feb 7, 2022

alamb commented Feb 9, 2022

alamb commented Feb 9, 2022

TPC-H benchmark can optionally write JSON output file with benchmark summary #1766

TPC-H benchmark can optionally write JSON output file with benchmark summary #1766

Conversation

andygrove commented Feb 6, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Example JSON output

Are there any user-facing changes?

Dandandan commented Feb 7, 2022

alamb commented Feb 9, 2022

alamb commented Feb 9, 2022