Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPC-H benchmark can optionally write JSON output file with benchmark summary #1766

Merged
merged 6 commits into from
Feb 9, 2022

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Closes #1757.

Rationale for this change

To help with benchmark automation and reporting, I would like the benchmark results to be written to a JSON file.

What changes are included in this PR?

This PR adds a new --output argument to the tpch benchmark. When specified, a JSON summary file will be written to the specified directory, containing the benchmark results.

Example JSON output

{
  "benchmark_version": "5.0.0",
  "datafusion_version": "6.0.0",
  "num_cpus": 48,
  "start_time": 1644167292,
  "arguments": [
    "benchmark",
    "datafusion",
    "--iterations",
    "1",
    "--path",
    "/mnt/bigdata/tpch/sf100-tbl",
    "--format",
    "tbl",
    "--query",
    "1",
    "--batch-size",
    "4096",
    "-o",
    "/tmp"
  ],
  "query": 1,
  "iterations": [
    {
      "elapsed": 210781.71820099998,
      "row_count": 4
    }
  ]
}

Are there any user-facing changes?

There is a new --output option when running the tpch benchmarks.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Feb 6, 2022
@andygrove andygrove changed the title Benchmark json summary TPC-H benchmark can optionally write JSON output file with benchmark summary Feb 6, 2022
@Dandandan
Copy link
Contributor

@andygrove FYI I pushed a fix for a clippy linting error

@alamb
Copy link
Contributor

alamb commented Feb 9, 2022

Query 1 iteration 2 took 62666.3 ms and returned 4 rows
Query 1 avg time: 62579.50 ms
Writing summary file to /tmp/tpch-q1-1644431672.json

It is pretty neat:

alamb@MacBook-Pro-2 arrow-datafusion % cat /tmp/tpch-q1-1644431672.json 
{
  "benchmark_version": "5.0.0",
  "datafusion_version": "6.0.0",
  "num_cpus": 16,
  "start_time": 1644431672,
  "arguments": [
    "benchmark",
    "datafusion",
    "-o",
    "/tmp",
    "-p",
    "/Users/alamb/Software/tpch_data/SF1",
    "-q",
    "1",
    "--format",
    "tbl"
  ],
  "query": 1,
  "iterations": [
    {
      "elapsed": 62607.731700000004,
      "row_count": 4
    },
    {
      "elapsed": 62464.438148,
      "row_count": 4
    },
    {
      "elapsed": 62666.318697,
      "row_count": 4
    }
  ]

@alamb
Copy link
Contributor

alamb commented Feb 9, 2022

Follow on PR that I needed to test this more easily: #1800

@andygrove andygrove deleted the benchmark-json-summary branch January 27, 2023 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmarks should optionally write timings and environment details to a JSON file
3 participants