Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences #782

andygrove · 2020-09-16T18:33:56Z

This PR adds a convenience method to BenchUtils for comparing the results from two benchmark runs (e.g. verifying that GPU and CPU results match).

This is a simple implementation that pulls results down to the driver for comparison so will only work for data sets that can fit in the driver's memory, but this is likely adequate for most of the benchmarks we are currently running. For example, TPCxBB q5 at 10TB scale produces ~600MB of output.

If data needs sorting before comparison, this is delegated to Spark before collecting the results.

Example usage from spark-shell:

scala> val cpu = spark.read.parquet("/data/andygrove/q5-cpu")
scala> val gpu = spark.read.parquet("/data/andygrove/q5-gpu")
scala> import com.nvidia.spark.rapids.tests.common._
scala> BenchUtils.compareResults(cpu, gpu, ignoreOrdering=true, epsilon=0.0)
Collecting rows from DataFrame
Collected 989754 rows in 7.701 seconds                                               
Collecting rows from DataFrame
Collected 989754 rows in 2.325 seconds                                               
Results match

Signed-off-by: Andy Grove <[email protected]>

…to parquet/csv

Signed-off-by: Andy Grove <[email protected]>

…quet Signed-off-by: Andy Grove <[email protected]>

Signed-off-by: Andy Grove <[email protected]>

…ataFrames Signed-off-by: Andy Grove <[email protected]>

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/common/BenchUtils.scala

jlowe · 2020-09-16T18:45:11Z

integration_tests/src/test/scala/com/nvidia/spark/rapids/tests/common/BenchUtilsSuite.scala

@@ -0,0 +1,48 @@
+package com.nvidia.spark.rapids.tests.common


copyright and license header are missing. A build should have flagged this.

jlowe · 2020-09-16T18:48:28Z

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/common/BenchUtils.scala

+  def compareResults(df1: DataFrame, df2: DataFrame, epsilon: Double = 0.00001): Unit = {
+
+    val result1: Seq[Seq[Any]] = collectAndSort(df1)
+    val result2: Seq[Seq[Any]] = collectAndSort(df2)


What if ordering of the results is a significant part of the query output? There are cases where we need to preserve order when comparing. For example, if I run a query where I'm collecting the top 100, having the order be off but the same 100 elements will declare victory when it should not.

Minimally this needs to account for unordered vs. ordered compare.

Signed-off-by: Andy Grove <[email protected]>

razajafri · 2020-09-17T04:55:02Z

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/common/BenchUtils.scala

+    }
+  }
+
+  private def collectResults(df: DataFrame, ignoreOrdering: Boolean): Seq[Seq[Any]] = {


considering this is a benchmarking utility, should we also provide an iterator in case the collected results are too big?

Spark's DataFrame.collect() loads the full result set into the driver and returns an Array. If (when) we need to handle comparisons of larger results I think we would need an alternate approach like converting the results to single files, downloading them, and using Scala code (without Spark) to perform the diff.

It is a relatively minor change to be able to do this with an iterator. We have that support in the python unit tests. The only issue is that you cannot compare the size of the results ahead of time, because you don't know it yet. But I agree until we hit a situation where we need it there is no point in doing it. Also it is very slow to try and do a comparison like that single threaded. It might be better to truncate floating point values and do a few anti-joins to see if there is something in the left that is not in the right, and vise versa. This might not handle duplicate rows, so we might need something there too, but it would scale a lot better.

I'd like to know more about the iterator approach. I did look at the toLocalIterator method but this seems to load one partition at a time so I don't think that would work when we need ordering across the result set. Could you point me to an example where we do this in the Python tests?

It loads one partition of the result at a time, so it does preserve ordering. It is just rather slow because we are doing the comparison single threaded.

checkout mortgage_test.py

I added the option to use an iterator and have been testing this out and it seems to work well.

tgravescs · 2020-09-17T13:14:25Z

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/common/BenchUtils.scala

+    val t1 = System.currentTimeMillis()
+    val rows = if (ignoreOrdering) {
+      // let Spark do the sorting
+      df.sort(df.columns.map(col): _*).collect()


the downside to this is if sort is broken, is this forcing cpu sort for instance?

The only known issue with GPU sort vs CPU sort is #84 Which I don't think we will ever run into outside of artificially generated data sets, which is why we have not pushed to fix it.

the downside to this is if sort is broken, is this forcing cpu sort for instance?

The job config is outside of the tool. I think to verify the results we should let standard cpu spark do it, nothing stops you from running this with the plugin, so it's up to the user at this point.

I guess we have other tests for correctness so ignore my comment, and hopefully the GPU sort should be faster.

tgravescs · 2020-09-17T13:15:54Z

if this is useful by itself it might be nice to document in the readme, otherwise it can wait til more stuff is in place.

revans2 · 2020-09-17T13:32:40Z

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/common/BenchUtils.scala

+   * precision.
+   *
+   * This is only suitable for data sets that can fit in the driver's memory.
+


Could we also add in a note about how this could add in a sort and does a collect so for benchmark results to be accurate in isolation it is probably best to run the benchmark with the output going to a file and this compares the results of that output after timing has stopped, or that this is used to compare results for a single run after more timings have been collected for other runs.

Signed-off-by: Andy Grove <[email protected]>

andygrove · 2020-09-17T15:11:37Z

if this is useful by itself it might be nice to document in the readme, otherwise it can wait til more stuff is in place.

I agree. I filed #794 for writing benchmark documentation.

andygrove · 2020-09-17T15:53:29Z

build

Signed-off-by: Andy Grove <[email protected]>

andygrove · 2020-09-18T20:07:52Z

build

Signed-off-by: Andy Grove <[email protected]>

andygrove · 2020-09-18T21:03:22Z

build

abellina · 2020-09-21T15:38:46Z

I merged this too quickly, sorry. If someone had pending concerns please let me know, I can revert, or help fix issues.

…wing for precision differences (NVIDIA#782) * Benchmark automation POC Signed-off-by: Andy Grove <[email protected]> * Default to collecting all results Signed-off-by: Andy Grove <[email protected]> * address feedback * make results action configurable, allowing for results to be written to parquet/csv * fix typo in javadoc Signed-off-by: Andy Grove <[email protected]> * remove unused imports Signed-off-by: Andy Grove <[email protected]> * Remove cold/hot run loops since they were causing confusion Signed-off-by: Andy Grove <[email protected]> * gc between runs Signed-off-by: Andy Grove <[email protected]> * Make gc optional Signed-off-by: Andy Grove <[email protected]> * update test Signed-off-by: Andy Grove <[email protected]> * Provide specific benchmark methods for collect vs write to CSV or Parquet Signed-off-by: Andy Grove <[email protected]> * provide convenience methods to run benchmarks and store action in json Signed-off-by: Andy Grove <[email protected]> * Add utility method to perform a diff of the data collected from two DataFrames Signed-off-by: Andy Grove <[email protected]> * Add missing license header and remove unused import Signed-off-by: Andy Grove <[email protected]> * Provide option to ignore ordering of results Signed-off-by: Andy Grove <[email protected]> * revert change to compare method Signed-off-by: Andy Grove <[email protected]> * remove rmm_log.txt Signed-off-by: Andy Grove <[email protected]> * optimize sorting * let spark sort the data before collecting it Signed-off-by: Andy Grove <[email protected]> * fix message Signed-off-by: Andy Grove <[email protected]> * Improved documentation Signed-off-by: Andy Grove <[email protected]> * bug fix and optimization to non-iterator case Signed-off-by: Andy Grove <[email protected]> * fix typo in javadoc Signed-off-by: Andy Grove <[email protected]> * fail fast if row counts do not match Signed-off-by: Andy Grove <[email protected]> * remove redundant logic Signed-off-by: Andy Grove <[email protected]> * scalastyle Signed-off-by: Andy Grove <[email protected]>

[auto-merge] bot-auto-merge-branch-22.12 to branch-23.02 [skip ci] [bot]

andygrove added 13 commits September 15, 2020 10:00

Benchmark automation POC

08aa904

Signed-off-by: Andy Grove <[email protected]>

Default to collecting all results

c820427

Signed-off-by: Andy Grove <[email protected]>

address feedback

f0420ae

make results action configurable, allowing for results to be written …

a133872

…to parquet/csv

fix typo in javadoc

3823800

Signed-off-by: Andy Grove <[email protected]>

remove unused imports

5fa574e

Signed-off-by: Andy Grove <[email protected]>

Remove cold/hot run loops since they were causing confusion

dae7177

Signed-off-by: Andy Grove <[email protected]>

gc between runs

780ed9f

Signed-off-by: Andy Grove <[email protected]>

Make gc optional

15fa656

Signed-off-by: Andy Grove <[email protected]>

update test

d1f1650

Signed-off-by: Andy Grove <[email protected]>

Provide specific benchmark methods for collect vs write to CSV or Par…

7585220

…quet Signed-off-by: Andy Grove <[email protected]>

provide convenience methods to run benchmarks and store action in json

c3994b1

Signed-off-by: Andy Grove <[email protected]>

Add utility method to perform a diff of the data collected from two D…

15b153b

…ataFrames Signed-off-by: Andy Grove <[email protected]>

andygrove added the test Only impacts tests label Sep 16, 2020

andygrove added this to the Sep 14 - Sep 25 milestone Sep 16, 2020

andygrove requested review from GaryShen2008, jlowe, NvTimLiu, revans2 and tgravescs as code owners September 16, 2020 18:33

andygrove self-assigned this Sep 16, 2020

andygrove commented Sep 16, 2020

View reviewed changes

integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/common/BenchUtils.scala Show resolved Hide resolved

andygrove requested a review from abellina September 16, 2020 18:37

jlowe reviewed Sep 16, 2020

View reviewed changes

andygrove added 5 commits September 16, 2020 12:53

Add missing license header and remove unused import

49cd605

Signed-off-by: Andy Grove <[email protected]>

Provide option to ignore ordering of results

4bd3593

Signed-off-by: Andy Grove <[email protected]>

revert change to compare method

5db1cfb

Signed-off-by: Andy Grove <[email protected]>

remove rmm_log.txt

e6ca6ef

Signed-off-by: Andy Grove <[email protected]>

optimize sorting

c5c1844

andygrove added 3 commits September 16, 2020 14:50

merge from branch-0.3

2887b4e

Signed-off-by: Andy Grove <[email protected]>

let spark sort the data before collecting it

d8c821b

Signed-off-by: Andy Grove <[email protected]>

fix message

21c3cc8

Signed-off-by: Andy Grove <[email protected]>

andygrove changed the title ~~[WIP] Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences~~ Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences Sep 16, 2020

razajafri reviewed Sep 17, 2020

View reviewed changes

tgravescs reviewed Sep 17, 2020

View reviewed changes

revans2 reviewed Sep 17, 2020

View reviewed changes

Improved documentation

c9aceb9

Signed-off-by: Andy Grove <[email protected]>

andygrove added 4 commits September 18, 2020 11:53

bug fix and optimization to non-iterator case

eebba51

Signed-off-by: Andy Grove <[email protected]>

fix typo in javadoc

8d36be4

Signed-off-by: Andy Grove <[email protected]>

fail fast if row counts do not match

a8e0c85

Signed-off-by: Andy Grove <[email protected]>

remove redundant logic

6bccea2

Signed-off-by: Andy Grove <[email protected]>

scalastyle

ebf5aa8

Signed-off-by: Andy Grove <[email protected]>

abellina approved these changes Sep 21, 2020

View reviewed changes

abellina merged commit fdf731c into NVIDIA:branch-0.3 Sep 21, 2020

sameerz added benchmark Benchmarking, benchmarking tools and removed test Only impacts tests labels Oct 4, 2020

andygrove deleted the diff-results branch December 17, 2020 15:27

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Merge pull request NVIDIA#782 from NVIDIA/bot-auto-merge-branch-22.12

06a44ae

[auto-merge] bot-auto-merge-branch-22.12 to branch-23.02 [skip ci] [bot]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences #782

Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences #782

andygrove commented Sep 16, 2020 •

edited

Loading

jlowe Sep 16, 2020

jlowe Sep 16, 2020

razajafri Sep 17, 2020

andygrove Sep 17, 2020

revans2 Sep 17, 2020

andygrove Sep 17, 2020

revans2 Sep 17, 2020

razajafri Sep 17, 2020

andygrove Sep 18, 2020

tgravescs Sep 17, 2020

revans2 Sep 17, 2020

abellina Sep 17, 2020

tgravescs Sep 17, 2020

tgravescs commented Sep 17, 2020

revans2 Sep 17, 2020

andygrove commented Sep 17, 2020

andygrove commented Sep 17, 2020

andygrove commented Sep 18, 2020

andygrove commented Sep 18, 2020

abellina commented Sep 21, 2020

		@@ -0,0 +1,48 @@
		package com.nvidia.spark.rapids.tests.common

Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences #782

Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences #782

Conversation

andygrove commented Sep 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Sep 17, 2020

Choose a reason for hiding this comment

andygrove commented Sep 17, 2020

andygrove commented Sep 17, 2020

andygrove commented Sep 18, 2020

andygrove commented Sep 18, 2020

abellina commented Sep 21, 2020

andygrove commented Sep 16, 2020 •

edited

Loading