Shrinker performance benchmark #177

jmid · 2021-09-13T22:44:34Z

This PR adds a shrinker benchmark.
It iterates the (factored) shrinker tests in QCheck_tests.ml and QCheck2_tests.ml in pairs.
For each such pair of tests we measure

time in seconds, denoted Q1/s and Q2/s
the number of succesful shrinks, denoted #succ
the number of shrink attempts, denoted #att

This is iterated across several seeds to make sure observations hold across more than one random run.
Finally we add up the timings to spot general trends.

Overall, the benchmark gives a reasonable idea of the state of affairs (see output below):

the QCheck list shrinker tries way too many candidates, often making it unbearably inefficient on large lists, e.g, spending 9.8 secs over 5 milion shrink attempts of which only 1.696 are successful(!)
the QCheck string shrinker does not perform as well as the QCheck2 shrinker, e.g., spending 1.342 seconds on 4465 shrink-steps when QCheck2 gets away with only 24 in 0.002 secs
the QCheck2 list shrinker could also use a hand, e.g., spending 17.010 secs (twice of QCheck) on lists shorter than 4332
the function shrinkers in both QCheck and QCheck2 suffer from poor list-shrinking and the involved domain+range shrinkers, e.g., QCheck2 spending 15.278 secs on fold_left fold_right uncurried in one iteration, whereas QCheck spends 28.631 secs on only 125 successful shrink attempts out of 27750 on fold_left test, fun first in another iteration.

The benchmark starts to drill the framework:

On one seed I had the segfault reported in Segfault on fac test with OCaml 4.12.0 #175
I've also had to cherry-pick random seeds that don't make the benchmark take hours on my laptop.

I've therefore drafted (QCheck) performance improvements that will make such hacks unnecessary. I'll commit them separately.

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
big bound issue59                                - skipped as generator is stateful, making it non-repeatable
long_shrink                                       0.051  149/351     1.700 3039/3099    0.010  148/349     1.064 3068/3127    0.013  146/345     1.172 3063/3124    0.074  3.937
ints arent 0 mod 3                                0.000   84/216     0.000    2/2       0.000   71/204     0.000    1/1       0.000   88/215     0.000   88/305     0.000  0.000
ints are 0                                        0.000   62/63      0.000   61/123     0.000   61/62      0.000   61/122     0.000   62/63      0.000   61/123     0.000  0.000
ints < 209609                                    - skipped as generator is stateful, making it non-repeatable
nat < 5001                                        0.000    6/56      0.000    7/77      0.000    6/47      0.000    7/69      0.000    3/24      0.000    8/85      0.000  0.000
char is never produces 'abcdef'                   0.000    0/0       0.000    1/1       0.000    0/0       0.000    0/0       0.000    0/0       0.000    0/0       0.000  0.000
strings are empty                                 0.000  249/250     0.000    8/16      0.050 4466/4467    0.002   13/26      0.000    0/1       0.000    1/2       0.051  0.002
string never has a \000 char                      0.000   25/40      0.000   22/167     0.159 4466/4519    0.003   56/254     0.000   15/18      0.000   15/48      0.159  0.003
string never has a \255 char                      0.001  249/316     0.001   59/318     0.163 4466/4520    0.005   97/529     0.645 9260/9365    0.003   41/194     0.808  0.009
strings have unique chars                         0.003  248/269     0.000   18/30      1.342 4465/4536    0.002   24/52      0.000   14/34      0.000   15/20      1.345  0.002
pairs have different components                   0.000    0/4       0.000    0/6       0.000    0/4       0.000    0/6       0.000    0/6       0.000    0/10      0.000  0.000
pairs have same components                        0.000  125/126     0.000   63/125     0.000  124/125     0.000   62/123     0.000  119/120     0.000   63/125     0.000  0.000
pairs have a zero component                       0.000  124/188     0.000  122/306     0.000  123/186     0.000  122/306     0.000  118/182     0.000  123/308     0.000  0.000
pairs are (0,0)                                   0.000  125/126     0.000   63/125     0.000  124/125     0.000   62/123     0.000  119/120     0.000   63/125     0.000  0.000
pairs are ordered                                 0.000  827/17626   0.000   94/1217    0.000  690/12946   0.000   85/865     0.000  687/13963   0.000   94/1326    0.001  0.001
pairs are ordered reversely                       0.000  124/125     0.000   62/124     0.000  123/124     0.000   62/124     0.000  122/123     0.000   62/124     0.000  0.000
pairs sum to less than 128                        0.000  116/129     0.000   56/126     0.000  120/146     0.000   59/138     0.000  119/141     0.000   57/130     0.000  0.000
pairs lists rev concat                            0.032  140/332     0.016   83/168     0.017  137/335     0.005   75/152     0.000  130/318     0.000   67/136     0.049  0.021
pairs lists no overlap                            0.001   22/47      0.006   27/60      0.000   17/41      0.008   18/41      0.000    6/20      0.000   11/28      0.001  0.014
triples have pair-wise different components       0.000    7/31      0.000    3/15      0.000    6/6       0.000    3/3       0.000    2/6       0.000    3/3       0.000  0.000
triples have same components                      0.000  188/252     0.000   64/127     0.000  177/240     0.000   64/128     0.000  182/246     0.000   62/122     0.000  0.000
triples are ordered                               0.000  188/252     0.000    3/4       0.000  177/178     0.000    3/4       0.000  187/250     0.003   91/1021    0.000  0.003
triples are ordered reversely                     0.000  188/189     0.000   64/126     0.000  177/240     0.000  124/247     0.000  182/183     0.000   65/127     0.000  0.000
quadruples have pair-wise different components    0.000   23/41      0.000    4/4       0.000   11/11      0.000    4/4       0.000   14/38      0.000    4/11      0.000  0.000
quadruples have same components                   0.000  250/377     0.000  126/313     0.000  237/424     0.000  115/292     0.000  242/425     0.000  123/307     0.000  0.001
quadruples are ordered                            0.000  251/315     0.000    5/6       0.000  239/240     0.000    4/5       0.000  244/308     0.000    5/6       0.000  0.000
quadruples are ordered reversely                  0.000  251/252     0.000   66/128     0.000  239/302     0.000  126/250     0.000  244/245     0.000   66/128     0.000  0.000
bind ordered pairs                                0.000  125/125     0.000    1/1       0.000  124/124     0.000    1/1       0.000  120/120     0.000    1/1       0.000  0.000
bind list_size constant                           0.000   50/358     0.000   12/26      0.000   48/338     0.000   12/25      0.000   48/342     0.000   11/21      0.000  0.001
lists are empty                                   0.000   11/16      0.000    8/16      0.000   19/27      0.004   13/26      0.000    4/9       0.000    1/2       0.000  0.004
lists shorter than 10                             0.000   50/1198    0.000   16/30      0.001   71/1637    0.004   21/42      0.000   36/868     0.000   15/29      0.001  0.004
lists shorter than 432                            9.842 1696/5118102  1.213  412/457     9.516 1612/4863421  1.183  405/450     9.506 1667/5037661  0.188  419/447    28.863  2.585
lists shorter than 4332                           3.785   13/190735  6.262 4022/4087    2.496   11/126052  6.124 4020/4067    2.378    7/126607  4.623 4013/4055    8.659 17.010
lists equal to duplication                        0.215   20/23      0.582    4/7       0.000    7/13      0.000    3/6       0.020   20/25      0.140   17/35      0.235  0.722
lists have unique elems                           0.000    7/17      0.000   11/22      0.003   12/44      0.007   17/30      0.000    6/16      0.000   10/20      0.003  0.007
tree contains only 42                             0.000   10/10      0.000    2/2       0.000   10/10      0.000    2/2       0.000   12/13      0.000    2/2       0.000  0.000
fail_pred_map_commute                             0.002  127/628     0.000   16/59      0.000  112/567     0.000   14/65      0.000  114/665     0.002  117/373     0.002  0.002
fail_pred_strings                                 0.000    1/2       0.000    1/4       0.000    1/2       0.000    1/4       0.000    1/3       0.000    2/5       0.000  0.000
fold_left fold_right                              0.000   25/74      0.000   22/73      0.003   43/161     0.011   58/139     0.001   17/45      0.000   39/95      0.004  0.012
fold_left fold_right uncurried                    4.361  111/99424   0.054  325/984     0.141   36/304    15.278 4811/8969    0.000    5/19      0.000    2/13      4.502 15.332
fold_left fold_right uncurried fun last           0.000   26/89      0.000   25/86      0.004   42/136     0.003   54/176     0.000   16/52      0.000   40/93      0.004  0.004
fold_left test, fun first                         0.002   40/57      0.001   15/28     28.631  125/27750   5.639   45/11947  20.790  282/87674   1.635  168/27748  49.423  7.276
                                                                                                                                                                   94.186 46.951

jmid · 2022-04-19T20:57:42Z

I have rebased this benchmark PR on top of the merged test PRs.

We still have a few tests printing to stdout, so to avoid garbling the two outputs I changed the program to write the results to a file - for now just hard-coded to shrink_bench.log.

For your viewing pleasure, below follows the output from a fresh run. The situation is (unsurprisingly) the same as I described above in the fall - but PRs #235 and #240 start to change that:

                                                         iteration seed 1234                   iteration seed 8743                   iteration seed 6789               total
Shrink test name                                  Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att   Q1/s  #succ/#att   Q2/s  #succ/#att    Q1/s   Q2/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
big bound issue59                                - skipped as generator is stateful, making it non-repeatable
long_shrink                                       0.022  149/351     0.885 3039/3099    0.007  148/349     0.567 3068/3127    0.006  146/345     0.620 3063/3124    0.035  2.072
ints arent 0 mod 3                                0.000   84/216     0.000    2/2       0.000   71/204     0.000    1/1       0.000   88/215     0.000   88/305     0.000  0.000
ints are 0                                        0.000   62/63      0.000   61/123     0.000   61/62      0.000   61/122     0.000   62/63      0.000   61/123     0.000  0.000
ints < 209609                                    - skipped as generator is stateful, making it non-repeatable
nat < 5001                                        0.000    6/56      0.000    7/77      0.000    6/47      0.000    7/69      0.000    3/24      0.000    8/85      0.000  0.000
char never produces 'abcdef'                      0.000    0/0       0.000    1/1       0.000    0/0       0.000    0/0       0.000    0/0       0.000    0/0       0.000  0.000
strings are empty                                 0.000  249/250     0.000    8/16      0.031 4466/4467    0.000   13/26      0.000    0/1       0.000    1/2       0.031  0.000
string never has a \000 char                      0.000   25/40      0.000   22/167     0.093 4466/4519    0.002   56/254     0.000   15/18      0.000   15/48      0.093  0.002
string never has a \255 char                      0.000  249/316     0.001   59/318     0.097 4466/4520    0.002   97/529     0.403 9260/9365    0.002   41/194     0.500  0.005
strings have unique chars                         0.003  248/269     0.000   18/30      0.912 4465/4536    0.001   24/52      0.000   14/34      0.000   15/20      0.915  0.002
pairs have different components                   0.000    0/4       0.000    0/6       0.000    0/4       0.000    0/6       0.000    0/6       0.000    0/10      0.000  0.000
pairs have same components                        0.000  125/126     0.000   63/125     0.000  124/125     0.000   62/123     0.000  119/120     0.000   63/125     0.000  0.000
pairs have a zero component                       0.000  124/188     0.000  122/306     0.000  123/186     0.000  122/306     0.000  118/182     0.000  123/308     0.000  0.000
pairs are (0,0)                                   0.000  125/126     0.000   63/125     0.000  124/125     0.000   62/123     0.000  119/120     0.000   63/125     0.000  0.000
pairs are ordered                                 0.000  827/17626   0.000   94/1217    0.001  690/12946   0.000   85/865     0.000  687/13963   0.000   94/1326    0.001  0.000
pairs are ordered reversely                       0.000  124/125     0.000   62/124     0.000  123/124     0.000   62/124     0.000  122/123     0.000   62/124     0.000  0.000
pairs sum to less than 128                        0.000  116/129     0.000   56/126     0.000  120/146     0.000   59/138     0.000  119/141     0.000   57/130     0.000  0.000
pairs lists rev concat                            0.014  140/332     0.008   83/168     0.008  137/335     0.002   75/152     0.000  130/318     0.000   67/136     0.021  0.011
pairs lists no overlap                            0.000   22/47      0.004   27/60      0.000   17/41      0.002   18/41      0.000    6/20      0.000   11/28      0.001  0.006
triples have pair-wise different components       0.000    7/31      0.000    3/15      0.000    6/6       0.000    3/3       0.000    2/6       0.000    3/3       0.000  0.000
triples have same components                      0.000  188/252     0.000   64/127     0.000  177/240     0.000   64/128     0.000  182/246     0.000   62/122     0.000  0.000
triples are ordered                               0.000  188/252     0.000    3/4       0.000  177/178     0.000    3/4       0.000  187/250     0.000   91/1021    0.000  0.000
triples are ordered reversely                     0.000  188/189     0.000   64/126     0.000  177/240     0.000  124/247     0.000  182/183     0.000   65/127     0.000  0.000
quadruples have pair-wise different components    0.000   23/41      0.000    4/4       0.000   11/11      0.000    4/4       0.000   14/38      0.000    4/11      0.000  0.000
quadruples have same components                   0.000  250/377     0.000  126/313     0.000  237/424     0.001  115/292     0.000  242/425     0.000  123/307     0.000  0.001
quadruples are ordered                            0.000  251/315     0.000    5/6       0.000  239/240     0.000    4/5       0.000  244/308     0.000    5/6       0.000  0.000
quadruples are ordered reversely                  0.000  251/252     0.000   66/128     0.000  239/302     0.000  126/250     0.000  244/245     0.000   66/128     0.000  0.000
forall (a, b) in nat: a < b                       0.000   13/23      0.000    6/16      0.000   10/15      0.000    6/15      0.000    5/6       0.000    4/7       0.000  0.000
forall (a, b, c) in nat: a < b < c                0.000   15/22      0.000    3/7       0.000   26/53      0.000    7/28      0.000    9/9       0.000    3/3       0.000  0.000
forall (a, b, c, d) in nat: a < b < c < d         0.000   23/29      0.000    4/4       0.000   30/56      0.000    4/4       0.000   13/13      0.000    4/4       0.000  0.000
forall (a, b, c, d, e) in nat: a < b < c < d < e  0.000   28/28      0.000    5/5       0.000   33/33      0.000    5/5       0.000   14/14      0.000    5/5       0.000  0.000
forall (a, b, c, d, e, f) in nat: a < b < c < d   0.000   30/30      0.000    6/6       0.000   38/38      0.000    6/6       0.000   16/16      0.000    6/6       0.000  0.000
forall (a, b, c, d, e, f, g) in nat: a < b < c <  0.000   31/31      0.000    7/7       0.000   41/41      0.000    7/7       0.000   22/22      0.000    7/7       0.000  0.000
forall (a, b, c, d, e, f, g, h) in nat: a < b <   0.000   35/35      0.000    8/8       0.000   48/48      0.000    8/8       0.000   22/22      0.000    7/7       0.000  0.000
forall (a, b, c, d, e, f, g, h, i) in nat: a < b  0.000   42/42      0.000    9/9       0.000   55/55      0.000    9/9       0.000   26/26      0.000    8/8       0.000  0.000
bind ordered pairs                                0.000  125/125     0.000    1/1       0.000  124/124     0.000    1/1       0.000  120/120     0.000    1/1       0.000  0.000
bind list_size constant                           0.000   50/358     0.000   12/26      0.000   48/338     0.001   12/25      0.000   48/342     0.000   11/21      0.000  0.001
lists are empty                                   0.000   11/16      0.000    8/16      0.002   19/27      0.003   13/26      0.000    4/9       0.000    1/2       0.002  0.004
lists shorter than 10                             0.000   50/1198    0.000   16/30      0.000   71/1637    0.002   21/42      0.000   36/868     0.000   15/29      0.001  0.002
lists shorter than 432                            6.851 1696/5118102  1.131  412/457     6.873 1612/4863421  1.058  405/450     6.909 1667/5037661  0.123  419/447    20.633  2.312
lists shorter than 4332                           2.713   13/190735  3.794 4022/4087    1.609   11/126052  4.977 4020/4067    1.790    7/126607  3.707 4013/4055    6.112 12.478
lists equal to duplication                        0.195   20/23      0.667    4/7       0.000    7/13      0.000    3/6       0.018   20/25      0.114   17/35      0.213  0.781
lists have unique elems                           0.000    7/17      0.000   11/22      0.003   12/44      0.006   17/30      0.000    6/16      0.000   10/20      0.003  0.007
tree contains only 42                             0.000   10/10      0.000    2/2       0.000   10/10      0.000    2/2       0.000   12/13      0.000    2/2       0.000  0.000
fail_pred_map_commute                             0.000  127/628     0.000   16/59      0.000  112/567     0.000   14/65      0.000  114/665     0.001  117/373     0.001  0.001
fail_pred_strings                                 0.000    1/2       0.000    1/4       0.000    1/2       0.000    1/4       0.000    1/3       0.000    2/5       0.000  0.000
fold_left fold_right                              0.000   25/74      0.000   22/73      0.002   43/161     0.007   58/139     0.000   17/45      0.000   39/95      0.002  0.007
fold_left fold_right uncurried                    3.261  111/99424   0.047  325/984     0.105   36/304     9.131 4811/8969    0.000    5/19      0.000    2/13      3.366  9.177
fold_left fold_right uncurried fun last           0.000   26/89      0.000   25/86      0.001   42/136     0.003   54/176     0.000   16/52      0.000   40/93      0.001  0.003
fold_left test, fun first                         0.001   40/57      0.001   15/28     20.486  125/27750   4.533   45/11947  14.653  282/87674   1.182  168/27748  35.140  5.716
                                                                                                                                                                   67.071 32.588

jmid mentioned this pull request Oct 11, 2021

Deriver: derive shrinkers #191

Open

jmid mentioned this pull request Dec 4, 2021

Add tup2 to tup9 for Gen #181

Merged

This was referenced Apr 3, 2022

Shrinker improvements #235

Closed

Add regression tests for qualified type #201

Merged

jmid mentioned this pull request Apr 16, 2022

More expect tests and decoupling expect test source and runner #237

Merged

jmid added 2 commits April 19, 2022 15:40

a shrinker-benchmark across existing tests

d31c963

comment individual functions, print to out-channel

230eb76

jmid force-pushed the shrinker-performance-benchmark branch from 1bf5c8a to 230eb76 Compare April 19, 2022 20:42

jmid merged commit 90d45d8 into c-cube:master Apr 20, 2022

jmid deleted the shrinker-performance-benchmark branch April 20, 2022 06:59

This was referenced May 1, 2023

Realign shrinker tests #274

Merged

Shrinker adjustments #277

Merged

Less exhaustive string shrinking #278

Open

jmid mentioned this pull request Jul 14, 2023

Documentation - role of Gen.t vs. arbitrary #281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrinker performance benchmark #177

Shrinker performance benchmark #177

jmid commented Sep 13, 2021

jmid commented Apr 19, 2022

Shrinker performance benchmark #177

Shrinker performance benchmark #177

Conversation

jmid commented Sep 13, 2021

jmid commented Apr 19, 2022