reduction performance improvements #3911

ajpotts · 2024-11-21T16:26:47Z

Investigate how to improve the performance of the reduction functions. The benchmarks have been showing a slight downward trend:
https://chapel-lang.org/perf/arkouda/16-node-xc/?startdate=2024/10/01&enddate=2024/11/21&graphs=all

ajpotts · 2024-11-22T18:14:11Z

Investigating the recent performance changes.

On October 30th, PR #3874 was merged in, and at that point performance seemed equal to historical expectation. Since then, the only change affecting the relevant functions (sum, prod, min, max) was PR #3876. PR #3876 did not change any of the chapel code on the benchmarked functions (sum, prod, min, max). However, it did change the python interface to have runtime generated code.

ajpotts · 2024-11-22T21:44:07Z

However, I could not see any difference locally in the python interface. Running small size with 1000 trials:
python3 -m pytest -c benchmark.ini benchmark_v2/reduce_benchmark.py --benchmark-autosave --benchmark-storage=file://benchmark_v2/.benchmarks --size=10 --trials=1000

On master:

------------------------------------------------------------------------------------- benchmark 'Arkouda_Reduce': 8 tests --------------------------------------------------------------------------------------
Name (time in us)                      Min                   Max                Mean              StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_ak_reduce[float64-sum]      396.7580 (1.0)      3,828.2360 (1.35)     428.2630 (1.00)     146.4466 (1.18)     414.3840 (1.00)      14.6695 (1.20)        17;69        2.3350 (1.00)       1000           1
bench_ak_reduce[int64-max]        398.5590 (1.00)     3,958.4780 (1.40)     427.2626 (1.0)      137.9110 (1.12)     414.7790 (1.01)      14.8625 (1.22)        11;73        2.3405 (1.0)        1000           1
bench_ak_reduce[int64-min]        399.7530 (1.01)     3,282.7670 (1.16)     429.8727 (1.01)     138.9670 (1.12)     412.5240 (1.0)       15.9540 (1.31)        18;68        2.3263 (0.99)       1000           1
bench_ak_reduce[float64-min]      400.5870 (1.01)     2,832.4070 (1.0)      434.3084 (1.02)     123.6151 (1.0)      419.5485 (1.02)      12.1960 (1.0)         22;75        2.3025 (0.98)       1000           1
bench_ak_reduce[int64-prod]       402.3570 (1.01)     3,905.0940 (1.38)     487.3010 (1.14)     224.2630 (1.81)     440.7360 (1.07)      66.3515 (5.44)       41;113        2.0521 (0.88)       1000           1
bench_ak_reduce[float64-max]      402.6960 (1.01)     3,761.3900 (1.33)     435.0992 (1.02)     183.9696 (1.49)     419.2550 (1.02)      14.9410 (1.23)         9;52        2.2983 (0.98)       1000           1
bench_ak_reduce[float64-prod]     405.0780 (1.02)     3,594.1770 (1.27)     437.1224 (1.02)     166.7625 (1.35)     421.3645 (1.02)      15.2345 (1.25)         7;58        2.2877 (0.98)       1000           1
bench_ak_reduce[int64-sum]        407.6870 (1.03)     3,212.4630 (1.13)     561.2634 (1.31)     233.6560 (1.89)     471.0960 (1.14)     129.7065 (10.64)     123;129        1.7817 (0.76)       1000           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

pr/3874

------------------------------------------------------------------------------------- benchmark 'Arkouda_Reduce': 8 tests -------------------------------------------------------------------------------------
Name (time in us)                      Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_ak_reduce[float64-min]      280.7100 (1.0)      2,821.8120 (1.0)      302.8397 (1.0)      135.6191 (1.0)      291.5810 (1.0)       4.9830 (1.0)          5;80        3.3021 (1.0)        1000           1
bench_ak_reduce[int64-min]        281.1470 (1.00)     3,754.0010 (1.33)     319.8396 (1.06)     147.1919 (1.09)     297.0130 (1.02)     29.3110 (5.88)        24;58        3.1266 (0.95)       1000           1
bench_ak_reduce[float64-max]      306.3550 (1.09)     4,033.6660 (1.43)     334.8669 (1.11)     186.9720 (1.38)     319.9380 (1.10)      7.9600 (1.60)        4;109        2.9863 (0.90)       1000           1
bench_ak_reduce[int64-max]        306.6460 (1.09)     3,729.6100 (1.32)     333.3853 (1.10)     157.7002 (1.16)     319.5255 (1.10)      7.1795 (1.44)         7;96        2.9995 (0.91)       1000           1
bench_ak_reduce[float64-prod]     307.2600 (1.09)     4,097.3310 (1.45)     337.2750 (1.11)     171.2904 (1.26)     320.9500 (1.10)      9.2685 (1.86)       10;120        2.9649 (0.90)       1000           1
bench_ak_reduce[int64-prod]       307.6670 (1.10)     3,946.4620 (1.40)     396.9619 (1.31)     227.2457 (1.68)     357.0160 (1.22)     41.1030 (8.25)       43;122        2.5191 (0.76)       1000           1
bench_ak_reduce[float64-sum]      315.0510 (1.12)     3,957.4570 (1.40)     349.5556 (1.15)     177.5801 (1.31)     328.6755 (1.13)      9.1370 (1.83)       13;127        2.8608 (0.87)       1000           1
bench_ak_reduce[int64-sum]        322.5230 (1.15)     3,523.9010 (1.25)     386.3780 (1.28)     179.6222 (1.32)     347.7215 (1.19)     37.0760 (7.44)       61;120        2.5881 (0.78)       1000           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

The difference is not statistically significant considering the large standard deviation. However, the difference is less than a tenth of second, so I don't think the python changes are the cause the performance regression either.

Co-authored-by: Amanda Potts <[email protected]>

ajpotts added a commit to ajpotts/arkouda that referenced this issue Nov 21, 2024

Closes Bears-R-Us#3911 reduction performance improvements

88b5f03

ajpotts added a commit to ajpotts/arkouda that referenced this issue Nov 22, 2024

Part of Bears-R-Us#3911 reduction performance improvements

59bafe2

ajpotts mentioned this issue Nov 22, 2024

Part of #3911 reduction performance improvements #3914

Merged

ajpotts self-assigned this Dec 6, 2024

github-merge-queue bot pushed a commit that referenced this issue Dec 17, 2024

Part of #3911 reduction performance improvements (#3914)

a77be87

Co-authored-by: Amanda Potts <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduction performance improvements #3911

reduction performance improvements #3911

ajpotts commented Nov 21, 2024

ajpotts commented Nov 22, 2024

ajpotts commented Nov 22, 2024 •

edited

Loading

reduction performance improvements #3911

reduction performance improvements #3911

Comments

ajpotts commented Nov 21, 2024

ajpotts commented Nov 22, 2024

ajpotts commented Nov 22, 2024 • edited Loading

ajpotts commented Nov 22, 2024 •

edited

Loading