Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduction performance improvements #3911

Open
ajpotts opened this issue Nov 21, 2024 · 2 comments
Open

reduction performance improvements #3911

ajpotts opened this issue Nov 21, 2024 · 2 comments
Assignees

Comments

@ajpotts
Copy link
Contributor

ajpotts commented Nov 21, 2024

Investigate how to improve the performance of the reduction functions. The benchmarks have been showing a slight downward trend:
https://chapel-lang.org/perf/arkouda/16-node-xc/?startdate=2024/10/01&enddate=2024/11/21&graphs=all

ajpotts added a commit to ajpotts/arkouda that referenced this issue Nov 21, 2024
@ajpotts
Copy link
Contributor Author

ajpotts commented Nov 22, 2024

Investigating the recent performance changes.

On October 30th, PR #3874 was merged in, and at that point performance seemed equal to historical expectation. Since then, the only change affecting the relevant functions (sum, prod, min, max) was PR #3876. PR #3876 did not change any of the chapel code on the benchmarked functions (sum, prod, min, max). However, it did change the python interface to have runtime generated code.

@ajpotts
Copy link
Contributor Author

ajpotts commented Nov 22, 2024

However, I could not see any difference locally in the python interface. Running small size with 1000 trials:
python3 -m pytest -c benchmark.ini benchmark_v2/reduce_benchmark.py --benchmark-autosave --benchmark-storage=file://benchmark_v2/.benchmarks --size=10 --trials=1000

On master:

------------------------------------------------------------------------------------- benchmark 'Arkouda_Reduce': 8 tests --------------------------------------------------------------------------------------
Name (time in us)                      Min                   Max                Mean              StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_ak_reduce[float64-sum]      396.7580 (1.0)      3,828.2360 (1.35)     428.2630 (1.00)     146.4466 (1.18)     414.3840 (1.00)      14.6695 (1.20)        17;69        2.3350 (1.00)       1000           1
bench_ak_reduce[int64-max]        398.5590 (1.00)     3,958.4780 (1.40)     427.2626 (1.0)      137.9110 (1.12)     414.7790 (1.01)      14.8625 (1.22)        11;73        2.3405 (1.0)        1000           1
bench_ak_reduce[int64-min]        399.7530 (1.01)     3,282.7670 (1.16)     429.8727 (1.01)     138.9670 (1.12)     412.5240 (1.0)       15.9540 (1.31)        18;68        2.3263 (0.99)       1000           1
bench_ak_reduce[float64-min]      400.5870 (1.01)     2,832.4070 (1.0)      434.3084 (1.02)     123.6151 (1.0)      419.5485 (1.02)      12.1960 (1.0)         22;75        2.3025 (0.98)       1000           1
bench_ak_reduce[int64-prod]       402.3570 (1.01)     3,905.0940 (1.38)     487.3010 (1.14)     224.2630 (1.81)     440.7360 (1.07)      66.3515 (5.44)       41;113        2.0521 (0.88)       1000           1
bench_ak_reduce[float64-max]      402.6960 (1.01)     3,761.3900 (1.33)     435.0992 (1.02)     183.9696 (1.49)     419.2550 (1.02)      14.9410 (1.23)         9;52        2.2983 (0.98)       1000           1
bench_ak_reduce[float64-prod]     405.0780 (1.02)     3,594.1770 (1.27)     437.1224 (1.02)     166.7625 (1.35)     421.3645 (1.02)      15.2345 (1.25)         7;58        2.2877 (0.98)       1000           1
bench_ak_reduce[int64-sum]        407.6870 (1.03)     3,212.4630 (1.13)     561.2634 (1.31)     233.6560 (1.89)     471.0960 (1.14)     129.7065 (10.64)     123;129        1.7817 (0.76)       1000           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

pr/3874

------------------------------------------------------------------------------------- benchmark 'Arkouda_Reduce': 8 tests -------------------------------------------------------------------------------------
Name (time in us)                      Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_ak_reduce[float64-min]      280.7100 (1.0)      2,821.8120 (1.0)      302.8397 (1.0)      135.6191 (1.0)      291.5810 (1.0)       4.9830 (1.0)          5;80        3.3021 (1.0)        1000           1
bench_ak_reduce[int64-min]        281.1470 (1.00)     3,754.0010 (1.33)     319.8396 (1.06)     147.1919 (1.09)     297.0130 (1.02)     29.3110 (5.88)        24;58        3.1266 (0.95)       1000           1
bench_ak_reduce[float64-max]      306.3550 (1.09)     4,033.6660 (1.43)     334.8669 (1.11)     186.9720 (1.38)     319.9380 (1.10)      7.9600 (1.60)        4;109        2.9863 (0.90)       1000           1
bench_ak_reduce[int64-max]        306.6460 (1.09)     3,729.6100 (1.32)     333.3853 (1.10)     157.7002 (1.16)     319.5255 (1.10)      7.1795 (1.44)         7;96        2.9995 (0.91)       1000           1
bench_ak_reduce[float64-prod]     307.2600 (1.09)     4,097.3310 (1.45)     337.2750 (1.11)     171.2904 (1.26)     320.9500 (1.10)      9.2685 (1.86)       10;120        2.9649 (0.90)       1000           1
bench_ak_reduce[int64-prod]       307.6670 (1.10)     3,946.4620 (1.40)     396.9619 (1.31)     227.2457 (1.68)     357.0160 (1.22)     41.1030 (8.25)       43;122        2.5191 (0.76)       1000           1
bench_ak_reduce[float64-sum]      315.0510 (1.12)     3,957.4570 (1.40)     349.5556 (1.15)     177.5801 (1.31)     328.6755 (1.13)      9.1370 (1.83)       13;127        2.8608 (0.87)       1000           1
bench_ak_reduce[int64-sum]        322.5230 (1.15)     3,523.9010 (1.25)     386.3780 (1.28)     179.6222 (1.32)     347.7215 (1.19)     37.0760 (7.44)       61;120        2.5881 (0.78)       1000           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

The difference is not statistically significant considering the large standard deviation. However, the difference is less than a tenth of second, so I don't think the python changes are the cause the performance regression either.

ajpotts added a commit to ajpotts/arkouda that referenced this issue Nov 22, 2024
@ajpotts ajpotts self-assigned this Dec 6, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant