Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement on Performance Tests #6

Open
briancylui opened this issue Aug 14, 2018 · 1 comment
Open

Enhancement on Performance Tests #6

briancylui opened this issue Aug 14, 2018 · 1 comment
Assignees

Comments

@briancylui
Copy link
Owner

briancylui commented Aug 14, 2018

Background

This issue page is opened in response to Intel's following inquiry (@helloguo):

I have one question about the perf tests SsePerformanceTests.cs. I saw ‘EXP_RANGE’ is initialized as private const int EXP_RANGE = EXP_MAX / 2;. It usually works very well since it tests a large range of floating point values. However, in some cases such as SumSqU, it causes a lot of FP_ASSIST, a hardware event causing some perf issue ( you can find more details here or the section ‘Floating-Point Performance Ratios’ in intel optimization manual). Basically, FP_ASSIST is caused by denormals, underflow numbers, NAN.

I would suggest to initialize ‘EXP_RANGE’ as private const int EXP_RANGE = EXP_MAX / 4; or private const int EXP_RANGE = EXP_MAX / 8;. On my machine, when I changed ‘EXP_RANGE’ to private const int EXP_RANGE = EXP_MAX / 4; the execution time reduced to 227 us from 469 us. Do you mind testing it on your machine and changing the perf tests if it makes sense to you?

Results (original, after Change 1, after Change 2)

Note: Since these results are obtained with the ShortRun mode, the mean is an average of only 3 numbers and can encapsulate a sizeable error.

  1. Original performance test results:
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain  LaunchCount=1  TargetCount=3
WarmupCount=3
Method Mean Error StdDev
NativeAddScalarUPerf 259.8 us 363.18 us 20.520 us
ManagedAddScalarUPerf 309.1 us 434.44 us 24.546 us
NativeScaleUPerf 312.4 us 1,012.74 us 57.222 us
ManagedScaleUPerf 259.0 us 788.02 us 44.525 us
NativeScaleSrcUPerf 417.2 us 969.85 us 54.798 us
ManagedScaleSrcUPerf 387.1 us 1,137.38 us 64.264 us
NativeScaleAddUPerf 333.3 us 670.78 us 37.900 us
ManagedScaleAddUPerf 306.6 us 859.01 us 48.536 us
NativeAddScaleUPerf 413.0 us 141.54 us 7.997 us
ManagedAddScaleUPerf 552.9 us 2,980.25 us 168.389 us
NativeAddScaleSUPerf 4,734.3 us 527.31 us 29.794 us
ManagedAddScaleSUPerf 4,942.8 us 3,993.27 us 225.627 us
NativeAddScaleCopyUPerf 662.1 us 1,882.68 us 106.375 us
ManagedAddScaleCopyUPerf 557.3 us 424.36 us 23.977 us
NativeAddUPerf 372.0 us 253.56 us 14.326 us
ManagedAddUPerf 351.3 us 178.05 us 10.060 us
NativeAddSUPerf 4,865.2 us 1,176.29 us 66.463 us
ManagedAddSUPerf 4,959.9 us 4,223.49 us 238.635 us
NativeMulElementWiseUPerf 569.9 us 370.85 us 20.954 us
ManagedMulElementWiseUPerf 544.0 us 431.84 us 24.400 us
NativeSumUPerf 305.4 us 75.23 us 4.251 us
ManagedSumUPerf 302.6 us 90.87 us 5.134 us
NativeSumSqUPerf 605.7 us 220.48 us 12.457 us
ManagedSumSqUPerf 627.1 us 31.06 us 1.755 us
NativeSumSqDiffUPerf 308.9 us 105.91 us 5.984 us
ManagedSumSqDiffUPerf 310.8 us 121.70 us 6.876 us
NativeSumAbsUPerf 307.0 us 156.36 us 8.835 us
ManagedSumAbsUPerf 306.9 us 150.29 us 8.492 us
NativeSumAbsDiffUPerf 310.7 us 71.38 us 4.033 us
ManagedSumAbsDiffUPerf 314.4 us 105.13 us 5.940 us
NativeMaxAbsUPerf 305.2 us 96.67 us 5.462 us
ManagedMaxAbsUPerf 308.2 us 106.53 us 6.019 us
NativeMaxAbsDiffUPerf 308.5 us 149.43 us 8.443 us
ManagedMaxAbsDiffUPerf 315.4 us 39.40 us 2.226 us
NativeDotUPerf 370.4 us 189.90 us 10.730 us
ManagedDotUPerf 499.3 us 1,135.48 us 64.157 us
NativeDotSUPerf 4,048.1 us 4,474.43 us 252.814 us
ManagedDotSUPerf 5,175.8 us 18,528.29 us 1,046.883 us
NativeDist2Perf 439.8 us 1,530.90 us 86.499 us
ManagedDist2Perf 418.8 us 613.55 us 34.667 us
NativeSdcaL1UpdateUPerf 797.2 us 1,032.66 us 58.347 us
ManagedSdcaL1UpdateUPerf 836.8 us 2,398.49 us 135.519 us
NativeSdcaL1UpdateSUPerf 14,388.6 us 5,955.75 us 336.511 us
ManagedSdcaL1UpdateSUPerf 15,092.4 us 8,062.04 us 455.520 us
  1. Performance test results after changing private const int EXP_RANGE = EXP_MAX / 2 to EXP_MAX / 4 in test\Microsoft.ML.CpuMath.PerformanceTests\SsePerformanceTests.cs:
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain  LaunchCount=1  TargetCount=3
WarmupCount=3
Method Mean Error StdDev
NativeAddScalarUPerf 241.5 us 528.68 us 29.872 us
ManagedAddScalarUPerf 254.9 us 436.65 us 24.672 us
NativeScaleUPerf 225.6 us 551.65 us 31.169 us
ManagedScaleUPerf 232.9 us 199.31 us 11.261 us
NativeScaleSrcUPerf 309.6 us 119.61 us 6.758 us
ManagedScaleSrcUPerf 443.4 us 2,016.66 us 113.945 us
NativeScaleAddUPerf 267.6 us 1,021.77 us 57.732 us
ManagedScaleAddUPerf 220.9 us 174.62 us 9.866 us
NativeAddScaleUPerf 367.9 us 270.39 us 15.277 us
ManagedAddScaleUPerf 362.1 us 423.48 us 23.927 us
NativeAddScaleSUPerf 4,687.2 us 2,677.88 us 151.305 us
ManagedAddScaleSUPerf 5,702.7 us 9,787.96 us 553.038 us
NativeAddScaleCopyUPerf 571.9 us 731.91 us 41.354 us
ManagedAddScaleCopyUPerf 520.9 us 310.43 us 17.540 us
NativeAddUPerf 371.3 us 181.94 us 10.280 us
ManagedAddUPerf 390.0 us 265.46 us 14.999 us
NativeAddSUPerf 4,880.2 us 4,622.75 us 261.194 us
ManagedAddSUPerf 5,215.2 us 2,576.32 us 145.567 us
NativeMulElementWiseUPerf 573.8 us 211.73 us 11.963 us
ManagedMulElementWiseUPerf 607.4 us 692.36 us 39.119 us
NativeSumUPerf 317.1 us 221.46 us 12.513 us
ManagedSumUPerf 311.5 us 63.41 us 3.583 us
NativeSumSqUPerf 314.4 us 20.76 us 1.173 us
ManagedSumSqUPerf 311.5 us 43.73 us 2.471 us
NativeSumSqDiffUPerf 358.2 us 675.55 us 38.170 us
ManagedSumSqDiffUPerf 366.4 us 151.24 us 8.545 us
NativeSumAbsUPerf 320.3 us 57.05 us 3.224 us
ManagedSumAbsUPerf 370.5 us 810.48 us 45.793 us
NativeSumAbsDiffUPerf 333.6 us 237.97 us 13.446 us
ManagedSumAbsDiffUPerf 330.6 us 212.14 us 11.986 us
NativeMaxAbsUPerf 314.5 us 159.91 us 9.035 us
ManagedMaxAbsUPerf 322.0 us 267.66 us 15.123 us
NativeMaxAbsDiffUPerf 328.3 us 128.60 us 7.266 us
ManagedMaxAbsDiffUPerf 372.7 us 1,107.30 us 62.564 us
NativeDotUPerf 374.5 us 273.08 us 15.430 us
ManagedDotUPerf 383.4 us 146.85 us 8.297 us
NativeDotSUPerf 3,792.4 us 839.94 us 47.458 us
ManagedDotSUPerf 4,351.2 us 3,492.81 us 197.351 us
NativeDist2Perf 377.7 us 247.30 us 13.973 us
ManagedDist2Perf 447.2 us 1,652.67 us 93.379 us
NativeSdcaL1UpdateUPerf 723.1 us 833.90 us 47.117 us
ManagedSdcaL1UpdateUPerf 661.7 us 412.54 us 23.309 us
NativeSdcaL1UpdateSUPerf 17,905.3 us 22,845.78 us 1,290.829 us
ManagedSdcaL1UpdateSUPerf 15,535.5 us 4,587.61 us 259.209 us
  1. Performance test results after changing private const int EXP_RANGE = EXP_MAX / 2 to EXP_MAX / 8 in test\Microsoft.ML.CpuMath.PerformanceTests\SsePerformanceTests.cs:
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain  LaunchCount=1  TargetCount=3
WarmupCount=3
Method Mean Error StdDev
NativeAddScalarUPerf 229.2 us 131.203 us 7.4132 us
ManagedAddScalarUPerf 200.3 us 105.466 us 5.9590 us
NativeScaleUPerf 230.7 us 96.228 us 5.4371 us
ManagedScaleUPerf 224.0 us 45.792 us 2.5873 us
NativeScaleSrcUPerf 296.4 us 167.351 us 9.4557 us
ManagedScaleSrcUPerf 292.9 us 160.873 us 9.0896 us
NativeScaleAddUPerf 245.2 us 29.924 us 1.6908 us
ManagedScaleAddUPerf 210.9 us 105.411 us 5.9559 us
NativeAddScaleUPerf 360.0 us 249.734 us 14.1104 us
ManagedAddScaleUPerf 345.0 us 88.350 us 4.9919 us
NativeAddScaleSUPerf 5,138.2 us 9,108.634 us 514.6547 us
ManagedAddScaleSUPerf 7,292.9 us 8,231.585 us 465.0998 us
NativeAddScaleCopyUPerf 607.0 us 176.048 us 9.9470 us
ManagedAddScaleCopyUPerf 579.1 us 229.272 us 12.9543 us
NativeAddUPerf 356.1 us 150.430 us 8.4996 us
ManagedAddUPerf 365.8 us 364.408 us 20.5898 us
NativeAddSUPerf 4,552.6 us 2,174.675 us 122.8732 us
ManagedAddSUPerf 4,857.6 us 3,113.260 us 175.9050 us
NativeMulElementWiseUPerf 567.1 us 60.800 us 3.4353 us
ManagedMulElementWiseUPerf 524.6 us 260.834 us 14.7376 us
NativeSumUPerf 300.6 us 87.834 us 4.9628 us
ManagedSumUPerf 300.1 us 100.474 us 5.6770 us
NativeSumSqUPerf 301.2 us 134.832 us 7.6182 us
ManagedSumSqUPerf 301.0 us 147.377 us 8.3271 us
NativeSumSqDiffUPerf 304.2 us 145.773 us 8.2364 us
ManagedSumSqDiffUPerf 304.8 us 176.255 us 9.9587 us
NativeSumAbsUPerf 301.8 us 138.377 us 7.8186 us
ManagedSumAbsUPerf 311.9 us 140.698 us 7.9497 us
NativeSumAbsDiffUPerf 309.2 us 8.792 us 0.4968 us
ManagedSumAbsDiffUPerf 359.3 us 647.514 us 36.5857 us
NativeMaxAbsUPerf 319.1 us 249.761 us 14.1120 us
ManagedMaxAbsUPerf 356.6 us 206.461 us 11.6655 us
NativeMaxAbsDiffUPerf 383.1 us 465.175 us 26.2833 us
ManagedMaxAbsDiffUPerf 364.9 us 107.476 us 6.0726 us
NativeDotUPerf 432.3 us 75.997 us 4.2940 us
ManagedDotUPerf 429.2 us 70.930 us 4.0077 us
NativeDotSUPerf 4,622.5 us 1,927.120 us 108.8859 us
ManagedDotSUPerf 4,801.0 us 1,360.842 us 76.8901 us
NativeDist2Perf 384.5 us 95.121 us 5.3745 us
ManagedDist2Perf 377.8 us 55.466 us 3.1340 us
NativeSdcaL1UpdateUPerf 812.2 us 425.857 us 24.0617 us
ManagedSdcaL1UpdateUPerf 700.9 us 381.663 us 21.5646 us
NativeSdcaL1UpdateSUPerf 14,386.1 us 1,931.457 us 109.1309 us
ManagedSdcaL1UpdateSUPerf 17,178.5 us 17,825.752 us 1,007.1880 us
@briancylui briancylui self-assigned this Aug 14, 2018
@briancylui
Copy link
Owner Author

@helloguo Thank you for your inquiry. As we can see from the perf test results, the running time for NativeSumSqU drops by half from 605.7 us to 314.4 us, and that for ManagedSumSqU also drops by half from 627.1 us to 311.5 us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant