You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue page is opened in response to Intel's following inquiry (@helloguo):
I have one question about the perf tests SsePerformanceTests.cs. I saw ‘EXP_RANGE’ is initialized as private const int EXP_RANGE = EXP_MAX / 2;. It usually works very well since it tests a large range of floating point values. However, in some cases such as SumSqU, it causes a lot of FP_ASSIST, a hardware event causing some perf issue ( you can find more details here or the section ‘Floating-Point Performance Ratios’ in intel optimization manual). Basically, FP_ASSIST is caused by denormals, underflow numbers, NAN.
I would suggest to initialize ‘EXP_RANGE’ as private const int EXP_RANGE = EXP_MAX / 4; or private const int EXP_RANGE = EXP_MAX / 8;. On my machine, when I changed ‘EXP_RANGE’ to private const int EXP_RANGE = EXP_MAX / 4; the execution time reduced to 227 us from 469 us. Do you mind testing it on your machine and changing the perf tests if it makes sense to you?
Results (original, after Change 1, after Change 2)
Note: Since these results are obtained with the ShortRun mode, the mean is an average of only 3 numbers and can encapsulate a sizeable error.
Performance test results after changing private const int EXP_RANGE = EXP_MAX / 2 to EXP_MAX / 4 in test\Microsoft.ML.CpuMath.PerformanceTests\SsePerformanceTests.cs:
Performance test results after changing private const int EXP_RANGE = EXP_MAX / 2 to EXP_MAX / 8 in test\Microsoft.ML.CpuMath.PerformanceTests\SsePerformanceTests.cs:
@helloguo Thank you for your inquiry. As we can see from the perf test results, the running time for NativeSumSqU drops by half from 605.7 us to 314.4 us, and that for ManagedSumSqU also drops by half from 627.1 us to 311.5 us.
Background
This issue page is opened in response to Intel's following inquiry (@helloguo):
Results (original, after Change 1, after Change 2)
Note: Since these results are obtained with the
ShortRun
mode, the mean is an average of only 3 numbers and can encapsulate a sizeable error.private const int EXP_RANGE = EXP_MAX / 2
toEXP_MAX / 4
intest\Microsoft.ML.CpuMath.PerformanceTests\SsePerformanceTests.cs
:private const int EXP_RANGE = EXP_MAX / 2
toEXP_MAX / 8
intest\Microsoft.ML.CpuMath.PerformanceTests\SsePerformanceTests.cs
:The text was updated successfully, but these errors were encountered: