forked from dotnet/machinelearning
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving performance in .NET Core App 3.0 #3
Comments
Can you manually inline
codegen of test1:
codegen of test2:
|
Thank you for all the help and guidance from commenters and my mentors. Now, the perf results have been reverted back to the desired behavior, so I may close the issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the main progress page, the performance tests originally sitting in the
src\Native\CpuMath\
folder gives comparable performance results for both native and managed implementations of SSE key intrinsics.However, once moved into the
test\Microsoft.ML.CpuMath.PerformanceTests\
folder, with multi-targeting, usingSpan<T>
, having a lower TargetCount (from ~20 to 3) in the ToolChain, the performances of managedDotU
,SumSqU
,Dist2
, andSumAbsU
seem to deviate noticeably from those of their native counterparts. Two relevant tables are shown below.Run in .NET Core App 3.0 (
ManagedXPerf
uses the managedX
)Run in .NET Core App 2.1 (
ManagedXPerf
uses the nativeX
)TODOs
When I ran the performance tests in the early half of the PR review period, the perfs looked fine, but the most recent run above looked pretty different. Will look into reasons that cause this issue.
Experiments made to find the cause to the issue
Default
toShortRun
, i.e. increasingLaunchCount
and other warm-up steps to make perf measurement more accurate.Conclusion: Not the main factor.
Removed the dependency on
Span<T>
to resort to using normal input float arrays instead.Conclusion: Not the main factor.
Removed the dependency on the
VectorSum
function to resort to using original code instead.Conclusion: This is the main factor.
Perf results after the fix:
The text was updated successfully, but these errors were encountered: