-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize calculation of dot product for double vectors with AVX #2257
Conversation
This improves the performance with best models and should also make training faster. Signed-off-by: Stefan Weil <[email protected]>
This replaces pull request #954. See timing results for OCR there. Feedback on more timing results on other hardware with AVX (also with training) would be interesting. |
The current test results show a large performance increase on server hardware, but on typical other platforms (tested on a virtual machine and on a MacBook Pro) the increase is small (about 2 %). |
You might want to try using FMA. https://en.wikipedia.org/wiki/FMA_instruction_set |
All my machines with AVX also support FMA, so I can try that. Thank you for the hint. |
thanks. |
@stweil, a reminder about FMA... |
Thank you for the reminder. The first test results with debug code look promissing:
With production code, both AVX and FMA take the same time for this simple image (real 2.86). |
FMA is supported in the latest code and gives about the same performance as AVX. |
Thanks! |
This improves the performance with best models and should also
make training faster.
Signed-off-by: Stefan Weil [email protected]