Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize calculation of dot product for double vectors with AVX #2257

Merged
merged 1 commit into from
Feb 21, 2019

Conversation

stweil
Copy link
Member

@stweil stweil commented Feb 20, 2019

This improves the performance with best models and should also
make training faster.

Signed-off-by: Stefan Weil [email protected]

This improves the performance with best models and should also
make training faster.

Signed-off-by: Stefan Weil <[email protected]>
@ghost ghost assigned stweil Feb 20, 2019
@ghost ghost added the review label Feb 20, 2019
@stweil
Copy link
Member Author

stweil commented Feb 20, 2019

This replaces pull request #954. See timing results for OCR there.

Feedback on more timing results on other hardware with AVX (also with training) would be interesting.

@stweil
Copy link
Member Author

stweil commented Feb 20, 2019

The current test results show a large performance increase on server hardware, but on typical other platforms (tested on a virtual machine and on a MacBook Pro) the increase is small (about 2 %).

@amitdo
Copy link
Collaborator

amitdo commented Feb 20, 2019

@stweil
Copy link
Member Author

stweil commented Feb 20, 2019

All my machines with AVX also support FMA, so I can try that. Thank you for the hint.

@zdenop zdenop merged commit b686859 into tesseract-ocr:master Feb 21, 2019
@ghost ghost removed the review label Feb 21, 2019
@zdenop
Copy link
Contributor

zdenop commented Feb 21, 2019

thanks.

@stweil stweil deleted the dp branch February 21, 2019 08:21
@amitdo
Copy link
Collaborator

amitdo commented Jul 12, 2019

@stweil, a reminder about FMA...

@stweil
Copy link
Member Author

stweil commented Jul 12, 2019

Thank you for the reminder. The first test results with debug code look promissing:

$ time -p tesseract test/testing/phototest.tif - -l tessdata_best/script/Latin -c dotproduct=fma
Page 1
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

real 10.25
user 9.87
sys 0.26
$ time -p tesseract test/testing/phototest.tif - -l tessdata_best/script/Latin -c dotproduct=fma
Page 1
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

real 10.35
user 9.94
sys 0.27
$ time -p tesseract test/testing/phototest.tif - -l tessdata_best/script/Latin -c dotproduct=avx
Page 1
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

real 12.39
user 11.96
sys 0.27
$ time -p tesseract test/testing/phototest.tif - -l tessdata_best/script/Latin -c dotproduct=avx
Page 1
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

real 12.73
user 12.17
sys 0.30

With production code, both AVX and FMA take the same time for this simple image (real 2.86).

@stweil
Copy link
Member Author

stweil commented Jul 15, 2019

FMA is supported in the latest code and gives about the same performance as AVX.

@amitdo
Copy link
Collaborator

amitdo commented Jul 15, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants