Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't drop words with low certainty #1264

Merged
merged 1 commit into from
Feb 20, 2018
Merged

Don't drop words with low certainty #1264

merged 1 commit into from
Feb 20, 2018

Conversation

amitdo
Copy link
Collaborator

@amitdo amitdo commented Jan 10, 2018

Fix #681.

@amitdo
Copy link
Collaborator Author

amitdo commented Jan 23, 2018

@theraysmith, @jbreiden,
Please review and approve this PR. I want this in to be in Ubuntu 18.04.

@zc813
Copy link

zc813 commented Feb 20, 2018

Hi amitdo, I encountered the same problem. Is there anything I can do to help with this pull request?

@amitdo
Copy link
Collaborator Author

amitdo commented Feb 20, 2018

I think it's a good idea to test it on several languages and variety of pages.

Apart from this, someone with the right permissions will need to merge it...

@zdenop zdenop merged commit 766b7bd into tesseract-ocr:master Feb 20, 2018
@amitdo amitdo deleted the dontdropwords branch February 20, 2018 16:23
@zc813
Copy link

zc813 commented Feb 21, 2018

@amitdo Some of the results are better on Tibetan. Previously missing words are recognized after this commit.
However, there are still several entire lines of text missing. Different lines are skipped if a different psm (3, 6, or 11) is used. It's like #538 , but in my case, all texts are in the same size.

Any idea? Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LSTM: Words dropped during recognition
3 participants