-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognize only complete dictionary words only #297
Comments
Tesseract can't do this. The dictionaries are just a hint for Tesseract. |
You could try playing with some of the dictionary related parameters to see if you can achieve the results that you want:
In particular, these two look like they might have promise:
|
AFAIK it is not possible within tesseract. |
Could you have any method to solve the problem?@dong77 |
A related stackoverflow question is here: http://stackoverflow.com/questions/20599768/tesseract-ocr-recognize-complete-dictionary-words-only.
Basically what I want to achieve is to ask Tesseract to recognize only complete words included in my custom dictionary (lang: chi_sim), or to find the best match.
Following the instruction in https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages, I applied a config file with the following content:
But this doesn't seem to work: when I ask Tesseract to recognize word from this image
,
$ tesseract /path/to/the/above/image.jpg stdout -l chi_sim /path/to/my/config_file
it gives me
硝酸嘛庸喹瓢膏
which is not in the dictionary at all. The best match is supposed to be硝酸咪康唑乳膏
which is included in the dictionary.I searched around and couldn't find a solution. Please help me out. Thank you.
The text was updated successfully, but these errors were encountered: