Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese gives an error #1441

Closed
niksedk opened this issue Mar 29, 2018 · 10 comments
Closed

Japanese gives an error #1441

niksedk opened this issue Mar 29, 2018 · 10 comments

Comments

@niksedk
Copy link

niksedk commented Mar 29, 2018

When running tesseract cmd line with Japanese (latest language file - tried both tessdata and tessdata_best) this error message is returned: read_params_file: parameter not found: textord_tabfind_vertical_horizontal_mix

Tesseract version: latest (2cc46fa), build via vcpkg install tesseract:x64-windows --HEAD

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Mar 29, 2018

@stweil Is it possible that some deprecated parameters are still there in config files?

@niksedk Please test with tessdata_fast and report. Thanks! tessdata_fast is the recommended repository to use for OCR with the new LSTM engine.

@amitdo
Copy link
Collaborator

amitdo commented Mar 29, 2018

Is it possible that some deprecated parameters are still there in config files?

Yes.

That parameter was removed in #1418

@stweil
Copy link
Member

stweil commented Mar 29, 2018

It looks like we should add a testcase which simply loads all languages.

@niksedk
Copy link
Author

niksedk commented Mar 29, 2018

@Shreeshrii: thx for the info,tessdata_fast does not crash :)

@stweil
Copy link
Member

stweil commented Mar 29, 2018

According to my tests, jpn.traineddata and jpn_vert.traineddata` in tessdata are the only files which need to be fixed. This is addressed by tesseract-ocr/tessdata#92.

All other languages and also the files from tessdata_best and tessdata_fast look good.

@amitdo
Copy link
Collaborator

amitdo commented Mar 29, 2018

@stweil, what did you look for? Just textord_tabfind_vertical_horizontal_mix
or all removed parameters?

@amitdo
Copy link
Collaborator

amitdo commented Mar 29, 2018

@stweil
Copy link
Member

stweil commented Mar 29, 2018

I first looked for all parameters used in tessdata and checked whether they still exist in tesseract git master. In addition I looked for textord_tabfind_vertical_horizontal_mix (which was the only problem reported from the first test) in tessdata_best and tessdata_fast.

Finally, I also checked all parameters in tesseract/tessdata (no problem found):

cat $(ls tessdata/*configs/*|grep -v Make)|sort|uniq|grep -v '^#'|sed 's/[[:space:]].*//'|uniq|grep ^[a-z] >/tmp/parameters.txt
for p in $(cat /tmp/parameters.txt|sort|uniq); do git grep -q $p || echo $p not found; done

@niksedk niksedk closed this as completed Mar 29, 2018
@delonzhou
Copy link

delonzhou commented May 28, 2018

@niksedk @stweil I found the issue still exists in the latest tessdata_best.

@stweil
Copy link
Member

stweil commented May 28, 2018

You are right. See tesseract-ocr/tessdata_best#28.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants