Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Inconsistent language order in tesseract calls #1112

Closed
abwiersma opened this issue Jun 16, 2023 · 2 comments
Closed

[Bug]: Inconsistent language order in tesseract calls #1112

abwiersma opened this issue Jun 16, 2023 · 2 comments
Assignees
Labels

Comments

@abwiersma
Copy link
Contributor

What were you trying to do?

I was trying to get consistent hocr results from a list of language models, but was finding that even though the list of languages supplied to ocrmypdf was consistent, the list of languages passed on to tesseract was randomly sorted.

So for example:
ocrmypdf -l lang1+lang2+lang3
would result in a random permutation of the -l parameter being passed on to tesseract, something like:
'tesseract', '-l', 'lang2+lang1+lang3

This breaks consistent language parsing as Tesseract has a sense of the primary language being given preference over the secondary languages.

Where are you installing from?

PyPI (pip, poetry, pipx, etc.)

What operating system are you working on?

Linux

Relevant log output

No response

@abwiersma
Copy link
Contributor Author

Am writing a PR to fix this issue

@abwiersma
Copy link
Contributor Author

Fixed with PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants