Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate multiple lstmf files with a single tesseract invocation #2899

Open
lorenzob opened this issue Feb 26, 2020 · 1 comment
Open

Generate multiple lstmf files with a single tesseract invocation #2899

lorenzob opened this issue Feb 26, 2020 · 1 comment

Comments

@lorenzob
Copy link

Environment

Current Behavior:

The tesseract command line tool is able to generate only one lstmf file at a time.

When using single line images for training this means to generate 20/50k lstmf files or more and this can take hours.

Expected Behavior:

It should be possible to generate lstmf files for a at least a few thousands files without incurring in the overhead of starting a new tesseract process for each file.

I would expect the execution time to be reduced to a few minutes if not seconds.

Suggested Fix:

The executable should accept a list of files (or a folder and process all the tif files found there).

@stweil
Copy link
Member

stweil commented Feb 26, 2020

I wonder whether lstmf files are useful at all. The training process could create the necessary data in memory on the fly instead of reading lstmf files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants