Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tesstrain.sh script #92

Merged
merged 6 commits into from
Sep 10, 2015
Merged

Conversation

nickjwhite
Copy link

Improvements to the tesstrain.sh script.

The only difference from default usage is that the --bin_dir option has been removed, in favour of $PATH.

See the commit log for details.

…ided

The --fontlist argument to tesstrain.sh was always ignored, even if
the language had no specific fonts specified in language-specific.sh.

Change this behaviour so the --fontlist argument is used if no specifc
fonts are selected by language-specific.sh.
Previously the fonts specified in language-selection.sh would override
any specified on the command line.

This changes language-specific.sh from overriding a user request to
just setting the default fonts if none are specified with --fontlist.
The fontconfig initialisation hardcodes using Arial. However it may
not be available, whereas the fonts being used later will be, so use
one of them for initialisation instead.
The --bin_dir option to tesstrain.sh is not useful, as $PATH does the
same job much better, so switch to relying on that instead.

This also makes the code a bit more readable, as it removes the need
to refer to binaries as COMMAND_NAME_EXE rather than just command_name.
This flag can be used to specify multiple different exposure levels
for a training. There was some code already in tesstrain_utils.sh
to deal with multiple exposure levels, so it looks like this
functionality was always intended.

The default usage does not change, with exposure level 0 being the
only one used if --exposures is not used.
mktemp is a better idea for security, as well as enabling users to
specify a different directory using the TMPDIR environment variable,
which is useful if /tmp is a small tmpfs.

Also fix a bug where the first few log messages were failing as the
workspace directory wasn't been created early enough.
@nickjwhite nickjwhite changed the title Bettertesstrain Improve tesstrain.sh script Sep 10, 2015
zdenop added a commit that referenced this pull request Sep 10, 2015
@zdenop zdenop merged commit b216f6f into tesseract-ocr:master Sep 10, 2015
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants