-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to set "Init Only" parameters (user_word_suffix, etc.) #613
Comments
I looked it up, and it seems like the init-only parameters are few in number and relatively fringe (notably disabling various dictionaries). However, I agree that it would be nice to have some way for advanced users to specify a config file (like is possible on desktop). I will look into whether this can be easily added in a future release. |
This feature has been added to the dev/v4 branch, and will be released with version 4. If you would like to test before then, instructions are in #662. To easily verify that these options are indeed being set, I am attaching a test image with significantly different results for the legacy model ( Results with
Results with
|
See #662 for explanation of Tesseract.js Version 4 changes. List below is auto-generated from commits. * Added image preprocessing functions (rotate + save images) * Updated createWorker to be async * Reworked createWorker to be async and throw errors per #654 * Reworked createWorker to be async and throw errors per #654 * Edited detect to return null when detection fails rather than throwing error per #526 * Updated types per #606 and #580 (#663) (#664) * Removed unused files * Added savePDF option to recognize per #488; cleaned up code for linter * Updated download-pdf example for node to use new savePDF option * Added OutputFormats option/interface for setting output * Allowed for Tesseract parameters to be set through recognition options per #665 * Updated docs * Edited loadLanguage to no longer overwrite cache with data from cache per #666 * Added interface for setting 'init only' options per #613 * Wrapped caching in try block per #609 * Fixed unit tests * Updated setImage to resolve memory leak per #678 * Added debug output option per #681 * Fixed bug with saving images per #588 * Updated examples * Updated readme and Tesseract.js-core version
Closing as this was added in Version 4. |
I would like to use the command-line parameters "user_word_suffix", "load_freq_dawg", and "load_system_dawg". After sorting through a lot of documentation, and looking through a lot of code, I realized that these are "init only" parameters. In the TessBaseAPI code, they need to be passed to Init(), either as a set of keys/values or in a config file. Setting the parameters after initialization doesn't work because the traineddata files have already been read and the dictionaries formed.
Suggested fix:
Add a config filename optional parameter (string) to worker.Initialize(...) that gets passed to api.Init(...).
Other fixes:
Add a worker.SetInitParameters() function, just like worker.SetParameters(), that must be called before worker.Initialize, and pass those keys/values to api.Init().
Add a "initParams" optional parameter to worker.Initialize, which contains key/value pairs that get passed to api.Init
I'm suggesting the config file option because it feels like the least work to get the desired result.
The text was updated successfully, but these errors were encountered: