-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trying to add user words/patterns again: #2324
Conversation
- replace (shallow/copy) Dict::LoadLSTM with (full/original) Dict::Load in LSTMRecognizer::LoadDictionary - pass the member ParamsVectors of the Tesseract instance into LSTMRecognizer by: - extending its Load method with params ptr - extending its LoadDictionary likewise - after constructing inner CCUtil and Dict with default params, overwrite these with the true params (via new ParamUtils::ResetFromParams)
This fixes #403 and #960, but one needs to set |
What is the meaning of those strange Windows build errors? |
@bertsky
Do the changes use the LSTM ones in LSTMRecognizer::LoadDictionary? |
Oh, now I see. Then probably this approach is wrong altogether. So I guess this should only be about adding But the problem remains that these settings only enter the member params of So can someone at least answer if those 2 parameters are meant to be shared between pre-LSTM and LSTM processors? |
One way to achieve that would be to simply make them global again. But there is this statement in
Should we still feel bound by that mission, or can global params be used for shared interests? |
If you mean, user_words_file and user_patterns_file, my guess would be IMO, for user_words to be useful, rather than as just being a hint, it should give those user_words exclusively OR there should at least be a config to limit the results to user_words. |
// TODO(daria): remove GlobalParams() when all global Tesseract This is a comment from 8 years ago, most probably from Google's internal code. |
Please don't make the parameters global. Tesseract release notes Oct 21 2011 - V3.01
|
Thanks @amitdo for that hint! Then we will need a solution along the lines of Perhaps this is also a problem of the semantic of the CLI options |
Currently they override langdata and config settings for the outermost
I still need a good transfer mechanism, though. |
I have found a way. Cancelling in favour of #2328. |
Dict::LoadLSTM
with (full/original)
Dict::Load
in
LSTMRecognizer::LoadDictionary
Tesseract
instance intoLSTMRecognizer
by:Load
method with params ptrLoadDictionary
likewiseCCUtil
andDict
with default params, overwrite these
with the true params
(via new
ParamUtils::ResetFromParams
)