-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM: Training - missing file /langdata/radical-stroke.txt #542
Comments
Fixed in tesseract-ocr/langdata@3299c60. |
Thanks, Ray.
…On 07-Dec-2016 10:29 PM, "theraysmith" ***@***.***> wrote:
Fixed in ***@***.***
<tesseract-ocr/langdata@3299c60>
.
I'm retesting now. It seems the tutorial works without it, so I imagine
the accuracy numbers in the tutorial will come out different.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#542 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o1Gwtcn4WKaDYuXl83XfkwNT8-mUks5rFuYMgaJpZM4LGOuy>
.
|
I've just updated the numbers in the training tutorial. |
Ray, Please add what are the minimum requirements for doing LSTM training in terms of hardware, software, etc. I realized after running the process that I needed to build Scrollview.jar. I am not sure whether it is REQUIRED or only optional for those who would like to see visual debugging output. It is not built as part of the regular make install of tesseract and training tools.
I think that is probably dependent on the hardware used. I did not get any progress for more than one and a half hour - not sure whether it was because I did not have scrollview.jar at that point. I ran it later with 500 iterations. I think it maybe helpful to have just a single iteration as the first step in tutorial to make sure that the process is working. Also, the case that I think most people would like to use for LSTM training would be to use Finetuning to add a font to the existing trainingdata. It would be helpful to have a separate page on wiki for it. It would also be great to know how to add training data based on scanned images for typefaces that are not available as fonts. I will try to test 'finetuning' the Hindi traineddata for Sanskrit and post here. |
p;lease help me out : Config file is optional, continuing... |
tesseract "data/Apex-ground-truth/eng_64.tif" data/Apex-ground-truth/eng_64 --psm 13 lstm.train
|
did you find the solution |
Before running |
hello dear , thx for ur suggestion , I did follow it and currently I do have langdata downloaded into tesstrain but once I run this cmd : I'd greatly appreciate ur help alot |
I cannot give you an easy answer to your question. I am still learning how to train tesseract and I feel like Hercules in Augean stables. what is in your directory *.lstmf files are generated from *.box and *.tiff ( or *.png) files by these lines in Makefile: https://github.com/tesseract-ocr/tesstrain/blob/main/Makefile#L250-L263
for each pair of box and image files there will be one *.lstmf file. PSM is set to 13 (as can be seen above in Makefile). The file |
@theraysmith I am trying to run the commands given in training tutorial.
the above messages are from basetrain.log.
Does the langdata repo need to be updated for 4.0 alpha?
.
The text was updated successfully, but these errors were encountered: