You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wanting kind of a general help, cause I'm a bit lost, so sorry if it's something dumb.
I'm wanting to train the tesseract model to be good with brazilian car lincese-plate characters, so i've used a regex text generator to generate 100.000 lines of characters in a way they have a format kinda like what we use here...
Then, i have downloaded the lstm files and Inside the eng lstm folder, i've replaced the content inside eng.training_text with the plate-like text I generated, cause I the previous content would have characters and text format I won't use (I only need the [A-Z] and [0-9] characters)
I've put the eng lang cause its needed, but there would be no specific language actually there, cause its only license plates characters... right?
After this command, i got tif, lstmf and box files for the fonts I've used, but they have multiple lines and multiple pages (200)
After looking the docs for a while, I've seen that with #7 script you can transform png pages to tif one-line-image with the respective transcriptions... but i didn't see a way to do that with tif images
So I wanted to ask the following:
Can I make the tif generation without specifying the language?
Can i take the multi-line / multi-page tif files and transform it to one-line tif that are needed for training?
Am I doing this training process the right way or am I complicating things?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
These show result of a test training I did by finetuning eng.traineddata.
As the plot shows, waiting for training to reach the target error rate leads to overfitting. Best results may be seen by using the traineddata files from the 400-700 checkpoints. You can test with real life images and verify results.
Also, as @stweil had mentioned recently in a related thread, you can finetune with 100+ real life single line images of license plates and their ground-truth using tesstrain Makefile.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
stalebot
added
the
stale
Issues which require input by the reporter which is not provided
label
Apr 16, 2022
Hello, good morning :)
I'm wanting kind of a general help, cause I'm a bit lost, so sorry if it's something dumb.
I'm wanting to train the tesseract model to be good with brazilian car lincese-plate characters, so i've used a regex text generator to generate 100.000 lines of characters in a way they have a format kinda like what we use here...
Then, i have downloaded the lstm files and Inside the eng lstm folder, i've replaced the content inside eng.training_text with the plate-like text I generated, cause I the previous content would have characters and text format I won't use (I only need the [A-Z] and [0-9] characters)
After that, i used the following:
I've put the eng lang cause its needed, but there would be no specific language actually there, cause its only license plates characters... right?
After this command, i got tif, lstmf and box files for the fonts I've used, but they have multiple lines and multiple pages (200)
After looking the docs for a while, I've seen that with #7 script you can transform png pages to tif one-line-image with the respective transcriptions... but i didn't see a way to do that with tif images
So I wanted to ask the following:
Thanks in advance!
The text was updated successfully, but these errors were encountered: