-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrong coordinate in LTSM ocr mode and Japanese #1015
Comments
i got a same problem. i am using jpn.traindata. thanks |
I'm temple fix this by using this way |
Strange. It looks like a bug. |
Please check if this is fixed by the latest set of commits by Ray.
|
Thank sir |
5 1 1 2 1 1 111 170 43 47 96 ご Please check this . Still wrong coordinate in り and め character |
We are still able to reproduce it in the Arabic language in LSTM mode. |
I'm getting the same behavior for Thai language in LSTM - BoundingBox() often returns the whole image size. 'ร' - Confidence: 94.3645 [0, 0; 400, 266] |
@amitdo sir could you show me where to get more info about how tesseract analyze input image to get the Coordinate of words/character and then recognize them through LSTM or old method and last combine the ocr result word with the coordinate ? |
See here: |
This happened to me also in Arabic language. Here is an example that reproduce the problem.
I used OpenCV to draw boxes. And this is the original image. |
I'm also getting this bug for english text, though I can't provide the data files as they contain PII. |
I got same issue on beta.4 with jpn.traineddata. In my case, the image size(width, height) and the invalid coordinate value are correlated.
My test image size is 596x118. The same letter appears multiple times(ex. '字', 'て'), but the value of boundigbox is wrong only once. FYI, In the above image, recognition of the character '日' incorrect by jpn.traineddata( traineddata_fast). |
Same issue as #1192 |
Hi all
i'm using Tesseract for get each char with Coordinate in image . I'm using ResultIterator with OCR MODE =2 (LTSM) and language = jpn.
Here is my program log and input image . You can see in り character i got wrong Coordinate . I tested using tsv and hocr but it's give me same result. Still wrong Coordinate .
And one more question . I'm try to and fonts in jpn data but may be i must re train from scratch. But i don't know actrually my jpn tessseract data (i'm downloaded from tessdata repository) how to make this?
I'm try download data from langdata repository make image from jpn.traintext and train it by using tesstrain.sh and Jtessboxeditor . But i got low accurary than i download from repository. Some body can tell me extractly how to make it!
Sorry for my bad english
The text was updated successfully, but these errors were encountered: