wrong coordinate in LTSM ocr mode and Japanese #1015

hoangtocdo90 · 2017-06-30T09:31:52Z

Hi all
i'm using Tesseract for get each char with Coordinate in image . I'm using ResultIterator with OCR MODE =2 (LTSM) and language = jpn.

tesseract::ResultIterator* ri = api->GetIterator();
int index_char = 0;
vector char_iterators;
do {
char *value = ri->GetUTF8Text(tesseract::RIL_SYMBOL);
//unknown value to space
if (value == nullptr || value == "")value = " ";
float conf = ri->Confidence(tesseract::RIL_SYMBOL);
ri->BoundingBox(tesseract::RIL_SYMBOL, &left, &top, &right, &bottom);
index_char++;
} while (ri->Next(tesseract::RIL_SYMBOL));
api->ClearAdaptiveClassifier();

Here is my program log and input image . You can see in り character i got wrong Coordinate . I tested using tsv and hocr but it's give me same result. Still wrong Coordinate .

Char value = こ left= 15 top = 14 right = 51 bottom = 51 conf = 99
Char value = ん left= 64 top = 9 right = 112 bottom = 54 conf = 99
Char value = ば left= 122 top = 5 right = 171 bottom = 54 conf = 99
Char value = ん left= 176 top = 9 right = 224 bottom = 54 conf = 99
Char value = は left= 234 top = 9 right = 281 bottom = 54 conf = 99
Char value = こ left= 295 top = 14 right = 331 bottom = 51 conf = 99
Char value = ん left= 344 top = 9 right = 392 bottom = 54 conf = 99
Char value = ば left= 402 top = 5 right = 445 bottom = 54 conf = 99
Char value = ん left= 456 top = 9 right = 497 bottom = 54 conf = 99
Char value = は left= 514 top = 9 right = 561 bottom = 54 conf = 99
Char value = ご left= 15 top = 79 right = 58 bottom = 126 conf = 99
Char value = 飯 left= 62 top = 80 right = 113 bottom = 130 conf = 99
Char value = 大 left= 120 top = 80 right = 225 bottom = 130 conf = 99
Char value = 盛 left= 242 top = 83 right = 260 bottom = 130 conf = 99
Char value = り left= 2328 top = 1616 right = 2328 bottom = 1616 conf = 99
Char value = 。 left= 289 top = 116 right = 305 bottom = 131 conf = 99

And one more question . I'm try to and fonts in jpn data but may be i must re train from scratch. But i don't know actrually my jpn tessseract data (i'm downloaded from tessdata repository) how to make this?
I'm try download data from langdata repository make image from jpn.traintext and train it by using tesstrain.sh and Jtessboxeditor . But i got low accurary than i download from repository. Some body can tell me extractly how to make it!
Sorry for my bad english

kandaman · 2017-07-04T13:45:01Z

i got a same problem. i am using jpn.traindata.
i tried RIL_SYMBOL, RIL_WORD. RIL_SYMBOL is better.
A critical problem is ---- the recgnized character is so good but the position is too bad.
i need the pair of image and character, don't you?
If you have new information pls tell me.

thanks

hoangtocdo90 · 2017-07-05T03:43:02Z

I'm temple fix this by using this way
I'm using RIL_SYMBOL. in my case the wrong Coordinate usually appear in a end of lines or end of block
res_it->IsAtFinalElement(RIL_TEXTLINE, RIL_WORD)
res_it->IsAtFinalElement(RIL_PARA, RIL_WORD)
res_it->IsAtFinalElement(RIL_BLOCK, RIL_WORD)
when you get a wrong Coordinate you can predict a new coordinate by using the backforward of ResultIterator coordinate

amitdo · 2017-07-05T07:58:12Z

Char value = こ left= 15 top = 14 right = 51 bottom = 51 conf = 99
Char value = ん left= 64 top = 9 right = 112 bottom = 54 conf = 99
Char value = ば left= 122 top = 5 right = 171 bottom = 54 conf = 99
Char value = ん left= 176 top = 9 right = 224 bottom = 54 conf = 99
Char value = は left= 234 top = 9 right = 281 bottom = 54 conf = 99
Char value = こ left= 295 top = 14 right = 331 bottom = 51 conf = 99
Char value = ん left= 344 top = 9 right = 392 bottom = 54 conf = 99
Char value = ば left= 402 top = 5 right = 445 bottom = 54 conf = 99
Char value = ん left= 456 top = 9 right = 497 bottom = 54 conf = 99
Char value = は left= 514 top = 9 right = 561 bottom = 54 conf = 99
Char value = ご left= 15 top = 79 right = 58 bottom = 126 conf = 99
Char value = 飯 left= 62 top = 80 right = 113 bottom = 130 conf = 99
Char value = 大 left= 120 top = 80 right = 225 bottom = 130 conf = 99
Char value = 盛 left= 242 top = 83 right = 260 bottom = 130 conf = 99
Char value = り left= 2328 top = 1616 right = 2328 bottom = 1616 conf = 99
Char value = 。 left= 289 top = 116 right = 305 bottom = 131 conf = 99

Strange. It looks like a bug.

Shreeshrii · 2017-07-20T10:00:20Z

Please check if this is fixed by the latest set of commits by Ray.

jpn-1.txt
jpn-1.tsv.txt

level	page_num	block_num	par_num	line_num	word_num	left	top	width	height	conf	text
1	1	0	0	0	0	0	0	2550	470	-1	
2	1	1	0	0	0	104	96	2286	348	-1	
3	1	1	1	0	0	111	96	546	49	-1	
4	1	1	1	1	0	111	96	546	49	-1	
5	1	1	1	1	1	111	105	36	37	96	こ
5	1	1	1	1	2	160	100	48	45	96	ん
5	1	1	1	1	3	218	96	49	49	96	ば
5	1	1	1	1	4	272	100	48	45	96	ん
5	1	1	1	1	5	330	100	47	45	96	は
5	1	1	1	1	6	391	105	36	37	95	こ
5	1	1	1	1	7	440	100	48	45	96	ん
5	1	1	1	1	8	498	96	49	49	96	ば
5	1	1	1	1	9	552	100	48	45	96	ん
5	1	1	1	1	10	610	100	47	45	95	は
3	1	1	2	0	0	111	170	962	52	-1	
4	1	1	2	1	0	111	170	962	52	-1	
5	1	1	2	1	1	111	170	43	47	96	ご
5	1	1	2	1	2	158	171	107	50	95	飯
5	1	1	2	1	3	271	171	50	50	96	大
5	1	1	2	1	4	338	174	29	47	96	盛
5	1	1	2	1	5	0	0	2550	470	96	り
5	1	1	2	1	6	385	207	16	15	96	。
5	1	1	2	1	7	439	172	123	50	93	今
5	1	1	2	1	8	0	0	2550	470	95	年
5	1	1	2	1	9	567	171	65	51	96	は
5	1	1	2	1	10	624	173	87	48	96	初
5	1	1	2	1	11	0	0	2550	470	95	め
5	1	1	2	1	12	722	178	43	41	96	て
5	1	1	2	1	13	776	171	48	50	94	恋
5	1	1	2	1	14	832	173	101	48	96	人
5	1	1	2	1	15	944	171	48	50	96	出
5	1	1	2	1	16	1001	173	26	46	96	来
5	1	1	2	1	17	1021	191	25	28	96	た
5	1	1	2	1	18	1057	207	16	15	96	。
3	1	1	3	0	0	106	245	2284	126	-1	
4	1	1	3	1	0	106	245	2284	51	-1

hoangtocdo90 · 2017-07-25T13:11:38Z

Thank sir

hoangtocdo90 · 2017-08-31T16:47:10Z

5 1 1 2 1 1 111 170 43 47 96 ご
5 1 1 2 1 2 158 171 107 50 95 飯
5 1 1 2 1 3 271 171 50 50 96 大
5 1 1 2 1 4 338 174 29 47 96 盛
5 1 1 2 1 5 0 0 2550 470 96 り
2 1 10 624 173 87 48 96 初
5 1 1 2 1 11 0 0 2550 470 95 め
5 1 1 2 1 12 722 178 43 41 96 て

Please check this . Still wrong coordinate in り and め character

GHamrouni · 2017-10-13T13:52:48Z

We are still able to reproduce it in the Arabic language in LSTM mode.
Most BBoxes are correct but there are some boxes that contain valid text and wrong coordinates (the region contained in the bbox is empty).

SimonTheBaptist · 2017-10-16T08:30:34Z

I'm getting the same behavior for Thai language in LSTM - BoundingBox() often returns the whole image size.
The image size was 400, 266. Here is a small portion of some results [X1, Y1; X2, Y2].
(As a side note, I'm using RIL_WORD, but it seems to behave like RIL_SYMBOL, I'm not sure why).

'ร' - Confidence: 94.3645 [0, 0; 400, 266]
'ม' - Confidence: 95.7061 [19, 68; 33, 77]
'า' - Confidence: 96.9703 [0, 0; 400, 266]
'ส' - Confidence: 96.976 [35, 67; 50, 77]

wanghaisheng · 2017-11-10T02:37:02Z

@amitdo sir could you show me where to get more info about how tesseract analyze input image to get the Coordinate of words/character and then recognize them through LSTM or old method and last combine the ocr result word with the coordinate ?

amitdo · 2017-11-17T09:48:48Z

@wanghaisheng

See here:
https://github.com/tesseract-ocr/tesseract/blob/master/lstm/recodebeam.cpp
Search for 'box', 'xcoords', 'blob'

aniseddali · 2017-12-15T17:27:37Z

This happened to me also in Arabic language. Here is an example that reproduce the problem.


struct OcrResult
{
	std::string text;
	cv::Rect box;
};
int main(int argc, char *argv[])
{
	tesseract::TessBaseAPI tesseract ;
	tesseract.Init("./data/tessdata/", "ara", tesseract::OcrEngineMode::OEM_LSTM_ONLY);
	tesseract.SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_WORD);
	PIX *patch_pix = pixRead(argv[1]);
	tesseract.SetImage(patch_pix);
	tesseract.Recognize(0);
	std::vector<OcrResult> ocrResults;
	tesseract::ResultIterator *ri = tesseract.GetIterator();
	tesseract::PageIteratorLevel level = tesseract::RIL_WORD ;
	if (ri != 0)
	{
		do
		{
			char *word = ri->GetUTF8Text(level);
			int left, top, right, bottom;
			if (ri->BoundingBox(level, &left, &top, &right, &bottom))
			{
				OcrResult res;
				res.box = cv::Rect(left,top,right - left,bottom - top);
				res.text = std::string(word);
				ocrResults.push_back(res);
			}
			delete[] word;
		} while (ri->Next(level));
	}
	cv::Mat image = cv::imread(argv[1]);
	cv::Mat DrawingImg = image.clone();
	for(int i=0;i<ocrResults.size();i++){
		cv::Rect rect = ocrResults[i].box; 
		cv::rectangle(DrawingImg, rect, cv::Scalar(255, 0, 0), 1);
		std::cout<<ocrResults[i].text<<std::endl;
		cv::imshow("DrawingImg",DrawingImg);
		cv::waitKey();
	}
}

I used OpenCV to draw boxes.
This image contains two arabic words. The recognition is correct for both words.
But the box position of the first word is wrong. (the word in the right)
The box is matching some noise on the top of the image.

And this is the original image.

lpatruno · 2018-05-04T17:52:20Z

I'm also getting this bug for english text, though I can't provide the data files as they contain PII.

atuyosi · 2018-08-18T14:33:21Z

I got same issue on beta.4 with jpn.traineddata.

In my case, the image size(width, height) and the invalid coordinate value are correlated.
Even with the same letter,　It's cause incorrect results depending on the position in the image.

$ tesseract -l jpn  'images/sample/test-jpn_01.jpg' stdout tsv | grep 596
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 167
1	1	0	0	0	0	0	0	596	118	-1
5	1	1	1	1	10	0	0	596	118	92	、
5	1	1	1	2	4	0	0	596	118	92	字
5	1	1	1	2	10	0	0	596	118	97	て
5	1	1	1	2	15	0	0	596	118	93	す

My test image size is 596x118. The same letter appears multiple times(ex. '字', 'て'), but the value of boundigbox is wrong only once.

FYI, In the above image, recognition of the character '日' incorrect by jpn.traineddata( traineddata_fast).

amitdo · 2018-10-15T10:18:42Z

Same issue as #1192

Shreeshrii mentioned this issue Jul 21, 2017

Box File disorder, Arabic Language #648

Open

hoangtocdo90 closed this as completed Jul 25, 2017

hoangtocdo90 reopened this Aug 31, 2017

Shreeshrii mentioned this issue Sep 13, 2017

GetWords can not get right BoundingBox().x and BoundingBox().y #1124

Closed

amitdo mentioned this issue Apr 30, 2018

psm 3 and psm 6 skip different parts of text based on font size #538

Open

amitdo mentioned this issue Jul 31, 2018

Incorrect character coordinates #1810

Closed

troplin mentioned this issue Sep 14, 2018

RFC: Tesseract 4.0.0 – open tasks #1423

Closed

zdenop added training accuracy labels Sep 30, 2018

amitdo mentioned this issue Oct 4, 2018

Noise characters recognized with bbox as the entire page #1192

Open

RicketyRick mentioned this issue Dec 18, 2019

Overlapping Character Boundingboxes #2825

Open

amitdo added the bounding box label Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong coordinate in LTSM ocr mode and Japanese #1015

wrong coordinate in LTSM ocr mode and Japanese #1015

hoangtocdo90 commented Jun 30, 2017 •

edited

Loading

kandaman commented Jul 4, 2017

hoangtocdo90 commented Jul 5, 2017

amitdo commented Jul 5, 2017

Shreeshrii commented Jul 20, 2017 •

edited

Loading

hoangtocdo90 commented Jul 25, 2017

hoangtocdo90 commented Aug 31, 2017

GHamrouni commented Oct 13, 2017

SimonTheBaptist commented Oct 16, 2017

wanghaisheng commented Nov 10, 2017

amitdo commented Nov 17, 2017

aniseddali commented Dec 15, 2017

lpatruno commented May 4, 2018

atuyosi commented Aug 18, 2018

amitdo commented Oct 15, 2018

wrong coordinate in LTSM ocr mode and Japanese #1015

wrong coordinate in LTSM ocr mode and Japanese #1015

Comments

hoangtocdo90 commented Jun 30, 2017 • edited Loading

kandaman commented Jul 4, 2017

hoangtocdo90 commented Jul 5, 2017

amitdo commented Jul 5, 2017

Shreeshrii commented Jul 20, 2017 • edited Loading

hoangtocdo90 commented Jul 25, 2017

hoangtocdo90 commented Aug 31, 2017

GHamrouni commented Oct 13, 2017

SimonTheBaptist commented Oct 16, 2017

wanghaisheng commented Nov 10, 2017

amitdo commented Nov 17, 2017

aniseddali commented Dec 15, 2017

lpatruno commented May 4, 2018

atuyosi commented Aug 18, 2018

amitdo commented Oct 15, 2018

hoangtocdo90 commented Jun 30, 2017 •

edited

Loading

Shreeshrii commented Jul 20, 2017 •

edited

Loading