Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong coordinate in LTSM ocr mode and Japanese #1015

Open
hoangtocdo90 opened this issue Jun 30, 2017 · 14 comments
Open

wrong coordinate in LTSM ocr mode and Japanese #1015

hoangtocdo90 opened this issue Jun 30, 2017 · 14 comments

Comments

@hoangtocdo90
Copy link

hoangtocdo90 commented Jun 30, 2017

Hi all
i'm using Tesseract for get each char with Coordinate in image . I'm using ResultIterator with OCR MODE =2 (LTSM) and language = jpn.

tesseract::ResultIterator* ri = api->GetIterator();
int index_char = 0;
vector char_iterators;
do {
char *value = ri->GetUTF8Text(tesseract::RIL_SYMBOL);
//unknown value to space
if (value == nullptr || value == "")value = " ";
float conf = ri->Confidence(tesseract::RIL_SYMBOL);
ri->BoundingBox(tesseract::RIL_SYMBOL, &left, &top, &right, &bottom);
index_char++;
} while (ri->Next(tesseract::RIL_SYMBOL));
api->ClearAdaptiveClassifier();

Here is my program log and input image . You can see in り character i got wrong Coordinate . I tested using tsv and hocr but it's give me same result. Still wrong Coordinate .

Char value = こ left= 15 top = 14 right = 51 bottom = 51 conf = 99
Char value = ん left= 64 top = 9 right = 112 bottom = 54 conf = 99
Char value = ば left= 122 top = 5 right = 171 bottom = 54 conf = 99
Char value = ん left= 176 top = 9 right = 224 bottom = 54 conf = 99
Char value = は left= 234 top = 9 right = 281 bottom = 54 conf = 99
Char value = こ left= 295 top = 14 right = 331 bottom = 51 conf = 99
Char value = ん left= 344 top = 9 right = 392 bottom = 54 conf = 99
Char value = ば left= 402 top = 5 right = 445 bottom = 54 conf = 99
Char value = ん left= 456 top = 9 right = 497 bottom = 54 conf = 99
Char value = は left= 514 top = 9 right = 561 bottom = 54 conf = 99
Char value = ご left= 15 top = 79 right = 58 bottom = 126 conf = 99
Char value = 飯 left= 62 top = 80 right = 113 bottom = 130 conf = 99
Char value = 大 left= 120 top = 80 right = 225 bottom = 130 conf = 99
Char value = 盛 left= 242 top = 83 right = 260 bottom = 130 conf = 99
Char value = り left= 2328 top = 1616 right = 2328 bottom = 1616 conf = 99
Char value = 。 left= 289 top = 116 right = 305 bottom = 131 conf = 99

jpn msgothic exp0
And one more question . I'm try to and fonts in jpn data but may be i must re train from scratch. But i don't know actrually my jpn tessseract data (i'm downloaded from tessdata repository) how to make this?
I'm try download data from langdata repository make image from jpn.traintext and train it by using tesstrain.sh and Jtessboxeditor . But i got low accurary than i download from repository. Some body can tell me extractly how to make it!
Sorry for my bad english

@kandaman
Copy link

kandaman commented Jul 4, 2017

i got a same problem. i am using jpn.traindata.
i tried RIL_SYMBOL, RIL_WORD. RIL_SYMBOL is better.
A critical problem is ---- the recgnized character is so good but the position is too bad.
i need the pair of image and character, don't you?
If you have new information pls tell me.

thanks

@hoangtocdo90
Copy link
Author

I'm temple fix this by using this way
I'm using RIL_SYMBOL. in my case the wrong Coordinate usually appear in a end of lines or end of block
res_it->IsAtFinalElement(RIL_TEXTLINE, RIL_WORD)
res_it->IsAtFinalElement(RIL_PARA, RIL_WORD)
res_it->IsAtFinalElement(RIL_BLOCK, RIL_WORD)
when you get a wrong Coordinate you can predict a new coordinate by using the backforward of ResultIterator coordinate

@amitdo
Copy link
Collaborator

amitdo commented Jul 5, 2017

Char value = こ left= 15 top = 14 right = 51 bottom = 51 conf = 99
Char value = ん left= 64 top = 9 right = 112 bottom = 54 conf = 99
Char value = ば left= 122 top = 5 right = 171 bottom = 54 conf = 99
Char value = ん left= 176 top = 9 right = 224 bottom = 54 conf = 99
Char value = は left= 234 top = 9 right = 281 bottom = 54 conf = 99
Char value = こ left= 295 top = 14 right = 331 bottom = 51 conf = 99
Char value = ん left= 344 top = 9 right = 392 bottom = 54 conf = 99
Char value = ば left= 402 top = 5 right = 445 bottom = 54 conf = 99
Char value = ん left= 456 top = 9 right = 497 bottom = 54 conf = 99
Char value = は left= 514 top = 9 right = 561 bottom = 54 conf = 99
Char value = ご left= 15 top = 79 right = 58 bottom = 126 conf = 99
Char value = 飯 left= 62 top = 80 right = 113 bottom = 130 conf = 99
Char value = 大 left= 120 top = 80 right = 225 bottom = 130 conf = 99
Char value = 盛 left= 242 top = 83 right = 260 bottom = 130 conf = 99
Char value = り left= 2328 top = 1616 right = 2328 bottom = 1616 conf = 99
Char value = 。 left= 289 top = 116 right = 305 bottom = 131 conf = 99

Strange. It looks like a bug.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Jul 20, 2017

Please check if this is fixed by the latest set of commits by Ray.

jpn-1.txt
jpn-1.tsv.txt

level	page_num	block_num	par_num	line_num	word_num	left	top	width	height	conf	text
1	1	0	0	0	0	0	0	2550	470	-1	
2	1	1	0	0	0	104	96	2286	348	-1	
3	1	1	1	0	0	111	96	546	49	-1	
4	1	1	1	1	0	111	96	546	49	-1	
5	1	1	1	1	1	111	105	36	37	96	こ
5	1	1	1	1	2	160	100	48	45	96	ん
5	1	1	1	1	3	218	96	49	49	96	ば
5	1	1	1	1	4	272	100	48	45	96	ん
5	1	1	1	1	5	330	100	47	45	96	は
5	1	1	1	1	6	391	105	36	37	95	こ
5	1	1	1	1	7	440	100	48	45	96	ん
5	1	1	1	1	8	498	96	49	49	96	ば
5	1	1	1	1	9	552	100	48	45	96	ん
5	1	1	1	1	10	610	100	47	45	95	は
3	1	1	2	0	0	111	170	962	52	-1	
4	1	1	2	1	0	111	170	962	52	-1	
5	1	1	2	1	1	111	170	43	47	96	ご
5	1	1	2	1	2	158	171	107	50	95	飯
5	1	1	2	1	3	271	171	50	50	96	大
5	1	1	2	1	4	338	174	29	47	96	盛
5	1	1	2	1	5	0	0	2550	470	96	り
5	1	1	2	1	6	385	207	16	15	96	。
5	1	1	2	1	7	439	172	123	50	93	今
5	1	1	2	1	8	0	0	2550	470	95	年
5	1	1	2	1	9	567	171	65	51	96	は
5	1	1	2	1	10	624	173	87	48	96	初
5	1	1	2	1	11	0	0	2550	470	95	め
5	1	1	2	1	12	722	178	43	41	96	て
5	1	1	2	1	13	776	171	48	50	94	恋
5	1	1	2	1	14	832	173	101	48	96	人
5	1	1	2	1	15	944	171	48	50	96	出
5	1	1	2	1	16	1001	173	26	46	96	来
5	1	1	2	1	17	1021	191	25	28	96	た
5	1	1	2	1	18	1057	207	16	15	96	。
3	1	1	3	0	0	106	245	2284	126	-1	
4	1	1	3	1	0	106	245	2284	51	-1	

jpn

@hoangtocdo90
Copy link
Author

Thank sir

@hoangtocdo90
Copy link
Author

5 1 1 2 1 1 111 170 43 47 96 ご
5 1 1 2 1 2 158 171 107 50 95 飯
5 1 1 2 1 3 271 171 50 50 96 大
5 1 1 2 1 4 338 174 29 47 96 盛
5 1 1 2 1 5 0 0 2550 470 96 り
2 1 10 624 173 87 48 96 初
5 1 1 2 1 11 0 0 2550 470 95 め
5 1 1 2 1 12 722 178 43 41 96 て

Please check this . Still wrong coordinate in り and め character

@GHamrouni
Copy link

We are still able to reproduce it in the Arabic language in LSTM mode.
Most BBoxes are correct but there are some boxes that contain valid text and wrong coordinates (the region contained in the bbox is empty).

@SimonTheBaptist
Copy link

I'm getting the same behavior for Thai language in LSTM - BoundingBox() often returns the whole image size.
The image size was 400, 266. Here is a small portion of some results [X1, Y1; X2, Y2].
(As a side note, I'm using RIL_WORD, but it seems to behave like RIL_SYMBOL, I'm not sure why).

'ร' - Confidence: 94.3645 [0, 0; 400, 266]
'ม' - Confidence: 95.7061 [19, 68; 33, 77]
'า' - Confidence: 96.9703 [0, 0; 400, 266]
'ส' - Confidence: 96.976 [35, 67; 50, 77]

@wanghaisheng
Copy link

@amitdo sir could you show me where to get more info about how tesseract analyze input image to get the Coordinate of words/character and then recognize them through LSTM or old method and last combine the ocr result word with the coordinate ?

@amitdo
Copy link
Collaborator

amitdo commented Nov 17, 2017

@aniseddali
Copy link

This happened to me also in Arabic language. Here is an example that reproduce the problem.


struct OcrResult
{
	std::string text;
	cv::Rect box;
};
int main(int argc, char *argv[])
{
	tesseract::TessBaseAPI tesseract ;
	tesseract.Init("./data/tessdata/", "ara", tesseract::OcrEngineMode::OEM_LSTM_ONLY);
	tesseract.SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_WORD);
	PIX *patch_pix = pixRead(argv[1]);
	tesseract.SetImage(patch_pix);
	tesseract.Recognize(0);
	std::vector<OcrResult> ocrResults;
	tesseract::ResultIterator *ri = tesseract.GetIterator();
	tesseract::PageIteratorLevel level = tesseract::RIL_WORD ;
	if (ri != 0)
	{
		do
		{
			char *word = ri->GetUTF8Text(level);
			int left, top, right, bottom;
			if (ri->BoundingBox(level, &left, &top, &right, &bottom))
			{
				OcrResult res;
				res.box = cv::Rect(left,top,right - left,bottom - top);
				res.text = std::string(word);
				ocrResults.push_back(res);
			}
			delete[] word;
		} while (ri->Next(level));
	}
	cv::Mat image = cv::imread(argv[1]);
	cv::Mat DrawingImg = image.clone();
	for(int i=0;i<ocrResults.size();i++){
		cv::Rect rect = ocrResults[i].box; 
		cv::rectangle(DrawingImg, rect, cv::Scalar(255, 0, 0), 1);
		std::cout<<ocrResults[i].text<<std::endl;
		cv::imshow("DrawingImg",DrawingImg);
		cv::waitKey();
	}
}

I used OpenCV to draw boxes.
This image contains two arabic words. The recognition is correct for both words.
But the box position of the first word is wrong. (the word in the right)
The box is matching some noise on the top of the image.

boxes

And this is the original image.

box

@lpatruno
Copy link

lpatruno commented May 4, 2018

I'm also getting this bug for english text, though I can't provide the data files as they contain PII.

@atuyosi
Copy link
Contributor

atuyosi commented Aug 18, 2018

I got same issue on beta.4 with jpn.traineddata.

In my case, the image size(width, height) and the invalid coordinate value are correlated.
Even with the same letter, It's cause incorrect results depending on the position in the image.

$ tesseract -l jpn  'images/sample/test-jpn_01.jpg' stdout tsv | grep 596
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 167
1	1	0	0	0	0	0	0	596	118	-1
5	1	1	1	1	10	0	0	596	118	92	、
5	1	1	1	2	4	0	0	596	118	92	字
5	1	1	1	2	10	0	0	596	118	97	て
5	1	1	1	2	15	0	0	596	118	93	す

My test image size is 596x118. The same letter appears multiple times(ex. '字', 'て'), but the value of boundigbox is wrong only once.

test-jpn_01

FYI, In the above image, recognition of the character '日' incorrect by jpn.traineddata( traineddata_fast).

@amitdo
Copy link
Collaborator

amitdo commented Oct 15, 2018

Same issue as #1192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests