LSTM vs BLSTM vs MDLSTM #630

ghost · 2016-12-30T15:26:21Z

i understand that much of the new Tesseract 4.0 is using a customized implementation of Ocropus, relying basically on the new LSTM recognition engine.

But the main problem is that most of the decisions that are being taken focus mostly on English (Latin Languages) which already able to reach +95% recognition rates easily.
My concern is allowing the other languages such as Arabic to be able to reach the PRECISION CEILING.

Methods such as BLSTM (Bidirectional LSTM) , and the two-dimensional 2D LSTM which is called MDLSTM, can achieve without explicit segmentation of words, a character-level accuracies of 92 and 96% !!!!!! and I repeat, without explicit segmentation.

So my question is that, will there be plans to implement and ascend the current LSTM to a MDLSTM (Multi-dimensional LSTM), this will radically make ALL THE LANGUAGES ABLE TO PASS THAT PRECISION CEILING.

i am planing to engage in testing Tesseract 4.0 LSTM on the Arabic language, and wanting to post results in the future, i hope that there will be recognition improvement while testing.
Thank you Ray for your hard work, and all contributors, you are appreciated.

More information about BLSTM and MDLSTM:
https://www.nist.gov/sites/default/files/documents/itl/iad/mig/OpenHaRT2013_WorkshopPres_A2IA.pdf
http://www.a2ialab.com/lib/exe/fetch.php?media=presentations:icdar2015_chinese_slides.pdf
https://goo.gl/0wUNfm

amitdo · 2016-12-30T16:36:28Z

In principle, Tesseract is probably as accurate (or slightly more accurate) than ocropy/clstm.

Tesseract has official trained models for ~100 languages. ocropy has official models for English and German only. Unlike ocropus, Tesseract works on Windows.

BLSTM is implemented and used.

2D-LSTM is also implemented in the library. I think (not sure) it's not used by the released traineddata. Using 2D-LSTM means much longer time to train a model. and for OCRing printed text, the accuracy will not necessary be better than 1D-BLSTM.

BTW, ocropy doesn't have 2D-LSTM support.

Shreeshrii · 2016-12-30T16:56:18Z

Please see https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs - excuse the brevity, sent from mobile

…

On 30-Dec-2016 10:06 PM, "Amit D." ***@***.***> wrote: In principle, Tesseract is probably as accurate (or slightly more accurate) than ocropy/clstm. Tesseract has official trained models for ~100 languages. ocropy has official models for English and German only. Unlike ocropus, Tesseract works on Windows. BLSTM is implemented and used. 2DLSTM is also implemented in the library. I think (not sure) it's not used by the released traineddata. Using 2DLSTM means much longer time to train a model. and for OCRing printed text, the accuracy will not necessary be better than 1D-BLSTM. BTW, ocropy doesn't have 2D-LSTM support. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#630 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE2_oweaHUpMa-dyavVOg6KqTEwrrcoHks5rNTMTgaJpZM4LYPU8> .

amitdo · 2016-12-30T18:02:48Z

But the main problem is that most of the decisions that are being taken focus mostly on English (Latin Languages)

This is not really the situation with the LSTM engine.

The difference in accuracy between Latin script based langs and Arabic is due to

Better traineddata files for the Latin script based langs.
The 'complexity' of the script. Arabic is much more complex.

amitdo · 2016-12-30T18:08:32Z

Also, the OCR stage is dependent on the layout analysis stage which is weaker for Arabic.

amitdo · 2016-12-30T18:22:29Z

Shree, indic scripts are even more complex...

roozgar · 2016-12-30T18:27:51Z

i checked Arabic today with default trained data it have about 80% accuracy what are you looking for?

ghost · 2016-12-30T20:51:06Z

@amitdo Thanks for clearing things up, improved pre-processing may make 1D-LSTM outperform the more complex MDLSTM. You were right.
I see, the main issue is not the ocr engine directly, but is of analysis/segmentation/classification.
Perhaps, i should focus on a combination of Tesseract LSTM & a Computer Assisted Transcription method.
somewhat similar to: https://sites.google.com/site/paradiitproject/project-definition

@Shreeshrii So Tesseract 4.x has the capability of producing more sophisticated and complex structures.

@roozgar i was looking for a method that gain +85% recognition rate for Arabic language.
Tesseract 3.x was using cube for arabic that made me loose hope, But thanks to the developers of Tesseract 4.0 for introducing the new LSTM engine, the hope is back and the community is excited. I am looking forward to test this version after reading that you've got an 80% recognition.
@roozgar can you share your training process, the tif/box files and the traineddata.

roozgar · 2016-12-30T20:56:00Z

as i said i got the result with official trainedata i dont started to my own training yet...

ghost · 2016-12-30T22:59:35Z

@roozgar what operating system are you using?

Shreeshrii · 2016-12-31T03:21:04Z

Please see Ray's comment with accuracy figures in #40 I have found Hindi to have much greater accuracy with LSTM engine. - excuse the brevity, sent from mobile

roozgar · 2016-12-31T07:27:17Z

@shree ubuntu 16lts

amitdo · 2017-01-08T13:04:04Z

This is what is used for most of the languages:
https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs#full-example-a-multi-layer-lstm-capable-of-high-quality-ocr

I think it is 2D-LSTM.

ghost · 2017-01-09T20:03:26Z

@amitdo thanks, I have been told that 4.x version of tesseract would be the next big leap, now I believe.

amitdo · 2017-10-25T22:54:34Z

https://github.com/tensorflow/tensorflow/blob/v1.4.0-rc1/tensorflow/contrib/ndlstm/README.md

olfaa · 2018-04-20T13:09:16Z

1/ please how i can use blstm to segment a page into textline.
2/ can you give me a blstm architecture model that relates to online document page segmentation of text
Thank you:)

ghost closed this as completed Jan 2, 2017

amitdo mentioned this issue Apr 10, 2017

Arabic and RTL languages #361

Closed

ChillarAnand mentioned this issue Jul 20, 2017

ocropy vs tesseract ocropus-archive/DUP-ocropy#235

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM vs BLSTM vs MDLSTM #630

LSTM vs BLSTM vs MDLSTM #630

ghost commented Dec 30, 2016 •

edited by ghost

Loading

amitdo commented Dec 30, 2016 •

edited

Loading

Shreeshrii commented Dec 30, 2016 via email •

edited

Loading

amitdo commented Dec 30, 2016

amitdo commented Dec 30, 2016 •

edited

Loading

amitdo commented Dec 30, 2016 •

edited

Loading

roozgar commented Dec 30, 2016 via email

ghost commented Dec 30, 2016 •

edited by ghost

Loading

roozgar commented Dec 30, 2016 via email

ghost commented Dec 30, 2016

Shreeshrii commented Dec 31, 2016 via email

roozgar commented Dec 31, 2016 via email

amitdo commented Jan 8, 2017

ghost commented Jan 9, 2017

amitdo commented Oct 25, 2017

olfaa commented Apr 20, 2018

LSTM vs BLSTM vs MDLSTM #630

LSTM vs BLSTM vs MDLSTM #630

Comments

ghost commented Dec 30, 2016 • edited by ghost Loading

amitdo commented Dec 30, 2016 • edited Loading

Shreeshrii commented Dec 30, 2016 via email • edited Loading

amitdo commented Dec 30, 2016

amitdo commented Dec 30, 2016 • edited Loading

amitdo commented Dec 30, 2016 • edited Loading

roozgar commented Dec 30, 2016 via email

ghost commented Dec 30, 2016 • edited by ghost Loading

roozgar commented Dec 30, 2016 via email

ghost commented Dec 30, 2016

Shreeshrii commented Dec 31, 2016 via email

roozgar commented Dec 31, 2016 via email

amitdo commented Jan 8, 2017

ghost commented Jan 9, 2017

amitdo commented Oct 25, 2017

olfaa commented Apr 20, 2018

ghost commented Dec 30, 2016 •

edited by ghost

Loading

amitdo commented Dec 30, 2016 •

edited

Loading

Shreeshrii commented Dec 30, 2016 via email •

edited

Loading

amitdo commented Dec 30, 2016 •

edited

Loading

amitdo commented Dec 30, 2016 •

edited

Loading

ghost commented Dec 30, 2016 •

edited by ghost

Loading