Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM vs BLSTM vs MDLSTM #630

Closed
ghost opened this issue Dec 30, 2016 · 15 comments
Closed

LSTM vs BLSTM vs MDLSTM #630

ghost opened this issue Dec 30, 2016 · 15 comments

Comments

@ghost
Copy link

ghost commented Dec 30, 2016

i understand that much of the new Tesseract 4.0 is using a customized implementation of Ocropus, relying basically on the new LSTM recognition engine.

But the main problem is that most of the decisions that are being taken focus mostly on English (Latin Languages) which already able to reach +95% recognition rates easily.
My concern is allowing the other languages such as Arabic to be able to reach the PRECISION CEILING.

Methods such as BLSTM (Bidirectional LSTM) , and the two-dimensional 2D LSTM which is called MDLSTM, can achieve without explicit segmentation of words, a character-level accuracies of 92 and 96% !!!!!! and I repeat, without explicit segmentation.

So my question is that, will there be plans to implement and ascend the current LSTM to a MDLSTM (Multi-dimensional LSTM), this will radically make ALL THE LANGUAGES ABLE TO PASS THAT PRECISION CEILING.

i am planing to engage in testing Tesseract 4.0 LSTM on the Arabic language, and wanting to post results in the future, i hope that there will be recognition improvement while testing.
Thank you Ray for your hard work, and all contributors, you are appreciated.

More information about BLSTM and MDLSTM:
https://www.nist.gov/sites/default/files/documents/itl/iad/mig/OpenHaRT2013_WorkshopPres_A2IA.pdf
http://www.a2ialab.com/lib/exe/fetch.php?media=presentations:icdar2015_chinese_slides.pdf
https://goo.gl/0wUNfm

@amitdo
Copy link
Collaborator

amitdo commented Dec 30, 2016

In principle, Tesseract is probably as accurate (or slightly more accurate) than ocropy/clstm.

Tesseract has official trained models for ~100 languages. ocropy has official models for English and German only. Unlike ocropus, Tesseract works on Windows.

BLSTM is implemented and used.

2D-LSTM is also implemented in the library. I think (not sure) it's not used by the released traineddata. Using 2D-LSTM means much longer time to train a model. and for OCRing printed text, the accuracy will not necessary be better than 1D-BLSTM.

BTW, ocropy doesn't have 2D-LSTM support.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 30, 2016 via email

@amitdo
Copy link
Collaborator

amitdo commented Dec 30, 2016

But the main problem is that most of the decisions that are being taken focus mostly on English (Latin Languages)

This is not really the situation with the LSTM engine.

The difference in accuracy between Latin script based langs and Arabic is due to

  1. Better traineddata files for the Latin script based langs.
  2. The 'complexity' of the script. Arabic is much more complex.

@amitdo
Copy link
Collaborator

amitdo commented Dec 30, 2016

Also, the OCR stage is dependent on the layout analysis stage which is weaker for Arabic.

@amitdo
Copy link
Collaborator

amitdo commented Dec 30, 2016

Shree, indic scripts are even more complex...

@roozgar
Copy link

roozgar commented Dec 30, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 30, 2016

@amitdo Thanks for clearing things up, improved pre-processing may make 1D-LSTM outperform the more complex MDLSTM. You were right.
I see, the main issue is not the ocr engine directly, but is of analysis/segmentation/classification.
Perhaps, i should focus on a combination of Tesseract LSTM & a Computer Assisted Transcription method.
somewhat similar to: https://sites.google.com/site/paradiitproject/project-definition

@Shreeshrii So Tesseract 4.x has the capability of producing more sophisticated and complex structures.

@roozgar i was looking for a method that gain +85% recognition rate for Arabic language.
Tesseract 3.x was using cube for arabic that made me loose hope, But thanks to the developers of Tesseract 4.0 for introducing the new LSTM engine, the hope is back and the community is excited. I am looking forward to test this version after reading that you've got an 80% recognition.
@roozgar can you share your training process, the tif/box files and the traineddata.

@roozgar
Copy link

roozgar commented Dec 30, 2016 via email

@ghost
Copy link
Author

ghost commented Dec 30, 2016

@roozgar what operating system are you using?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 31, 2016 via email

@roozgar
Copy link

roozgar commented Dec 31, 2016 via email

@ghost ghost closed this as completed Jan 2, 2017
@amitdo
Copy link
Collaborator

amitdo commented Jan 8, 2017

This is what is used for most of the languages:
https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs#full-example-a-multi-layer-lstm-capable-of-high-quality-ocr

I think it is 2D-LSTM.

@ghost
Copy link
Author

ghost commented Jan 9, 2017

@amitdo thanks, I have been told that 4.x version of tesseract would be the next big leap, now I believe.

@amitdo
Copy link
Collaborator

amitdo commented Oct 25, 2017

@olfaa
Copy link

olfaa commented Apr 20, 2018

1/ please how i can use blstm to segment a page into textline.
2/ can you give me a blstm architecture model that relates to online document page segmentation of text
Thank you:)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants