Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure with leptonica 1.83 #87

Closed
risicle opened this issue Jan 28, 2023 · 7 comments
Closed

Build failure with leptonica 1.83 #87

risicle opened this issue Jan 28, 2023 · 7 comments

Comments

@risicle
Copy link

risicle commented Jan 28, 2023

Leptonica 1.83 moved a number of struct definitions into "private" headers, notably Pix and Box et al.

This causes a build failure:

src/TessTools.cpp:147:25: error: member access into incomplete type 'PIX' (aka 'Pix')
   l_uint32 *datas = pixs->data;
...

To address this, an extra import of <leptonica/pix_internal.h> needs to be added to src/TessTools.h.

On top of this, it looks like this version got rid of the library's lept alias, so references to -llept in qt-box-editor.pro need to be switched to -lleptonica.

@zdenop
Copy link
Owner

zdenop commented Jan 29, 2023

qt-box-editor was IMO relevant for tesseract 3.x training (legacy engine) and it does not provide any value for the current tesseract version...
So what is the value if it is possible to build with the latest version of leptonica&tesseract?

@risicle
Copy link
Author

risicle commented Jan 30, 2023

Simply that older leptonica versions have security vulnerabilities meaning we (NixOS) can't ship them.

Perhaps this is an indication that we should just drop the qt-box-editor package, but as long as it's relatively straightforward to keep it building, we probably will do so with patches.

@zdenop
Copy link
Owner

zdenop commented Jan 31, 2023

It is not problem to include patch here, I just wander if really people are actively using this.

@dpward
Copy link

dpward commented Mar 11, 2023

Yes. The current version of Tesseract still supports the OCR-based engine. The LSTM model takes significantly longer to train, according to the Tesseract documentation itself.

@zdenop
Copy link
Owner

zdenop commented Mar 11, 2023

LSTM engine does not need to be trained from scratch (legacy engine has to). E.g. you can train and extend only problems.
IMO LSTM training is (could be) faster as you do not need to take care about bounding boxes of letters and training based on tutorials like this seem to be pretty easy.

Anyway I made requested changes of QTB code.

@dpward
Copy link

dpward commented Mar 11, 2023

Unfortunately LSTM doesn't seem to work well on matching basic monospace without word recognition.

@zdenop
Copy link
Owner

zdenop commented Oct 14, 2024

fixed.

@zdenop zdenop closed this as completed Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants