Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad .box in tesseract 4 #1448

Closed
int255 opened this issue Apr 3, 2018 · 4 comments
Closed

Bad .box in tesseract 4 #1448

int255 opened this issue Apr 3, 2018 · 4 comments

Comments

@int255
Copy link

int255 commented Apr 3, 2018


Environment

  • tesseract 4.00.00alpha
  • Commit Number:
  • macOS High Sierra (10.13.4)
    Darwin ****** 17.5.0 Darwin Kernel Version 17.5.0: Tue Mar 13 20:39:15 PDT 2018; root:xnu-4570.51.1~36/RELEASE_X86_64 x86_64

Current Behavior:

  1. Run the tesseract to generate .box file
    tesseract -l chi_sim --oem 1 raster_20.png a -c tessedit_create_boxfile=1
    (using tessdata_best)
  2. use jTextBoxEditor to check the bounding boxes (see attached screenshot)
    Although characters are correctly recognized, the bounding boxes are very wrong.
    Result much worse than tesseract 3.05.0, and is not usable at all.

screen shot 2018-04-03 at 11 34 08

Expected Behavior:

The bounding boxes should be tightly surrounding the glyph. My input is a super clean binary image already.
The problem is that bounding box is much worse than tesseract 3.05.00, and is totally unusable.

Suggested Fix:

N/A

Also attached the raw png
raster_20

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 3, 2018 via email

@Shreeshrii
Copy link
Collaborator

Duplicate of #1276

@zdenop Please close.

@sagimann
Copy link

sagimann commented Oct 3, 2018

problem is, when using oem 0, that OCR does not work well with non-solid backgrounds. The point is: if bboxes are not used by line recognizer, what other kind of data is available to correctly find the symbol on the image in terms of location?

@Shreeshrii
Copy link
Collaborator

@sagimann Please post your comment under #1276 so that all related discussion about incorrect box coordinates is at one place. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants