Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to extract more features #1

Closed
OlivierBinette opened this issue Jan 15, 2021 · 1 comment · Fixed by #3
Closed

Need to extract more features #1

OlivierBinette opened this issue Jan 15, 2021 · 1 comment · Fixed by #3

Comments

@OlivierBinette
Copy link
Member

Word confidence and font size.

@zbw8388
Copy link

zbw8388 commented Jan 17, 2021

One possible way to get font size: https://github.com/tesseract-ocr/tesseract/blob/0f7212bba7d075d357ab211c9a5617750bdb8f1a/src/ccmain/ltrresultiterator.cpp#L179

Relevant discussion: tesseract-ocr/tesseract#1074 (comment)

An easier way to do this: change the config file in [tesseract base folder]/tessdata/configs and set hocr_font_info 0 to hocr_font_info 1. We can also create a new config file in the repo so people don't have to modify their version (preferred).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants