Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add different classes to hocr output depending on BlockType #2432

Merged
merged 1 commit into from
May 16, 2019

Conversation

nickjwhite
Copy link

These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.

Note that I haven't read the hOCR specification closely, so there's a
chance I misunderstood something. @kba, can you take a look and
check that the hOCR produced is valid and sensible?

These classes are taken from the hOCR specification, and seem
to map well onto the BlockType types. There are probably more that
could be added.
@zdenop zdenop added the output issues related output formats label May 14, 2019
@zdenop zdenop merged commit 7e9d2f4 into tesseract-ocr:master May 16, 2019
@zdenop
Copy link
Contributor

zdenop commented May 16, 2019

@nickjwhite : thanks. Can have a look at issue 2303 if you are able to fix it?

@stweil
Copy link
Member

stweil commented May 16, 2019

@nickjwhite, @zdenop, I just updated branch 4.1. Would you suggest to add PR #2432 to that branch, too?

@zdenop
Copy link
Contributor

zdenop commented May 17, 2019

Yes, all commits that does not effect API should be cherry-picked to 4.1 branch

@stweil
Copy link
Member

stweil commented May 19, 2019

I updated 4.1 accordingly, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
output issues related output formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants