Skip to content

Commit

Permalink
Merge pull request tesseract-ocr#2432 from nickjwhite/hocrmoretypes
Browse files Browse the repository at this point in the history
Add different classes to hocr output depending on BlockType
  • Loading branch information
zdenop authored May 16, 2019
2 parents b124a5f + 068eb4c commit 7e9d2f4
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions src/api/hocrrenderer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,21 @@ char* TessBaseAPI::GetHOCRText(ETEXT_DESC* monitor, int page_number) {
AddBoxTohOCR(res_it.get(), RIL_PARA, hocr_str);
}
if (res_it->IsAtBeginningOf(RIL_TEXTLINE)) {
hocr_str << "\n <span class='ocr_line'"
<< " id='"
hocr_str << "\n <span class='";
switch (res_it->BlockType()) {
case PT_HEADING_TEXT:
hocr_str << "ocr_header";
break;
case PT_PULLOUT_TEXT:
hocr_str << "ocr_textfloat";
break;
case PT_CAPTION_TEXT:
hocr_str << "ocr_caption";
break;
default:
hocr_str << "ocr_line";
}
hocr_str << "' id='"
<< "line_" << page_id << "_" << lcnt << "'";
AddBoxTohOCR(res_it.get(), RIL_TEXTLINE, hocr_str);
}
Expand Down

0 comments on commit 7e9d2f4

Please sign in to comment.