Feature Request: Add Bounding Box info to LSTM choices #2580

Shreeshrii · 2019-07-18T04:43:15Z

Newly made enhancements by @noahmetzger provide accurate bounding box info at the character level as well as the LSTM choices for the character.

#2554
#2576

An earlier commit by Nick White had added the option to include character bounding boxes in hocr output.

06b7a7b

Currently it is possible to get HOCR output with both options as follows:

 tesseract -l eng --psm 6 --dpi 300 -c lstm_choice_mode=4 -c lstm_choice_amount=0  -c hocr_char_boxes=1  --tessdata-dir ~/tessdata_best choices.png choices40 hocr

The output is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title></title>
  <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
  <meta name='ocr-system' content='tesseract 5.0.0-alpha-315-g5a47' />
  <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word ocrp_wconf'/>
 </head>
 <body>
  <div class='ocr_page' id='page_1' title='image "choices.png"; bbox 0 0 293 90; ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 16 18 270 71">
    <p class='ocr_par' id='par_1_1' lang='eng' title="bbox 16 18 270 71">
     <span class='ocr_line' id='line_1_1' title="bbox 16 18 270 71; baseline -0.012 0; x_size 68.5; x_descenders 17.125; x_ascenders 17.125">
      <span class='ocrx_word' id='word_1_1' title='bbox 16 18 206 71; x_wconf 42'>
             <span class='ocrx_cinfo' title='x_bboxes 16 19 42 71; x_conf 99.041275'>B</span>
             <span class='ocrx_cinfo' title='x_bboxes 49 20 76 71; x_conf 99.038635'>A</span>
             <span class='ocrx_cinfo' title='x_bboxes 84 19 107 70; x_conf 98.950821'>S</span>
             <span class='ocrx_cinfo' title='x_bboxes 117 19 139 69; x_conf 91.848969'>O</span>
             <span class='ocrx_cinfo' title='x_bboxes 148 19 174 70; x_conf 99.027092'>B</span>
             <span class='ocrx_cinfo' title='x_bboxes 181 18 206 69; x_conf 98.989304'>C</span>
       <span class='ocrx_cinfo' id='lstm_choices_1_1_1'><span class='ocr_glyph' id='choice_1_1_1' title='x_confs 94.258354'>B</span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_1_2'><span class='ocr_glyph' id='choice_1_1_2' title='x_confs 95.207481'>A</span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_1_3'><span class='ocr_glyph' id='choice_1_1_3' title='x_confs 95.032639'>S</span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_1_4'><span class='ocr_glyph' id='choice_1_1_4' title='x_confs 86.357178'>O</span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_1_5'><span class='ocr_glyph' id='choice_1_1_5' title='x_confs 95.195663'>B</span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_1_6'><span class='ocr_glyph' id='choice_1_1_6' title='x_confs 92.210503'>C</span></span></span>
      <span class='ocrx_word' id='word_1_2' title='bbox 242 18 270 68; x_wconf 88'>
             <span class='ocrx_cinfo' title='x_bboxes 242 18 270 68; x_conf 98.37661'>6</span>
       <span class='ocrx_cinfo' id='lstm_choices_1_2_1'><span class='ocr_glyph' id='choice_1_2_1' title='x_confs 88.696487'> </span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_2_2'><span class='ocr_glyph' id='choice_1_2_2' title='x_confs 91.191261'>6</span></span></span>
     </span>
    </p>
   </div>
  </div>
 </body>
</html>

This feature request is for combining the output from both options with accurate bounding boxes and confidence values at character level.

The text was updated successfully, but these errors were encountered:

stweil · 2019-07-18T08:08:34Z

The formatting can also be improved. @noahmetzger, is the calculation of the confidence values correct? See my code comment.

noahmetzger · 2019-07-18T08:40:02Z

yeah the calculation of the confidence values differs slightly to the original confidence because my values are using the new bounding boxes to evaluate the confidence while the old algorithm relates the old character boundaries.

stweil · 2019-07-18T11:35:42Z

@kba, I just tried to find the right form from the hOCR spec.

Why does the example for x_bboxes have 8 parameters there? Shouldn't there be exactly 4 parameters?

And what would be the recommended form for character choices with bounding boxes?

kba · 2019-07-18T13:57:30Z

Why does the example for x_bboxes have 8 parameters there? Shouldn't there be exactly 4 parameters?

The example is for two characters.

And what would be the recommended form for character choices with bounding boxes?

http://kba.cloud/hocr-spec/1.2/#segmentation would seem the most appropriate mechanism, using <ins> and <del> in a <span class="alternatives">. However, I have not come across that construct in the wild.

stweil · 2019-07-18T18:45:42Z

So the first word from the image above could be encoded like this?

  <span class='ocrx_word' id='word_1_1' title='bbox 16 18 206 71; x_wconf 90'>
   <span class='ocrx_cinfo' title='x_bboxes 16 19 42 71; x_conf 99.040695'>B</span>
   <span class='ocrx_cinfo' title='x_bboxes 49 20 76 71; x_conf 99.036415'>A</span>
   <span class='ocrx_cinfo' title='x_bboxes 84 19 107 70; x_conf 99.001602'>S</span>
   <span class='ocrx_cinfo' title='x_bboxes 117 19 139 69; x_conf 98.663239'>O</span>
   <span class='ocrx_cinfo' title='x_bboxes 148 19 174 70; x_conf 99.032227'>B</span>
   <span class='ocrx_cinfo' title='x_bboxes 181 18 206 69; x_conf 98.988968'>C</span>
   <span class='alternatives'>
    <ins class='alt' id='choice_1_1_1' title='nlp 0.011549211; x_confs 98.851723'>B</ins>
    <del class='alt' id='choice_1_1_2' title='nlp 0.3656525; x_confs 69.374382'>R</del>
   </span>
   <span class='alternatives'>
    <ins class='alt' id='choice_1_1_3' title='nlp 0.0096266214; x_confs 99.041954'>A</ins>
    <del class='alt' id='choice_1_1_4' title='nlp 0.38926715; x_confs 67.755325'>a</del>
   </span>
   <span class='alternatives'>
    <ins class='alt' id='choice_1_1_5' title='nlp 0.0097376015; x_confs 99.030968'>S</ins>
    <del class='alt' id='choice_1_1_6' title='nlp 0.244562; x_confs 78.304741'>5</del>
   </span>
   <span class='alternatives'>
    <ins class='alt' id='choice_1_1_7' title='nlp 0.0167614; x_confs 98.33783'>O</ins>
    <del class='alt' id='choice_1_1_8' title='nlp 0.17149627; x_confs 84.240341'>0</del>
   </span>
   <span class='alternatives'>
    <ins class='alt' id='choice_1_1_9' title='nlp 0.013082595; x_confs 98.700264'>B</ins>
   </span>
   <span class='alternatives'>
    <ins class='alt' id='choice_1_1_10' title='nlp 0.0096449163; x_confs 99.040146'>C</ins>
    <del class='alt' id='choice_1_1_11' title='nlp 0.32846478; x_confs 72.002831'>(</del>
   </span>
  </span>

I added the required nlp property to the alternatives and wonder how many digits are reasonable for float values in hOCR. Maybe the spec should suggest 4 or 5 digits.

@noahmetzger, I just noticed that the characters for the alternatives still need to be escaped (HOcrEscape). How can I get the bounding boxes of the different alternatives, or are they all identical?

Shreeshrii · 2019-07-19T04:00:10Z

The example with nlp in https://github.com/kba/hocr-spec/blob/master/1.2/spec.md only uses single digits after decimal point.

<span class="ocr_cinfo" title="bbox 0 0 300 100; nlp 1.7 2.3 3.9 2.7; cuts 9 11 7,8,-2 15 3">hello</span>

stweil · 2019-07-19T07:09:39Z

If I am not mistaken, 1.7 would be a recognition probability of 18 %. That's not really good. The other values are even worse, so that seems to be a bad example.

Shreeshrii · 2019-07-23T08:32:02Z

@stweil Shouldn't it be formatted as below, considering that the alternatives are for each character?
Then ins class indicates the character that was chosen and del class indicates the choices that were discarded.

<span class='ocrx_cinfo' title='x_bboxes 16 19 42 71; x_conf 99.040695'>
   <span class='alternatives'>
    <ins class='alt' id='choice_1_1_1' title='nlp 0.011549211; x_confs 98.851723'>B</ins>
    <del class='alt' id='choice_1_1_2' title='nlp 0.3656525; x_confs 69.374382'>R</del>
   </span>
</span>

Shreeshrii · 2019-07-23T08:41:54Z

If I am not mistaken, 1.7 would be a recognition probability of 18 %. That's not really good. The other values are even worse, so that seems to be a bad example.

http://kba.cloud/hocr-spec/1.2/#segmentation has a different (obsolete?) example.

<span class="alternatives">
<ins class="alt" title="nlp 0.3">hello</ins>
<del class="alt" title="nlp 1.1">hallo</del>
</span>

stweil · 2019-07-23T10:32:17Z

nlp 0.3 would be 74 % (exp(-0.3) * 100), nlp 1.1 would be 33 %.

Shreeshrii · 2019-09-09T13:30:27Z

@noahmetzger Does your new code regarding choices implement this?

noahmetzger · 2019-09-09T13:51:55Z

@Shreeshrii Yes the new code should be compatible to hocr_char_boxes now.

Shreeshrii · 2019-09-09T14:13:46Z

Thanks. I tried it now using lstm_choice_iterations instead of lstm_choice_amount. The output is better formatted than before, but the confidence levels in both are still different.

tesseract -l eng --psm 6 --dpi 300 -c lstm_choice_mode=2 -c lstm_choice_iterations=0  -c hocr_char_boxes=1  --tessdata-dir ~/tessdata_best choices.png -  hocr
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title></title>
  <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
  <meta name='ocr-system' content='tesseract 5.0.0-alpha-381-g4257' />
  <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word ocrp_wconf'/>
 </head>
 <body>
  <div class='ocr_page' id='page_1' title='image "choices.png"; bbox 0 0 293 90; ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 16 18 270 71">
    <p class='ocr_par' id='par_1_1' lang='eng' title="bbox 16 18 270 71">
     <span class='ocr_line' id='line_1_1' title="bbox 16 18 270 71; baseline -0.012 0; x_size 68.5; x_descenders 17.125; x_ascenders 17.125">
      <span class='ocrx_word' id='word_1_1' title='bbox 16 18 206 71; x_wconf 42'>
       <span class='ocrx_cinfo' title='x_bboxes 16 19 42 71; x_conf 99.041275'>B</span>
        <span class='ocrx_cinfo' id='lstm_choices_1_1_1'>
         <span class='ocrx_cinfo' id='choice_1_1_1' title='x_confs 94.258354'>B</span>
        </span>
       <span class='ocrx_cinfo' title='x_bboxes 49 20 76 71; x_conf 99.038635'>A</span>
        <span class='ocrx_cinfo' id='lstm_choices_1_1_2'>
         <span class='ocrx_cinfo' id='choice_1_1_2' title='x_confs 95.207481'>A</span>
        </span>
       <span class='ocrx_cinfo' title='x_bboxes 84 19 107 70; x_conf 98.950821'>S</span>
        <span class='ocrx_cinfo' id='lstm_choices_1_1_3'>
         <span class='ocrx_cinfo' id='choice_1_1_3' title='x_confs 95.032639'>S</span>
        </span>
       <span class='ocrx_cinfo' title='x_bboxes 117 19 139 69; x_conf 91.848969'>O</span>
        <span class='ocrx_cinfo' id='lstm_choices_1_1_4'>
         <span class='ocrx_cinfo' id='choice_1_1_4' title='x_confs 86.357178'>O</span>
        </span>
       <span class='ocrx_cinfo' title='x_bboxes 148 19 174 70; x_conf 99.027092'>B</span>
        <span class='ocrx_cinfo' id='lstm_choices_1_1_5'>
         <span class='ocrx_cinfo' id='choice_1_1_5' title='x_confs 95.195663'>B</span>
        </span>
       <span class='ocrx_cinfo' title='x_bboxes 181 18 206 69; x_conf 98.989304'>C</span>
        <span class='ocrx_cinfo' id='lstm_choices_1_1_6'>
         <span class='ocrx_cinfo' id='choice_1_1_6' title='x_confs 92.210503'>C</span>
        </span>
      </span>
      <span class='ocrx_word' id='word_1_2' title='bbox 242 18 270 68; x_wconf 88'>
       <span class='ocrx_cinfo' title='x_bboxes 242 18 270 68; x_conf 98.37661'>6</span>
        <span class='ocrx_cinfo' id='lstm_choices_1_2_1'>
         <span class='ocrx_cinfo' id='choice_1_2_1' title='x_confs 91.191261'>6</span>
        </span>
      </span>
     </span>
    </p>
   </div>
  </div>
 </body>
</html>

Also, shouldn't the class='ocrx_cinfo' for each character from both commands be combined together?

noahmetzger · 2019-09-09T14:29:57Z

yeah the confidence levels will stay different as they are different rating procedures. The second rating is the rating procedure which is also used to evaluate the choices. It is based on den confidence Levels inside the beamsearch finding the best path. It is mainly there to be compared with the other choices.

Shreeshrii · 2019-09-09T16:05:37Z

Ok.
So, how do I get the bounding boxes with new values ( not using old hocr_char_boxes=1).

Shreeshrii · 2019-11-13T12:59:48Z

Example in reply to https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/WX4yZUMUsYQ/Sp5QcN9hBQAJ

 tesseract -l eng --psm 6 --dpi 300  -c hocr_char_boxes=1  --tessdata-dir ~/tessdata_best choices.png choices hocr

HOCR output

 <div class='ocr_page' id='page_1' title='image "choices.png"; bbox 0 0 293 90; ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 16 18 270 71">
    <p class='ocr_par' id='par_1_1' lang='eng' title="bbox 16 18 270 71">
     <span class='ocr_line' id='line_1_1' title="bbox 16 18 270 71; baseline -0.012 0; x_size 68.5; x_descenders 17.125; x_ascenders 17.125">
      <span class='ocrx_word' id='word_1_1' title='bbox 16 18 206 71; x_wconf 42'>
       <span class='ocrx_cinfo' title='x_bboxes 16 19 42 71; x_conf 99.041275'>B</span>
       <span class='ocrx_cinfo' title='x_bboxes 49 20 76 71; x_conf 99.038635'>A</span>
       <span class='ocrx_cinfo' title='x_bboxes 84 19 107 70; x_conf 98.950821'>S</span>
       <span class='ocrx_cinfo' title='x_bboxes 117 19 139 69; x_conf 91.848969'>O</span>
       <span class='ocrx_cinfo' title='x_bboxes 148 19 174 70; x_conf 99.027092'>B</span>
       <span class='ocrx_cinfo' title='x_bboxes 181 18 206 69; x_conf 98.989304'>C</span>
      </span>
      <span class='ocrx_word' id='word_1_2' title='bbox 242 18 270 68; x_wconf 88'>
       <span class='ocrx_cinfo' title='x_bboxes 242 18 270 68; x_conf 98.37661'>6</span>
      </span>
     </span>
    </p>
   </div>
  </div>

stweil added the feature request label Jul 18, 2019

amitdo added the bounding box label Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add Bounding Box info to LSTM choices #2580

Feature Request: Add Bounding Box info to LSTM choices #2580

Shreeshrii commented Jul 18, 2019

stweil commented Jul 18, 2019

noahmetzger commented Jul 18, 2019

stweil commented Jul 18, 2019

kba commented Jul 18, 2019

stweil commented Jul 18, 2019

Shreeshrii commented Jul 19, 2019

stweil commented Jul 19, 2019

Shreeshrii commented Jul 23, 2019

Shreeshrii commented Jul 23, 2019

stweil commented Jul 23, 2019

Shreeshrii commented Sep 9, 2019

noahmetzger commented Sep 9, 2019

Shreeshrii commented Sep 9, 2019

noahmetzger commented Sep 9, 2019

Shreeshrii commented Sep 9, 2019

Shreeshrii commented Nov 13, 2019

Feature Request: Add Bounding Box info to LSTM choices #2580

Feature Request: Add Bounding Box info to LSTM choices #2580

Comments

Shreeshrii commented Jul 18, 2019

stweil commented Jul 18, 2019

noahmetzger commented Jul 18, 2019

stweil commented Jul 18, 2019

kba commented Jul 18, 2019

stweil commented Jul 18, 2019

Shreeshrii commented Jul 19, 2019

stweil commented Jul 19, 2019

Shreeshrii commented Jul 23, 2019

Shreeshrii commented Jul 23, 2019

stweil commented Jul 23, 2019

Shreeshrii commented Sep 9, 2019

noahmetzger commented Sep 9, 2019

Shreeshrii commented Sep 9, 2019

noahmetzger commented Sep 9, 2019

Shreeshrii commented Sep 9, 2019

Shreeshrii commented Nov 13, 2019