-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Add Bounding Box info to LSTM choices #2580
Comments
The formatting can also be improved. @noahmetzger, is the calculation of the confidence values correct? See my code comment. |
yeah the calculation of the confidence values differs slightly to the original confidence because my values are using the new bounding boxes to evaluate the confidence while the old algorithm relates the old character boundaries. |
@kba, I just tried to find the right form from the hOCR spec. Why does the example for And what would be the recommended form for character choices with bounding boxes? |
The example is for two characters.
http://kba.cloud/hocr-spec/1.2/#segmentation would seem the most appropriate mechanism, using |
So the first word from the image above could be encoded like this?
I added the required @noahmetzger, I just noticed that the characters for the alternatives still need to be escaped ( |
The example with nlp in https://github.com/kba/hocr-spec/blob/master/1.2/spec.md only uses single digits after decimal point.
|
If I am not mistaken, 1.7 would be a recognition probability of 18 %. That's not really good. The other values are even worse, so that seems to be a bad example. |
@stweil Shouldn't it be formatted as below, considering that the alternatives are for each character?
|
http://kba.cloud/hocr-spec/1.2/#segmentation has a different (obsolete?) example.
|
|
@noahmetzger Does your new code regarding choices implement this? |
@Shreeshrii Yes the new code should be compatible to |
Thanks. I tried it now using
Also, shouldn't the |
yeah the confidence levels will stay different as they are different rating procedures. The second rating is the rating procedure which is also used to evaluate the choices. It is based on den confidence Levels inside the beamsearch finding the best path. It is mainly there to be compared with the other choices. |
Ok. |
Example in reply to https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/WX4yZUMUsYQ/Sp5QcN9hBQAJ
HOCR output
|
Newly made enhancements by @noahmetzger provide accurate bounding box info at the character level as well as the LSTM choices for the character.
#2554
#2576
An earlier commit by Nick White had added the option to include character bounding boxes in hocr output.
06b7a7b
Currently it is possible to get HOCR output with both options as follows:
The output is as follows:
This feature request is for combining the output from both options with accurate bounding boxes and confidence values at character level.
The text was updated successfully, but these errors were encountered: