Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD with --psm 0 creates wrong result in latest version #1926

Closed
CanadianHusky opened this issue Sep 23, 2018 · 31 comments
Closed

OSD with --psm 0 creates wrong result in latest version #1926

CanadianHusky opened this issue Sep 23, 2018 · 31 comments
Labels
accuracy OSD Orientation and Script Detection

Comments

@CanadianHusky
Copy link

For the attached file; the latest master version creates nonsense result

command line :
tesseract --psm 0 "C:\temp\input.png" "C:\output"

OSD result with master

Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 0.30
Script: Latin
Script confidence: 4.44

OSD Result with 20180608 version

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 1.76
Script: Latin
Script confidence: 35.83

the file has no rotation!
old version is correct, current master branch codes is behaving wrong and claims 180 degrees rotation accoring to my findings

input

Best Regards

@stweil
Copy link
Member

stweil commented Sep 23, 2018

Could you please try the old version with -l osd? Does that change the result?

@CanadianHusky
Copy link
Author

Yes, it does change the result. Result becomes same incorrect output.

Command line :
tesseract --psm 0 -l osd "C:\input.png" "C:\output"

Result with tesseract v4.0.0-beta.1.20180608

Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 0.30
Script: Latin
Script confidence: 4.44

This is wrong orientation and poor confidence value.

How does this make sense if running without -l osd produces correct result ?

@stweil
Copy link
Member

stweil commented Sep 23, 2018

As far as I know, --psm 0 always needs -l osd and won't detect the orientation (=‌always give 0 degrees) without it. The newer code enforces -l osd even if it was not specified on the command line.

Obviously at least the two tested versions of Tesseract come to the result that your image is upside down.

@CanadianHusky
Copy link
Author

I made following tests :

Version = beta 1, beta 4 and latest compiled master branch :
command tesseract --psm 0 "C:\input.png" stdout

result =wrong
All of them detect 180 rotation on the image above, which is wrong

Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 0.30
Script: Latin
Script confidence: 4.44

Version = 4.0.0-alpha.20170804
command tesseract --psm 0 "C:\input.png" stdout

result = correct

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 1.76
Script: Latin
Script confidence: 35.83

Version = 4.0.0-alpha.20170804 .... only add -l osd
command tesseract --psm 0 -l osd "C:\input.png" stdout

result = wrong

Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 0.30
Script: Latin
Script confidence: 4.44

further tests with 3.05.02 - same story..adding -l osd causes incorrect result. How does that make sense ?

image

I am sorry but there seems to be some weird bug when the the osd option is activated via command line.

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2018

As far as I know, --psm 0 always needs -l osd and won't detect the orientation (=‌always give 0 degrees) without it.

If you give tesseract '-l <lang>' a traineddata that includes data for the legacy engine, other than osd, it will still able to detect the orientation if the the traineddata matches the.language used in the input image.

@CanadianHusky
Try using tesseract with --psm 0 -l eng

@CanadianHusky
Copy link
Author

CanadianHusky commented Sep 24, 2018

@amitdo engine gives following comment with your suggestion and produces wrong result

C:\Program Files (x86)\Tesseract-OCR>tesseract --psm 0 -l eng "C:\temp14\ocr-good\input.png" stdout
Warning, ignoring -l eng for --psm 0
Warning. Invalid resolution 0 dpi. Using 70 instead.
Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 0.30
Script: Latin
Script confidence: 4.44

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2018

Warning, ignoring -l eng for --psm 0

@stweil, your code is causing this...

I think osd should be the default for psm 0, to enable both script and orientation detection, but --psm -l <lang> should be allowed. This might be useful when the script is known and the user just needs orientation detection.

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2018

I removed @stweil's change, and tested it, with a traineddata that contains the legacy data.

tesstest i1926.png - --psm 0 -l eng

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 1.76
Script: Latin
Script confidence: 35.83

With best/fast traineddata tesseract segfault, because they lack the legacy data needed by psm 0.

@CanadianHusky
Copy link
Author

@amitdo
that result is correct.
hope you can get that into the master branch soon.
thank you

@stweil
Copy link
Member

stweil commented Sep 24, 2018

Just reverting my commit will produce segfaults again (see issue #1855 or @amitdo's comment above).

I wrote the code which enforces -l osd because I thought that was necessary to detect script and orientation. Script detection won't be able to detect different scripts with -l eng or other language models of course (therefore I am not sure what the script confidence indicates in that case). But if orientation detection works (@amitdo, did you test it? Testing the orientation == 0 case is not sufficient) and people need that, the code can be changed.

Even then some questions still remain:

  • Should Tesseract use -l osd with --psm 0 if no other language was selected by the user?
  • Why does -l osd "detect" a flipped image for the test image here?

@stweil
Copy link
Member

stweil commented Sep 24, 2018

@CanadianHusky, what is the result with old Tesseract and --psm 0 -l deu (as you have German construction plans)?

@CanadianHusky
Copy link
Author

here is the result of the test
there is difference in the result and both results are correct with 4.00.00alpha release

image

@CanadianHusky
Copy link
Author

Even then some questions still remain:

Should Tesseract use -l osd with --psm 0 if no other language was selected by the user?

In my opinion, Yes. BUT beware... the result is wrong at the moment

image

Why does -l osd "detect" a flipped image for the test image here?

Im not familiar enough with the detection engine logic to comment meaningful enough, but for the average user it is clearly a bug

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2018

https://ai.google/research/pubs/pub35506

According to this paper, 'osd' is able to recognize only 30 Latin characters!
'eng' will obviously recognize much more.

If there are enough chars in the text osd will usually work quite well, but for short text it might fail.

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2018

'OINZM6890'

Is this upright or rotated 180 degrees?

That's one reason why osd only has 30 Latin chars.

A second reason is ambiguousness across different scripts.

@CanadianHusky
Copy link
Author

While I understand why its impossible to determine by those (carefully selected, some almost symetrical or having symetric equivalent) characters alone, if it is upright or 180, It should still be possible to determine orientation in the image I supplied at the top of the post because there is relation between the characters (words and chains of letters) I am sure the bright minds at Google have some good ideas how to improve recognition for orientation detection.
The Image above has more than the "trouble" characters and the kMinLimit is really annoying as well which I have to constantly work against by tampering with the source code because it is not a command line option.
Much appreciated

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2018

I am sure the bright minds at Google have some good ideas how to improve recognition for orientation detection.

It mostly one bright mind from Google that worked on tesseract in the last years. Currently, he is on vacation from any tesseract work.

@amitdo
Copy link
Collaborator

amitdo commented Sep 24, 2018

Meanwhile, he let us play with his toy... :-)

@CanadianHusky
Copy link
Author

understand....
then I would try the following algorithm to deterime orientation is 0 or 180 (or 90/270 is essentially the same)

the Abstract you posted the link to earlier shows how bounding boxes and blobs are detected.

The chances of "offending" characters be one after the other consecutively that disturb OSD are extremely rare in real content, your post of 'OINZM6890' is the extreme case and we should live with the fact that OSD will fail on that sequence of chars.

for anything else, I would have the algorithm look for and try to give "meaning" to characters in pairs of 2.
Meaning
OA
OB
OC
OD
etc could get trained as OSD data, repeat that training for all combinations
IA
IB
IC
NA
NB
...
etc
with these models, it is known which way is "up"
there will be a few hundret character pairs/combinations to train.
Not sure what kind of afford is required to explain this to the OCR/OSD engine, mine is just a suggestion

@amitdo
Copy link
Collaborator

amitdo commented Sep 25, 2018

@amitdo, did you test it? Testing the orientation == 0 case is not sufficient) and people need that, the code can be changed.

convert i1926.png -rotate 180 i1926-r180.png

tesseract i1926-r180.png - --psm 0 -l eng

Page number: 0
Orientation in degrees: 180
Rotate: 180
Orientation confidence: 1.20
Script: Latin
Script confidence: 35.00

tesseract i1926-r180.png - --psm 0 -l osd

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.43
Script: Latin
Script confidence: 1.67

Should Tesseract use -l osd with --psm 0 if no other language was selected by the user?

From #1926 (comment):

I think osd should be the default for psm 0, to enable both script and orientation detection, but --psm -l should be allowed.

@amitdo
Copy link
Collaborator

amitdo commented Sep 25, 2018

The previous test was 180 degrees rotation.

With 90 degrees rotation, I get:

Too few characters. Skipping this page
Error during processing.

For both -l osd and -l eng.

@amitdo
Copy link
Collaborator

amitdo commented Sep 25, 2018

@CanadianHusky,

After resizing your image 300%, tesseract with psm 0 works fine, with or without my change.

@zdenop
Copy link
Contributor

zdenop commented Sep 25, 2018

I really doubt that we can solve this in tesseract. I would bother if image has text only part (e.g. line or paragraph, or page). Input images with tables, noise and graphics were always problem to full automatic processing in tesseract.

Without wide testing I assume that correct of result of orientation detection 4.0.alpha is more coincidence than prove of bug implementation in later stage.

I just checked the image with leptonica function of orientation detection, (e.g. no connection to OSD or eng tesseract data) and it claims that image need to be corrected by 180 degrees....

@CanadianHusky
Copy link
Author

Without wide testing I assume that correct of result of orientation detection 4.0.alpha is more coincidence than prove of bug implementation in later stage.

Attached 7zip file has my complete accuracy and performance test set of 128 corner images that I use for orientation detection. The matching OSD Result files are included as well for review.
Some pages do not have an OSD Result because the engine exited with 'too few characters' (my kMinCharstoTry is set to 10 instead of 50 that is used in the current source otherwise the engine exits far too early and gives up when it shouldn't.

ALL of the results are correct. There isnt a single mistake in the results!
-The rotation detection value is correct, as long as there is enough meaningful content.
-The confidence level makes sense and I use it in a meaningful way, with a threshold, to take the best bet for orientation.
IF the engine exits with 'too few characters' it is justified and correct, example on a file like
cimg2_TR_070f1701-36b0-41d0-92e0-630e120f5acd.png

All of this is done with this version and therefore I do not believe that this pure coincidence. There are too many different content types to be correct by pure coincidence.
image

with the additional algorithm that I added in an external processor and multiple OSD/OCR passes I have a false detection rate of approx 1 in 20,000 files with 4.00.alpha version. That is a brilliant result.

The results with the recent beta and master versions are horrible.
Enlarging input by to 300% may be a workaround, I have not tested it yet, but it is guaranteed to be a (very undesirable) performance killer.

I am not skilled enough in C++ to dig down into the engine code and try to figure out what changed.

@CanadianHusky
Copy link
Author

ZIP File
Cache.zip

@amitdo
Copy link
Collaborator

amitdo commented Oct 6, 2018

If you only need rotation detection, you can now use -l eng with --psm 0.
Without explicit -l <lang>, -l osd will be used by default with --psm 0.

You can use other lang than eng.

@stweil
Copy link
Member

stweil commented Oct 6, 2018

@CanadianHusky, please test with latest code and -l eng to get the old behavior and close this issue if that works for you.

@CanadianHusky
Copy link
Author

Hello, I pulled the latest master branch and compiled it as is. There seems to be some sort of problem because the result is not correct :

image

the engine still gives warning error and produces wrong result

Warning, ignoring -l eng for --psm 0

the second fix; setting -c min_characters_to_try=xx seems to do something and is reacting more or less as expected, but I am not sure if I understand the same under "characters" as the engine because the cut off threshold seems to be incosistent, or I do not understant how the engine really counts characters

minimalistic example on this image :
image

image

why is the engine processing a result with -kMin=17, but exiting with -kMin=18 ?

the image has 13 characters to a human, 14 if you count the space and 16 "characters" if you count the dots of the letter i as a seperate blob/char

The -kMin issue is not critical, just a curiousity to understand but the orientation detection is not working according to my test.

My test was done as follows, in case it makes a difference
I had a binary installed from an earlier 4.00alpha build in C:\Program Files (x86)\Tesseract OCR\ (this was done with a precompiled installer and extra font downloads)

I pulled the latest github master and compiled it. It compiled without errors and created a bunch of DLL and EXE files.
I replaced those DLL and EXE files in Program Files dir.
I launch tesseract from BOTH compiled Project dir, and Program Files dir. Both show same version code and both create same wrong result in regards to OSD with -l eng

I appreciate any further tips,
I think the source not fully working as expected and is still trying to use "osd" language data intead of "eng" based on the message

thanks

@amitdo
Copy link
Collaborator

amitdo commented Oct 7, 2018

I pulled the latest master branch and compiled it as is.

Warning, ignoring -l eng for --psm 0

This warning was removed a few days ago.

@stweil
Copy link
Member

stweil commented Oct 7, 2018

@CanadianHusky, the snapshots show that you used 4.0.0-rc1. The fix was added later, so please pull the latest code from Git master to get it and build again.

@CanadianHusky
Copy link
Author

@stweil
I have pulled this version 10 minutes ago by using the "download" as zip

image

it compiled fine without errors

the output still shows 4.0.0-rc1 but the result is correct this time.

image

The engine outputs a warning message and returns correct rotation and a confidence value that is > 1

worth to note;
the confidence with the alpha build from 2017 was 1.76 on the same file with min_characters_to_try= 10
it dropped to 1.38 now.
with min_characters_to_try = 11, the confidence is 1.64
with min_characters_to_try = 12, the confidence is 1.76 (same as in 2017 version) and does not go higher, even if the setting is increased to 100. This proves that everything is in order now.

all other rotation test files (more than 100) have been checked against this version. The engine, combined with the additional external pre/post processing that I have added makes zero mistakes in orientation detection now even with very little meaningful content.

It is a signifcant improvement in my opinion now and allows tesseract to be used for detection of orientation when content is not a "regular" block of text.

Thank you very much to all that helped. I have closed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy OSD Orientation and Script Detection
Projects
None yet
Development

No branches or pull requests

4 participants