-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognition aborts at "baselines" which are only a point #606
Comments
If I remove all lines with have a WIDTH of 1 , 2 or 3, the recognition works for the remaining lines without an exception. There are also some lines with a WIDTH of 0, but those don't cause an exception. |
The line is invalid and should be skipped in the recognizer but this case isn't caught. BTW |
Where do these lines come from anyway? The segmenter filters out extremely short line segments like these and IIRC the eScriptorium UI would make drawing point-sized line segments very difficult. |
I think the user created those lines accidentally by manually clicking in the eScriptorium panel where it's possible to add, change or delete baselines. Maybe it's sufficient to click without drawing, and that will add a "baseline" point. |
I can confirm that the recognition works if I only remove the two lines where the baseline is a point from the ALTO file. |
... and I was able to add a baseline which zero length. I could not create it directly, but it is possible to change an existing baseline with two points so that both points are on the same position. |
There was a report in the eScriptorium Gitter chat about a failing recognition with a certain image. With the provided export (export_doc1_consular_cards_1_alto_202405140257.zip) it is not only possible to reproduce the issue in eScriptorium, but also with latest
kraken
on the command line.I modified kraken.py to get a full exception backtrace and found that this part of the ALTO XML triggers the exception:
Normally kraken would process lots of lines before handling that fatal line, but when I move that line to the first place it gets the exception early:
The text was updated successfully, but these errors were encountered: