-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix separator line detector #3082
Conversation
…ngth When detecting vertical separators, the blob aligner is used to glue line segments (often segmented due to artificial cracks). But (unlike LineFinder) it has many parameters that are not relative to pixel density/resolution. This change decreases the minimum absolute length in pixels for vertical separators.
…tion When checking horizontal line partitions for possible interpretation as underline formatting, avoid confusing the hline partition itself with an overlapping neighbour (which would delete it).
@bertsky Since you are fixing line detection, can you also look into incorrect sgmentation when the script extends below the baseline. See example: Zip file with sample images and hocr and alto output enclosed. |
@Shreeshrii sorry, but this does not address textline detection directly. It's about the v/h-line (foreground separator) detection that is a preliminary to block/column detection. (I have renamed the title to make this difference more clear.) But your case looks like merely an artefact of the layout extraction API or ALTO representation: bboxes are too coarse here. We should try to allow retrieving polygons. There was an (slightly off-topic) discussion about this in #2971. No dedicated issue or PR yet. Does the recognition result also degrade where line bboxes overlap? (This is currently the best indicator of the internal row structure IIUC.) |
Yes, the recognition result degrades in such cases, which has been reported earlier in other issues. I checked with Pageviewer today and noticed the overlaps. Should I open this as a separate issue? |
Please do! |
Thank you, @bertsky. |
These commits break the |
As I have said somewhere (cannot find it) already, sorry and I'll have a look at that unit test and make another PR – as soon as I found some time. |
This fixes 2 minor issues with foreground line separator detection.
Here is an example (using the ALTO renderer and PageViewer for display):