-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add semantics to coordinate system #13
Comments
Good points ;-)
|
Splendid! Would you like me to do a PR?
I don't know. It just seemed like the minimal option. As in: "no interpretation is guaranteed, help yourself!" Or in having no specification at all. Implementors could try to always compute the outer hull, or try their luck with path interpretations...
Ok, fair enough. (Closed by description – at least one pair must repeat – or closed by convention – the first pair is meant to be repeated?) But what about cases where the region is non-contiguous, because e.g. a TextRegion gets flowed over by a ImageRegion, or a TextLine by a GraphicRegion? In that case, only having a single path necessitates including the intruders, so the only way to get rid of them for further processing (layout / dewarping / recognition) would be to offer a
I was hoping you say so. (But I do get non-planar polygons from Tesseract sometimes, and some contour libraries never bother to close their paths.)
Yes, as with most of the semantics, this would be a matter of some non-XSD validation. In OCR-D, we are planning to write one using geometry heuristics. Is there some place I can look at the respective rules in Aletheia?
That is also what Tesseract does itself (if asked to return a raw image of blocks from layout analysis). But if this is forbidden by the schema, it's totally up to the processor/library trying to produce PAGE to handle this case if it does arise internally. (Maybe it's still worth commenting on in the schema, though.) |
Closed by convention (first pair repeats). |
Thanks a lot for all the clarification! I hope the PR meets your approval. |
Coordinates are at the heart of stand-off annotation formats. In PAGE-XML, all visible elements must have a
CoordsType
, which must have a@points
. There is even some syntax for that enforced by a regular expression. However, the standard lacks any semantics for the coordinate system whatsoever. There is not even a comment about this, so with luck, at least all implementors guessed consistently.IMO we need to specify that:
@points
always describes (a list of x-y pairs of) absolute pixel coordinates ("absolute" meaning they refer to the root image inPageType/@imageFilename
with the upper left corner as0,0
)Moreover, we should clarify whether:
@points
has a topology of@points
must obey certain constraints likeThis is highly relevant for implementors, especially when polygon processing and
AlternativeImage
processing on multiple hierarchy levels in the presence of skew becomes common practise – which is currently happening within OCR-D (for showcases see our Tesseract and our Ocropy preprocessing and segmentation wrappers).(Cf. altoxml/schema#49)
The text was updated successfully, but these errors were encountered: