-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALTO for Handwriting #56
Comments
That is actually extremely pertinent to my work right now. For basic manuscripts with completely straight, vertical/horizontal writing ALTO works quite well but anything more complex would be helped by a free-form baseline capability. hOCR limits the definition to a polynomial but a sequence of line segments is more appropriate for highly curled/circular lines. |
The shape-element usage discussion might be useful to you, I used the bounding box coordinates from the Cloud Vision API but ALTO has allowed polygon, circle and ellipse shape types since version 3.1, and these are available down to the glyph level. |
Stupid question: Does the |
@mittagessen an "open polygon" is an oxymoron: a polygon is by definition "a closed plane figure bounded by three or more line segments." If what is meant is a series of points connected by line segments, maybe the name should be changed (not that I have an elegant suggestion). |
@urieli Open polygonal chains are sometimes known as open polygons. The shortest unambiguous name would be polyline. The easiest way would be to deal with this rather special case would be to extent the |
@urieli, @mittagessen - I like the_BASELINE_suggestion. Technically, the schema doesn't distinguish between open and closed polygons, though the documentation does identify its use for bounding shapes. Issue 22 targets changing BASELINE to PointsType which I think would address this. |
@artunit Changing |
@mittagessen The schema does not currently annotate |
This issue seems to be addressed, ALTO is now used for encoding handwriting in two major projects (Transkribus and eScripta), and the change to BASELINE has been published in version 4.2 of the ALTO schema. |
ALTO could have great value for handwriting representation. This is an initial example of what it might look like, I have taken the coordinates and confidence levels from the Cloud Vision API and its beta support for handwriting recognition, though have rounded the Glyph confidence numbers.
The text was updated successfully, but these errors were encountered: