Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALTO for Handwriting #56

Closed
artunit opened this issue Jan 27, 2019 · 9 comments
Closed

ALTO for Handwriting #56

artunit opened this issue Jan 27, 2019 · 9 comments

Comments

@artunit
Copy link
Member

artunit commented Jan 27, 2019

ALTO could have great value for handwriting representation. This is an initial example of what it might look like, I have taken the coordinates and confidence levels from the Cloud Vision API and its beta support for handwriting recognition, though have rounded the Glyph confidence numbers.

Handwriting Sample

<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns="http://www.loc.gov/standards/alto/ns-v4#" 
  xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-0.xsd" 
  xmlns:xlink="http://www.w3.org/1999/xlink">
  <Tags>
    <RoleTag ID="HW01" TYPE="Handwritten"/>
  </Tags>
  <Layout>
    <Page WIDTH="1266" HEIGHT="107" PHYSICAL_IMG_NR="0" ID="page_0">
      <PrintSpace HPOS="0" VPOS="0" WIDTH="1266" HEIGHT="107">
        <TextBlock ID="block_0" HPOS="15" VPOS="16" WIDTH="1236" HEIGHT="81">
          <TextLine ID="line_0" TAGREFS="HW01" HPOS="15" VPOS="16" WIDTH="1236" HEIGHT="81">
            <String ID="string_0" HPOS="17" VPOS="28" WIDTH="97" HEIGHT="88" WC="0.98" CONTENT="This">
              <Glyph CONTENT="T" HPOS="17" VPOS="29" WIDTH="23" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="h" HPOS="42" VPOS="30" WIDTH="23" HEIGHT="80" GC="63.000000"/>
              <Glyph CONTENT="i" HPOS="68" VPOS="29" WIDTH="22" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="s" HPOS="92" VPOS="28" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_1" HPOS="137" VPOS="27" WIDTH="84" HEIGHT="81" WC="0.99" CONTENT="was">
              <Glyph CONTENT="w" HPOS="137" VPOS="27" WIDTH="29" HEIGHT="81" GC="99.000000"/>
              <Glyph CONTENT="a" HPOS="172" VPOS="26" WIDTH="26" HEIGHT="81" GC="100.000000"/>
              <Glyph CONTENT="s" HPOS="198" VPOS="26" WIDTH="23" HEIGHT="80" GC="100.000000"/>
            </String>
            <String ID="string_2" HPOS="254" VPOS="25" WIDTH="84" HEIGHT="81" WC="0.99" CONTENT="a"/>
            <String ID="string_3" HPOS="341" VPOS="20" WIDTH="167" HEIGHT="80" WC="0.99" CONTENT="pleasant">
              <Glyph CONTENT="p" HPOS="341" VPOS="23" WIDTH="16" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="l" HPOS="358" VPOS="23" WIDTH="16" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="e" HPOS="372" VPOS="23" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="a" HPOS="397" VPOS="22" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="s" HPOS="412" VPOS="22" WIDTH="22" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="a" HPOS="445" VPOS="21" WIDTH="22" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="n" HPOS="462" VPOS="21" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="t" HPOS="485" VPOS="20" WIDTH="23" HEIGHT="80" GC="100.000000"/>
            </String>
            <String ID="string_4" HPOS="531" VPOS="18" WIDTH="83" HEIGHT="81" WC="0.99" CONTENT="and">
              <Glyph CONTENT="a" HPOS="531" VPOS="18" WIDTH="29" HEIGHT="81" GC="99.000000"/>
              <Glyph CONTENT="n" HPOS="566" VPOS="19" WIDTH="25" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="d" HPOS="592" VPOS="18" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_5" HPOS="673" VPOS="13" WIDTH="212" HEIGHT="81" WC="0.99" CONTENT="reflective">
              <Glyph CONTENT="r" HPOS="673" VPOS="17" WIDTH="23" HEIGHT="81" GC="99.000000"/>
              <Glyph CONTENT="e" HPOS="698" VPOS="16" WIDTH="23" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="f" HPOS="725" VPOS="15" WIDTH="19" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="l" HPOS="741" VPOS="15" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="e" HPOS="766" VPOS="15" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="c" HPOS="782" VPOS="15" WIDTH="19" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="t" HPOS="807" VPOS="14" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="i" HPOS="825" VPOS="13" WIDTH="15" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="v" HPOS="839" VPOS="13" WIDTH="19" HEIGHT="80" GC="100.000000"/>
              <Glyph CONTENT="e" HPOS="862" VPOS="13" WIDTH="23" HEIGHT="80" GC="100.000000"/>
            </String>
            <String ID="string_6" HPOS="911" VPOS="9" WIDTH="146" HEIGHT="81" WC="0.99" CONTENT="journey">
              <Glyph CONTENT="j" HPOS="911" VPOS="12" WIDTH="23" HEIGHT="80" GC="95.000000"/>
              <Glyph CONTENT="o" HPOS="938" VPOS="11" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="u" HPOS="954" VPOS="10" WIDTH="19" HEIGHT="80" GC="96.000000"/>
              <Glyph CONTENT="r" HPOS="981" VPOS="11" WIDTH="16" HEIGHT="80" GC="70.000000"/>
              <Glyph CONTENT="n" HPOS="989" VPOS="10" WIDTH="16" HEIGHT="80" GC="97.000000"/>
              <Glyph CONTENT="e" HPOS="1011" VPOS="10" WIDTH="22" HEIGHT="80" GC="84.000000"/>
              <Glyph CONTENT="y" HPOS="1035" VPOS="9" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_7" HPOS="1101" VPOS="7" WIDTH="60" HEIGHT="80" WC="0.99" CONTENT="for">
              <Glyph CONTENT="f" HPOS="1101" VPOS="8" WIDTH="22" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="o" HPOS="1126" VPOS="7" WIDTH="19" HEIGHT="80" GC="99.000000"/>
              <Glyph CONTENT="r" HPOS="1145" VPOS="7" WIDTH="16" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_8" HPOS="1183" VPOS="6" WIDTH="46" HEIGHT="81" WC="0.99" CONTENT="me">
              <Glyph CONTENT="m" HPOS="1183" VPOS="7" WIDTH="22" HEIGHT="80" GC="98.000000"/>
              <Glyph CONTENT="e" HPOS="1207" VPOS="6" WIDTH="22" HEIGHT="80" GC="99.000000"/>
            </String>
            <String ID="string_9" HPOS="1222" VPOS="5" WIDTH="25" HEIGHT="80" WC="0.97" CONTENT=","/>
          </TextLine>
        </TextBlock>
      </PrintSpace>
    </Page>
  </Layout>
</alto>
@mittagessen
Copy link
Contributor

That is actually extremely pertinent to my work right now. For basic manuscripts with completely straight, vertical/horizontal writing ALTO works quite well but anything more complex would be helped by a free-form baseline capability. hOCR limits the definition to a polynomial but a sequence of line segments is more appropriate for highly curled/circular lines.

@artunit
Copy link
Member Author

artunit commented Jan 28, 2019

The shape-element usage discussion might be useful to you, I used the bounding box coordinates from the Cloud Vision API but ALTO has allowed polygon, circle and ellipse shape types since version 3.1, and these are available down to the glyph level.

@mittagessen
Copy link
Contributor

Stupid question: Does the POLYGON shape define an open or a closed polygon? For baselines open would be more appropriate but the documentation doesn't elaborate on that point.

@urieli
Copy link

urieli commented Feb 15, 2019

@mittagessen an "open polygon" is an oxymoron: a polygon is by definition "a closed plane figure bounded by three or more line segments." If what is meant is a series of points connected by line segments, maybe the name should be changed (not that I have an elegant suggestion).

@mittagessen
Copy link
Contributor

@urieli Open polygonal chains are sometimes known as open polygons. The shortest unambiguous name would be polyline.

The easiest way would be to deal with this rather special case would be to extent the BASELINE attribute to allow polylines instead of a single line segment. It would also keep the existing semantics of the shape elements.

@artunit
Copy link
Member Author

artunit commented Feb 17, 2019

@urieli, @mittagessen - I like the_BASELINE_suggestion. Technically, the schema doesn't distinguish between open and closed polygons, though the documentation does identify its use for bounding shapes. Issue 22 targets changing BASELINE to PointsType which I think would address this.

@mittagessen
Copy link
Contributor

@artunit Changing BASELINE to points type is exactly what I had in mind, although I am unsure if the change breaks backward compatibility unnecessarily. The old model just used a single y-coordinate, so the encoding differs even for perfectly straight baselines.

@artunit
Copy link
Member Author

artunit commented Feb 18, 2019

@mittagessen The schema does not currently annotate BASELINE and I guess it would come down to whether existing implementations would be broken. A point is normally two coordinates though there could be the notion that one is implicit for single values in the annotation. The schema also has the notion of a typesetting point, or 1/72 of an inch, so it would probably be good to define the different uses of point. In the same vein, PointsType is defined as a list of points and I think it would be useful to allow these to be written as a list of pairs, e.g. instead of 200 400 203 405 210 420, something like (200,400),(203,405),(210,420).

@artunit artunit self-assigned this Jun 3, 2019
@artunit
Copy link
Member Author

artunit commented Sep 27, 2020

This issue seems to be addressed, ALTO is now used for encoding handwriting in two major projects (Transkribus and eScripta), and the change to BASELINE has been published in version 4.2 of the ALTO schema.

@artunit artunit closed this as completed Sep 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants