Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position for rotated text #59

Closed
silviu22 opened this issue Apr 1, 2019 · 4 comments
Closed

Position for rotated text #59

silviu22 opened this issue Apr 1, 2019 · 4 comments

Comments

@silviu22
Copy link

silviu22 commented Apr 1, 2019

I am a little confused about the coordinates and width/height of rotated text.

I believe the HPOS, VPOS are the (x,y) coordinates of the top-left corner of the text block. Also, the width/height seem to be the width of the bounding rectangle containing the whole text.

This seems obvious for normal (horizontal) text. Is this the case for rotated text as well?

For example, when text is rotated by 90 degrees, the old width becomes height and old height becomes width, To describe the question a little better, I came up with 4 cases:

  • A - horizontal text
  • B - Text rotated 270 degrees
  • C - Text rotated 90 degrees
  • B - Text rotated 45 degrees

Please take a look at this file: Text Position.pdf
In that file, (x,y) is HPOS,VPOS for a particular word. And W/H is width/height of that word.

I believe the answers are as follows:

  • A : (x,y) starting point (HPOS,VPOS) is P1
  • B : (x,y) starting point (HPOS,VPOS) is P1
  • C : (x,y) starting point (HPOS,VPOS) is P4
  • D : (x,y) starting point (HPOS,VPOS) is P4

Note that if HPOS,VPOS is always top-left of the displayed text, then it has a different meaning for the program that is supposed to displaying such text.

Case D might be best to explain what I mean. To draw text at 45 degrees, you will typically tell the computer to draw text at 45 degrees starting from point P2 (the baseline). You will usually not tell it to display the text at point P4 (top-left corner). So I would have to do quite a lot of work to deduce point P2 that would draw text at 45 degrees that will have the top-left corner at point P4. It can be done, but it takes some work.

So, to recap, can someone confirm that the HPOS,VPOS for case D (text at 45 degrees) is point P4? Also, for point D (45 degrees), there are two possible pairs of values that can be considered width/height:

  • (W1, H1) - the size of the unrorated text
  • (W2, H2) - the width, height of the smallest rectangle that contains the rotated text
    I believe the WIDTH/HEIGHT values that we are supposed to write are W2, H2 (smallest rectangle containing the rotated text).
@artunit
Copy link
Member

artunit commented Apr 6, 2019

This is a good question and I am hoping one of my Board brethren with more experience with using ALTO with rotated blocks can weigh in. I believe that HPOS,VPOS are for the center of the block and then the rotation is applied when the rotation attribute is used.

@bertsky
Copy link
Contributor

bertsky commented Jul 12, 2019

Please allow me to weigh in. I have just finished tackling those issues for PAGE-XML within OCR-D, where with pc:RegionType/@orientation exists a perfect analogue of @ROTATION, and all segments' @points are likewise absolute (always referencing the pixel xy coordinates of the page image source). See here for an in-depth discussion, and its implementations for Tesseract and for Ocropy.

(That discussion aims to solve not just the particular problem of rotation but the wider issues of relative coordinates when using binary image data for segments at each step in the hierarchy – blocks, lines, words –, which is possible to represent in PAGE-XML via pc:AlternativeImage/@filename. But that should be the same for ALTO-XML with its ComposedBlockType/@FILEID.)

Let me start off my answer with a quote from the spec. In the xsd:documentation of @ROTATION (my emphasis, this is also mentioned specifically in the changelog), we have:

Tells the rotation of e.g. text or illustration within the block.

So this merely informs about the skew of the binary image data within the annotated region (being described by the bounding box with @HPOS / @VPOS / @HEIGHT / @WIDTH or by a polygon with Shape/POINTS). Naturally, the bbox will have to be larger than the actual block's outline if it is rotated. Using a polygon would always be a more precise alternative representation.

Therefore, yes @silviu22, your block D has its HPOS/VPOS at P4 and its WIDTH/HEIGHT is W2/H2 (referring to your drawing – your verbalization is somewhat unfortunate, because it describes W2/H2 as the width/height of the smallest rectangle of the rotated text; surely, you are referring to "rotated" as rotated in the image, but that's usually called "skewed", whereas "rotated" is the respective countermeasure).

This is not a big deal. It only starts to get complicated when we extract and annotate a binary, cropped and deskewed image for the block (via @FILEID), and offer this to the next lower segmentation: now the runtime coordinate system is relative to that (skewed) block, and has to be converted back to absolute before writing the segmentation results. To do that, coordinates have to be rotated back, and shifted by the offset of the parent block.

It is in this detail that the comment by @artunit makes some sense, but happens to be wrong:

I believe that HPOS,VPOS are for the center of the block and then the rotation is applied when the rotation attribute is used.

No, HPOS / VPOS are always the top-left corner of blocks. But indeed, deskewing does rotate around the center of the region, at (HPOS+0.5*WIDTH) / (VPOS+0.5*HEIGHT). That just means the above mentioned compensatory (passive) rotation of coordinates has to be accompanied by translation (from the top-left corner to the center) before rotation and back-translation (from the center to the top-left corner) afterwards.

@cneud as you can see I answered myself here.

@artunit
Copy link
Member

artunit commented Jul 12, 2019

Thanks @bertsky, @Jo-CCS pointed out my error at the last Board meeting. I was sure I had read this somewhere but the schema is indeed the definitive word on this.

@silviu22
Copy link
Author

Thank you @bertsky and @artunit for the clarification.

To me, skewed text was distorted text, like italic text. (I assumed skewed text is still written horizontally, but you drag the top of the text left or right by a certain amount, the same way the italic text leans to the right). But if you prefer the term "skewed" instead of "rotated", that is fine with me.

There will be a good amount of calculations to find a way to draw this skewed text using the coordinates of the bounding rectangle. But this is fine as long as it's clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants