Trainable reading order #492

mittagessen · 2023-04-16T16:43:27Z

This pull request is an implementation of this article modelling the sorting of baselines/regions as estimated binary order relations.

ToDo

Training
Serialization
Inference
Correct reading order parsing from XML files

Fix normalization factors for RO datasets

mittagessen · 2023-05-25T10:51:15Z

The model and greedy decoding used in both papers is the same, the only change is the hierarchical decoder which doesn't really work when lines can be found outside of regions (at least not easily and if I understand their implementation correctly).

But I never got their code to run properly (it obviously has only vague connection to the one used to produce the paper) and substituted my own for anything but the decoder. It shouldn't really matter though as the whole thing is really simple (the primary reason why I implemented it in the first place).

bertsky · 2023-05-25T17:35:55Z

The model and greedy decoding used in both papers is the same,

They also dropped the RDF model in the second paper (although it did perform better than the MLP on the table dataset).

Moreover, they introduced a region level model (with slightly different feature vector) in the second paper (which is needed for the hierarchical mode).

But when they trained on multiple datasets, they did not investigate how well models would generalize from one domain to the other or whether the training would benefit from cross-domain training. To me, all that is significant enough to warrant additional experimentation. (Especially if you look at the mistakes their model still makes. Even if rank distance is low, that does not per se mean you get a high-quality order. The prediction sometimes alternates like crazy between different parts of the page, not just at difficult/ambiguous spots.)

the only change is the hierarchical decoder which doesn't really work when lines can be found outside of regions (at least not easily and if I understand their implementation correctly).

The hierarchical model simply restricts the topology during training (avoiding arbitrary line pairs), it does not concern coordinates (esp. not whether polygons are properly contained). When decoding, you first apply the region-level model on regions, then the hierarchical line-level model on the lines within each region.

But I never got their code to run properly (it obviously has only vague connection to the one used to produce the paper)

It's inconsistent, yes, likely the last state was in the middle of the second paper. But I can reproduce their published figures from that. See above issue for the details.

and substituted my own for anything but the decoder. It shouldn't really matter though as the whole thing is really simple (the primary reason why I implemented it in the first place).

I agree it's not necessary to re-use their code in the end. But IMO the hierarchical variant (and probably also some transfer learning) should be used. And/or perhaps one can restrict the decoder with some basic geometric rules to avoid silly obvious mistakes.

mittagessen · 2023-05-25T23:01:56Z

But when they trained on multiple datasets, they did not investigate how well models would generalize from one domain to the other or whether the training would benefit from cross-domain training.

We don't have terribly high hopes about the generalization of this method as the model is a) linked deeply to the segmentation model typology and b) it is unable to deal with mixed-directional inputs.

The hierarchical model simply restricts the topology during training (avoiding arbitrary line pairs), it does not concern coordinates (esp. not whether polygons are properly contained). When decoding, you first apply the region-level model on regions, then the hierarchical line-level model on the lines within each region.

The issue arises when lines exist outside of regions. Then ordering the regions first and then the lines inside each region doesn't work as there's no way to know where to insert the non-region-affiliated lines in the order as you can't feed disparate line-region feature pairs into either the region/line model.

bertsky · 2023-05-26T08:04:43Z

We don't have terribly high hopes about the generalization of this method as the model is a) linked deeply to the segmentation model typology

I also have doubts, but perhaps this could be factored in with additional features (e.g. enabling/disabling segment categories with additional input, both during training and inference).

and b) it is unable to deal with mixed-directional inputs.

Sure – generalization to bottom-up or left-right textline order systems is unlikely to be possible. But within the same system, you still have divergent material (as the paper shows) – with/out columns, marginals, tables, print vs handwriting etc. I would hope to at least gain some coverage in that area by curating training data/schemes.

The issue arises when lines exist outside of regions. Then ordering the regions first and then the lines inside each region doesn't work as there's no way to know where to insert the non-region-affiliated lines in the order as you can't feed disparate line-region feature pairs into either the region/line model.

Oh, now I remember. That's why you insisted ALTO v4.3 should have reading order even on the line level.

I would argue that either this kind of segment is typical and common – in which case you should not need a hierarchical/region-level model – or it is special and rare – in which case probably the best design would be to add a dedicated hierarchy level (say 'insertion'):

ignore the region level (both at training and inference)
train at line level both with these insertions (but as extra parent type) and without them (i.e. skipped randomly)
decode as after-step

json

Fix small bugs in the Feature/reading order branch

Would drastically slow down display of help message in CLI drivers

bertsky · 2024-01-17T22:47:38Z

@mittagessen are there any pretrained segmentation models with neural reading order available which one can already try? And do you have plans on adding RO to the builtin blla.mlmodel? Finally, do you have eval results to share (or will there be a paper)?

mittagessen · 2024-01-17T23:49:21Z

The code is a fairly straightforward adaptation of the method mentioned above so any results should translate (it certainly isn't publishable from my point of view). We've ran some tests and it generally seems to perform better than the heuristic for specific use cases.

The big BUT here is though that the net only uses line features (class, position, and extents) without any visual features for determining order. This makes it not a good choice for a default model that can be used for different text directions as it doesn't know if it is looking at a Latin or Arabic text (in the absence of such line classes) and will order columns incorrectly. I'd say it is mostly useful for people that are training a new segmentation model for some material that isn't well captured by the default model and want better reading order for only slight computational overhead and no manual annotation effort.

I've written some basic documentation here.

mittagessen added 30 commits April 16, 2023 13:02

partial implementation of new xml parser

ba5444e

skip ROs in ALTO with sub-line elements

c36c721

wip

b197705

non-working xml parsing tests

65ad0ea

Fix ALTO region order parsing

2a1be8b

working training code for RO

bb4999e

more training code

b06ebe7

remove metric ignore code in progress bar

145516b

make checkpoint loading work for RO training

46d2c96

add batch_size arg to ro cli again

c13b72c

arrgh

92dfb07

sketch of neural RO decoder

e1a932b

Use spearman footrule distance as evaluation metric for RO training

3507a8e

use original implementation hidden size

72caf7b

compute loss with logits

583d36f

logits not probits

845335c

s/h,w/w,h/g

7af9a72

Fix normalization factors for RO datasets

some small new parser fixes

c51e28d

more small fixes in lib/segmentation.py

898a1ed

decoder work

7be2aa0

add cls mapping to model params

1c283a9

coreml serialization of RO model

15820b4

some linter cleanup

b044928

remove unused failed_sample_threshold

1326d09

more import fixes

98faf22

syntax fix xml tests

f36310b

xml parsing tests files

a7f1140

lightning 2.0 changes to RO code

778c20b

Code for adding RO models to segmentation models

c733b85

Working inference

d4947af

bertsky mentioned this pull request May 25, 2023

Experiment with kraken's trainable reading order detection OCR-D/ocrd_kraken#35

Open

colibrisson added 2 commits June 17, 2023 14:17

add __getitem__ to the Baseline class

7e8fcf7

remove preparse_xml_data import statement

8f34a9a

colibrisson mentioned this pull request Jun 28, 2023

Fix small bugs in the Feature/reading order branch #524

Merged

mittagessen added 19 commits July 2, 2023 11:12

Container classes in segmentation

fe897c7

autoinstantiate baselineline/bboxline when loading segmentation from

2cffebd

json

Use new containers in rpred

20a869a

Merge pull request #524 from colibrisson/feature/reading_order

584ed68

Fix small bugs in the Feature/reading order branch

serialization with new container classes

3aa34a8

Add alternative reading orders to ALTO output

4eef3c5

docstrings

c08edcb

more docstrings

6dcc004

better default output name in ketos compile

dd1b9ed

fix import in arrow_dataset

10c83d3

arrow dataset builder test skeleton

e72f5ae

Small fixes to RO dataset class

260d047

forced alignment contrib script with container classes

7118f1e

extract_lines.py with container classes

da6c4f6

More contrib scripts with containers

7cf2dc4

Switch recognition datasets to container classes

46d6c06

s/preparse_xml_data/XMLPage/g

a5a8a20

Move progress bar imports around to prevent torch import

0fddaac

Would drastically slow down display of help message in CLI drivers

add threadpool limits to CLI drivers

003568b

mittagessen merged commit 003568b into main Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainable reading order #492

Trainable reading order #492

mittagessen commented Apr 16, 2023 •

edited

Loading

mittagessen commented May 25, 2023

bertsky commented May 25, 2023 •

edited

Loading

mittagessen commented May 25, 2023

bertsky commented May 26, 2023 •

edited

Loading

bertsky commented Jan 17, 2024

mittagessen commented Jan 17, 2024 •

edited

Loading

Trainable reading order #492

Trainable reading order #492

Conversation

mittagessen commented Apr 16, 2023 • edited Loading

ToDo

mittagessen commented May 25, 2023

bertsky commented May 25, 2023 • edited Loading

mittagessen commented May 25, 2023

bertsky commented May 26, 2023 • edited Loading

bertsky commented Jan 17, 2024

mittagessen commented Jan 17, 2024 • edited Loading

mittagessen commented Apr 16, 2023 •

edited

Loading

bertsky commented May 25, 2023 •

edited

Loading

bertsky commented May 26, 2023 •

edited

Loading

mittagessen commented Jan 17, 2024 •

edited

Loading