Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kraken 5.2.4 on eScriptorium recognition artefacts #605

Closed
johnlockejrr opened this issue May 13, 2024 · 11 comments
Closed

kraken 5.2.4 on eScriptorium recognition artefacts #605

johnlockejrr opened this issue May 13, 2024 · 11 comments

Comments

@johnlockejrr
Copy link

johnlockejrr commented May 13, 2024

I'm not sure if is eScriptorium or kraken related, I just want to poin out, same model, same image on different installs:

  1. eScriptorium with kraken 4.3.13 (python 3.8)

esc-kraken-4

  1. eScriptorium with kraken 5.2.4 (python 3.10)

esc-kraken-5

Both segmentation and recognition models were trained on kraken 5.2.4

@dstoekl
Copy link

dstoekl commented May 13, 2024

looks like shapely

@dstoekl
Copy link

dstoekl commented May 13, 2024

the polygon is too big and the recognizer wasn't trained on lines where the letters are only a quarter of the line height.

@johnlockejrr
Copy link
Author

Here is on kraken 4.x, same model.

image

@johnlockejrr
Copy link
Author

johnlockejrr commented May 13, 2024

The models I used in this test:
mcdonald.zip

@mittagessen
Copy link
Owner

It isn't a model issue but the polygonization is wrong. I'll have a look. The rotation code changed between 4.x and 5.x so it's either that or other shapely shenanigans.

@mittagessen
Copy link
Owner

Could you also send me the image file and any ALTO/PageXML you've got? It's difficult to debug without being able to run a test case.

@johnlockejrr
Copy link
Author

export_doc23_memar_marqah_mcdonald_alto_202405131147.zip
Sure, here it is the image with ALTO (from 4.x)

@mittagessen
Copy link
Owner

Thanks. It's mostly so I can make sure the baselines are identical.

@johnlockejrr
Copy link
Author

Any update on this matter?

@mittagessen
Copy link
Owner

Apparently, the error persists on some other image data.

@mittagessen mittagessen reopened this May 23, 2024
@mittagessen
Copy link
Owner

Nope, not true after all. Just crappy output of the polygonizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants