Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocr of rotated image #121

Open
josef821 opened this issue Sep 30, 2024 · 4 comments
Open

Ocr of rotated image #121

josef821 opened this issue Sep 30, 2024 · 4 comments

Comments

@josef821
Copy link

hi, thanks for your useful ocr engine,
its works good but when i try to set rotated image it return bad result.
Screenshot 2024-09-30 161933

is there any fix tips?

@josef821 josef821 changed the title ocr of rotated image Ocr of rotated image Sep 30, 2024
@robertknight
Copy link
Owner

Currently the recognition model and layout logic assumes that the image is approximately upright (some amount of rotation or skew is OK) and that the text is read left to right. To work with rotated or severely skewed images, they need to be rotated / de-skewed as a preprocessing step. Eventually this should be integrated into this library, but in the meantime you could try something like:

  1. Call the OcrEngine::detect_words method to detect bounding boxes of connected areas (the white regions in the top-left image)
  2. Infer the orientation from the positions and aspect ratios of the boxes (eg. if most boxes are tall rather than wide, that means the text is probably upside-down)
  3. Use functions in the imageproc crate to rotate the image based on the inferred orientation
  4. Perform OCR or the rotated image

A more sophisticated approach would be to use an image classification model to infer the orientation of each word, or a sample of words. If a suitable model was created in eg. PyTorch and exported to ONNX, it could then be converted to RTen and used in the above preprocessing pipeline instead of heuristics.

@josef821
Copy link
Author

thanks for reply.
i will do that. i check all masks to check lines are rotated or not.
your layout analyze is not good enough. i check you reply to other. you want to create a model for cluster and sort word to get line bounding box. How long do you think it will take to be able to publish the layout analyze model with its training code?

@robertknight
Copy link
Owner

How long do you think it will take to be able to publish the layout analyze model with its training code?

I don't know. All the code that exists is in the ocrs-models repository, but for layout analysis that only includes some non-functional prototypes.

In the meantime, if you happen to be working with documents that have a predictable layout, you can always substitute the find_text_lines step with custom code.

@josef821
Copy link
Author

josef821 commented Oct 5, 2024

exist layout analysis not working good for curve layout or complex image. i waiting for your layout analysis.
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants