Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --text-line-images debug option apply recognition preprocessing #30

Merged
merged 3 commits into from
Feb 27, 2024

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Feb 27, 2024

Make the --text-line-images debug option apply the same preprocessing that is applied before lines are fed into the text recognition model. This includes:

  • Resizing the image to be 64px high and with a max width of 800px
  • Converting the image from color to gray
  • Extracting only the polygon containing the line's words, and masking off other pixels in black

This makes this option more useful for debugging recognition accuracy issues, as problems arising from the preprocessing become visible.

In the process of doing this functions in ocrs which return dynamic errors were changed to use anyhow::Error rather than Box<dyn Error> as the error type. This is more convenient to work with in ocrs-cli, which already used anyhow.

Add `OcrEngine::prepare_recognition_input` method that returns an image with the
same preprocessing applied as `OcrEngine::recognize_text` does before it feeds
input into a model. This is useful for debugging scenarios where ocrs produces
different / worse output than the PyTorch model training/evaluation tools.
`anyhow::Error` provides a better dynamic error type than `dyn Error` as it can
capture context and be sent between threads. Using it here also enables
propagating these errors in ocrs-cli which is already using anyhow.
Make the `--text-line-images` option save images with the same preprocessing
applied as when preparing images to feed into the recognition model. This makes
accuracy errors arising from preprocessing issues easier to debug. This
preprocessing includes:

 - Resizing the image to 64px high and a max width of 800px
 - Extracting only the polygon containing the line's words, with other pixels
   masked off
 - Converting the image to grayscale
@robertknight robertknight merged commit 9d56a86 into main Feb 27, 2024
2 checks passed
@robertknight robertknight deleted the expose-recognition-preprocessing branch February 27, 2024 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant