Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ocr): added support for RapidOCR engine #415

Merged
merged 10 commits into from
Nov 27, 2024
Merged

Conversation

Swaymaw
Copy link
Contributor

@Swaymaw Swaymaw commented Nov 22, 2024

  • Added RapidOCR Model as an OCR engine option.
  • Added Options for configuring RapidOCR model during document conversion using pipeline options.
  • Updates documentation, added tests and updated dependencies(extras) to reflect the added engine support.
  • Updated examples to demonstrate the use of RapidOcrOptions.

This change allows users to seamlessly work with RapidOCR-OnnxRuntime engine which provides higher accuracy and performance in use-cases which require working with complex PDF files.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Nov 22, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:

Signed-off-by: Swaymaw <[email protected]>
@dolfim-ibm
Copy link
Contributor

@Swaymaw would you suggest we need both PaddleOCR and RapidOCR in Docling? Or one of the two is enough?

@dolfim-ibm dolfim-ibm self-requested a review November 25, 2024 07:58
@dolfim-ibm
Copy link
Contributor

Please see the test results, can you please address those?

@dolfim-ibm dolfim-ibm requested a review from cau-git November 25, 2024 08:42
@Swaymaw
Copy link
Contributor Author

Swaymaw commented Nov 25, 2024

@Swaymaw would you suggest we need both PaddleOCR and RapidOCR in Docling? Or one of the two is enough?

I would say that we can choose to only stick with RapidOCR as it is much faster than PaddleOCR with the same accuracy and at the same time much simpler to install and work with. RapidOCR, also makes it easier to train and run inference with custom detection , classification and recognition model paths which will improve the overall usability of the framework with use-case specific models.

@dolfim-ibm
Copy link
Contributor

Ok, let's then focus on getting this PR running. There are still a few installation issue in CI for onnx.

@cau-git
Copy link
Contributor

cau-git commented Nov 27, 2024

@Swaymaw Thanks for the configuration options enhancements, this is matching what I had in mind.

However, to better align with an in-development global configuration system in docling (see here) without breaking this config interface down the line, we will take the liberty of temporarily hiding all the device-related configuration options to users in RapidOcrOptions and make the AUTO the implicit default. As such, we don't need to delay the merge of this PR and we will revisit how to expose the configuration options short-term.

Copy link
Contributor

@cau-git cau-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cau-git cau-git merged commit 85b2999 into DS4SD:main Nov 27, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants