Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add adaptive OCR, factor out treatment of OCR areas and cell filtering #38

Merged
merged 5 commits into from
Aug 20, 2024

Conversation

cau-git
Copy link
Contributor

@cau-git cau-git commented Aug 20, 2024

  • Outfits PDF backends with new get_bitmap_rect method
  • Implements algorithm to find the minimum rectangles which need to be OCRed
  • Implements algorithm to filter OCR cells which overlap with programmatic cells
  • Enables OCR by default on pipeline options
  • Adds test unit for docling-parse PDF backend
  • Adds new dependency to rtree

@cau-git cau-git requested review from dolfim-ibm and maxmnemonic and removed request for dolfim-ibm August 20, 2024 12:36
Signed-off-by: Christoph Auer <[email protected]>
@cau-git cau-git merged commit e94d317 into main Aug 20, 2024
7 checks passed
@cau-git cau-git deleted the cau/adaptive-ocr branch August 20, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants