Skip to content

gimesia/ocr-gui

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐍💻 Python Project for MAIA 8, semester 1 Software Engineering class

🔍👀 Text recognition with GUI

A GUI tool to extract texts from images either taken from the user's webcamera or an uploaded from the file directory.

The development of this program helped me to get familiar and showcase the tools provided by PyTesseract and PyQt5, 2 popular packages which I have never used before.

The tool can be useful for:

  • digitalizing the information on grocery receipts
  • to digitalize the text on sheets of papers from other classes, where the digital document was not provided (e.g. Applied Mathematics course papers)
  • extracting the text from screenshots
  • etc...

Usage:

📦 Dependencies:

  • numpy
  • opencv
  • pytesseract
  • pyqt5

📝Manual:

  1. (Make sure all the dependencies are met, change variables in the config.py file for the path if needed)

  2. run main.py -> The main GUI window opens (this can take a couple of seconds, as OpenCV has to access the computer's webcamera)

  3. Capture an image via webcamera using the "Capture Frame" button or by pressing *space* OR by uploading an image from the computer by pressing the "Upload Image" button. The image to be analysed will appear on the right side (Note: if the webcamera does not have a good enough resolution the OCR might not work as intended)

  4. Press the "Analyze" button to open the Analyzer window

  5. Cut the ROI. This can be done by clicking on the image. The first click selects the closest corner of the bounding box to the cursor, the second click replaces it to the coordinates of the click. When the bounding box is as desired press the "Cut" button.

  6. The ROI is cut out from the original image and oriantated to a vertical line. If the orientation is not correct, adjust the rotation with the "Rotate" buttons. When the orientation is as desired press the "OCR" button.

  7. The image with the indicated recognized texts will apper on the left, while on the right the text is extracted into an editable textbox. If the text recognition has made some mistakes, correct them in the textbox, and save the results by pressing the "Save" button. The analyzer window closes when the text is saved, and the process returns to Step 2.

📁 Directory structure:

├── test_images
|   └─ images for demonstation purposes
|
├── AnalyserWindow.py
├── main.py
|
├── widgets
|   ├──BBoxEditorWidget.py
|   ├──CutImagePreviewWidget.py
|   └──OCRWidget.py
|
├── utils.py
├── config.py

💡 Notes:

  • PyTesseract works best if the text is vertical and text color is black and the background is white
  • PyTesseract engine needs to be downloaded, windows installer available at UB Mannheim's github

About

MAIA 8 Software Engineering Project

Topics

Resources

Stars

Watchers

Forks

Languages