Skip to content

This project uses Tesseract software and Python language to resolve OCR problems.

Notifications You must be signed in to change notification settings

franyack/ocr-tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

OCR Project with Tesseract

This project uses Tesseract software and Python language to resolve OCR problems.

Starting from an image (or a folder with images), the target is to convert visual information (e.g. words, numbers inside the image) into text information (e.g. csv file with relevant information).

All the scripts are based on Adrian Rosebrock and his website pyimagesearch. So, thanks Adrian!

Index:

About project

technology Python

Environment setup

  • Install Tesseract technology on your machine, you can find how to do this here

  • Configure your development environment. It is highly recommended to use pyenv, so you can manage your python projects easily. More information here

  • Once you have your environment ready, you must install requirements. To do this, in the root of this project, execute:

    pip install -r requirements.txt
    
  • That's all! Yoy could run any python script. For example:

    python scripts/ocr_folder_process.py --folder folder_path_where_you_have_your_images_to_process --blacklist "|/\[](){}" 
    

References

About

This project uses Tesseract software and Python language to resolve OCR problems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages