OCR Project with Tesseract

This project uses Tesseract software and Python language to resolve OCR problems.

Starting from an image (or a folder with images), the target is to convert visual information (e.g. words, numbers inside the image) into text information (e.g. csv file with relevant information).

All the scripts are based on Adrian Rosebrock and his website pyimagesearch. So, thanks Adrian!

About project

Environment setup

Install Tesseract technology on your machine, you can find how to do this here
Configure your development environment. It is highly recommended to use pyenv, so you can manage your python projects easily. More information here
Once you have your environment ready, you must install requirements. To do this, in the root of this project, execute:
```
pip install -r requirements.txt
```

That's all! Yoy could run any python script. For example:

python scripts/ocr_folder_process.py --folder folder_path_where_you_have_your_images_to_process --blacklist "|/\[](){}"

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR Project with Tesseract

Index:

About project

Environment setup

References

About

Releases

Packages

Languages

franyack/ocr-tesseract

Folders and files

Latest commit

History

Repository files navigation

OCR Project with Tesseract

Index:

About project

Environment setup

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages