This project uses Tesseract software and Python language to resolve OCR problems.
Starting from an image (or a folder with images), the target is to convert visual information (e.g. words, numbers inside the image) into text information (e.g. csv file with relevant information).
All the scripts are based on Adrian Rosebrock and his website pyimagesearch. So, thanks Adrian!
Install Tesseract technology on your machine, you can find how to do this here
Configure your development environment. It is highly recommended to use pyenv, so you can manage your python projects easily. More information here
Once you have your environment ready, you must install requirements. To do this, in the root of this project, execute:
pip install -r requirements.txt
That's all! Yoy could run any python script. For example:
python scripts/ --folder folder_path_where_you_have_your_images_to_process --blacklist "|/\[](){}"