Extract Text from PDFs and Images Using Tesseract. Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0
- Django
- Tesseract
- Pdf2Image
Prerequisites :
virtualenv venv
source venv/bin/activate
cd text_grabber
pip install -r requirements.txt
Run Django App :
python3 manage.py runserver 0.0.0.0:8000