bangla-pdf-to-text-OCR

This base describes how to create a Txt document/EPUB from an image-type pdf

These codes are only tested for Windows PCs. For Linux update the directories in ocr.py.

*pre-requisites:

Tesseract engine (https://github.com/UB-Mannheim/tesseract/wiki) - while setting up make sure to select Bangla as a language.
Calibre (https://calibre-ebook.com/download)

LETS GO>>>

After setting up Tesseract on your respective device. Run: pip3 install pytesseract
Go to Calibre and add your PDF book there. Select it and click Convert books. Convert it to ZIP.
Open the directory where the ZIP is located and UnZIP the file.
after unzipping you can discover the Images located inside the folder.
you will find all the images named something like index-1_***.jpg.
paste the Python script on the directory. And run it.
you will find a book.txt document.

Edit the txt file in Google Docs and save it as EPUB.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
ocr.py		ocr.py

Provide feedback