This base describes how to create a Txt document/EPUB from an image-type pdf
These codes are only tested for Windows PCs. For Linux update the directories in ocr.py.
*pre-requisites:
- Tesseract engine (https://github.com/UB-Mannheim/tesseract/wiki) - while setting up make sure to select Bangla as a language.
- Calibre (https://calibre-ebook.com/download)
LETS GO>>>
- After setting up Tesseract on your respective device. Run:
pip3 install pytesseract
- Go to Calibre and add your PDF book there. Select it and click Convert books. Convert it to ZIP.
- Open the directory where the ZIP is located and UnZIP the file.
- after unzipping you can discover the Images located inside the folder.
- you will find all the images named something like
index-1_***.jpg
. - paste the Python script on the directory. And run it.
- you will find a
book.txt
document.
Edit the txt file in Google Docs and save it as EPUB.