Using the tesseract CLI tool

Tesseract OCR has a command-line utility which is woefully under-documented. Thanks to Alexandru Nedelcu I figured out how to use it today.

To install on macOS:

brew install tesseract

To convert an image into an annotated PDF (which you can then copy and paste text out of, and which will be correctly indexed by Spotlight):

tesseract image.png output-file -l eng pdf

The second output-file argument there is the path and filename of the output - note that I didn't include a .pdf extension because Tesseract adds that automatically - so the output will be in a file called output-file.pdf.

To get out just the plain text:

tesseract image.png output-file -l eng txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tesseract-cli.md

tesseract-cli.md

Using the tesseract CLI tool

Files

tesseract-cli.md

Latest commit

History

tesseract-cli.md

File metadata and controls

Using the tesseract CLI tool