diff --git a/README.md b/README.md index a3d0bda4e8..de2ec6fbcd 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/gr Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box". -Tesseract supports **various output formats**: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf. +Tesseract supports **various output formats**: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The master branch also has experimental support for ALTO (XML) output. You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract. diff --git a/doc/tesseract.1.asc b/doc/tesseract.1.asc index 4e024fb19c..0aca4f7d4a 100644 --- a/doc/tesseract.1.asc +++ b/doc/tesseract.1.asc @@ -90,6 +90,7 @@ OPTIONS contains a list of variables and their values, one per line, with a space separating variable from value. Interesting config files include: + + * `alto` - Output in ALTO format (file extension `.xml`). * `hocr` - Output in hOCR format (file extension `.hocr`). * `pdf` - Output PDF (file extension `.pdf`). * `tsv` - Output TSV (file extension `.tsv`).