-
Notifications
You must be signed in to change notification settings - Fork 190
Home
Stefan Weil edited this page May 10, 2020
·
8 revisions
tesstrain (formerly ocrd-train) is a collection of scripts and documentation for training of Tesseract with LSTM (supported by Tesseract 4 and newer releases).
Currently it includes a Makefile
which allows training from real line images with ground truth (text transcriptions).
Such data is available from a number of sources, see https://github.com/cneud/ocr-gt for a list.
Training from synthetic images is supported by training scripts (Shell, Python) which are still part of the Tesseract code base.
- Training Fraktur with Austrian Newspapers
- Training Fraktur with Neue Zürcher Zeitung
- Training Fraktur with GT4HistOCR
- Training Fraktur and Handwriting with German primers
- Training Arabic Handwriting
- Training Handwritten Text with German Konzilsprotokolle