Skip to content

Shreeshrii/tesstrain-bali

Repository files navigation

tesstrain-bali

Finetune Training and OCR evaluation of Tesseract 5.0.0 Alpha for Balinese script using tesstrain Training workflow for Tesseract 4 as a Makefile. Certain file locations and scripts have been modified compared to source repos.

OCR evaluation is done using ISRI Analytic Tools for OCR Evaluation with UTF-8 support and and The ocrevalUAtion tool.

bali1 - Balinese script - Version 1

Replace the top layer training was done using five Balinese Unicode fonts and synthetic training text using jav1.traineddata as the Start_Model to continue from. The training was aborted around 8% CER as the training and evaluation images were not shuffled across fonts.

Balinese script - Version 2

Replace the top layer training was done using five Balinese Unicode fonts and synthetic training text using bali1.traineddata as the Start_Model to continue from.

About

FInetuning for Balinese script

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published