A Dataset for Urdu Textline OCR The Dataset contains text images in gray scale and their corresponding text in utf 8. Each .rar file contains folders with nested folders containing Augmented images and a single text folder. This dataset contain three types of images
- Low Resolution text line images
- High resolution text line images
- Words images
Low Res Folder | High Res Folder | Words Folder | |
---|---|---|---|
Unedited images | 20787 | 23018 | 118013 |
Chars in Unedited images | 1602435 | 2234487 | 1080079 |
Words in Unedited images | 370381 | 515498 | |
Total Augmented Images | 119652 | 483378 | 1063772 |
Size in GB | 2.2 GB | 8.7 GB | 9.6 GB |
Link | Low Res Dataset | High Res Dataset | Words Dataset |
Trained model With Minimal Code is Deployed here https://github.com/HassamChundrigar/Urdu-Ocr