Skip to content

HassamChundrigar/Urdu-Augmented-TextLines-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Urdu-Augmented-TextLines-Dataset

A Dataset for Urdu Textline OCR The Dataset contains text images in gray scale and their corresponding text in utf 8. Each .rar file contains folders with nested folders containing Augmented images and a single text folder. This dataset contain three types of images

  • Low Resolution text line images
  • High resolution text line images
  • Words images

Summary of Dataset

Low Res Folder High Res Folder Words Folder
Unedited images 20787 23018 118013
Chars in Unedited images 1602435 2234487 1080079
Words in Unedited images 370381 515498
Total Augmented Images 119652 483378 1063772
Size in GB 2.2 GB 8.7 GB 9.6 GB
Link Low Res Dataset High Res Dataset Words Dataset

Examples

Low Res Unedited alt text

Low Res Augmented alt text

High Res Unedited alt text

High Res Augmented alt text

Word Unedited alt text

Word Augmented alt text

Project Url:

Trained model With Minimal Code is Deployed here https://github.com/HassamChundrigar/Urdu-Ocr

About

A Dataset for Urdu Textline OCR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published