Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Sgaw and W Pwo Karen languages in the Myanmar validator. #4065

Merged
merged 1 commit into from
May 5, 2023

Conversation

ben417
Copy link
Contributor

@ben417 ben417 commented May 4, 2023

Fixes the issue reported in #4061

  1. Added 0x102c and 0x1062 in the tone mark section, in Karen these can be tones too.

  2. Added the optional 0x103a, 0x1037, and 0x1038 after the tones. Asat is part of the Sgaw tone mark and dot below and visarga are used as nasal marks following the Pwo tones.

And here are some text files for testing:

test_strings.txt - A few Sgaw and Pwo test strings highlighting the errors this PR fixes.
syllables_sgaw.txt - All possible Sgaw syllables.
syllables_pwo.txt - All possible Pwo syllables.

1. Added 0x102c and 0x1062 in the tone mark section, in Karen these can
be tones too.

2. Added the optional 0x103a, 0x1037, and 0x1038 after the tones. Asat
is part of the Sgaw tone mark and dot below and visarga are used as
nasal marks following the Pwo tones.
@amitdo amitdo merged commit ed69e57 into tesseract-ocr:main May 5, 2023
@amitdo
Copy link
Collaborator

amitdo commented May 5, 2023

Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants