Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1362: Add support for sanskrit transliteration in latin/roman script #4

Merged
merged 3 commits into from
Feb 21, 2018

Conversation

@Shreeshrii
Copy link
Contributor

@theraysmith

The best/Latin.traineddata does not include all the characters that are required for IAST.

I would appreciate if you can either include all characters from this unicharset to Latin traineddata or add this as a separate option.

https://sanskritdocuments.org/iast/by-category/stotra.php
http://sarit.indology.info/

have a number of documents in IAST format.

@Shreeshrii
Copy link
Contributor

Shreeshrii commented Sep 5, 2017

Plusminus training worked ok for this (unlike for Devanagari script)

see san_latn.traineddata for version 1

from https://github.com/Shreeshrii/tessdata4alpha/tree/master/best

kamakoti-Latin.txt

kamakoti-san_latn_1.txt

kamakoti

@Shreeshrii
Copy link
Contributor

Shreeshrii commented Sep 7, 2017

When testing a page which has both English and IAST(modified transliteration scheme) in it, Latin traineddata gives better results. The traineddata after plusminus tuning does not.

See attached sample.
apracticalhinds01mathgoog_0050

apracticalhinds01mathgoog_0050-eng+san_latn_1.txt
apracticalhinds01mathgoog_0050-Latin.txt
apracticalhinds01mathgoog_0050-san_latn_1.txt
apracticalhinds01mathgoog_0050-san_latn_2.txt

apracticalhinds01mathgoog_0050-eng.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants