Indic-trans provides a state-of-the-art transliteration module for cross-transliterations (representing multilingual languages in English) among Indian languages, including English and Urdu.
(This deployment is a work in progress and only focuses on text input for the time being)
- Hindi
- Bengali
- Gujarati
- Punjabi
- Malayalam
- Kannada
- Tamil
- Telugu
- Oriya
- Marathi
- Assamese
- Konkani
- Bodo
- Nepali
- Urdu
- English
-
Rule-Based and Machine Learning Models:
- Option to choose between rule-based (default) or machine learning-based transliteration.
-
Efficient Lookup:
- Build lookup tables to prevent repeated transliteration of the same words, improving performance for large text corpora.
-
K-Best Transliterations:
- Generate multiple best transliterations (up to 5) for more accurate results.
- Cython
- SciPy
-
Clone the repository:
git clone https://github.com/libindic/indic-trans.git
or:
git clone https://github.com/irshadbhat/indic-trans.git
-
Navigate to the cloned directory:
cd indic-trans
-
Install the requirements:
pip install -r requirements.txt pip install .
indictrans --s <source_language> --t <target_language> --input <input_file> --output <output_file>
from indictrans import Transliterator
trn = Transliterator(source='hin', target='eng', build_lookup=True)
result = trn.transform("input text")
print(result)
Input (Hindi):
कांग्रेस पार्टी अध्यक्ष सोनिया गांधी, तमिलनाडु की मुख्यमंत्री जयललिता और रिज़र्व बैंक के गवर्नर रघुराम राजन
Python Script
from indictrans import Transliterator
trn = Transliterator(source='hin', target='eng', build_lookup=True)
hin_text = "कांग्रेस पार्टी अध्यक्ष सोनिया गांधी, तमिलनाडु की मुख्यमंत्री जयललिता और रिज़र्व बैंक के गवर्नर रघुराम राजन"
result = trn.transform(hin_text)
print(result)
Output
congress party adhyaksh sonia gandhi, tamilnadu kii mukhyamantri jayalalita aur reserve bank ke governor raghuram rajan
Input (Tamil):
நான் ஒரு மாணவன். இந்தியா ஒரு பெரிய நாடு.
Python Script
from indictrans import Transliterator
trn = Transliterator(source='tam', target='eng', build_lookup=True)
tam_text = "நான் ஒரு மாணவன். இந்தியா ஒரு பெரிய நாடு."
result = trn.transform(tam_text)
print(result)
Output
naan oru maanavan. india oru periya naadu.